Tech »  Topic »  Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft

Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft


The project’s leader says that allowing everyone to access the collection of public-domain books will help “level the playing field” in the AI industry.

Photograph: Getty Images

Harvard University announced Thursday it’s releasing a high-quality dataset of nearly one million public-domain books that could be used by anyone to train large language models and other AI tools. The dataset was created by Harvard’s newly formed Institutional Data Initiative with funding from both Microsoft and OpenAI. It contains books scanned as part of the Google Books project that are no longer protected by copyright.


Copyright of this story solely belongs to www.wired.com . To see the full text click HERE