Thousands of free audio books: Project Gutenberg and Microsoft cooperate
The open-access book collection Project Gutenberg and Microsoft are collaborating to provide approximately 5,000 audiobooks for free. To do this, they use an AI model that is supposed to realistically set thousands of books written to music. In the future, the project operators want to make the audio books created in this way available via platforms such as Spotify, the Apple and Google Podcasts and the Internet Archive.
When you first listen to it, Wuthering Heights read in English sounds very natural for an artificial intelligence, especially due to the comparatively natural prosody. However, if you listen closely, you’ll notice Heathcliff’s mispronunciation. The researchers of the recently published study on the project, “Large-Scale Intelligent Microservices,” say that some audiobooks “contain errors, strange pronunciations, offensive language, or content” that is not suitable for all audiences. Anyone who finds problems with the recordings can report them.
Project Gutenberg has over 70,000 digitized books in English, the content of which is often used for automated text analysis or training large language models (LLM). Project Gutenberg also already has audio books, but these are read aloud in a robot-like voice. With the new system, which researchers from Microsoft, the Massachusetts Institute of Technology (MIT), Project Gutenberg and Google worked on, users will in future also be able to individually adapt the speech output of an audio book. For example, the speed, style and emotional intonation should be able to be changed. Your own voice or a desired voice can also be used with the help of a short audio example.
SynapseML as a basis
In their paper (PDF), the researchers present a scalable system that can convert HTML-based e-books into audio books. To do this, Microsoft uses the open source library Synapse Machine Learning (SynapseML), which is available on Github under the MIT license. SynapseML can “train and evaluate models (…) with one or more nodes (…) in scalable size.” Various databases and cloud storage are used. Developers should be able to combine different machine learning frameworks, for example in supervised learning. In addition, SynapseML can be used in various programming languages such as Python, R, Scala and Java.
What should be read aloud?
While the conversion of text into speech is already well advanced, according to the researchers, there is still less research into which text from an e-book the artificial voice should read out. The researchers are therefore primarily concerned with cleaning up the texts and at which point in the text the automatic language generation should start – LSTM networks (long short-term memory) are used. The team around Markus Weimer wants to ensure that, for example, no footnotes, page numbers, tables, figures and tables of contents are read out.
The research team created the Document Object Model (DOM) of the e-books partly automatically and partly manually in order to categorize the HTML files. This allowed them to create rules to structure the texts automatically. The parsed text could then be passed on to text-to-speech algorithms.
Make audio books accessible worldwide
In contrast to other platforms such as LibriVox, the content is generated automatically. LibriVox, on the other hand, is run by a team of volunteers who record the audiobooks themselves, which also limits resources. Project Gutenberg’s automatic conversion is designed to make literature more accessible to audiobook lovers around the world.
To the home page
#Thousands #free #audio #books #Project #Gutenberg #Microsoft #cooperate