AI Offers Free Datasets for Machine Learning was launched on April 15. As the founder, Magic Data Tech, lays the leading position in the amount of conversational speech data and becomes the first China company to release open-source datasets on an independent website, which might change the way users get data.

Massive, diversiform datasets are released on The datasets are subdivided into multiple dimensions, offering AI engineers a more efficient way to find datasets for their various AI models, thereby reserves more energy on algorithm optimization.

Magic Data Tech. welcomes all data producers of discoverers to join and release datasets on  We, together, could build a better ecology for open source. Please contact us if interested. has released more than 30 open-source datasets, including Mandarin Chinese, English, and Shanghai Dialect (Wu Chinese) conversational speech, NLP textual corpus, TTS corpus, and lexicons. All datasets are divided by languages, scenes, and industries as possible.

The company is releasing high-quality datasets and more content on, and they always appreciate comments, sharing, or any form of support. Let’s together make a better place for inspiration and the spirit of sharing.

In 1969, Unix released source code on Unix Community, initiating the first “open-source act” in human history.

In 1991, the Linux kernel was released.

In 1998, Netscape Communications released the source code for its Communicator suite, which defined the word “open source.”

In 2005, a source-code management system called Git came out, which gave rise to the managed Git code warehouse.

Since the conception of Artificial Intelligence (AI) was put forward in the Dartmouth Summer Research Project, it suffered countless ups and downs. Internet, big data, cloud computing, 5G, numerous new technologies came out and played increasingly important roles.

AI opened a new era, and open source comes up. Platforms for machine learning have been emerging. Developers, generation by generation, contribute their intelligence to the evolution of AI in the spirit of openness, freedom, and cooperation. An increasing number of governments, NGOs, companies, academic institutions, and individuals release their image, textual, and audio data to the public and formed platforms like Kaggle, UCI, OpenML, ImageNet, OpenSLR. Data has yet to become the core driver for AI development.

Previous ArticleNext Article

Leave a Reply

Your email address will not be published. Required fields are marked *