THANK YOU FOR SUBSCRIBING
Magic Data Technology Co., Ltd.: Training Data for Machine Learning
Data, as the foundation of the AI industry, has grown in its importance. However, raw data is heterogeneous and has poor usability. With the diversification of application scenarios and smart applications, the need for processing unstructured data soars. In order to meet the market demand for structured data, Magic Data Technology has independently developed a structured big data RPA platform, using AI technology to facilitate data labeling and quality inspection. The platform greatly improves the efficiency and quality of artificial intelligence data labeling in this labor-intensive industry. Leveraging this platform, Magic Data provides its clients with well-structured and precise data. “We contribute to the industrialization of AI technology by producing high-quality data,” remarks Zhang Qingqing, Founder, CTO, and CEO of Magic Data Technology.
In the field of speech recognition, Magic Data has collected more than 100,000 hours of multilingual self-owned copyright standard datasets for AI modeling under various scenarios. In fact, Magic Data has designed and established a benchmark-level recording studio with independent intellectual property rights. In this regard, Magic Data is able to make the reverberation time adjustable to meet the demanding acoustic requirements from clients in different industries and different project requirements, greatly improving the quality and reliability of data and largely reducing the cost of subsequent processing and cleaning.Magic Data has a complete data processing system with a human-in-the-loop setup. In view of the characteristics of big data, unstructured, heterogeneous, and fast processing, the company uses intelligent data sorting technology to classify and store big data, use intelligent data labeling technology to accelerate the manual structuring process, and use intelligent data quality inspection technology to improve efficiency and accuracy."Through a whole-process intelligent algorithm, we try to optimize the production model of the supply terminal and actively contribute toward ‘4.0 era’ of big data structural processing,” states Zhang.
Magic Data’s solution combines supervised learning, semi-supervised/unsupervised learning, transfer learning, self-adaption, and other AI technologies to provide the clients with full-chain data service. At the same time, the company provides clients with a structured and versatile standard data set. Combining with customized datasets and standard datasets, Magic Data greatly shorten training time, improve recognition efficiency, and reduce modeling cost.
Based on the judgment of industry trends and the insights of customer needs, Magic Data has established multilingual lexicons and a labeling system to provide clients with efficient, customized services. “Our experts are from diverse backgrounds, and their experience can ensure the credibility and efficiency,” says Zhang. Recently, Magic Data is exploring a new solution, which is to add real-time audit functions to the crowdsourcing data collection applications.
We contribute to the industrialization of AI technology by producing high-quality data
This function uses AI technology to conduct real-time audition of online recording, reminding users to record their speeches correctly and save them the trouble of unnecessary do-over, which improves data collection efficiency.
Zhang, the founder of Magic Data, is experienced in algorithms. She is good at formulating adequate solutions according to clients’ demand, and has put forward the concept of “Big Data Structural Processing 4.0”. She divides the core needs of clients into two categories: one is the basic data sets for AI modeling, the other is customized datasets. Currently, Magic Data has established multi-language standard data sets under different scenarios to provide clients with standardized data products and help clients build their AI models promptly. After completing the AI models, Magic Data will help clients to make structured annotations, add more values, and improve their availability. Recently, Daniel Povey, a former associate research professor at the Center for Language and Speech Processing at John Hopkins University, joined the company as a Principal Scientist Advisor. He is the developer and operator of Kaldi, a famous speech recognition open-source toolkit. He is also known for his contributions to speech recognition technology, including discriminative training, now known as sequence training.
Highlighting an implementation instance is a quote by Xu Ran, a former R&D Director at Nuance Greater China, and a current Senior Director at Roobo Research Speech Technology. He says, “We used nearly 10,000 hours of MagicData's conversational speech data, to update automatic speech recognition models for noisy speech and conversational speech. The final performance improvements were significant. For the conversational speech recognition, the word error rate has been reduced by 30 percent related. At the same time, we were surprised that their data can also help us reduce the word error rate by 10 percent related for noisy speech recognition. This shows that the spontaneousness of their data can not only help models catch up the speech naturalness, but also enhance the robustness against to noise!"
Magic Data is developing a training system for data annotators, formulating data standards, increasing customer loyalty, and boosting the development of the industry. As a SAAS product, Magic Data’s human-in-the-loop platform for structural data processing will be used to empower clients who want to build AI models in an efficient and prompt manner. In these ways, Magic Data will satisfy users’ needs and deliver more successful cases.
“With the growth of the big data industry, we believe that the quality of data annotation is becoming more and more important, and it is extremely urgent to establish industrial and national standards,” says Zhang. In the future, Magic Data will also contribute to the development of data structuring. The company will tap into more application scenarios with high standards so as to provide data services for clients in various fields.