Although the term can be hackneyed and sometimes misused, the AI market affects virtually every sector of activity, from education, finance, logistics, healthcare, for very specific applications like the discovery of new materials or drugs and even in deeptech research like nuclear fusion. Here we'll focus more on the part dealing with communication between humans and technology, through speech and text recognition, but also the emerging market for generative AIs and how they are trained.
1.a The AI market today.
For a few years now, AI in linguistics is maturing and entering a go-to-market phase. If their use was still reserved for companies to offer a new way of using a product, AI is starting to fall into the hands of users benefiting from an acceleration of innovation, becoming the product itself. Large Language Models have entered mass adoption and are becoming increasingly generalized thanks to the use of multi-modal agents and generative AI such as stable diffusion.
According to studies, the total AI market is expected to grow from 200 billion today, to around 2000 billion by 2030, a 10-fold increase in less than a decade.
While voice assistants like Siri and connected speakers like Alexa are long-standing innovations, they have unfortunately remained more at the gadget stage due to their limited capacity for action. In addition, almost only large technology companies such as GAFAM were able to offer high-performance voice recognition, as they had the necessary resources and user-base to train their AIs. All the other companies that have tried, such as car manufacturers for GPS voice-command, or voice chat-bots for telephone assistance, have often been almost unusable in practice during the previous decade.
But this could be about to change, not least with new types of AI like Large Action Model, and hardware and software layers built specifically for non-app-based interactions like the recent Rabbit R1 or Hu.ma.ne’s AI-pin. We're seeing a certain trend towards changing the way we interact with technology in a more profound way, while at the same time allowing us to abstract ourselves from screens whose addictions are becoming increasingly problematic from a societal point of view.
Therefore, the voice recognition market is likely to take a new turn and could become much more present in our daily lives, such as in the home automation sector, but also in AI assistants, which are becoming more and more popular as companions. Also, efforts in terms of language support have always been much greater for English than for other major languages, and mass adoption will require catching up to better cover people’s diversity, particularly in high-population emerging countries.
The data collection and labeling market for text, image and voice recognition is also expected to grow significantly, from around 1.2 billion in 2021 to over 7 billion by 2030 according to Verified Market Research, but other analyses see this market climbing to over 17 billion by the end of the decade.
It’s interesting to note that today’s key companies in this market such as Amazon Mechanical Turk, Playment, Scale AI, Labelbox, specialize more in automatic or human data annotation on existent data collection rather than in data production.
Most of the data used in AI training today is not specifically created for it, for example LLMs uses data sourced almost exclusively from the Internet, which can reduce their efficiency, data quality and be highly biased if not properly selected and cleaned. That’s why a lot of established companies are more specialized in data sorting, anomaly detection and making this data readable for algorithms with labeling. Therefore, there could be a growing interest in the large-scale production of data with high quality for more specific training and to cover the blind spots in existing datasets.
2.b Micro-tasking and trustless collaborative data production and annotation.
The microtasking market is particularly used in the AI data annotation market, but also in anything to do with moderating social networks, or any kind of “botting” on the Internet. With its aim of horizontally dividing work by parceling out unskilled tasks, it is often seen as neo-Taylorism adapted to the digital age and its needs.
In 2019, the micro-tasking market was worth 1.4 billion and forecasts predicted very significant growth by 2025.
However, more recent forecasts are less enthusiastic, reducing the latter to only 3 billion by 2027. The main reason lies in the fact that the market has encountered several difficulties with micro-tasking, notably in terms of quality, but also in terms of confidence by the services offered by this type of company.
Without a game-theoretic mechanism, it’s difficult for micro-tasking companies to ensure that the work their micro-workers do is honestly made, and to prove to their customer that potential quality checks are carried out properly.
This is why blockchain can be a major asset for creating large collaborative networks, offering transparency as much as traceability of workers’ tasks and checkers’ proof, while enabling economic mechanisms to ensure that all participants are honest. Blockchain can also be important for improving the ethics of this type of work, ensuring good remuneration for workers without intermediaries.
Ta-Da's Role in Advancing AI Data Utilization
The advancement of AI hinges on several critical factors: access to extensive datasets, cutting-edge algorithms, enhanced computing power, and substantial AI research investment. These elements collectively propel AI forward, influencing everything from everyday technology to complex industrial systems. Central to these advancements is the need for high-quality, extensive data.
Data continues to be a significant barrier
Securing data for AI training is a complex and costly process, vital for algorithm accuracy. High-grade data is crucial, yet sourcing it proves tough. AI systems need diversity in data to be effective, yet capturing it is a challenge. Custom datasets, essential for specific AI needs, come at a hefty price. This leads to a hefty financial toll, with data acquisition often consuming nearly 60% of an AI project's budget. Overcoming these barriers is critical for AI progress, demanding innovative solutions for economical, varied, and top-tier data collection.
Ta-Da aims to meet the growing demand for qualified data by becoming the first decentralized platform on the blockchain for data collection. Its inception is more than just an addition to existing data collection methods; it represents a new approach. It aims to provide cost-effective, high-quality data collection, emphasizing flexibility and transparency via blockchain technology. This strategy not only ensures fair compensation for data contributors but also guarantees thorough traceability in data handling