Africa is home to an estimated 1,500 to 3,000 languages. For decades, the digital world has largely ignored most of them. But a wave of AI development is beginning to close a gap that has left hundreds of millions of people on the margins of the internet age.
The Problem With Building on Silence
Training a language model requires lots of data. English benefits from an almost unlimited supply of text scraped from books, articles, and websites. Most African languages have no such reservoir. As Vukosi Marivate, a computer science professor at the University of Pretoria, noted, the number of Wikipedia articles for a language is a telling proxy for how much AI can learn about it. For many African languages, that number is close to zero.

Some languages barely have a written tradition at all. Researchers working on Isindebele (spoken in South Africa and Zimbabwe) had such difficulty finding written source material that they turned to a government manual for goat herders to help construct their prompts. The challenge, in other words, is a matter of building entire linguistic infrastructures from scratch.
Homegrown Models Leading the Way
South African startup Lelapa AI is among the most prominent African-led efforts in this space. In 2024, it launched InkubaLM, described as Africa’s first multilingual large language model, covering Swahili, Yoruba, IsiXhosa, Hausa, and isiZulu. The model, named after the dung beetle for its lean, efficient design, was built to work in low-resource environments. This was a deliberate choice that positions African developers ahead of global trends now shifting toward smaller, more efficient models.
Separately, the African Next Voices project has gathered 9,000 hours of speech recordings across 18 languages in Nigeria, Kenya, and South Africa. Rather than pulling from existing digital text, researchers engaged directly with native speakers, asking them to describe images and respond to prompts in their own languages. The resulting datasets are being released openly for developers across the continent to use.
The Big Tech Push
Global players are also moving, if belatedly. Google Translate’s single largest expansion ever added over 110 new languages using its PaLM 2 model, with nearly a quarter of those additions being African, bringing the platform’s African language total to over 50. Languages added include Dholuo, spoken by more than four million people in the Nilotic regions, and N’Ko, a standardised form of several West African Manding languages.
More recently, Google extended its AI Overviews and AI Mode search tools to 13 African languages, including Akan, Amharic, Wolof, Kinyarwanda, and Afaan Oromoo. The rollout builds on its Waxal language project, which combines machine learning, linguistic fieldwork, and community input to improve how AI tools understand and generate African language content.
The Stakes Are High
Progress is real, but so are the risks. Researchers are careful to distinguish between contexts where AI error is tolerable and where it is not. A chatbot that stumbles over casual queries is one thing; a health or banking tool that misunderstands a user’s language is another matter entirely.
There is also the question of what happens if the work stalls. For Marivate, languages without digital representation face a quiet kind of extinction, gradually squeezed out as people switch to languages the internet understands. AI, in this framing, may be one of the most consequential instruments for whether these languages survive the century.










