Neuphonic Raises £3M to Build the World's Fastest AI Text-to-Speech Engine

May 31, 2024

Author:

Neuphonic, a London-based AI company, has raised £3 million in pre-seed funding to accelerate the development of what it claims is the world's fastest text-to-speech technology. The round was led by Moonfire VC, with participation from Tiny VC, Salica Oryx Fund, and Cur8 Capital. The company will use the proceeds to expand its language capabilities, enhance the performance of its text-to-speech models, and develop on-device deployment solutions. At the time of the raise, Neuphonic had already launched a closed beta of its technology and attracted over 1,000 unique users from its growing waitlist.

Neuphonic was co-founded by Jiameng Gao, a former co-founder of Papercup — a UK AI video dubbing company — and Sohaib Ahmad, who worked as a quant trader at a hedge fund before moving into AI research. The pair met at Cambridge University while studying machine learning, and both bring to the company a particularly personal perspective on the problem they are solving: as multilingual first-generation immigrants with roots in China, Ireland, and Pakistan respectively, they have direct experience of the language barriers and cultural nuances that voice AI must overcome to be genuinely useful across global markets.

The problem Neuphonic is targeting is a fundamental bottleneck in conversational AI. Current text-to-speech models generate speech in chunks — typically waiting until a complete sentence or phrase is available before producing audio. This introduces latency that makes AI interactions feel stilted and unnatural. When a user asks an AI system a question and the system pauses for multiple seconds before beginning to speak, the illusion of a real conversation collapses. Neuphonic's patent-pending algorithm works differently: it generates speech incrementally, word-by-word, as text arrives from the language model, achieving an ultra-low latency of just 25 milliseconds. This enables voice AI responses that begin within the same window as a natural human conversational pause, making AI-driven dialogue feel for the first time genuinely interactive rather than query-and-response.

The technology is designed to be language-agnostic and model-agnostic, integrating with any large language model through Neuphonic's API. This positions it as infrastructure that can serve a broad range of applications: customer service automation, digital avatars, AI-powered gaming characters, real-time translation, accessibility tools, and content creation. Akshat Goenka, Partner at Moonfire, noted that voice AI has long been constrained by technical limitations that prevented natural interaction, and that Neuphonic's solution has the potential to unlock entirely new business models across multiple industries. Professor Steve Young CBE, Emeritus Professor of Information Engineering at Cambridge and a former Pro-Vice Chancellor, joined both as an advisor and investor in the round, lending significant academic credibility to the company's technical claims.

The voice AI market is projected to reach $41.39 billion by 2030, a growth trajectory driven by the deployment of conversational AI across enterprise customer service, autonomous vehicles, consumer electronics, and gaming. Neuphonic is positioning itself as the layer that makes this market technically viable — the low-latency speech synthesis infrastructure without which truly conversational AI cannot exist.

Sources