Niel, Saadé and Schmidt bet 60 million euros on Gradium and real-time voice AI

For years, the voice AI industry has focused on the ability to produce believable voices that can mimic human nuances with enough realism for voice acting, marketing content, or scripted voice assistants. This first generation was marked by rapid progress, reaching a homogeneity of quality which today makes these models interchangeable for a large part of the uses. Producing a beautiful voice is no longer a competitive advantage. The market is shifting towards the ability of voice to become a complete conversational interface.

Voice AI can no longer limit itself to generating a realistic voice, it must now converse, react to the unexpected and adapt to the rhythm of a human exchange. It is exactly on this ground that Gradium wants to build its offer and the startup relies for this on the work carried out within the Kyutai laboratory, in particular around the Moshi model, which breaks with the classic “speech-to-text then text-to-speech” chain. Instead of going through intermediate transcription, Moshi adopts an architecture speech-to-speech direct, designed to reduce latency and enable more natural and seamless interaction.

This approach allows for a more natural interaction and avoids the delays inherent in traditional pipelines. It also paves the way for dialogues where voice, listening and understanding work simultaneously, which is becoming essential in the next generation of AI agents.

Gradium is banking on this structuring constraint to differentiate its offer, by integrating multilingualism from the start. The startup, however, enters a chessboard where different highly capitalized players are busy. ElevenLabs, already well established in dubbing and voice creation, has raised $287 million. Cartesia and Deepgram, funded to the tune of $86 million each, are positioning themselves on audio multimodality and advanced conversational AI. These players enjoy massive economic advantage, large-scale data and a significant commercial lead. Gradium’s strategy consists of circumventing this head-on competition by specializing in a segment that is still insufficiently addressed, namely real-time voice and fine synchronization with AI agents.

This positioning nevertheless raises several issues for the European ecosystem. First of all, the ability to maintain a multilingual advantage remains uncertain in the face of American models powered by much greater volumes of data. But also the integration of voice into multimodal systems driven by large LLMs requires considerable infrastructure, which few European players have. And finally there is the structural risk of seeing European startups becoming technical building blocks integrated into larger foreign platforms, without mastering the application layer or customer relations.

Founded in September 2025 by Neil Zeghidour, former researcher at Google DeepMind and Meta and founding member of Kyutai, Gradium brings together a team from the Parisian laboratory, including Laurent Mazaré, Alexandre Défossez and Olivier Teboul. The startup today announces a fundraising of 60 million euros led by FirstMark Capital and Eurazeo, and brings together a group of investors rarely united around a European project: DST Global Partners, Xavier Niel (iliad), Eric Schmidt, Rodolphe Saadé (CMA CGM), Korelya Capital led by Fleur Pellerin, Amplify Partners, Liquid 2 Ventures and Drysdale Ventures. Several major figures in AI and tech are also participating in the round, including Yann LeCun, Olivier Pomel, Ilkka Paananen via the Illusian Founder Office, Thomas Wolf, Guillermo Rauch and Mehdi Ghissassi on behalf of the Tiny Supercomputer Investment Company.

It plans to address uses such as real-time interpreting, video games, medical transcriptions, automated surveys and language education, by banking on voice AI capable of supporting a natural and multilingual conversation.