As the race for artificial intelligence focuses on training models, and as models become more capable, another problem emerges, their execution.
It is on this critical layer that Fractile is positioned, a British startup which announces a fundraising of 220 million dollars, or approximately 187 million euros. The operation is led by Accel, Factorial Funds and Founders Fund, with participation from Conviction, Felicis, 8VC, Gigascale, O1A and Buckley Ventures.
Founded in 2022, the startup develops hardware architectures intended to accelerate the inference of frontier models. In other words, the phase during which the models actually produce results. A subject that has long been secondary to training, but has become central with the emergence of reasoning models and future autonomous agents.
Fractile’s thesis is based on a simple idea: the most advanced models will soon no longer be limited by their theoretical capabilities, but by the time required to execute their chains of reasoning.
“We bet on the fact that the most advanced AI systems would end up being limited in their impact by the time required to produce useful results,” explains a representative of the startup. “The only way to truly unlock this latent value was to radically reinvent the hardware on which the frontier models run. »
This development is gradually transforming the economy of the sector. Each request to an AI model consumes computing resources. And the more complex the models become, the more the inference costs increase. New reasoning systems now generate long processing sequences, sometimes involving several tens of millions of tokens.
Fractile estimates that some models already produce up to 100 million tokens to solve complex problems. At execution speeds approaching 40 tokens per second on current architectures, such processing can require nearly a month of continuous computing.
For the company, this constraint goes far beyond the simple issue of performance. “Inference is both the revenue driver of the AI industry and the main factor limiting its expansion”
Fractile draws a parallel with the systems developed by DeepMind for AlphaGo. The system was not based solely on a neural network producing an immediate response, but on a succession of inferences allowing different scenarios to be explored before each decision.
According to the British startup, major language models are now moving in this direction. “Complex intellectual work involves many sequential steps, each dependent on the previous one,” explains the company, which sees reasoning models as a first step towards systems capable of maintaining long and structured chains of analysis.
The main technical bottleneck identified by Fractile concerns memory bandwidth. The company considers that current architectures are not progressing fast enough to absorb the increase in needs linked to long contexts and reasoning models.
“To compress this month of calculation into a day, we would need to achieve around 1,200 tokens per second while managing the complexity and capacity constraints of large models operating on very long contexts,” specifies the company.
To address this issue, Fractile works across the entire technological chain: microarchitecture, system design, manufacturing processes and hardware optimization. A vertical approach that brings the company closer to players like Cerebras Systems or Groq.
This battle over inference has become one of the main industrial fronts in AI. Several groups are looking to reduce their dependence on traditional GPU architectures dominated by NVIDIA. AMD, Google, Amazon Web Services and Intel are accelerating their investments in AI accelerators, while startups like SambaNova Systems, Etched, Tenstorrent or d-Matrix are looking to develop specialized architectures for workloads related to reasoning and AI agents.
Europe is also trying to preserve a presence on this strategic layer of infrastructure. In France, SiPearl develops processors intended for European supercomputers, while Kalray works on parallel processing architectures adapted to massive data flows and AI uses. Scaleway and Mistral AI are also participating in the emergence of a European computing and inference infrastructure. In the United Kingdom, Graphcore remains one of the main industrial precedents in this segment despite commercial difficulties facing NVIDIA.
Fractile believes, however, that the issue goes beyond the current uses of generative AI. “The workloads that are pushing the boundaries today are already transformational. Those beyond this boundary will redefine the entire economy,” the company says.
The company is currently recruiting in London, Bristol, San Francisco and Taipei.