While language models have occupied all the media space for two years, the idea may be surprising, but the next breakthrough in artificial intelligence may well not come from them. Because behind their stylistic ease and their analytical capacity, LLMs remain prisoners of an architecture designed for a single task, predicting the next word. However, this mechanism, as sophisticated as it may be, does not allow them to reason reliably or to understand the dynamics of the real world.
Gregory Renard explained it to us in 2023 in our podcast dedicated to AI, an LLM does not know what a physical movement is, does not understand temporal continuity, does not have a structured memory of the world, cannot explicitly manipulate causal relationships, and does not know how to anticipate the effects of an action. When he “explains” how an object falls or how a scene evolves, he does not access any representation of the physical laws that govern the scene but reassembles statistical correlations, often correct, sometimes hazardous.
And this limit is not a bug, but is the direct result of their training, because a linguistic model does not observe objects, trajectories, or interactions. The contrast with human learning is very instructive on this point: a four-year-old child has already absorbed, through his vision, more information than that contained in the entire public text accessible online. And above all, he learned through experience, constantly correcting his predictions.
Generative AI thus experiences a structural contradiction: although it produces text with impressive fluidity, it has no internal mechanism for organizing a representation of reality. She knows how to imitate, but does not understand, her ability to progress is therefore constrained by her very nature.
And it is this impasse that many researchers, including Yann LeCun, are pointing out today. To cross a real threshold, AI must be able to predict how the world will evolve. To do this, it will have to acquire a form of “internal model” allowing it to anticipate, test hypotheses, plan several actions, and correct its errors. This is precisely the promise of emerging architectures that will be at the heart of the next decade: World Models and the Joint Embedding Predictive Architectures (JEPA).
These models seek to fill what is lacking in AI today, and as the limitations of the LLM paradigm become visible, this new approach could emerge as the true ground for innovation. This is the bet that Yann LeCun is making with his future startup dedicated to advanced Machine Intelligence, as well as Jeff Bezos with Prometheus