Why robotics has (finally) become the ultimate application of generative AI

During the NVIDIA GTC 2025 conference, one of the most anticipated panels brought together the leaders of Onex, Skilled AI, Agility Robotics, Boston Dynamics and Nvidia around a shared observation: robotics is entering a new phase. Thanks to the advances of fundamental models, the drop in hardware costs and the generation of large -scale data, artificial intelligence leaves the screen to anchor in the physical world.

Long hampered by the complexity of systems, the scarcity of motor data and the cost of machines, robotics finally aligns with the exponential trajectory that language processing and computer vision have known. AI is no longer content to predict. She acts, tests, learns. And it now does it in a closed loop, in a world where gravity, inertia and objects impose their constraints. It is this paradigm change that this round table exposed without detour.

Robotics, long lagging behind, enters its chatgpt moment

Despite its common origins with AI, robotics has long remained on the fringes of the great advances. Its limitations were less theory than in practice: little usable data, strong physical constraints, slow adoption, a high cost.

“The generative AI was built on an easy fuel: texts. In robotics, there is no Wikipedia gestures ”-Jim Fan, Co-Lead by Nvidia Gear Lab.

But this structural brake is yielding under the combined effect of three revolutions: the maturity of multimodal models, access to an affordable calculation power, and the creation of massive artificial data pipelines.

From perception to action: AI becomes embodied

What distinguishes the on -board AI from the software is the obligation to interact. A hallucinated chatbot? We correct afterwards. A hallucinated robot? He breaks a cup, misses a catch or becomes dangerous.

The robot cannot be content to predict. He must experiment.

“The robot has no right to make mistakes. It acts in a world where gravity sanctions imprecision ” – Deepak Pathak, CEO of Skilled AI.

It is this principle that bases the new interest in embodied AI. Unlike LLMS, who are passively learning, robotics AI learns by closed loop : perception, action, return of reality. It is the experience that teaches.

New architecture: from photon to movement

At Nvidia, this approach gave birth to the Groot project, unveiled during the Keynote of Jensen Huang. This is a foundation model of 2 billion parameters, open source, capable of transforming images captured by a camera into continuous motor signals.

“The objective is simple: create an AI capable of switching from pixels to actions, without intermediate pipeline” – Jim Fan

This approach recalls the one that allowed the LLMS to unravel: A unique model, a universal taskmassive training. The model learns from three data sources, organized in pyramid:

  • Real data : from teleoperations on physical robots.
  • Simulated data : generated via the Isaac SIM engine.
  • Synthetic data : videos generated by neural simulation models.

The price of hardware falls. The use cases open.

Until now, the cost of the material has gruded the experiments. But robotic components now benefit from the progress of consumer electronics: batteries, sensors, cameras, calculation units.

“10 years ago, a humanoid robot cost $ 1.5 million. Today, we can produce it for less than € 40,000 ” – Aaron Saunders, CTO of Boston Dynamics

This drop allows companies like Agility Robotics or ONEX to consider serial deployments in warehouses, mounting chains, even households. The humanoid robot becomes a potentially scalable product.

Cross-edim: the great challenge of the universal model

One of the major obstacles remains the generalization of the same model on several robotic bodies. What is called “Cross-Embodiment” asks complex questions of dynamics, inertia, calibration, perception.

“Even two identical robots do not react identically. Mechanics introduces noise, even within the same generation of machines ” – Bernd Bornik, CEO of ONEX

Several strategies are tested:

    • Diversification learning : multiply the physical configurations in simulation to lead to variability.
    • Encoding of the robot structure : Describe morphology as a vector sequence (Grammar robot).
    • Dynamic contextualization : inject into the model the history of the robot behavior so that it is adapted.

Humans, the first source of motor data

In the absence of massive robotic gestures databases, the researchers turn to an omnipresent source: Human.

The filmed daily gestures become a gold mine to infer motorized behavior. It is no longer a question of copying, but of interpreting the logic of the gesture.

“The robot does not need to have five fingers to learn how to open a fridge. He needs to understand why we are looking for the handle ” – Deepak Pathak

What robotics teaches AI (and not the other way around)

From Chatgpt, one wonders what LLMS can do for robotics. But the reverse becomes more strategic: What if on -board AI became the ultimate artificial intelligence laboratory?

    • Robotics impose anchoring in reality.
    • She forces to generate her own data.
    • It removes hallucinations by confronting them with physical consequences.

“A model that acts in the world learns better than a model that comments on the world” – Bernd Bornik

The continuation: a change of scale more than a revolution

In the next two to five years, generalist robots will not replace humans. But they will reach a sufficient utility threshold to integrate workflows into repetitive, dangerous or painful tasks.

The challenge will no longer be to know if A robot can accomplish a task, but How much of tasks it can accomplish without reprogramming.

“The adoption of robots will be faster than you think. The brain is ready. The body is almost there. ” – Jim Fan