Understand the impact of material and algorithmic costs on AI economy

With our partner Salesforce, unify sales, marketing and customer service. Accele your growth!

We offer you a new series devoted to AI behind the scenes, in order to better understand how it works, as well as the real issues of the actors who operate them.

What is the real competitive pressure on the prices and profitability of AI startups? How much and what SaaS applications could be replaced by the generative AI? What is the real trajectory of the Openai, Google, Anthropic and what are LLM models? When will AI equal humans in various fields? Which Genai startups find a product-market Fit?

So many questions to which we will answer the most educational way possible. A lexicon of AI terms will accompany this file in order to understand the essential concepts to know.

In this 1st episode we send the subject of the costs associated with artificial intelligence.

As everyone can hear it daily, the rise of generative artificial intelligence is based on a material infrastructure of considerable complexity and cost. The training and inference of the most advanced models require calculation capacities that are constantly increasing, pushing businesses and researchers to explore alternatives to optimize its efficiency. In the short term, the scarcity of specialized components and the high concentration of the market limit any significant reduction in costs. In the medium term, algorithmic innovation and the arrival of new players could however modify this trajectory.

The cost of the calculation, a structural brake to the accessibility of AI

Generative artificial intelligence is based on architectures, the functioning of which is particularly energy -consuming. A model like GPT-4 mobilizes tens of thousands of GPU NVIDIA A100 During his training and demands infrastructure equally imposing for its inference. Each request subject to Chatgpt consumes Three to five times more energy That a Google research, illustrating the extent of computer resources needs.

This level of technological requirement results in prohibitive operating costs, which directly impact all market players:

  • THE Hyperscalers (Microsoft, Google, Amazon) invest several tens of billions of euros in their data centers to meet this demand.
  • THE AI sector startupsoften dependent on these infrastructure, must deal with high unit costs, threatening their profitability.
  • THE client companies Seeking to integrate generative AI into their products come up against an invoice that is difficult to absorb.

Faced with this financial pressure, the entire ecosystem seeks to optimize its use of resources and explore solutions to lower these costs without sacrificing the performance of the models.

NVIDIA, master of the AI ​​semiconductor market, but until when?

If the calculation power needs explode, the component offer capable of responding to them remains concentrated in the hands of a single actor. Nvidia dominates the GPU market dedicated to artificial intelligencewith an estimated share of more than 80 %. Its software architecture Cudaessential to exploit these chips, has strengthened its hegemony and limits the emergence of competing solutions.

The success of the generative AI has consolidated this position, allowing Nvidia to impose unpublished prices: a chip H100central element of AI infrastructure, is currently negotiated between 30,000 and 40,000 dollarswell beyond its manufacturing cost. The scarcity of the supply combined with the explosion of demand has thus allowed the company to display a raw record margin, capturing most of the profits generated by the development of the AI.

This monopolistic situation, however, begins to arouse reactions. Several initiatives aim to reduce dependence on NVIDIA GPUs:

  • AMD and Intel accelerate their efforts With competitive alternatives (MI300x, Gaudi 3).
  • Google and AWS develop their own specialized fleas (TPU, Inferentia).
  • Chinese manufacturers, like Huawei and Biren, invest massively Despite the American restrictions.

If these alternatives are still in the minority, they could, in a few years, to fragment the market and lead to a gradual drop in material costs.

Algorithmic optimization, a decisive lever to reduce hardware dependence

Failing to count on a rapid fall in the prices of components, companies invest massively in improving software efficiency. The goal is simple: reduce the calculation consumption necessary for the operation of models, without loss of perceptible performance.

Several major technical advances contribute to this optimization:

  • Quantity : Reducing the accuracy of the calculations (Passage from the FP32 format to Int8 or FP16) makes it possible to accelerate the execution of models while reducing their energy consumption.
  • PRUNING AND SPARSITY : by eliminating unnecessary neural connections, it becomes possible to obtain lighter and rapid models.
  • Distillation of models : By resulting in lightened versions from larger models, researchers retain comparable efficiency with a reduced computer imprint.

These techniques, already used in certain models such as Mistral 7b or Gemma 7bmake it possible to achieve performance comparable to those of much larger models, but with a significantly lower energy cost.

Other approaches focus on Optimization of architectures themselves:

  • Flashedwhich improves the treatment of long sequences by reducing the needs in RAM.
  • Mixture of Experts (MOE)which activates only certain parts of the model according to requests, thus optimizing the allocation of resources.
  • Parallel training and new data storage approacheswhich minimize the duplication of useless information.

Finally, theOptimization of inferencethat is to say the use of production models is a strategic axis for cost reduction. Solutions like Tensorrt, Onnx Runtime or Jax make it possible to execute the models more effectively, while dedicated infrastructure, as AWS Inferentia or Nvidia Triton serversreduce their energy consumption.

Thanks to these advances, it is estimated that the consumption of calculation by task of AI could decrease from 30 to 50 % by 2026which would constitute a turning point for the economic viability of AI applications.

An unstable balance between costs and explosion of demand

If progress in terms of hardware and software optimization makes it possible to hope for a reduction in the unit costs of the calculation, this dynamic is offset by A continuous explosion of demand in AI. Three factors play in favor of a sustained increase in computing power needs:

  1. Increased model size : each generation of LLM requires 10 to 100 times more power that the previous one, even if the communication around Deepseek could suggest
  2. AI is spreading to all sectors : From cloud computing to SaaS applications, including industrial automation and consumer services.
  3. The rise of autonomous agents : the arrival of models capable of performing loop tasks without supervision (Autogpt, Devin AI) could multiply by ten calculation consumption.

Thus, even if technological evolution makes it possible to reduce the cost of calculation per unit, the volume of use largely compensates for these gains. The economic accessibility of the generative will therefore depend on the capacity of the market to balance these opposite trends.

The generative AI is based today on an economic model under tension:

  • The material costs remain extremely highdue to a limited supply of components and an exponential demand.
  • Nvidia still dominates the marketbut alternatives are gradually emerging.
  • Algorithmic optimization makes it possible to partially compensate for these constraintsreducing calculation power requirements.
  • The explosion of demand could however absorb these gainsnow a constant pressure on costs.

In the medium term, only a profound transformation of infrastructure and training methods will ensure A sustainable democratization of AIwithout compromising its profitability.