Quantity: optimize AI models to reduce their cost

With our partner Salesforce, unify sales, marketing and customer service. Accele your growth!

Definition of quantity

Quantity is a technique to Reduce digital accuracy of calculations Carried out by an AI model, which decreases its memory consumption and accelerates its execution without significant loss of performance.

Why is quantity crucial?

  • She reduces calculation power requirements without altering the quality of the results.
  • It allows you to perform models on Less powerful deviceslike smartphones.

Quantity

  • Passage from FP32 to int8 : replace the values ​​in 32 floating bits (FP32) by values ​​in 8 whole bits (int8).
  • Dynamic quantity vs static : Adapt the accuracy according to use to avoid loss of quality.

Concrete examples

🔹 Mistral 7b and Gemma 7b : Use quantity to compete with larger models.
🔹 Meta Llama 2-Chat : offers quantified versions to facilitate less powerful GPU inference.

Benefits and challenges of quantity

Benefits Challenge
🚀 Faster execution ❗ Risk of loss of precision
🔋 Reduced energy consumption ⚙️ requires careful adjustment
📱 Possible mobile deployment 🔄 Compatibility with certain limited architectures

The future of quantity

Hybrid techniques Combining quantity and sparsity.
Unified standards To facilitate integration on all hardwares.
Large adoption on mobile and edge computing For on -board AI.