Quantity: optimize AI models to reduce their cost

With our partner Salesforce, unify sales, marketing and customer service. Accele your growth!

Definition of quantity

Quantity is a technique to Reduce digital accuracy of calculations Carried out by an AI model, which decreases its memory consumption and accelerates its execution without significant loss of performance.

Why is quantity crucial?

  • She reduces calculation power requirements without altering the quality of the results.
  • It allows you to perform models on Less powerful deviceslike smartphones.

Quantity

  • Passage from FP32 to int8 : replace the values ​​in 32 floating bits (FP32) by values ​​in 8 whole bits (int8).
  • Dynamic quantity vs static : Adapt the accuracy according to use to avoid loss of quality.

Concrete examples

πŸ”Ή Mistral 7b and Gemma 7b : Use quantity to compete with larger models.
πŸ”Ή Meta Llama 2-Chat : offers quantified versions to facilitate less powerful GPU inference.

Benefits and challenges of quantity

Benefits Challenge
πŸš€ Faster execution ❗ Risk of loss of precision
πŸ”‹ Reduced energy consumption βš™οΈ requires careful adjustment
πŸ“± Possible mobile deployment πŸ”„ Compatibility with certain limited architectures

The future of quantity

βœ… Hybrid techniques Combining quantity and sparsity.
βœ… Unified standards To facilitate integration on all hardwares.
βœ… Large adoption on mobile and edge computing For on -board AI.