Hi

I am David Corvoysier, versatile developer and open Source enthusiast.

Aligning quantization scales before incompatible operations

30 May 2023 by David Corvoysier

As explained in my introduction to Machine Learning quantization, important restrictions apply to operations performed on quantized inputs.

First, additions between the integer mantissa of quantized inputs can only be performed if they are in the same scale.

This comes from the representation of the quantized numbers:

$a = (n - zeropoint_a) * scale_a$

$b = (m - zeropoint) * scale_b$

$a$ and $b$ integer mantissa can only be added if $scale_a == scale_b$, allowing us to write directly:

$a + b = (n - zeropoint_a + m - zeropoint_b) * scale_a$

Intuitively, this is analog to say that you cannot add two quantities expressed in different units (like bytes and kilobytes) without converting one number representation to the other.

(more…)

Resolve quantization scales after an operation

29 May 2023 by David Corvoysier

As explained in my introduction to Machine Learning quantization, the inputs, weights and outputs of a quantized operation are quantized each with a different scale.

In the same post, I explain how these scales can be folded into a single output scale, allowing the operation to be performed on the integer mantissa of the quantized inputs and weights:

$scale_{folded} = \frac{scale_{out}}{scale_{in} . scale_{w}}$

In another post I explain how heterogenous input scales could be converted to a fixed-point representation and aligned before the operation, resulting in yet another implicit scale expressed as a power-of-two that needs to be applied to the output scale.

In this post I explain how these output scales can be applied using integer arithmetics only.

(more…)

Fixed-point representation for quantization

26 May 2023 by David Corvoysier

As explained in my introduction to Machine Learning quantization, the quantization of a ML model produces a graph of operations applied on quantized tensors.

Quantized tensors are actually integer tensors that share the same float scale and integer zero-point.

The implementation of the quantized operations is device-specific.

One of the main design decision is how the inputs, weights and output float scales are propagated and applied in the quantized graph.

In two other posts I will explain how is is possible to use integer arithmetic operators for that purpose if the scales are represented as fixed-point numbers.

This posts is a brief introduction to the fixed-point representation and to the fixed-point arithmetic operators.

(more…)

A brief introduction to Machine Learning models quantization

25 May 2023 by David Corvoysier

Even before the development of Large Language Models (LLM), the increasing memory and computing requirements of Deep Neural Networks (DNN) has been a concern.

Functionally, DNN are graphs of arithmetic operations: the inputs are fed at the stem and the chain of operations produces the outputs at the head.

From an implementation perspective, the operations are performed on floating point numbers, which are a digital representation of decimal numbers composed of a mantissa and an exponent:

\[x = mantissa . 2^{exponent}\]

(more…)

Posts by Categories

Linux

System

Machine Learning

Development