As explained in my introduction to Machine Learning quantization,
important restrictions apply to operations performed on quantized inputs.
First, additions between the integer mantissa of quantized inputs can only be performed if they are in the same scale.
This comes from the representation of the quantized numbers:
$a = (n - zeropoint_a) * scale_a$
$b = (m - zeropoint) * scale_b$
$a$ and $b$ integer mantissa can only be added if $scale_a == scale_b$, allowing us to write directly:
$a + b = (n - zeropoint_a + m - zeropoint_b) * scale_a$
Intuitively, this is analog to say that you cannot add two quantities expressed in different units (like bytes and kilobytes) without converting one
number representation to the other.
(more…)