<?xml version="1.0"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Kaizou</title>
    <link>http://www.kaizou.org/</link>
    <atom:link href="http://www.kaizou.org/feed/index.xml" rel="self" type="application/rss+xml" />
    <description>Technology blog about Web development and Open Source</description>
    <language>en-us</language>
    <pubDate>Sun, 11 Jun 2023 10:25:07 +0000</pubDate>
    <lastBuildDate>Sun, 11 Jun 2023 10:25:07 +0000</lastBuildDate>

    
    <item>
      <title>
          <![CDATA[
          Aligning quantization scales before incompatible operations
          ]]>
      </title>
      <link>http://www.kaizou.org/2023/05/quantization-scales-alignment.html</link>
      <pubDate>Tue, 30 May 2023 12:00:00 +0000</pubDate>
      <author>kaizouman@kaizou.org (David Corvoysier)</author>
      <guid>http://www.kaizou.org/2023/05/quantization-scales-alignment</guid>
      <description>
          <![CDATA[
          <p>As explained in my introduction to <a href="/2023/05/ml-quantization-introduction#quantized-linear-operations">Machine Learning quantization</a>,
 important restrictions apply to operations performed on quantized inputs.</p>

<p>First, additions between the integer mantissa of quantized inputs can only be performed if they are in the same scale.</p>

<p>This comes from the representation of the quantized numbers:</p>

<p>$a = (n - zeropoint_a) * scale_a$</p>

<p>$b = (m - zeropoint) * scale_b$</p>

<p>$a$ and $b$ integer mantissa can only be added if $scale_a == scale_b$, allowing us to write directly:</p>

<p>$a + b = (n - zeropoint_a + m - zeropoint_b) * scale_a$</p>

<p>Intuitively, this is analog to say that you cannot add two quantities expressed in different units (like bytes and kilobytes) without converting one
number representation to the other.</p>

<!--more-->

<p>The same kind of restriction can also be extended to operations that combine the channels of the inputs, such as the Matrix Multiplication or the
Convolution.</p>

<p>For such operations, the channels must be all in the same scale: in other words, the inputs of these operations must be quantized per-tensor.</p>

<p>The first restriction is a major issue for all Machine Learning models that are not purely sequential. In other words it is a major issue for all models
of the 2020’s, as they all include parallel branches that are eventually merged with an addition layer.</p>

<p>The second restriction used to be rather harmless: most models used to have very homogeneous activations, allowing a lossless quantization to 8-bit per-tensor.</p>

<p>This changed with the introduction of Transformer models, whose activation ranges can vary with a factor from 1 to 100 between channels, making
per-tensor quantization less efficient.</p>

<p>On devices that support float arithmetics, not being able to use directly the integer mantissa is hardly a problem, except maybe for efficiency.</p>

<p>On devices supporting only integer arithmetics this is a serious issue.</p>

<p>In the next paragraphs I will detail a method to align inputs using only integer operations.</p>

<h2 id="explicitly-apply-input-scale-using-fixed-point-arithmetics">Explicitly apply input scale using fixed-point arithmetics</h2>

<p>In a previous post, I introduced the <a href="/2023/05/quantization-fixed-point">fixed-point representation</a> and explained how it relates to quantization.</p>

<p>Going back to our problem, we see immediately that if the scales of the inputs were power-of-two’s, then the inputs
could be interpreted as fixed-point numbers, and it would become trivial to align them.</p>

<p>Here comes the trick: it is actually not that difficult to obtain a fixed-point representation of the inputs, even
with a scale that is not a power-of-two.</p>

<p>As a reminder, a quantized number is represented as:</p>

<p>$x = (n - zeropoint) * scale$</p>

<p>Our goal here is to obtain a fixed-point representation of $x$.</p>

<p>The thing is: fixed-point arithmetic operations produce fixed-point numbers, and the first term is already an 8-bit integer,
i.e. a fixed-point with zero fractional bits, so all we have to do is to make sure the scale is a fixed-point number.</p>

<p>Since the inputs are quantized to 8-bit anyway, an 8-bit mantissa is enough to accurately represent a <code class="language-plaintext highlighter-rouge">float32</code> scale, so
we only need to keep the 8-bit most significant bits of the scale mantissa.</p>

<p>You can refer to this <a href="/2023/05/quantization-fixed-point">fixed-point conversion algorithm</a> for an example of how we can
convert the scale to a fixed-point representation.</p>

<p>Now that we have a fixed-point representation of the scale as:</p>

<p>$scale \approx i_s . 2^{-fracbits_s}$</p>

<p>We can derive an approximated fixed-point representation of $x$:</p>

<p>$x \approx ((n - zeropoint) * i_s). 2^{-fracbits_s}$</p>

<p>Due to the multiplication of the two integers, this representation has a higher bitwidth than the original quantized
number, but it should not be an issue since the resulting mantissa needs to be calculated only when the operation is
 performed, and thus using an intermediate buffer with a larger bitwidth.</p>

<blockquote>
  <p>Note: If that is an issue, then it could still be reduced using a right bitshift whose magnitude would be evaluated using the
calibration information.</p>
</blockquote>

<h2 id="align-inputs-explicitly-after-converting-them-to-fixed-point">Align inputs explicitly after converting them to fixed-point</h2>

<p>Using the fixed-point scales obtained as specified in the previous paragraph, it is now possible to align
inputs expressed with different scales.</p>

<p>$a \approx ((n - zeropoint_a) * p). 2^{-fracbits_a} = a_i . 2^{-fracbits_a}$</p>

<p>$b \approx ((m - zeropoint_b) * q). 2^{-fracbits_a} = b_i . 2^{-fracbits_b}$</p>

<p>At quantization time, we can evaluate channel-wise the maximum number of fractional bits for the two inputs we
want to combine and produce two relative shifts to be applied to each one of them:</p>

<p>$maxfracbits = max(fracbits_a, fracbits_b)$</p>

<p>$shift_a = fracbits_a - maxfracbits$</p>

<p>$shift_b = fracbits_b - maxfracbits$</p>

<p>Then the sequence of operations before the addition is to:</p>

<ul>
  <li>convert inputs integer mantissa to a fixed-point representation:</li>
</ul>

<p>$a_i = (n - zeropoint_a) * p$</p>

<p>$b_i = (m - zeropoint_b) * q$</p>

<ul>
  <li>align the resulting fixed-point:</li>
</ul>

<p>$a_i = a_i « shift_a$</p>

<p>$b_i = b_i « shift_b$</p>

<ul>
  <li>perform the integer addition</li>
</ul>

<p>$s_i = a_i + b_i$</p>

<p>This produces a fixed-point tensor with an implicit scale of $2^{-maxfracbits}$.</p>

<p>This additional scale needs to be taken into account when quantizing the outputs of the addition.</p>

<p>Mathematically, this means that the scale of the outputs obtained after calibration must be multiplied
by $2^{-maxfracbits}$.</p>

<blockquote>
  <p>Note: as mentioned in a previous note, I will explain in another post how this can be achieved using integer arithmetics
only.</p>
</blockquote>

<h2 id="generalization-to-per-axis-inputs">Generalization to per-axis inputs</h2>

<p>The same kind of alignment can be applied to inputs quantized per-axis when reaching an operation that requires
per-tensor inputs.</p>

<p>Ths only difference is that the maximum number of fractional bits is a scalar value corresponding to the aligned
per-tensor scale:</p>

<p>$maxfracbits = max(fracbits_a)$</p>

          ]]>
      </description>
    </item>
    
    <item>
      <title>
          <![CDATA[
          Resolve quantization scales after an operation
          ]]>
      </title>
      <link>http://www.kaizou.org/2023/05/quantization-scale-out.html</link>
      <pubDate>Mon, 29 May 2023 12:00:00 +0000</pubDate>
      <author>kaizouman@kaizou.org (David Corvoysier)</author>
      <guid>http://www.kaizou.org/2023/05/quantization-scale-out</guid>
      <description>
          <![CDATA[
          <p>As explained in my introduction to <a href="/2023/05/ml-quantization-introduction#quantized-linear-operations">Machine Learning quantization</a>,
 the inputs, weights and outputs of a quantized operation are quantized each with a different scale.</p>

<p>In the same post, I explain how these scales can be folded into a single output scale, allowing the operation to be performed on the integer mantissa
of the quantized inputs and weights:</p>

<p>$scale_{folded} = \frac{scale_{out}}{scale_{in} . scale_{w}}$</p>

<p>In <a href="/2023/05/quantization-scales-alignment">another post</a> I explain how heterogenous input scales could be converted to a fixed-point representation
and aligned before the operation, resulting in yet another implicit scale expressed as a power-of-two that needs to be applied to the output scale.</p>

<p>In this post I explain how these output scales can be applied using integer arithmetics only.</p>

<!--more-->

<h2 id="reminder-how-are-output-scales-applied-in-a-quantized-graph">Reminder: how are output scales applied in a quantized graph</h2>

<p>As a general principle, the last step of a quantized operation is a downscale to reduce the output bitwidth.</p>

<p>When applied to float outputs, the general formula for the downscale is:</p>

<p>$outputs_{uint8} = saturate(round(\frac{outputs_{float32)}}{scale_{out}}) + {zp_{out}})$</p>

<p>For a quantized output of scale $y_{s}$ and zero-point $y_{zp}$.</p>

<p>As explained in my <a href="/2023/05/ml-quantization-introduction#quantized-linear-operations">quantization introduction</a>,
some compatible operations can be applied directly on the integer mantissa of the quantized inputs and weights,
folding the inputs and weights scale into the output scale.</p>

<p>The downscale operation becomes then:</p>

<p>$outputs_{uint8} = saturate(round(\frac{outputs_{int32}}{scale_{folded}}) + zp_{out})$</p>

<p>with $scale_{folded} = \frac{scale_{out}}{scale_{in} . scale_{w}}$</p>

<p>This operation still requires a division and a round that is not easily implemented using integer arithmetic operators.</p>

<h2 id="use-fixed-point-folded-scale-reciprocal-to-obtain-rescaled-fixed-point-outputs">Use fixed-point folded scale reciprocal to obtain rescaled fixed-point outputs</h2>

<p>The idea is to convert the scale to a fixed-point representation to be able to take advantage of integer arithmetic operators
and obtain a fixed-point representation of the downscaled outputs.</p>

<p>Since the fixed-point division is a lossy operation, instead of dividing by the folded output scale, we can multiply by its reciprocal $\frac{1}{scale_{folded}}$.</p>

<p>The first step is to obtain a fixed-point representation of the reciprocal of the folded scale:</p>

<p>$rec_{folded} = to_fixed_point(\frac{scale_{in}.scale_{w}}{scale_{out}}) = rec_{int} . 2^{-fracbits_{rec}}$</p>

<p>You can refer to this <a href="/2023/05/quantization-fixed-point">fixed-point conversion algorithm</a> for an example of how we can
convert the scale to a fixed-point representation.</p>

<p>Then the rescaled outputs are simply evaluated as:</p>

<p>$outputs_{int32} = outputs_{int32}.rec_{folded}$</p>

<h2 id="reduce-the-precision-of-the-fixed-point-rescaled-outputs-using-a-rounded-right-shift">Reduce the precision of the fixed-point rescaled outputs using a rounded right-shift</h2>

<p>The rescaled outputs are represented as a fixed-point number with an implicit scale of $2^{-fracbits_{rec}}$.</p>

<p>To obtain the actual 8-bit integer values corresponding to the original downscale operation, we must apply this implicit
scale.</p>

<p>We use the rounded right-shift operation described in the <a href="/2023/05/quantization-fixed-point">fixed-point introduction post</a></p>

<p>$outputs_{int8} = outputs_{int32} + 2^{fracbits_{rec} - 1}» frac_bits_{rec}$</p>

<p>Then we can apply the zero-point:</p>

<p>$outputs_{uint8} = saturate(outputs_{int8} + zp_{out})$</p>

          ]]>
      </description>
    </item>
    
    <item>
      <title>
          <![CDATA[
          Fixed-point representation for quantization
          ]]>
      </title>
      <link>http://www.kaizou.org/2023/05/quantization-fixed-point.html</link>
      <pubDate>Fri, 26 May 2023 12:00:00 +0000</pubDate>
      <author>kaizouman@kaizou.org (David Corvoysier)</author>
      <guid>http://www.kaizou.org/2023/05/quantization-fixed-point</guid>
      <description>
          <![CDATA[
          <p>As explained in my introduction to <a href="/2023/05/ml-quantization-introduction.html#quantized-linear-operations">Machine Learning quantization</a>,
 the quantization of a ML model produces a graph of operations applied on quantized tensors.</p>

<p>Quantized tensors are actually integer tensors that share the same float scale and integer zero-point.</p>

<p>The implementation of the quantized operations is device-specific.</p>

<p>One of the main design decision is how the inputs, weights and output float scales are propagated and applied in the quantized graph.</p>

<p>In two other posts I will explain how is is possible to use integer arithmetic operators for that purpose if the scales are represented
as fixed-point numbers.</p>

<p>This posts is a brief introduction to the fixed-point representation and to the fixed-point arithmetic operators.</p>

<!--more-->

<h2 id="fixed-point-representation">Fixed-point representation</h2>

<p>Before the introduction of the floating point representation, decimal values were expressed using a fixed-point representation.</p>

<p>This representation also uses a mantissa and an exponent, but the latter is implicit: it defines the number of bits in the mantissa
dedicated to the fractional part of the number.</p>

<p>The minimum non-zero value that can be represented for a given number of fractional bits is $2^{-fracbits}$.</p>

<p>For instance, with three fractional bits, the smallest float number than can be represented is $2^{-3} = 0.125$.</p>

<p>Below is an example of an unsigned 8-bit fixed-point number with 4 fractional bits.</p>

<pre class="diagram">
.------------------------------------.  
|  0   1   0   1 |  1   1   1    0   |
.------------------------------------.  
|  integer bits  |  fractional bits  |
.------------------------------------.  
|  3   2   1   0 | -1  -2  -3   -4   |
'------------------------------------'  
</pre>

<p>The decimal value of that number is: $2^{2} + 2^{0} + 2^{-1} + 2^{-2} + 2^{-3} = 5.875$</p>

<p>The precision of the representation is directly related to the number of fractional bits.</p>

<p>Below are some more examples of PI represented with unsigned 8-bit fixed-point numbers different fractional bits:</p>

<table>
  <thead>
    <tr>
      <th>float</th>
      <th>frac_bits</th>
      <th>mantissa</th>
      <th>binary</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>3.140625</td>
      <td>6</td>
      <td>201</td>
      <td>11001001</td>
    </tr>
    <tr>
      <td>3.15625</td>
      <td>5</td>
      <td>101</td>
      <td>01100101</td>
    </tr>
    <tr>
      <td>3.125</td>
      <td>4</td>
      <td>50</td>
      <td>00110010</td>
    </tr>
    <tr>
      <td>3.125</td>
      <td>3</td>
      <td>25</td>
      <td>00011001</td>
    </tr>
    <tr>
      <td>3.25</td>
      <td>2</td>
      <td>13</td>
      <td>00001100</td>
    </tr>
    <tr>
      <td>3.0</td>
      <td>1</td>
      <td>6</td>
      <td>00000110</td>
    </tr>
  </tbody>
</table>

<h2 id="obtaining-a-fixed-point-representation-of-a-float">Obtaining a fixed-point representation of a float</h2>

<p>As a reminder, a float number is represented as:</p>

\[x = mantissa * 2^{exponent}\]

<p>Our goal here is to obtain a fixed-point representation of $x$.</p>

<p>Technically, we could directly take the float mantissa, but it is 24-bit, with a high risk of overflows in the downstream
fixed-point operations.</p>

<p>For the range of numbers used in Machine Learning, an 8-bit mantissa is usually enough to accurately represent a <code class="language-plaintext highlighter-rouge">float32</code> number.</p>

<p>As a consequnce, we only need to keep the 8 most significant bits of the mantissa, which effectively means quantizing the float to
8-bit with the power-of-two scale that minimizes the precision loss.</p>

<p>This can be achieved in several ways depending on the level of abstraction you are comfortable with: below is an algorithm
relying only on high-level mathematical operations.</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>

<span class="k">def</span> <span class="nf">to_fixed_point</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">bitwidth</span><span class="p">,</span> <span class="n">signed</span><span class="o">=</span><span class="bp">True</span><span class="p">):</span>
    <span class="s">"""Convert a number to a FixedPoint representation

    The representation is composed of a mantissa and an implicit exponent expressed as
    a number of fractional bits, so that:

    x ~= mantissa . 2 ** -frac_bits

    The mantissa is an integer whose bitwidth and signedness are specified as parameters.

    Args:
        x: the source number or array


    """</span>
    <span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">x</span><span class="p">,</span> <span class="n">np</span><span class="p">.</span><span class="n">ndarray</span><span class="p">):</span>
        <span class="n">x</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">array</span><span class="p">(</span><span class="n">x</span><span class="p">)</span>
    <span class="c1"># Evaluate the number of bits available for the mantissa
</span>    <span class="n">mantissa_bits</span> <span class="o">=</span> <span class="n">bitwidth</span> <span class="o">-</span> <span class="mi">1</span> <span class="k">if</span> <span class="n">signed</span> <span class="k">else</span> <span class="n">bitwidth</span>
    <span class="c1"># Evaluate the number of bits required to represent the whole part of x
</span>    <span class="c1"># as the power of two enclosing the absolute value of x
</span>    <span class="c1"># Note that it can be negative if x &lt; 0.5
</span>    <span class="n">whole_bits</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">ceil</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">log2</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">abs</span><span class="p">(</span><span class="n">x</span><span class="p">))).</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">int32</span><span class="p">)</span>
    <span class="c1"># Deduce the number of bits required for the fractional part of x
</span>    <span class="c1"># Note that it can be negative if the whole part exceeds the mantissa
</span>    <span class="n">frac_bits</span> <span class="o">=</span> <span class="n">mantissa_bits</span> <span class="o">-</span> <span class="n">whole_bits</span>
    <span class="c1"># Evaluate the 'scale', which is the smallest value that can be represented (as 1)
</span>    <span class="n">scale</span> <span class="o">=</span> <span class="mf">2.</span> <span class="o">**</span> <span class="o">-</span><span class="n">frac_bits</span>
    <span class="c1"># Evaluate the minimum and maximum values for the mantissa
</span>    <span class="n">mantissa_min</span> <span class="o">=</span> <span class="o">-</span><span class="mi">2</span> <span class="o">**</span> <span class="n">mantissa_bits</span> <span class="k">if</span> <span class="n">signed</span> <span class="k">else</span> <span class="mi">0</span>
    <span class="n">mantissa_max</span> <span class="o">=</span> <span class="mi">2</span> <span class="o">**</span> <span class="n">mantissa_bits</span> <span class="o">-</span> <span class="mi">1</span>
    <span class="c1"># Evaluate the mantissa by quantizing x with the scale, clipping to the min and max
</span>    <span class="n">mantissa</span> <span class="o">=</span> <span class="n">np</span><span class="p">.</span><span class="n">clip</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="nb">round</span><span class="p">(</span><span class="n">x</span> <span class="o">/</span> <span class="n">scale</span><span class="p">),</span> <span class="n">mantissa_min</span><span class="p">,</span> <span class="n">mantissa_max</span><span class="p">).</span><span class="n">astype</span><span class="p">(</span><span class="n">np</span><span class="p">.</span><span class="n">int32</span><span class="p">)</span>
    <span class="k">return</span> <span class="n">mantissa</span><span class="p">,</span> <span class="n">frac_bits</span>


</code></pre></div></div>

<p>The algorithm above produces a fixed-point representation of $x$ such that:</p>

\[x_{float} \approx x_{int} . 2^{-x_{fracbits}}\]

<h2 id="fixed-point-addition-or-subtraction">Fixed-point addition (or subtraction)</h2>

<p>The reason why the fixed-point representation comes to mind when it comes to quantization is that it has exactly the same
restrictions regarding the addition of numbers: they must be expressed using the same amount of fractional bits.</p>

<p>The addition can then be performed directly on the underlying integer.</p>

<p>The resulting sum is a fixed-point number with the same fractional bits. It is exact unless it overflows.</p>

<p>What is really interesting here is that the alignment of fixed-point numbers is trivial: it can just be performed
using a left bitshift.</p>

<p>Example:</p>

<p>The following fixed-point (values, fractional bits) pairs represent the following float values:</p>

<p>$a: (84, 3) = 84 * 2^{-3} = 10.5​$</p>

<p>$b: (113, 4) = 113 * 2^{-4} = 7.0625$</p>

<p>​
Before summing a and b, we need to shift $a$ to the left to align it with $b$:</p>

<p>$s = a + b = 84 « 1 + 113 = 168 + 113 = 281$</p>

<p>The sum is a fixed-point number with 4 fractional bits:</p>

<p>$s: (281, 4) = 281 * 2^{-4} = 17.5625​$</p>

<h2 id="fixed-point-multiplication">Fixed-point multiplication</h2>

<p>The multiplication of two fixed-point numbers can be performed directly on the underlying integer numbers.</p>

<p>The resulting product is a fixed-point number with a number of fractional bits corresponding to the sum of the fractional bits of the inputs. It is exact unless it overflows.</p>

<p>Example:</p>

<p>Going back to our two numbers:</p>

<p>$a: (84, 3) = 84 * 2^{-3} = 10.5​$</p>

<p>$b: (113, 4) = 113 * 2^{-4} = 7.0625$</p>

<p>Their fixed-point product is:</p>

<p>$p = a.b = (84 . 113, 3 + 4) = (9492, 7) = 74.15625$</p>

<h2 id="fixed-point-downscale">Fixed-point downscale</h2>

<p>The mantissa of the resulting product of two fixed-point numbers can go very quickly, which would eventually lead to an overflow when chaining multiple operations.</p>

<p>It is therefore common to ‘downscale’ the result of a multiplication using a right-shift.</p>

<p>Example:</p>

<p>Going back to our previous product:</p>

<p>$p = a.b = (84 . 113, 3 + 4) = (9492, 7) = 74.15625$</p>

<p>It can be downscaled to fit in 8-bit by shifting right and adjusting the fractional bits:</p>

<p>$downscale(p) = p » 6 = (148, 1) = 74$</p>

<p>Note that the right-shift operation always perform a floor, which may lead to a loss of precision.</p>

<p>For that reason, it is often implemented as a ‘rounded’ right-shift by adding $2^{n-1}$ before shifting of $n$.</p>

<blockquote>
  <p>Note: this is mathematically equivalent to adding $0.5$ to $\frac${x}{2^{n}}$ before taking its floor.</p>
</blockquote>

<h2 id="fixed-point-division">Fixed-point division</h2>

<p>The division of two fixed-point numbers can be performed directly on the underlying integer numbers.</p>

<p>The resulting quotient is a fixed-point number with a number of fractional bits corresponding to the subtraction of the fractional bits of the inputs. It is usually not exact.</p>

<p>Example:</p>

<p>Going back to our two numbers:</p>

<p>$a: (84, 3) = 84 * 2^{-3} = 10.5​$</p>

<p>$b: (113, 4) = 113 * 2^{-4} = 7.0625$</p>

<p>Their fixed-point division is:</p>

<p>$p = \frac{b}{a} = (\frac{113}{84}, 4 - 3) = (1, 1) = 0.5$</p>

<p>A possible mitigation is to left-shift the dividend before the division to increase its precision: the resulting quotient will in turn have an increased precision.</p>

<p>$b: (113, 4) « 3 = (113 « 3, 4 + 3) = (904, 7) = 904 * 2^{-7} = 7.0625$</p>

<p>$p = \frac{b}{a} = (\frac{904}{84}, 7 - 3) = (10, 4) = 0.625$</p>

          ]]>
      </description>
    </item>
    
    <item>
      <title>
          <![CDATA[
          A brief introduction to Machine Learning models quantization
          ]]>
      </title>
      <link>http://www.kaizou.org/2023/05/machine-learning-quantization-introduction.html</link>
      <pubDate>Thu, 25 May 2023 12:00:00 +0000</pubDate>
      <author>kaizouman@kaizou.org (David Corvoysier)</author>
      <guid>http://www.kaizou.org/2023/05/machine-learning-quantization-introduction</guid>
      <description>
          <![CDATA[
          <p>Even before the development of Large Language Models (LLM), the increasing
memory and computing requirements of Deep Neural Networks (DNN) has been a concern.</p>

<p>Functionally, DNN are graphs of arithmetic operations: the inputs are fed at the
stem and the chain of operations produces the outputs at the head.</p>

<p>From an implementation perspective, the operations are performed on floating point
numbers, which are a digital representation of decimal numbers composed of a mantissa and an
exponent:</p>

\[x = mantissa . 2^{exponent}\]

<!--more-->

<p>The 32-bit floating point representation if the most common, as it allows to represent
numbers in a range that is sufficient for most operations. The <code class="language-plaintext highlighter-rouge">float32</code> mantissa is composed of
24-bit (including sign), and the exponent is 8-bit.</p>

<p>Each operation performed at an operating node in the inference device requires its inputs
to be transferred from either a static memory location or the previous processing nodes.</p>

<p>The cost of these transfers adds-up with the cost of the operations themselves.</p>

<p>The DNN terminology for operation data is “weights” for static inputs and “activations” for dynamic inputs/outputs.</p>

<p>Note: the outputs of an operation are designated as “activations” even if it is not actually an activation.</p>

<p>The process of representating the n-bit weights and activations of a DNN into a smaller
number of bits is called quantization<sup id="fnref:quant" role="doc-noteref"><a href="#fn:quant" class="footnote" rel="footnote">1</a></sup>.</p>

<p>It is typically used in DNN to “quantize” <code class="language-plaintext highlighter-rouge">float32</code> weights and activations into 8-bit integer.</p>

<p>This brings several benefits:</p>
<ul>
  <li>reducing the weights to 8-bit requires 4 times less memory on the device to store them,</li>
  <li>reducing the activations to 8-bits reduces the amount of data exchanged between nodes, which impacts latency,</li>
  <li>using 8-bit instead of 32-bit inputs for an operation improves vectorization (multiple data processed at the same time for a single operation),</li>
  <li>all standard integer arithmetic operations but the division are faster than their floating point counterpart,</li>
  <li>GPU devices may include specific mechanisms to process 8-bit inputs (like NVIDIAS 8-bit Tensor cores).</li>
</ul>

<h2 id="a-mathematical-formulation-of-linear-quantization">A mathematical formulation of linear quantization</h2>

<p>The most widespread type of quantization is the <em>linear</em> or <em>affine</em> quantization scheme first introduced in tensorflow lite<sup id="fnref:qtf" role="doc-noteref"><a href="#fn:qtf" class="footnote" rel="footnote">2</a></sup>.</p>

<p>The representation of a linearly quantized number is composed of:</p>
<ul>
  <li>an integer mantissa,</li>
  <li>a float scale,</li>
  <li>an integer zero-point.</li>
</ul>

\[x = (mantissa - zeropoint).scale\]

<p>The scale is used to project back the integer numbers into a float representation.</p>

<p>The zero point corresponds to the value that zero takes in the target representation.</p>

<p>If we compare that formula with the floating point representation one can see
immediately that each floating point number can be represented exactly with the same
mantissa, a scale corresponding to the exponent and a null zero-point.</p>

<p>Of course this representation would be very inefficient because it would require two
integer and a float to represent each number.</p>

<h2 id="applicability-of-quantization-to-machine-learning">Applicability of quantization to Machine-Learning</h2>

<p>When quantizing Machine-Learning models, one can take advantage of the fact that
the training produces weights and activations stay within reasonably stable ranges
for a given operation.</p>

<p>This comes from several empirical techniques used to improve convergence:</p>
<ul>
  <li>weights initialization<sup id="fnref:qinit" role="doc-noteref"><a href="#fn:qinit" class="footnote" rel="footnote">3</a></sup>,</li>
  <li>weights and/or activation regularization<sup id="fnref:qreg" role="doc-noteref"><a href="#fn:qreg" class="footnote" rel="footnote">4</a></sup>,</li>
  <li>explicit normalization layers<sup id="fnref:qbn" role="doc-noteref"><a href="#fn:qbn" class="footnote" rel="footnote">5</a></sup>.</li>
</ul>

<p>This means that the weights and activations tensors for a specific operation can be represented
using the same scale and zero-point, thus leading to a very compact representation.</p>

<blockquote>
  <p>Note: this is why quantization is often categorized as a form of compression, although unlike most
compression techniques, it produces numbers that can be directly used for arithmetic operations.</p>
</blockquote>

<p>There are various subtypes of quantization.</p>

<p>The first two subtypes are related to the dimensions of the scale and zero-point:</p>
<ul>
  <li><em>per-tensor</em> quantization uses a single scalar value for scale and zero-point for a whole
tensor of weights or activations,</li>
  <li><em>per-axis</em> quantization uses a vector of scales and zero-points whose length corresponds
to a single axis of the tensor (typically the <em>channels</em> or <em>embeddings</em> axis).</li>
</ul>

<p>The second subtypes are related to the <em>symmetry</em> of the resulting quantized numbers:</p>
<ul>
  <li><em>symmetric</em> quantization assumes that the quantization range is symmetric, which leads to a zero-point equal
to zero and a signed integer representation of the values,</li>
  <li><em>asymmetric</em> quantization does not assume anything, and zero-point is typically non-null.</li>
</ul>

<p>Weights are typically quantized symmetrically per-axis.</p>

<p>Activations are typically quantized asymmetrically, most of the time per-tensor.</p>

<h2 id="quantizing-a-float-tensor">Quantizing a float tensor</h2>

<p>The first step to quantize a float tensor is to choose the quantization range, i.e. the
minimum and maximum float values one wants to represent: $[Min, Max]$.</p>

<p>Since the weights are constant tensors, they are typically quantized using the mimimum and maximum
values of the tensor, globally or along the channel axis.</p>

<p>Evaluating the quantization range of the activations is more difficult as they are dependent of the inputs
of the previous operation. Their range is therefore evaluated globally inside a model, as explained in the next
paragraph.</p>

<p>For a target bit width of n for the mantissa, one evaluates the scale as:</p>

\[scale = \frac{Max - Min}{2^n - 1}\]

<p>The zero-point is then deduced from the scale to make sure that $Min$ is mapped to the
lowest integer value and $Max$ to the highest integer value.</p>

<p>This leads to the following formulas for signed/unsigned representations:</p>

<ul>
  <li>unsigned: $zeropoint = round(\frac{Min}{scale})$</li>
  <li>signed: $zeropoint = round(\frac{Min}{scale}) - 2^{n - 1}$</li>
</ul>

<p>The quantization of a float tensor is then:</p>

\[mantissa = saturate(round(\frac{x}{scale}) + zeropoint)\]

<p>Again, the saturation depends of the signed of the target representation:</p>
<ul>
  <li>unsigned: $[0, 2n - 1]$,</li>
  <li>signed: $[-2^{n-1}, 2^{n-1} - 1]$.</li>
</ul>

<p>Note that the zero-point always has the same signedness as the mantissa.</p>

<h2 id="quantizing-a-machine-learning-model">Quantizing a Machine Learning Model</h2>

<p>As mentioned before, a Machine Learning model uses two types of tensors: weights and activations.</p>

<p>The static weights need to be quantized only once, each weight tensor producing three new static
tensors for the mantissa, scale and zeropoint.</p>

<p>Since weights can contain positive and negative values, they are typically quantized into <code class="language-plaintext highlighter-rouge">int8</code>.</p>

<pre class="diagram">
             .----------.
             |  Weights |
             |  float32 |
             | constant |
             +----+-----+
            /     |      \
           v      v       v
.----------. .----------. .------------.
|  Weights | |  scale   | | zero-point |
|   int8   | | float32  | |    int8    |
| constant | | constant | |  constant  |
'----------' '----------' '------------'
</pre>

<p>The dynamic activations on the other hand need to be quantized on-the-fly by inserting the quantization
operations in the graph:</p>

<ul>
  <li>evaluate the quantization range,</li>
  <li>quantize.</li>
</ul>

<p>The evaluation of the quantization range is costly because is requires a full-scan of the activations tensor,
which is a bottleneck for parallel processing.</p>

<p>For that reason, the activations quantization ranges are often evaluated before the inference on a selected
number of samples: this is called the calibration of the quantized model.</p>

<blockquote>
  <p>Note: the operations that clip their outputs like the bounded ReLU are an exception and don’t require an
explicit calibration, since the exact range of their outputs is known in advance.</p>
</blockquote>

<p>After calibration, each activation float variable is mapped to an integer variable and two static tensors.</p>

<pre class="diagram">
               .-----------.
              | Activations |
              |   float32   |
              |  variable   |
              /'-----+-----'\
             /       |       \
            v        v        v
 .-----------.  .----------. .------------.
| Activations | |  scale   | | zero-point |
|   (u)int8   | | float32  | |  (u)int8   |
|  variable   | | constant | |  constant  |
 '-----------'  '----------' '------------'
</pre>

<blockquote>
  <p>Note: the activations can be quantized into either <code class="language-plaintext highlighter-rouge">int8</code> or <code class="language-plaintext highlighter-rouge">uint8</code>. It is simpler to quantize them to <code class="language-plaintext highlighter-rouge">uint8</code>
if they correspond to the output of a ReLU operation, since zero-point will be in that case 0.</p>
</blockquote>

<p>Conceptually, the resulting graph is a clone of the original graph where all compatible operations are replaced
by a version that operates on tuples of (mantissa, scale, zero-point).</p>

<p>Separating the constant and variable tensors, this leads to the following graphs:</p>

<pre class="diagram">
              .---------.                   .--------.  .----------. .------------.
             |  Inputs   |                 |  Inputs  | |  scale   | | zero-point |
             |  float32  |                 |  (u)int8 | | float32  | |  (u)int8   |
             | variable  |                 | variable | | constant | |  constant  |
              '----+----'                   '----+---'  '-----+----' '------+-----'
                   |             .               '------------+-------------'
.----------.       v             |\      .----------.         |
| Weights  |   .------.       +--' \     | Weights  |         |
| float32  +-&gt;| Matmul |      +--. /     |  int8    +-.       |
| constant |   '---+--'          |/      | constant | |       |         .------------.
'----------'       |             '       '----------' |       |         |   scale    |
                   v                                  |       |       .-+  float32   |
              .---------.                .----------. |       v       | |  constant  |
             |  Outputs  |               |  scale   | |   .-------.   | '------------'
             |  float32  |               | float32  +-+-&gt;| QMatMul |&lt;-+
             |  variable |               | constant | |   '---+---'   | .------------.
              '---------'                '----------' |       |       | | zero-point |
                                                      |       |       '-+  (u)int8   |
                                         .----------. |       |         |  constant  |
                                         |zero-point| |       |         '------------'
                                         |  int8    +-'       |
                                         | constant |         |
                                         '----------'         |
                                                              v
                                                          .--------.
                                                         | Outputs  |
                                                         |  (u)int8 |
                                                         | variable |
                                                          '--------'
</pre>

<h2 id="quantized-linear-operations">Quantized linear operations</h2>

<p>Most basic Machine Learning operations can be performed using integer arithmetics, which makes them compatible
with linearly quantized inputs.</p>

<p>This does not mean however that one can just replace all floating point operations by an equivalent integer operation:
 the scale and zeropoint of all weights and activations must be taken into account to produce an equivalent graph.</p>

<p>Also, there are two important restrictions with respect to the inputs quantization:</p>
<ul>
  <li>additions between the integer mantissa of inputs can only be performed if they are in the same scale,</li>
  <li>operations that combine the integer mantissa of inputs channels can only be performed if the channels are in the same scale,
i.e if the inputs are quantized per-tensor.</li>
</ul>

<blockquote>
  <p>Note: in <a href="/2023/05/quantization-scales-alignment.html">another post</a> I explain how it is possible to add two inputs quantized with different scales
by adding an explicit alignment operation beforehand.</p>
</blockquote>

<p>From an implementation perspective, operations accepting linearly quantized inputs are very specific to each device.</p>

<p>In the next paragraph, I will detail a possible implementation of a quantized matrix multiplication.</p>

<h2 id="wrap-up-example-a-quantized-matrix-multiplication">Wrap-up example: a quantized matrix multiplication</h2>

<p>Let’s consider a simple matrix multiplication of an $X(I, J)$ input by a $W(J, K)$ set of weights:</p>

<p>$Y = X.W$</p>

<p>Since the matrix multiplication multiplies all inputs along the dimension of length $J$ and adds them,
 $X$ cannot be quantized per-axis, because it will lead to the addition of quantized numbers that are not in the same scale.</p>

<p>There is no such restriction on $W$, since the filters along $K$ are all applied independently.</p>

<p>After quantization of the weights per-axis and calibration of the inputs per-tensor, we obtain:</p>

<p>$X \approx X_s * (X_q - X_{zp})$, with $X_s()$, $X_q(I, J)$, $X_{zp}()$</p>

<p>$W \approx W_s * (W_q - W_{zp})$, with $W_s(K)$, $W_q(J, K)$, $W_{zp}(K)$</p>

<p>We can also approximate the outputs per-axis, assuming that the next operation does not require per-tensor inputs.</p>

<p>$Y \approx Y_s * (Y_q - Y_{zp})$, with $Y_s(K)$, $Y_q(I, K)$, $Y_{zp}(K)$</p>

<p>The operation is summarized on the graph below (note that the intermediate integer output Y_q can be implicit):</p>

<pre class="diagram">
    .-----.  .-----. .------.
   |  X_q  | | X_s | | X_zp |
    '--+--'  '--+--' '--+---'
       '--------+-------'
.-----.         |
| W_q +-.       |
'-----' |       |          .-----.
        |       v        .-+ Y_s |
.-----. |  .---------.   | '-----'
| W_s +-+-&gt;| QMatMul |&lt;--+
'-----' |  '----+----'   | .-----.
        |       |        '-+ Y_zp|
.-----. |       |          '-----'
|W_zp +-'       |(Y_q)
'-----'         |
                v
               .-.
              | Y |
               '-'
</pre>

<p>Going through the graph step by step:</p>

<ul>
  <li>evaluate the matrix multiplication of the quantized inputs to produce float outputs</li>
</ul>

<p>$O = X_s * (X_q - X_{zp}) . W_s * (W_q - W_{zp})$</p>

<ul>
  <li>quantize the float outputs to obtain 8-bit integer outputs</li>
</ul>

<p>$Y_q = saturate(round(\frac{O}{Y_s}) + Y_{zp})$</p>

<ul>
  <li>convert back the 8-bit integer outputs to float outputs</li>
</ul>

<p>$Y \approx Y_s * (Yq - Y_{zp})$</p>

<p>Since $X_s$ is a scalar, and $W_s$ has the same dimension as the outputs last dimension,
the first operation can also be written:</p>

<p>$O = (X_s * W_s) * (X_q - X_{zp}) . (W_q - W_{zp})$</p>

<p>This means that the matrix multiplication can be operated equivalently on integer values,
and the result is a quantized integer number with a scale corresponding to the product of
the inputs and weights scale and a null zero-point.</p>

<p>The quantized sequence of operations is then to:</p>

<ul>
  <li>evaluate the matrix multiplication of the 8-bit integer inputs to produce n-bit integer outputs</li>
</ul>

<p>$O_q = (X_q - X_{zp}) . (W_q - W_{zp})$</p>

<ul>
  <li>convert the n-bit integer outputs to float outputs</li>
</ul>

<p>$O = (X_s * W_s) * O_q$</p>

<ul>
  <li>quantize the float outputs to obtain 8-bit integer outputs</li>
</ul>

<p>$Y_q = saturate(round(\frac{O}{Y_s}) + Y_{zp})$</p>

<ul>
  <li>convert back the 8-bit integer outputs to float outputs</li>
</ul>

<p>$Y \approx Y_s * (Yq - Y_{zp})$</p>

<p>The question that should immediately arise at this stage is why we need another quantization
operation after the matrix multiplication, since we already have a quantized output ?</p>

<p>The reason is simply the bitwidth of the outputs: we need an explicit quantization to make
sure that the results of the integer matrix multiplication fit in 8-bit.</p>

<blockquote>
  <p>Note: when the operation is followed by a bias addition, the biases are typically quantized to
32-bit with a scale precisely equal to $X_s * W_s$ so that they can be added directly to the outputs
before quantizing.</p>
</blockquote>

<p>Going one step further and replacing $O$, since $Y_s$ has the same shape as $X_s * W_s$, we can omit
the third step and write directly:</p>

<ul>
  <li>evaluate the matrix multiplication of the integer inputs to produce n-bit integer outputs</li>
</ul>

<p>$O_q = (X_q - X_{zp}) . (W_q - W_{zp})$</p>

<ul>
  <li>quantize the n-bit integer outputs to obtain 8-bit integer outputs</li>
</ul>

<p>$Y_q = saturate(round(\frac{X_s * W_s}{Y_s} * O_q) + Y_{zp})$</p>

<ul>
  <li>convert back the 8-bit integer outputs to float outputs</li>
</ul>

<p>$Y \approx Y_s * (Yq - Y_{zp})$</p>

<p>This reveals that we can directly ‘downscale’ the integer outputs of the operation with a folded scale
  $F_s = \frac{Y_s}{X_s * W_s}$.</p>

<p>The downscaling operation can be implemented as a float division and a round.</p>

<blockquote>
  <p>Note: I will detail in another post an implementation using only integer arithmetic.</p>
</blockquote>

<p>The simplified graph can be summarized below:</p>

<pre class="diagram">
        .-----.   .------. 
       |  X_q  |  | X_zp |
        '--+--'   '--+---'
           '----+----'
.-----.         |
| W_q +-.       v
'-----' |  .----------.
        +-&gt;|IntMatMul |
.-----. |  '----+-----'
|W_zp +-'       |         .-----.   
'-----'         v       .-+ F_s |  
           .---------.  | '-----'   
           |Downscale|&lt;-+         
           '----+----'  | .-----.      
                v       '-+ Y_zp|     
               .-.        '-----'
              | Y |
               '-'
</pre>

<p>This can be further simplified by removing the zero-points if we assume a symmetric quantization.</p>

<pre class="diagram">
           .-----.  
          |  X_q  | 
           '--+--'  
              |
              v 
.-----.  .----------.
| W_q +-&gt;|IntMatMul |
'-----'  '----+-----'
              |             
              v          
         .---------.  .-----.   
         |Downscale|&lt;-+ Y_s |         
         '----+----'  '-----'  
              v          
             .-.        
            | Y |
             '-'
</pre>

<blockquote>
  <p>Note: the quantized matrix multiplication can be implemented in very different ways on devices that do not have efficient
implementations of the integer Matrix Multiplication.</p>
</blockquote>

<h2 id="references">References</h2>

<div class="footnotes" role="doc-endnotes">
  <ol>
    <li id="fn:quant" role="doc-endnote">
      <p>Yunchao Gong, Liu Liu, Ming Yang, Lubomir Bourdev, “Compressing Deep Convolutional Networks using Vector Quantization”
      <a href="https://arxiv.org/abs/1412.6115">arxiv</a>, 2014. <a href="#fnref:quant" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:qtf" role="doc-endnote">
      <p>Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko,
    “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference”
    <a href="https://arxiv.org/abs/1712.05877">arxiv</a>, 2017. <a href="#fnref:qtf" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:qinit" role="doc-endnote">
      <p>Stone Yun, Alexander Wong, “Where Should We Begin? A Low-Level Exploration of Weight Initialization Impact on Quantized Behaviour of Deep Neural Networks”,
      <a href="https://arxiv.org/abs/2011.14578">arxiv</a>, 2020. <a href="#fnref:qinit" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:qreg" role="doc-endnote">
      <p>Arash Ahmadian, Saurabh Dash, Hongyu Chen, Bharat Venkitesh, Stephen Gou, Phil Blunsom, Ahmet Üstün, Sara Hooker, “Intriguing Properties of Quantization at Scale”,
     <a href="https://arxiv.org/abs/2305.19268">arxiv</a>, 2023. <a href="#fnref:qreg" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
    <li id="fn:qbn" role="doc-endnote">
      <p>Elaina Teresa Chai, “Analysis of quantization and normalization effects in deep neural networks”, <a href="https://searchworks.stanford.edu/view/13971425">stanford</a>, 2021. <a href="#fnref:qbn" class="reversefootnote" role="doc-backlink">&#8617;</a></p>
    </li>
  </ol>
</div>

          ]]>
      </description>
    </item>
    
    <item>
      <title>
          <![CDATA[
          Identify Repeating Patterns using Spiking Neural Networks in Tensorflow
          ]]>
      </title>
      <link>http://www.kaizou.org/2018/07/stdp-tensorflow.html</link>
      <pubDate>Thu, 26 Jul 2018 10:38:00 +0000</pubDate>
      <author>kaizouman@kaizou.org (David Corvoysier)</author>
      <guid>http://www.kaizou.org/2018/07/stdp-tensorflow</guid>
      <description>
          <![CDATA[
          <p>Spiking neural networks (<a href="https://en.wikipedia.org/wiki/Spiking_neural_network">SNN</a>) are the 3rd generation of neural networks.</p>

<p>SNN do not react on each stimulus, but rather accumulate inputs until they reach a threshold potential and generate a ‘spike’.</p>

<p>Because of their very nature, SNNs cannot be trained like 2nd generation neural networks using gradient descent.</p>

<p>Spike Timing Dependent Plasticity (<a href="https://en.wikipedia.org/wiki/Spike-timing-dependent_plasticity">STDP</a>) is a biological process that
inspired an unsupervised training method for SNNs.</p>

<p>In this article, I will provide an illustration of how STDP can be used to teach a single neuron to identify a repeating pattern in a continuous stream of input spikes.</p>

<!--more-->

<p>For this, I will reproduce the STDP experiments described in 
<a href="https://www.semanticscholar.org/paper/Spike-Timing-Dependent-Plasticity-Finds-the-Start-Masquelier-Guyonneau/432b5bfa6fc260289fef45544a43ebcd8892915e">Masquelier &amp; Thorpe (2008)</a> using <a href="https://www.tensorflow.org/">Tensorflow</a> instead of Matlab.</p>

<h2 id="lif-neuron-model">LIF neuron model</h2>

<p>The LIF neuron model used in this experiment is based on Gerstner’s <a href="http://lcn.epfl.ch/~gerstner/SPNM/node26.html#SECTION02311000000000000000">Spike Response Model</a>.</p>

<p>At every time-step, the neuron membrane potential p is given by the formula:</p>

\[p=\eta(t-t_{i})\sum_{j|t_{j}&gt;t_{i}}{}w_{j}\varepsilon(t-t_{j})\]

<p>where $\eta(t-t_{i})$ is the membrane response after a spike at time $t_{i}$:</p>

\[\eta(t-t_{i})=K_{1}exp(-\frac{t-t_{i}}{\tau_{m}})-K_{2}(exp(-\frac{t-t_{i}}{\tau_{m}})-exp(-\frac{t-t_{i}}{\tau_{s}}))\]

<p>and $\varepsilon(t)$ describes the Excitatory Post-Synaptic Potential of each synapse spike at time $t_{j}$:</p>

\[\varepsilon(t-t_{j})=K(exp(-\frac{t-t_{j}}{\tau_{m}})-exp(-\frac{t-t_{j}}{\tau_{s}}))\]

<p>Note that K has to be chosen so that the max of $\eta(t)$ is 1, knowing that $\eta(t)$ is maximum when:
\(t=\frac{\tau_{m}\tau_{s}}{\tau_{m}-\tau_{s}}ln(\frac{\tau_{m}}{\tau_{s}})\)</p>

<p>In this simplified version of the neuron, the synaptic weights $w_{j}$ remain constant.</p>

<p>The main graph operations are described below (please refer to my 
<a href="https://github.com/kaizouman/tensorsandbox/blob/master/snn/STDP_masquelier_2008.ipynb">jupyter notebook</a> for details):</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="c1"># Excitatory post-synaptic potential (EPSP)
</span>    <span class="k">def</span> <span class="nf">epsilon_op</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>

        <span class="c1"># We only use the negative value of the relative spike times
</span>        <span class="n">spikes_t_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">negative</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">t_spikes</span><span class="p">)</span>

        <span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">K</span> <span class="o">*</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">spikes_t_op</span><span class="o">/</span><span class="bp">self</span><span class="p">.</span><span class="n">tau_m</span><span class="p">)</span> <span class="o">-</span> <span class="n">tf</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">spikes_t_op</span><span class="o">/</span><span class="bp">self</span><span class="p">.</span><span class="n">tau_s</span><span class="p">))</span>
    
    <span class="c1"># Membrane spike response
</span>    <span class="k">def</span> <span class="nf">eta_op</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        
        <span class="c1"># We only use the negative value of the relative time
</span>        <span class="n">t_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">negative</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">last_spike</span><span class="p">)</span>
        
        <span class="c1"># Evaluate the spiking positive pulse
</span>        <span class="n">pos_pulse_op</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">K1</span> <span class="o">*</span> <span class="n">tf</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">t_op</span><span class="o">/</span><span class="bp">self</span><span class="p">.</span><span class="n">tau_m</span><span class="p">)</span>
        
        <span class="c1"># Evaluate the negative spike after-potential
</span>        <span class="n">neg_after_op</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">K2</span> <span class="o">*</span> <span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">t_op</span><span class="o">/</span><span class="bp">self</span><span class="p">.</span><span class="n">tau_m</span><span class="p">)</span> <span class="o">-</span> <span class="n">tf</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">t_op</span><span class="o">/</span><span class="bp">self</span><span class="p">.</span><span class="n">tau_s</span><span class="p">))</span>

        <span class="c1"># Evaluate the new post synaptic membrane potential
</span>        <span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">T</span> <span class="o">*</span> <span class="p">(</span><span class="n">pos_pulse_op</span> <span class="o">-</span> <span class="n">neg_after_op</span><span class="p">)</span>
    
    <span class="c1"># Neuron behaviour during integrating phase (t_rest = 0)
</span>    <span class="k">def</span> <span class="nf">w_epsilons_op</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        
        <span class="c1"># Evaluate synaptic EPSPs. We ignore synaptic spikes older than the last neuron spike
</span>        <span class="n">epsilons_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">where</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">logical_and</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">t_spikes</span> <span class="o">&gt;=</span><span class="mi">0</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">t_spikes</span> <span class="o">&lt;</span> <span class="bp">self</span><span class="p">.</span><span class="n">last_spike</span> <span class="o">-</span> <span class="bp">self</span><span class="p">.</span><span class="n">tau_rest</span><span class="p">),</span>
                               <span class="bp">self</span><span class="p">.</span><span class="n">epsilon_op</span><span class="p">(),</span>
                               <span class="bp">self</span><span class="p">.</span><span class="n">t_spikes</span><span class="o">*</span><span class="mf">0.0</span><span class="p">)</span>
                          
        <span class="c1"># Agregate weighted incoming EPSPs 
</span>        <span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">reduce_sum</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">w</span> <span class="o">*</span> <span class="n">epsilons_op</span><span class="p">)</span>  
   <span class="p">...</span>
   <span class="k">def</span> <span class="nf">default_op</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        
        <span class="c1"># Update weights
</span>        <span class="n">w_op</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">default_w_op</span><span class="p">()</span>
        
        <span class="c1"># By default, the membrane potential is given by the sum of the eta kernel and the weighted epsilons
</span>        <span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">control_dependencies</span><span class="p">([</span><span class="n">w_op</span><span class="p">]):</span>
            <span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">eta_op</span><span class="p">()</span> <span class="o">+</span> <span class="bp">self</span><span class="p">.</span><span class="n">w_epsilons_op</span><span class="p">()</span>
        
    <span class="k">def</span> <span class="nf">integrating_op</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>

        <span class="c1"># Evaluate the new membrane potential, integrating both synaptic input and spike dynamics
</span>        <span class="n">p_op</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">eta_op</span><span class="p">()</span> <span class="o">+</span> <span class="bp">self</span><span class="p">.</span><span class="n">w_epsilons_op</span><span class="p">()</span>

        <span class="c1"># We have a different behavior if we reached the threshold
</span>        <span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">cond</span><span class="p">(</span><span class="n">p_op</span> <span class="o">&gt;</span> <span class="bp">self</span><span class="p">.</span><span class="n">T</span><span class="p">,</span>
                       <span class="bp">self</span><span class="p">.</span><span class="n">firing_op</span><span class="p">,</span>
                       <span class="bp">self</span><span class="p">.</span><span class="n">default_op</span><span class="p">)</span>
    
    <span class="k">def</span> <span class="nf">get_potential_op</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        
        <span class="c1"># Update our internal memory of the synapse spikes (age older spikes, add new ones)
</span>        <span class="n">update_spikes_op</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">update_spikes_times</span><span class="p">()</span>
        
        <span class="c1"># Increase the relative time of the last spike by the time elapsed
</span>        <span class="n">last_spike_age_op</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">last_spike</span><span class="p">.</span><span class="n">assign_add</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">dt</span><span class="p">)</span>
        
        <span class="c1"># Update the internal state of the neuron and evaluate membrane potential
</span>        <span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">control_dependencies</span><span class="p">([</span><span class="n">update_spikes_op</span><span class="p">,</span> <span class="n">last_spike_age_op</span><span class="p">]):</span>
            <span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">cond</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">t_rest</span> <span class="o">&gt;</span> <span class="mf">0.0</span><span class="p">,</span>
                           <span class="bp">self</span><span class="p">.</span><span class="n">resting_op</span><span class="p">,</span>
                           <span class="bp">self</span><span class="p">.</span><span class="n">integrating_op</span><span class="p">)</span>
</code></pre></div></div>
<h2 id="stimulate-neuron-with-predefined-synapse-input">Stimulate neuron with predefined synapse input</h2>

<p>We replicate the $figure\,3$ of the original paper by stimulating a LIF neuron with six consecutive synapse spikes (dotted gray lines on the figure).</p>

<p>The neuron has a refractory period of $1\,ms$ and a threshold of $1$.</p>

<p><img src="/images/posts/masquelier_1.png" alt="LIF Neuron response" /></p>

<p>As in the original paper. we see that because of the leaky nature of the neuron, the stimulating spikes have to be nearly synchronous
for the threshold to be reached.</p>

<h2 id="generate-poisson-spike-trains-with-varying-rate">Generate Poisson spike trains with varying rate</h2>

<p>The original paper uses Poisson spike trains with a rate varying in the $[0, 90]\,Hz$ interval, with a variation speed that itself varies in the $[-1800, 1800]\,Hz$ interval (in random uniform increments in the $[-360,360]$ interval).</p>

<p>Optionally, we may force each synapse to spike at least every $\Delta_{max}\,ms$.</p>

<p>Please refer to my 
<a href="https://github.com/kaizouman/tensorsandbox/blob/master/snn/STDP_masquelier_2008.ipynb">jupyter notebook</a> for the details of the Spike
trains generator.</p>

<p>We test our spike trains generator and draw the corresponding spikes.
Both sets of spike trains use varying rates in the $[0, 90]\,Hz$ interval.
The second set imposes $\Delta_{max}=50\,ms$.</p>

<p><img src="/images/posts/masquelier_2.png" alt="Varying spike trains" />
<img src="/images/posts/masquelier_2_1.png" alt="Varying spike trains with delta_max" /></p>

<p>We note the increased mean rate of the second set of spike trains, due to the minimum $20\,Hz$ rate we impose (ie the maximum interval we allow between two spikes is $50\,ms$).</p>

<h2 id="stimulate-a-lif-neuron-with-random-spike-trains">Stimulate a LIF Neuron with random spike trains</h2>

<p>We now feed the neuron with $500$ synapses that generate spikes at random interval with varying rates.</p>

<p>The synaptic efficacy weights are arbitrarily set to $0.475$ and remain constant throughout the simulation.</p>

<p>We draw the neuron membrane response to the $500$ random synaptic spike trains.</p>

<p><img src="/images/posts/masquelier_3.png" alt="Varying spike trains" />
<img src="/images/posts/masquelier_3_1.png" alt="LIF Neuron response" /></p>

<p>We can see that the neuron mostly saturates and continuously generates spikes.</p>

<h2 id="introduce-spike-timing-dependent-plasticity">Introduce Spike Timing Dependent Plasticity</h2>

<p>We extend the LIFNeuron by allowing it to modify its synapse weights using a Spike Timing Dependent Plasticity algorithm (<strong>STDP</strong>).</p>

<p>The <strong>STDP</strong> algorithm rewards synapses where spikes occurred immediately before a neuron spike, and inflicts penalties to the synapses where spikes occur after the neuron spike.</p>

<p>The ‘rewards’ are called Long Term synaptic Potentiation (<strong>LTP</strong>), and the penalties Long Term synaptic Depression (<strong>LTD</strong>).</p>

<p>For each synapse that spiked $\Delta{t}$ before a neuron spike:</p>

\[\Delta{w} = a^{+}exp(-\frac{\Delta{t}}{\tau^{+}})\]

<p>For each synapse that spikes $\Delta{t}$ after a neuron spike:</p>

\[\Delta{w} = -a^{-}exp(-\frac{\Delta{t}}{\tau^{-}})\]

<p>As in the original paper, we only apply <strong>LTP</strong>, resp. <strong>LTD</strong> to the first spike before, resp. after a neuron spike on each synapse.</p>

<p>The main <strong>STDP</strong> graph operations are described below (please refer to my 
<a href="https://github.com/kaizouman/tensorsandbox/blob/master/snn/STDP_masquelier_2008.ipynb">jupyter notebook</a> for details:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="c1"># Long Term synaptic Potentiation
</span>    <span class="k">def</span> <span class="nf">LTP_op</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        
        <span class="c1"># We only consider the last spike of each synapse from our memory
</span>        <span class="n">last_spikes_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">reduce_min</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">t_spikes</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">0</span><span class="p">)</span>

        <span class="c1"># Reward all last synapse spikes that happened after the previous neutron spike
</span>        <span class="n">rewards_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">where</span><span class="p">(</span><span class="n">last_spikes_op</span> <span class="o">&lt;</span> <span class="bp">self</span><span class="p">.</span><span class="n">last_spike</span><span class="p">,</span>
                              <span class="n">tf</span><span class="p">.</span><span class="n">constant</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">a_plus</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="bp">self</span><span class="p">.</span><span class="n">n_syn</span><span class="p">])</span> <span class="o">*</span> <span class="n">tf</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">negative</span><span class="p">(</span><span class="n">last_spikes_op</span><span class="o">/</span><span class="bp">self</span><span class="p">.</span><span class="n">tau_plus</span><span class="p">)),</span>
                              <span class="n">tf</span><span class="p">.</span><span class="n">constant</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="bp">self</span><span class="p">.</span><span class="n">n_syn</span><span class="p">]))</span>
        
        <span class="c1"># Evaluate new weights
</span>        <span class="n">new_w_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">w</span><span class="p">,</span> <span class="n">rewards_op</span><span class="p">)</span>
        
        <span class="c1"># Update with new weights clamped to [0,1]
</span>        <span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">w</span><span class="p">.</span><span class="n">assign</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">clip_by_value</span><span class="p">(</span><span class="n">new_w_op</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">))</span>
    
    <span class="c1"># Long Term synaptic Depression
</span>    <span class="k">def</span> <span class="nf">LTD_op</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>

        <span class="c1"># Inflict penalties on new spikes on synapses that have not spiked
</span>        <span class="c1"># The penalty is equal for all new spikes, and inversely exponential
</span>        <span class="c1"># to the time since the last spike
</span>        <span class="n">penalties_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">where</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">logical_and</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">new_spikes</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">logical_not</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">syn_has_spiked</span><span class="p">)),</span>
                                <span class="n">tf</span><span class="p">.</span><span class="n">constant</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">a_minus</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="bp">self</span><span class="p">.</span><span class="n">n_syn</span><span class="p">])</span> <span class="o">*</span> <span class="n">tf</span><span class="p">.</span><span class="n">exp</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">negative</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">last_spike</span><span class="o">/</span><span class="bp">self</span><span class="p">.</span><span class="n">tau_minus</span><span class="p">)),</span>
                                <span class="n">tf</span><span class="p">.</span><span class="n">constant</span><span class="p">(</span><span class="mf">0.0</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="bp">self</span><span class="p">.</span><span class="n">n_syn</span><span class="p">]))</span>
        
        <span class="c1"># Evaluate new weights
</span>        <span class="n">new_w_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">subtract</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">w</span><span class="p">,</span> <span class="n">penalties_op</span><span class="p">)</span>
        
        <span class="c1"># Update the list of synapses that have spiked
</span>        <span class="n">new_spikes_op</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">syn_has_spiked</span><span class="p">.</span><span class="n">assign</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">syn_has_spiked</span> <span class="o">|</span> <span class="bp">self</span><span class="p">.</span><span class="n">new_spikes</span><span class="p">)</span>
        
        <span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">control_dependencies</span><span class="p">([</span><span class="n">new_spikes_op</span><span class="p">]):</span>
            <span class="c1"># Update with new weights clamped to [0,1]
</span>            <span class="k">return</span> <span class="bp">self</span><span class="p">.</span><span class="n">w</span><span class="p">.</span><span class="n">assign</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">clip_by_value</span><span class="p">(</span><span class="n">new_w_op</span><span class="p">,</span> <span class="mf">0.0</span><span class="p">,</span> <span class="mf">1.0</span><span class="p">))</span>
</code></pre></div></div>

<h2 id="test-stdp-with-predefined-input">Test STDP with predefined input</h2>

<p>We apply the same predefined spike train to an <strong>STDP</strong> capable LIFNeuron with a limited number of synapses, and draw the resulting rewards (<em>green</em>) and penalties (<em>red</em>).</p>

<p><img src="/images/posts/masquelier_4.png" alt="Synapse spikes and STDP" />
<img src="/images/posts/masquelier_4_1.png" alt="LIF Neuron response" /></p>

<p>On the graph above, we verify that the rewards (<em>green</em> dots) are assigned only when the neuron spikes, and that they are assigned to synapses where a spike occured before the neuron spike (big <em>blue</em> dots).</p>

<p>Note: a reward is assigned event if the synapse spike is not synchronous with the neuron spike, but it will be lower.</p>

<p>We also verify that a penaly (<em>red</em> dot) is inflicted on every synapse where a first spike occurs after a neuron spike.</p>

<p>Note: these penalties may later be counter-balanced by a reward if a neuron spike closely follows.</p>

<h2 id="stimulate-an-stdp-lif-neuron-with-random-spike-trains">Stimulate an STDP LIF Neuron with random spike trains</h2>

<p>The goal here is to check the effects of the <strong>STDP</strong> learning on the neuron behaviour when it is stimulated with our random spike trains.</p>

<p>We test the neuron response with three set of spike trains, with a mean rate of $35$, $45$ and $55$ $Hz$ respectively.</p>

<p><img src="/images/posts/masquelier_5.png" alt="LIF Neuron response 35Hz" />
<img src="/images/posts/masquelier_5_1.png" alt="Mean weights 35 Hz" />
<img src="/images/posts/masquelier_5_2.png" alt="LIF Neuron response 45Hz" />
<img src="/images/posts/masquelier_5_3.png" alt="Mean weights 45 Hz" />
<img src="/images/posts/masquelier_5_4.png" alt="LIF Neuron response 55Hz" />
<img src="/images/posts/masquelier_5_5.png" alt="Mean weights 55 Hz" /></p>

<p>We see that the evolution of the synapse weights as a response to this steady stimulation is highly dependent of the mean input frequency.</p>

<p>If the mean input frequency is too low, the neuron exhibits a low decrease of the synaptic efficacy weights, down to the point where the neuron is not able to fire anymore.</p>

<p>If the mean input frequency is too high, the neuron exhibits in the contrary an increase of the synaptic efficacy weights, up to the point where it fires regardless of the input.</p>

<p>Using the <strong>STDP</strong> values of the original paper, only the exact mean frequency of $45$ $Hz$ (the one also used in the paper) exhibits some kind of stability.</p>

<p>As a conclusion, either our implementations differ, or the adverse effect of this particular <strong>STDP</strong> algorithm has been overlooked in the original paper, because as we will see later, the actual mean stimulation rate will be around $64$ $Hz$.</p>

<h2 id="generate-recurrent-spike-trains">Generate recurrent spike trains</h2>

<p>We don’t follow exactly the same procedure as in the original paper, as the evolution of the hardware and software allows us to generate spike trains more easily. The result, however, is equivalent.</p>

<p>We generate $2000$ spike trains, from which we force the $1000$ first to repeat a $50\,ms$ pattern at random intervals.</p>

<p>The time to the next pattern is chosen with a probability of $0.25$ among the next slices of $50\,ms$ (omitting the first one to avoid consecutive patterns).</p>

<p>We display the resulting synapse mean spiking rates, and some samples of the spike trains, identifying the pattern (<em>gray</em> areas).</p>

<p><img src="/images/posts/masquelier_6.png" alt="Synapses Mean firing rate" />
<img src="/images/posts/masquelier_6_1.png" alt="Spike trains with pattern 1" />
<img src="/images/posts/masquelier_6_2.png" alt="Spike trains with pattern 2" />
<img src="/images/posts/masquelier_6_3.png" alt="Spike trains with pattern 3" /></p>

<p>We verify that the mean spiking rate is the same for both population of synapses (approximately $64\,Hz = 54\,Hz + 10\,Hz$).</p>

<p>We nevertheless notice that the standard deviation is much higher for the synapses involved in the pattern.</p>

<p>On the spike trains samples, one can visually recognize the patterns thanks to the <em>gray</em> background, but otherwise
they would go unnoticed for the human eye.</p>

<p>We also verify that each pattern is slightly modified by the $10\,Hz$ spontaneous activity.</p>

<h2 id="stimulate-an-stdp-lif-neuron-with-recurrent-spiking-trains">Stimulate an STDP LIF neuron with recurrent spiking trains</h2>

<p>We perform a simulation on our <strong>STDP</strong> LIF neuron with the generated spike trains, and draw the neuron response at the 
begining, middle and end of the simulation.</p>

<p>On each sample, we identify the pattern interval with a <em>gray</em> background.</p>

<p><img src="/images/posts/masquelier_7.png" alt="STDP training 1" />
<img src="/images/posts/masquelier_7_1.png" alt="STDP training 2" />
<img src="/images/posts/masquelier_7_2.png" alt="STDP training 3" />
<img src="/images/posts/masquelier_7_3.png" alt="STDP training 4" /></p>

<p>At the beginning of the stimulation, the neuron spikes continuously, inside and outside the pattern.</p>

<p>At the middle of the stimulation, the neuron fires mostly inside the pattern and sometimes outside the pattern (false positive).</p>

<p>At the end of the stimulation, the neuron fires only inside the pattern.</p>

<blockquote>
  <p><strong>Important note:</strong>
With the rates specified in the original paper, the neuron quickly saturates and doesn’t learn anything.
With a tweaked LTD factor $a^{-}$, that seems to be dependent of the spike trains, the neuron learns the pattern after only a few seconds of presentation: Hurray !
For a given set of spike trains, you might adjust the rate to achieve a successful training</p>
</blockquote>

<p>The neuron has become more an more selective as the pattern presentation were repeated, up to the point where the synapses involved in the pattern have dominant weights, as displayed on the graph below.</p>

<p><img src="/images/posts/masquelier_8.png" alt="Weights after training" /></p>

<h2 id="discussion">Discussion</h2>

<p>We managed to reproduce the experiments described in <a href="https://www.semanticscholar.org/paper/Spike-Timing-Dependent-Plasticity-Finds-the-Start-Masquelier-Guyonneau/432b5bfa6fc260289fef45544a43ebcd8892915e">Masquelier &amp; Thorpe (2008)</a> using <a href="https://www.tensorflow.org/">Tensorflow</a>.</p>

<p>However, we found out that the <strong>STDP</strong> parameters needed to be tweaked to adjust to the input spike train mean rate,
and possibly also to adjust to the generated spike trains themselves, as for a given rate, the neuron did not react
identically for different sets of spike trains.</p>

<p>Also, we found out that the neuron doesn’t necessarily identify the beginning of the pattern, but sometime its end.</p>

<p>These differences with the original paper raise questions about the differences between our implementation and the original one done in Matlab.</p>


          ]]>
      </description>
    </item>
    
    <item>
      <title>
          <![CDATA[
          Leaky Integrate and Fire neuron with Tensorflow
          ]]>
      </title>
      <link>http://www.kaizou.org/2018/07/lif-neuron-tensorflow.html</link>
      <pubDate>Wed, 25 Jul 2018 10:38:00 +0000</pubDate>
      <author>kaizouman@kaizou.org (David Corvoysier)</author>
      <guid>http://www.kaizou.org/2018/07/lif-neuron-tensorflow</guid>
      <description>
          <![CDATA[
          <p>Spiking Neural Networks (SNN) are the next generation of neural networks, that operate using spikes, 
which are discrete events that take place at points in time, rather than continuous values.</p>

<p>Essentially, once a stimulated neuron reaches a certain potential, it spikes, and the potential of that neuron is reset.</p>

<p>In this article, I will detail how the Leaky Integrate and Fire (LIF) spiking neuron model can be implemented
using <a href="https://www.tensorflow.org/">Tensorflow</a>.</p>

<!--more-->

<h2 id="leaky-integrate-and-fire-model">Leaky-integrate-and-fire model</h2>

<p>We use the model described in <a href="http://lcn.epfl.ch/~gerstner/SPNM/node26.html#SECTION02311000000000000000">§ 4.1 of “Spiking Neuron Models”, by Gerstner and Kistler (2002)</a>.</p>

<p>The leaky integrate-and-fire (LIF) neuron is probably one of the simplest spiking neuron models, but it is still very popular due to the ease with which it can be analyzed and simulated.</p>

<p>The basic circuit of an integrate-and-fire model consists of a capacitor C in parallel with a resistor R driven by a current I(t):</p>

<p><img alt="Leaky Integrate and Fire model" src="/images/posts/gerstner.gif" style="margin: auto; display:block" /></p>

<p>The driving current can be split into two components, $I(t) = IR + IC$.</p>

<p>The first component is the resistive current $IR$ which passes through the linear resistor $R$.</p>

<p>It can be calculated from Ohm’s law as $IR = \frac{u}{R}$ where $u$ is the voltage across the resistor.</p>

<p>The second component $IC$ charges the capacitor $C$.</p>

<p>From the definition of the capacity as $C = \frac{q}{u}$ (where $q$ is the charge and $u$ the voltage), we find a capacitive current $IC = C\frac{du}{dt}$. Thus:</p>

\[I(t) = \frac{u(t)}{R} + C\frac{du}{dt}\]

<p>By multiplying the equation by $R$ and introducing the time constant $\tau_{m} = RC$ this yields the standard form:</p>

\[\tau_{m}\frac{du}{dt}=-u(t) + RI(t)\]

<p>where $u(t)$ represents the membrane potential at time $t$, $\tau_{m}$ is the membrane time constant and $R$ is the
membrane resistance.</p>

<p>When the membrane potential reaches the spiking threshold $u_{thresh}$, the neuron ‘spikes’ and enters a resting state for a duration $\tau_{rest}$.</p>

<p>During the resting perdiod the membrane potential remains constant a $u_{rest}$.</p>

<h2 id="step-1-create-a-single-lif-model">Step 1: Create a single LIF model</h2>

<p>In a first step, we create a tensorflow graph to evaluate the membrane response of a LIF neuron.</p>

<p>For encaspulation and isolation, the graph is a member of a LIFNeuron object that takes all model parameters at initialization.</p>

<p>The LIFNeuron object exposes the membrane potential Tensorflow ‘operation’ as a member.</p>

<p>The input current and considered time interval are passed at Tensorflow placeholders.</p>

<p>The main graph operations are described below (please refer to my 
<a href="https://github.com/kaizouman/tensorsandbox/blob/master/snn/leaky_integrate_fire.ipynb">jupyter notebook</a> for details:</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code>    <span class="c1"># Neuron behaviour during integration phase (below threshold)
</span>    <span class="k">def</span> <span class="nf">get_integrating_op</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>

        <span class="c1"># Get input current
</span>        <span class="n">i_op</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">get_input_op</span><span class="p">()</span>

        <span class="c1"># Update membrane potential
</span>        <span class="n">du_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">divide</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">subtract</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">multiply</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">r</span><span class="p">,</span> <span class="n">i_op</span><span class="p">),</span> <span class="bp">self</span><span class="p">.</span><span class="n">u</span><span class="p">),</span> <span class="bp">self</span><span class="p">.</span><span class="n">tau</span><span class="p">)</span> 
        <span class="n">u_op</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">u</span><span class="p">.</span><span class="n">assign_add</span><span class="p">(</span><span class="n">du_op</span> <span class="o">*</span> <span class="bp">self</span><span class="p">.</span><span class="n">dt</span><span class="p">)</span>
        <span class="c1"># Refractory period is 0
</span>        <span class="n">t_rest_op</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">t_rest</span><span class="p">.</span><span class="n">assign</span><span class="p">(</span><span class="mf">0.0</span><span class="p">)</span>
        
        <span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">control_dependencies</span><span class="p">([</span><span class="n">t_rest_op</span><span class="p">]):</span>
            <span class="k">return</span> <span class="n">u_op</span>

    <span class="c1"># Neuron behaviour during firing phase (above threshold)    
</span>    <span class="k">def</span> <span class="nf">get_firing_op</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>                  

        <span class="c1"># Reset membrane potential
</span>        <span class="n">u_op</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">u</span><span class="p">.</span><span class="n">assign</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">u_rest</span><span class="p">)</span>
        <span class="c1"># Refractory period starts now
</span>        <span class="n">t_rest_op</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">t_rest</span><span class="p">.</span><span class="n">assign</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">tau_rest</span><span class="p">)</span>

        <span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">control_dependencies</span><span class="p">([</span><span class="n">t_rest_op</span><span class="p">]):</span>
            <span class="k">return</span> <span class="n">u_op</span>

    <span class="c1"># Neuron behaviour during resting phase (t_rest &gt; 0)
</span>    <span class="k">def</span> <span class="nf">get_resting_op</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>

        <span class="c1"># Membrane potential stays at u_rest
</span>        <span class="n">u_op</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">u</span><span class="p">.</span><span class="n">assign</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">u_rest</span><span class="p">)</span>
        <span class="c1"># Refractory period is decreased by dt
</span>        <span class="n">t_rest_op</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">t_rest</span><span class="p">.</span><span class="n">assign_sub</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">dt</span><span class="p">)</span>
        
        <span class="k">with</span> <span class="n">tf</span><span class="p">.</span><span class="n">control_dependencies</span><span class="p">([</span><span class="n">t_rest_op</span><span class="p">]):</span>
            <span class="k">return</span> <span class="n">u_op</span>

    <span class="k">def</span> <span class="nf">get_potential_op</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
        
        <span class="k">return</span> <span class="n">tf</span><span class="p">.</span><span class="n">case</span><span class="p">(</span>
            <span class="p">[</span>
                <span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">t_rest</span> <span class="o">&gt;</span> <span class="mf">0.0</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">get_resting_op</span><span class="p">),</span>
                <span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">u</span> <span class="o">&gt;</span> <span class="bp">self</span><span class="p">.</span><span class="n">u_thresh</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">get_firing_op</span><span class="p">),</span>
            <span class="p">],</span>
            <span class="n">default</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">get_integrating_op</span>
        <span class="p">)</span>
</code></pre></div></div>
<h2 id="step-2-stimulation-by-a-square-input-current">Step 2: Stimulation by a square input current</h2>

<p>We stimulate the neuron with three square input currents of vaying intensity: 0.5, 1.2 and 1.5 mA.</p>

<p><img alt="Square input current" src="/images/posts/lif_1.png" />
<img alt="LIF neuron response" src="/images/posts/lif_1_1.png" /></p>

<p>The first current step is not sufficient to trigger a spike. The two other trigger several spikes whose frequency increases with the input current.</p>

<h2 id="step-3-stimulation-by-a-random-varying-input-current">Step 3: Stimulation by a random varying input current</h2>

<p>We now stimulate the neuron with a varying current corresponding to a normal distribution of mean 1.5 mA and standard deviation 1.0 mA.</p>

<p><img alt="Varying input current" src="/images/posts/lif_2.png" />
<img alt="LIF neuron response" src="/images/posts/lif_2_2.png" /></p>

<p>The input current triggers spike at regular intervals: the neuron mostly saturates, each spike being separated by the resting period.</p>

<h2 id="step-4-stimulate-neuron-with-synaptic-currents">Step 4: Stimulate neuron with synaptic currents</h2>

<p>We now assume that the neuron is connected to input neurons through $m$ synapses.</p>

<p>The contribution of the synapses to the neuron input current is given by the general formula below:</p>

\[I =\sum_{i}^{}w_{i}\sum_{f}{}I_{syn}(t-t_i^{(f)})\]

<p>Where $t_i^{(f)}$ is the time of the f-th spike of the synapse $i$.</p>

<p>A typical implementation of the $I_{syn}$ function is:</p>

\[I_{syn}(t)=\frac{q}{\tau}exp(-\frac{t}{\tau})\]

<p>where $q$ is the total charge that is injected in a postsynaptic neuron via a synapse with efficacy $w_{j} = 1$.</p>

<p>Note that $\frac{dI_{syn}}{dt}=-\frac{I_{syn}(t)}{\tau}$.</p>

<p>We create a new neuron model derived from the LIFNeuron.</p>

<p>The graph for this neuron includes a modified operation to evaluate the input current at each time step based on a memory of synaptic spikes.</p>

<p>The graph requires a new boolean Tensorflow placeholder that contains the synapse spikes over the last time step.</p>

<p>The modified operation is displayed below (please refer to my 
<a href="https://github.com/kaizouman/tensorsandbox/blob/master/snn/leaky_integrate_fire.ipynb">jupyter notebook</a> for details:</p>

<pre><code class="language-spike">    # Override parent get_input_op method
    def get_input_op(self):
        
        # Update our memory of spike times with the new spikes
        t_spikes_op = self.update_spike_times()

        # Evaluate synaptic input current for each spike on each synapse
        i_syn_op = tf.where(t_spikes_op &gt;=0,
                            self.q/self.tau_syn * tf.exp(tf.negative(t_spikes_op/self.tau_syn)),
                            t_spikes_op*0.0)

        # Add each synaptic current to the input current
        i_op =  tf.reduce_sum(self.w * i_syn_op)
        
        return tf.add(self.i_app, i_op)     
</code></pre>

<p>Each synapse spikes according to an independent poisson process at $\lambda = 20 hz$.</p>

<p>We perform a simulation by evaluating the contribution of each synapse to the input current over time.</p>

<p>At every time step, we draw a single sample $r$ from a uniform distribution in the $[0,1]$ interval, and if it is lower than
the probability of a spike over the time interval (ie $r &lt; \lambda.dt$) then a spike occurred.</p>

<p>Note that this assumes that the chosen time interval is lower than the minimum synapse spiking interval.</p>

<p><img alt="Synapse spikes" src="/images/posts/lif_3.png" />
<img alt="Synaptic input current" src="/images/posts/lif_3_1.png" />
<img alt="LIF neuron response" src="/images/posts/lif_3_2.png" /></p>

<p>As expected, the neuron spikes when several synapses spike together.</p>

          ]]>
      </description>
    </item>
    
    <item>
      <title>
          <![CDATA[
          Simulating spiking neurons with Tensorflow
          ]]>
      </title>
      <link>http://www.kaizou.org/2018/07/simulating-spiking-neurons-with-tensorflow.html</link>
      <pubDate>Tue, 24 Jul 2018 10:38:00 +0000</pubDate>
      <author>kaizouman@kaizou.org (David Corvoysier)</author>
      <guid>http://www.kaizou.org/2018/07/simulating-spiking-neurons-with-tensorflow</guid>
      <description>
          <![CDATA[
          <p>Spiking Neural Networks are the next generation of machine learning, according to the litterature.</p>

<p>After the feed-forward perceptrons of the last century and the bi-directional deep networks trained
using gradient descent of today, this 3rd generation of neural networks uses biologically-realistic
models of neurons to carry out computation.</p>

<p>A spiking neural network (SNN) operates using spikes, which are discrete events that take place at
points in time, rather than continuous values. The occurrence of a spike is determined by differential
equations that represent the membrane potential of the neuron.
Essentially, once a neuron reaches a certain potential, it spikes, and the potential of that neuron is reset.</p>

<p>In this article, I will detail how this kind of network can be modelled using <a href="https://www.tensorflow.org/">Tensorflow</a>.</p>

<!--more-->

<p>You can find a jupyter notebook corresponding to this article in my 
<a href="https://github.com/kaizouman/tensorsandbox/blob/master/snn/simple_spiking_model.ipynb">tensorflow sandbox</a>.</p>

<p>The article is based on an existing exercise using <a href="http://www.mjrlab.org/wp-content/uploads/2014/05/CSHA_matlab_2012.pdf">Matlab</a>.</p>

<h2 id="spiking-neuron-model">Spiking neuron model</h2>

<p>The neuron model is based on <a href="http://www.izhikevich.org/publications/spikes.htm">“Simple model on spiking neuron”</a>, by Eugene M. Izhikevich.</p>

<p><img src="/images/posts/izhik.gif" alt="Simple model on spiking neuron" width="100%" /></p>

<p>Electronic version of the figure and reproduction permissions are freely available at www.izhikevich.com</p>

<p>The behaviour of the neuron is determined by its membrane potential v that increases over time when it is stimulated by an input current I.
Whenever the membrane potential reaches the spiking threshold, the membrane potential is reset.</p>

<p>The membrane potential increase is mitigated by an adversary recovery effect defined by the u variable.</p>

<p>Tensorflow doesn’t support differential equations, so we need to approximate the evolution of the membrane potential and
membrane recovery by evaluating their variations over small time intervals dt:</p>

\[dv = 0.04v^2 + 5v + 140 -u + I\]

\[du = a(bv -u)\]

<p>We can then apply the variations by multiplying by the time interval dt:</p>

\[v += dv.dt\]

\[u += du.dt\]

<p>As stated in the model, the $0.04$, $5$ and $140$ values have been defined so that $v$ is in $mV$, $I$ is in $A$ and $t$ in $ms$.</p>

<p>The corresponding Tensorflow code looks like this (see the <a href="https://github.com/kaizouman/tensorsandbox/blob/master/snn/simple_spiking_model.ipynb">jupyter notebook</a> for details):</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># Evaluate membrane potential increment for the considered time interval
# dv = 0 if the neuron fired, dv = 0.04v*v + 5v + 140 + I -u otherwise
</span><span class="n">dv_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">where</span><span class="p">(</span><span class="n">has_fired_op</span><span class="p">,</span>
                 <span class="n">tf</span><span class="p">.</span><span class="n">zeros</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">v</span><span class="p">.</span><span class="n">shape</span><span class="p">),</span>
                 <span class="n">tf</span><span class="p">.</span><span class="n">subtract</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">add_n</span><span class="p">([</span><span class="n">tf</span><span class="p">.</span><span class="n">multiply</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">square</span><span class="p">(</span><span class="n">v_reset_op</span><span class="p">),</span> <span class="mf">0.04</span><span class="p">),</span>
                                       <span class="n">tf</span><span class="p">.</span><span class="n">multiply</span><span class="p">(</span><span class="n">v_reset_op</span><span class="p">,</span> <span class="mf">5.0</span><span class="p">),</span>
                                       <span class="n">tf</span><span class="p">.</span><span class="n">constant</span><span class="p">(</span><span class="mf">140.0</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="bp">self</span><span class="p">.</span><span class="n">n</span><span class="p">]),</span>
                                       <span class="n">i_op</span><span class="p">]),</span>
                             <span class="bp">self</span><span class="p">.</span><span class="n">u</span><span class="p">))</span>
                        
<span class="c1"># Evaluate membrane recovery decrement for the considered time interval
# du = 0 if the neuron fired, du = a*(b*v -u) otherwise
</span><span class="n">du_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">where</span><span class="p">(</span><span class="n">has_fired_op</span><span class="p">,</span>
                 <span class="n">tf</span><span class="p">.</span><span class="n">zeros</span><span class="p">([</span><span class="bp">self</span><span class="p">.</span><span class="n">n</span><span class="p">]),</span>
                 <span class="n">tf</span><span class="p">.</span><span class="n">multiply</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">A</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">subtract</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">multiply</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">B</span><span class="p">,</span> <span class="n">v_reset_op</span><span class="p">),</span> <span class="n">u_reset_op</span><span class="p">)))</span>
    
<span class="c1"># Increment membrane potential, and clamp it to the spiking threshold
# v += dv * dt
</span><span class="n">v_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">assign</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">v</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">minimum</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">constant</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">SPIKING_THRESHOLD</span><span class="p">,</span> <span class="n">shape</span><span class="o">=</span><span class="p">[</span><span class="bp">self</span><span class="p">.</span><span class="n">n</span><span class="p">]),</span>
                                                 <span class="n">tf</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">v_reset_op</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">multiply</span><span class="p">(</span><span class="n">dv_op</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">dt</span><span class="p">))))</span>

<span class="c1"># Decrease membrane recovery
</span><span class="n">u_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">assign</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">u</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="n">u_reset_op</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">multiply</span><span class="p">(</span><span class="n">du_op</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">dt</span><span class="p">)))</span>
</code></pre></div></div>

<h2 id="simulate-a-single-neuron-with-injected-current">Simulate a single neuron with injected current</h2>

<p>In a first step, we stimulate the neuron model with a square input current.</p>

<p><img src="/images/posts/simulating_spiking_1_0.png" alt="square input current" />
<img src="/images/posts/simulating_spiking_1.png" alt="Neuron response with square input current" /></p>

<p>The neuron spikes at regular intervals. After each spike, the neuron membrane goes to its resting potential
before starting to increase again.</p>

<h2 id="step-2-simulate-a-single-neuron-with-synaptic-input">Step 2: Simulate a single neuron with synaptic input</h2>

<p>It is a simple variation of the previous experiment, where the input current is the composition of currents coming from several synapses (typically here, a hundred).</p>

<p>The formula for evaluating the synaptic current corresponds to the weighted sum of the input current generated by each synapse:</p>

\[Isyn = \sum_{j}^{}w_{in}(j).Isyn(j)\]

<p>The current $Isyn(j)$ generated by each synapse is the multiplication of:</p>
<ul>
  <li>a linear response to the membrane potential, with a target objective of potential $E_{in}(j)$: ($E_{in}(j) -v$)</li>
  <li>a conductance dynamics parameter, that is an exponential function $g_{in}(j)$ that is defined by a differential equation.</li>
</ul>

\[\frac{dg_{in}(j)}{dt} = \frac{g_{in}(j)}{tau}\]

<p>Each input synapse emits a spike following a poisson distribution of frequency $frate$. The probability that a neuron fires during the time interval $dt$ is thus $frate.dt$.</p>

<p>To simulate the neuron, we draw random numbers r in the $[0,1]$ interval at each timestep, and is the number $r$ is less than $frate.dt$, we generate a synapse spike by increasing the conductance dynamics for that synapse:</p>

\[g_{in}(j) = g_{in}(j) + 1\]

<p>The complete synaptic current formula at each timestep is:</p>

\[Isyn = \sum_{j}^{}w_{in}(j)g_{in}(j)(E_{in}(j) -v(t)) = \sum_{j}^{}w_{in}(j)g_{in}(j)E_{in}(j) - (\sum_{j}w_{in}(j)g_{in}(j)).v(t)\]

<p>The corresponding Tensorflow code looks like this (see the <a href="https://github.com/kaizouman/tensorsandbox/blob/master/snn/simple_spiking_model.ipynb">jupyter notebook</a> for details):</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># First, update synaptic conductance dynamics:
# - increment by one the current factor of synapses that fired
# - decrease by tau the conductance dynamics in any case
</span><span class="n">g_in_update_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">where</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">syn_has_spiked</span><span class="p">,</span>
                          <span class="n">tf</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">g_in</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">ones</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">g_in</span><span class="p">.</span><span class="n">shape</span><span class="p">)),</span>
                          <span class="n">tf</span><span class="p">.</span><span class="n">subtract</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">g_in</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">multiply</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">dt</span><span class="p">,</span><span class="n">tf</span><span class="p">.</span><span class="n">divide</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">g_in</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">tau</span><span class="p">))))</span>

<span class="c1"># Update the g_in variable
</span><span class="n">g_in_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">assign</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">g_in</span><span class="p">,</span> <span class="n">g_in_update_op</span><span class="p">)</span>

<span class="c1"># We can now evaluate the synaptic input currents
# Isyn = Σ w_in(j)g_in(j)E_in(j) - (Σ w_in(j)g_in(j)).v(t)
</span><span class="n">i_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">subtract</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">einsum</span><span class="p">(</span><span class="s">'nm,m-&gt;n'</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">constant</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">W_in</span><span class="p">),</span> <span class="n">tf</span><span class="p">.</span><span class="n">multiply</span><span class="p">(</span><span class="n">g_in_op</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">constant</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">E_in</span><span class="p">))),</span>
                   <span class="n">tf</span><span class="p">.</span><span class="n">multiply</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">einsum</span><span class="p">(</span><span class="s">'nm,m-&gt;n'</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">constant</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">W_in</span><span class="p">),</span> <span class="n">g_in_op</span><span class="p">),</span> <span class="n">v_op</span><span class="p">))</span>
</code></pre></div></div>
<p>We stimulate a neuron with $100$ synapses firing at $2 Hz$ between $200$ and $700 ms$.</p>

<p>Every millisecond, there are $0.001 * 2 * 100 = 0.2$ synapse spikes as an average.</p>

<p>In other words, a synapse spike occurs every $5 ms$ as an average.</p>

<p>The resulting membrane potential is displayed below:</p>

<p><img src="/images/posts/simulating_spiking_2_0.png" alt="synaptic input current" />
<img src="/images/posts/simulating_spiking_2.png" alt="Neuron response with synaptic input current" /></p>

<p>The synaptic input current oscillates around a mean value of approximately $10 mA$.</p>

<p>Due to the increased input current, the neuron spikes faster than in the previous stimulation.</p>

<h2 id="step-3-simulate-1000-neurons-with-synaptic-input">Step 3: Simulate 1000 neurons with synaptic input</h2>

<p>Each neuron is either:</p>

<ul>
  <li>an inhibitory fast-spiking neuron $(a=0.1, d=2.0)$,</li>
  <li>or an excitatory regular spiking neuron $(a=0.02, d=8.0)$.</li>
</ul>

<p>with a proportion of $20$ % inhibitory.</p>

<p>We therefore define a random uniform vector p on $[0,1]$, and condition the a and d vectors of our neuron population on p.</p>

\[a[p&lt;0.2] = 0.1, a[p &gt;=0.2] = 0.02\]

\[d[p&lt;0.2] = 2.0, d[p &gt;=0.2] = 8.0\]

<p>Each neuron is randomly connected with $10$ % of the input synapses, and thus receives an input synapse spike every $50 ms$ as an average.</p>

<p>Instead of displaying the membrane potentials, we just plot the neuron spikes for inhibitory (blue) and excitatory (yellow) neurons:</p>

<p><img src="/images/posts/simulating_spiking_3.png" alt="Inhibitory and Excitatory spikes" /></p>

<p>The neurons spike in ‘stripes’ at somehow regular intervals, with a bit of dispersion.</p>

<p>The neuron dynamics seem to act as a regulator to the synaptic ‘noise’.</p>

<h2 id="step-4-simulate-1000-neurons-with-recurrent-connections">Step 4: Simulate 1000 neurons with recurrent connections</h2>

<p>A neuron i is sparsely (with probability $prc = 0.1$) connected to a neuron j.</p>

<p>Thus neuron i receives an additional current $Isyn(i)$ of the same form as the synaptic input:</p>

\[Isyn(i) = \sum_{j}w(i,j)g(j)(E(j) -v(t))\]

<p>The corresponding Tensorflow code looks like this (see the <a href="https://github.com/kaizouman/tensorsandbox/blob/master/snn/simple_spiking_model.ipynb">jupyter notebook</a> for details):</p>

<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1"># First, update recurrent conductance dynamics:
# - increment by one the current factor of synapses that fired
# - decrease by tau the conductance dynamics in any case
</span><span class="n">g_update_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">where</span><span class="p">(</span><span class="n">has_fired_op</span><span class="p">,</span>
                       <span class="n">tf</span><span class="p">.</span><span class="n">add</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">g</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">ones</span><span class="p">(</span><span class="n">shape</span><span class="o">=</span><span class="bp">self</span><span class="p">.</span><span class="n">g</span><span class="p">.</span><span class="n">shape</span><span class="p">)),</span>
                       <span class="n">tf</span><span class="p">.</span><span class="n">subtract</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">g</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">multiply</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">dt</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">divide</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">g</span><span class="p">,</span> <span class="bp">self</span><span class="p">.</span><span class="n">tau</span><span class="p">))))</span>
        
<span class="c1"># Update the g variable
</span><span class="n">g_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">assign</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">g</span><span class="p">,</span> <span class="n">g_update_op</span><span class="p">)</span>

<span class="c1"># We can now evaluate the recurrent conductance
# I_rec = Σ wjgj(Ej -v(t))
</span><span class="n">i_rec_op</span> <span class="o">=</span> <span class="n">tf</span><span class="p">.</span><span class="n">einsum</span><span class="p">(</span><span class="s">'ij,j-&gt;i'</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">constant</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">W</span><span class="p">),</span> <span class="n">tf</span><span class="p">.</span><span class="n">multiply</span><span class="p">(</span><span class="n">g_op</span><span class="p">,</span> <span class="n">tf</span><span class="p">.</span><span class="n">subtract</span><span class="p">(</span><span class="n">tf</span><span class="p">.</span><span class="n">constant</span><span class="p">(</span><span class="bp">self</span><span class="p">.</span><span class="n">E</span><span class="p">),</span> <span class="n">v_op</span><span class="p">)))</span>

<span class="c1"># Get the synaptic input currents from parent
</span><span class="n">i_in_op</span> <span class="o">=</span> <span class="nb">super</span><span class="p">(</span><span class="n">SimpleSynapticRecurrentNeurons</span><span class="p">,</span> <span class="bp">self</span><span class="p">).</span><span class="n">get_input_ops</span><span class="p">(</span><span class="n">has_fired_op</span><span class="p">,</span> <span class="n">v_op</span><span class="p">)</span>
        
<span class="c1"># The actual current is the sum of both currents
</span><span class="n">i_op</span> <span class="o">=</span> <span class="n">i_in_op</span> <span class="o">+</span> <span class="n">i_rec_op</span>
</code></pre></div></div>

<p>Weights $w$ are Gamma distributed (scale $0.003$, shape $2$).</p>

<p>Inhibitory to excitatory connections are twice as strong.</p>

<p>$E(j)$ is set to $-85$ for inhibitory neurons, $0$ otherwise.</p>

<p>We again plot the neuron spikes for inhibitory (blue) and excitatory (yellow) neurons:</p>

<p><img src="/images/posts/simulating_spiking_4.png" alt="Inhibitory and Excitatory spikes with recurrent connections" /></p>

<p>The addition of recurrent connections has drastically reduced the dispersion of the neuron spikes.</p>

          ]]>
      </description>
    </item>
    
    <item>
      <title>
          <![CDATA[
          Explore Tensorflow features with the CIFAR10 dataset
          ]]>
      </title>
      <link>http://www.kaizou.org/2017/06/tensorflow-cifar10.html</link>
      <pubDate>Mon, 26 Jun 2017 16:51:00 +0000</pubDate>
      <author>kaizouman@kaizou.org (David Corvoysier)</author>
      <guid>http://www.kaizou.org/2017/06/tensorflow-cifar10</guid>
      <description>
          <![CDATA[
          <p>The reason I started using Tensorflow was because of the limitations of my
experiments so far, where I had coded my models from scratch following the
guidance of the <a href="http://cs231n.github.io/">CNN for visual recognition</a> course.</p>

<p>I already knew how CNN worked, and had already a good experience of what it
takes to train a good model. I had also read a lot of papers presenting
multiple variations of CNN topologies, those aiming at increasing accuracy like
those aiming at reducing model complexity and size.</p>

<p>I work in the embedded world, so performance is obviously one of my primary
concern, but I soon realized that the CNN state of the art for computer vision
had not reached a consensus yet on the best compromise between accuracy and
performance.</p>

<!--more-->

<p>In particular, I noticed that some papers had neglected to investigate how the
multiple characteristics of their models contribute to the overall results they
obtain: I assume that this is because it takes an awful lot of time to train a
single model, thus leaving no time for musing around.</p>

<p>Anyway, my goal was therefore to multiply experiments on several models to
better isolate how each feature contributes to the efficiency of the training
and to the performance of the inference.</p>

<p>More specifically, my goals were:</p>

<ul>
  <li>to verify that Tensorflow allowed me to improve the efficiency of my
trainings (going numpy-only is desperately slow, even with BLAS and/or MKL),</li>
  <li>to use this efficiency to multiply experiments, changing one model parameter
at a time to see how it contributes to the overall accuracy,</li>
  <li>to experiment with alternative CNN models to verify the claims in the
corresponding papers.</li>
</ul>

<p>Thanks to the <a href="http://cs231n.github.io/">CNN for visual recognition</a> course, I
had already used the CIFAR10 dataset extensively, and I was sure that its
complexity was compatible with the hardware setup I had.</p>

<p>I therefore used the <a href="https://www.tensorflow.org/tutorials/deep_cnn">tensorflow CIFAR10 image
tutorial</a> as a starting point.</p>

<h2 id="setting-up-a-tensorflow-environment">Setting up a Tensorflow environment</h2>

<p>I have a pretty good experience in setting up development environments, and am
very much aware of the mess your host system can become if you don’t maintain
a good isolation between these developments environments.</p>

<p>After having tried several containment techniques (including chroots, Virtual
Machines and virtual env), I now use <a href="https://www.docker.com/">docker</a>, like
everybody else in the industry.</p>

<p>Google provides <a href="https://hub.docker.com/r/tensorflow/tensorflow/">docker images</a>
for the latest Tensorflow versions (both CPU and GPU), and also a development
image that you can use to rebuild Tensorflow with various optimizations for
your SoC.</p>

<p>You can refer to my <a href="https://github.com/kaizouman/tensorsandbox/tree/master/docker">step by step recipe</a>
to create your environment using docker.</p>

<h2 id="creating-a-cifar10-training-framework">Creating a CIFAR10 training framework</h2>

<p>Taking the Tensorflow image tutorial as an inspiration, I developed a
generic model training framework for the CIFAR10 dataset.</p>

<p>The framework uses several types of scripts for training and evaluations.</p>

<p>All scripts rely on the same data provider based on the tensorflow <a href="https://www.tensorflow.org/programmers_guide/reading_data">batch input
pipeline</a>.</p>

<p>The training scripts uses Tensorflow <a href="https://www.tensorflow.org/api_docs/python/tf/train/MonitoredTrainingSession">monitored training sessions</a>, whose benefits
are twofolds:</p>
<ul>
  <li>they neatly take care of tedious tasks like logs, saving checkpoints and
summaries,</li>
  <li>they almost transparently give access to the <a href="https://www.tensorflow.org/deploy/distributed">Tensorflow distributed
mode</a> to create training clusters.</li>
</ul>

<p>There is one script for training on a single host and another one for clusters.</p>

<p>There is also a single evaluation script, and a script to ‘freeze’ a model, ie
combine its graph definition with its trained weights into a single <a href="https://www.tensorflow.org/extend/tool_developers/">model
file</a> that can be loaded by
another Tensorflow application.</p>

<p>I tested the framework on a model I had already created for the assignments of
my course, verifying that I achieved the same accuracy.</p>

<p>The framework is in this <a href="https://github.com/kaizouman/tensorsandbox/tree/master/cifar10">github
repository</a>.</p>

<h2 id="reproducing-the-tutorial-performance">Reproducing the tutorial performance</h2>

<p>The next step was to start experimenting to figure out what really matters in
a CNN model for the CIFAR10 dataset.</p>

<p>The idea was to isolate the specific characteristic of the tutorial model to
evaluate how they contribute to the overall model accuracy.</p>

<p>As a first step, I implemented the same model as the tutorial in my framework,
but without all training bells and whistles.</p>

<h3 id="basic-hyperparameters">Basic hyperparameters</h3>

<p>Learning rate and batch size are two of the most important hyperparameters, and
are usually well evaluated by model designers, as they have a direct impact on
model convergence.</p>

<p>So I would assume they are usually well-defined. I nevertheless tried different
training parameters, and finally decided to keep the ones provided by the
tutorial, as they gave the best results:</p>
<ul>
  <li>learning rate = 0.1,</li>
  <li>batch size = 128.</li>
</ul>

<p>Note: the learning rate is more related to the model, and the batch size to the
dataset.</p>

<h3 id="initialization">Initialization</h3>

<p>For the initialization parameters, I was a bit reluctant to investigate much,
as there were too many variations.</p>

<p>More, I had already tried the <a href="http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf">Xavier initialization</a>
with good success, so I decided to initialize all variables with a Xavier
initializer.</p>

<h3 id="weight-decay">weight decay</h3>

<p>For the weight decay, I used a global parameter for each model, but refined
each for each variable, dividing it by the matrix size: my primary concern was
to make sure that the induced loss did not explode.</p>

<h3 id="gradually-improving-from-my-first-results">Gradually improving from my first results</h3>

<p>With my basic setup, I achieved results a bit lower than the tutorial (for
exactly the same model):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>75,3 % accuracy after 10,000 iterations instead of 81,3%.
</code></pre></div></div>

<p>Then, I added data augmentation, that smoothed a lot the training process:</p>

<ul>
  <li>drastic reduction of the overfitting,</li>
  <li>lower results for early iterations,</li>
  <li>much higher results after 5000+ iterations.</li>
</ul>

<p>With data augmentation:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>78,8 % accuracy after 10,000 iterations.
</code></pre></div></div>

<p>Finally, I used trainable variables moving averages instead of raw values, and
it gave me the extra missing accuracy to match the tutorial performance:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>81,4% accuracy after 10,000 iterations.
</code></pre></div></div>

<p>After 300,000 iterations, the model with data augmentation even reached 87%
accuracy.</p>

<h3 id="conclusion">Conclusion</h3>

<p>For the CIFAR10 dataset, data augmentation is a key factor for a successful
training, and using variable moving averages ireally helps convergence.</p>

<h3 id="tutorial-model-metrics">Tutorial model metrics</h3>

<p>Without data augmentation (32x32x3 images):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Size  : 1.76 Millions of parameters
Flops : 66.98 Millions of operations
</code></pre></div></div>

<p>With data augmentation (24x24x3 images):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Size     : 1.07 Millions of parameters
Flops    : 37.75 Millions of operations
</code></pre></div></div>

<h2 id="experimenting-with-the-tutorial-model-topology">Experimenting with the tutorial model topology</h2>

<p>To better understand how the tutorial model topology, I tested a few <a href="https://github.com/kaizouman/tensorsandbox/tree/master/cifar10/models/alex">ALexNet-style
models</a>
variants.</p>

<p>Note: I call these models Alex-like as the tutorial is based on the models
defined by Alexei krizhevsky, winner of the ImageNet challenge in 2012).</p>

<p>I didn’t save all variants I tried, but to summarize my experiments:</p>

<ul>
  <li>Local-response-normalization is useless,</li>
  <li>One of the FC layer can be removed without harming accuracy too much,</li>
  <li>For the same amount of parameters, more filters with smaller kernels are
equivalent to the base setup.</li>
</ul>

<p>My conclusion is that the tutorial model can be improved a bit in terms of size
and processing power (see the Alex 4 variant for instance), but that it is
already a good model for that specific topology that combines two standard
convolutional layers with two dense layers.</p>

<h2 id="experimenting-with-alternative-models">Experimenting with alternative models</h2>

<p>The next step was to experiment further with different models:</p>

<ul>
  <li><a href="https://github.com/kaizouman/tensorsandbox/tree/master/cifar10/models/nin">NiN
networks</a> that remove dense layers altogether,</li>
  <li><a href="https://github.com/kaizouman/tensorsandbox/tree/master/cifar10/models/squeeze">SqueezeNets</a> that parallelize convnets.</li>
</ul>

<p>The idea was to stay within the same range in terms of computational cost and
model size, but trying to find a better compromise between model accuracy,
model size and inference performance.</p>

<p>The figure below provides accuracy for the three best models I obtained,
compared to the tutorial version and one of the Alex-style variant.</p>

<p><img src="/images/posts/cifar10@300000.jpg" alt="cifar10 accuracy for various models after 300,000 iterations" /></p>

<p>For each model, I evaluated the model size in number of parameters, and its
computational cost in number of operations.</p>

<p>To put these theoretical counters in perspective, I also got ‘real’ numbers by
checking:</p>
<ul>
  <li>the actual disk size of the saved models,</li>
  <li>the inference time using the C++ label_image tool (I added some traces)</li>
</ul>

<p>The ratio between the number of parameters and the actual size on disk seems
consistent for all models, but the inference time is not, and may vary greatly
depending on the actual local optimizations. The winner is however the model
with the less number of operations.</p>

<p>Here are the detailed numbers for all trained models :</p>

<h3 id="tuto">Tuto</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Accuracy : 87,2%
Size     : 1.07 Millions of parameters  / 4,278,750 bytes
Flops    : 37.75 Millions of operations / 44 ms
</code></pre></div></div>

<h3 id="alex-alex4">Alex (alex4)</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Accuracy : 87,5%
Size     : 1.49 Millions of parameters  / 5,979,938 bytes
Flops    : 35.20 Millions of operations / 50 ms
</code></pre></div></div>

<h3 id="nin-nin2">NiN (nin2)</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Accuracy : 89,8%
Size     : 0.97 Millions of parameters   / 3,881,548 bytes
Flops    : 251.36 Millions of operations / 90 ms
</code></pre></div></div>

<h3 id="squeezenet-squeeze1">SqueezeNet (squeeze1)</h3>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Accuracy : 87,8%
Size     : 0.15 Millions of parameters   / 602,892 bytes
Flops    : 22.84 Millions of operations  / 27 ms
</code></pre></div></div>

<h3 id="conclusion-1">Conclusion</h3>

<p>From all model topologies I studied here, the SqueezeNet architecture is by far
the most efficient, reaching the same level of accuracy with a model that is
more than six times lighter than the tutorial version, and more than 1,5 times
faster.</p>

<h2 id="further-experiments">Further experiments</h2>

<p>In my alternative models, I had first included <a href="https://arxiv.org/abs/1409.4842">Inception</a>, but I ruled it out
after finding out how NiN was already costly: it would nevertheless be
interesting to evaluate <a href="https://arxiv.org/pdf/1610.02357.pdf">Xception</a>, one of
its derivative that uses depthwise separable convolutions.</p>

<p>Last, I would like to check how these models could be compressed using iterative
pruning and quantization.</p>

          ]]>
      </description>
    </item>
    
    <item>
      <title>
          <![CDATA[
          Build and boot a minimal Linux system with qemu
          ]]>
      </title>
      <link>http://www.kaizou.org/2016/09/boot-minimal-linux-qemu.html</link>
      <pubDate>Fri, 23 Sep 2016 16:00:00 +0000</pubDate>
      <author>kaizouman@kaizou.org (David Corvoysier)</author>
      <guid>http://www.kaizou.org/2016/09/boot-minimal-linux-qemu</guid>
      <description>
          <![CDATA[
          <p>When you want to build a Linux system for an embedded target these days, it is very unlikely that you decide to do it from scratch.</p>

<p>Embedded Linux build systems are really smart and efficients, and will fit almost all use cases: should you need only a simple system, <a href="https://buildroot.org/">buildroot</a> should be your first choice, and if you want to include more advanced features, or even create a full distribution, <a href="https://www.yoctoproject.org/">Yocto</a> is the way to go.</p>

<p>That said, even if these tools will do all the heavy-lifting for you, they are not perfect, and if you are using less common configurations, you may stumble upon issues that were not expected. In that case, it may be important to understand what happens behind the scenes.</p>

<p>In this post, I will describe step-by-step how you can build a minimal Linux system for an embedded target and boot it using <a href="http://wiki.qemu.org/Main_Page">QEMU</a>.</p>

<!--more-->

<h1 id="install-qemu">Install QEMU</h1>

<p><a href="http://wiki.qemu.org/Main_Page">QEMU</a> is available for all major distros.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">sudo </span>apt-get <span class="nb">install </span>qemu
</code></pre></div></div>

<p>In this post I will create a system for an ARM target, just to make sure I don’t mix between my host and target systems (see the last paragraph of <a href="https://landley.net/writing/docs/cross-compiling.html">this introduction on cross-compilation</a>).</p>

<p>You can list the ARM machines your <a href="http://wiki.qemu.org/Main_Page">QEMU</a> setup supports from the command-line:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>qemu-system-arm <span class="nt">--machine</span> <span class="nb">help
</span>Supported machines are:
versatileab          ARM Versatile/AB <span class="o">(</span>ARM926EJ-S<span class="o">)</span>
...
mainstone            Mainstone II <span class="o">(</span>PXA27x<span class="o">)</span>
...
midway               Calxeda Midway <span class="o">(</span>ECX-2000<span class="o">)</span>
virt                 ARM Virtual Machine
borzoi               Borzoi PDA <span class="o">(</span>PXA270<span class="o">)</span>
</code></pre></div></div>

<p>I will use in this tutorial an old Intel ARM platform, the Mainstone.</p>

<blockquote>
  <p>The only reason I chose this platform is because the maintainer of this board is Robert Jarzmik, who has been sitting next to me in the Open space for the last year. He is _very_knowledgeable on the Kernel and also very nice. Thanks to you, Bob !</p>
</blockquote>

<p>#Generate the toolchain</p>

<p>To generate the binaries for our embedded target, we need a toolchain, which is a set of tools targeting the corresponding processor architecture.</p>

<p>Most of the time, the board manufacturer will have provided the toolchain as part of the BSP (Board Support Package).</p>

<p>Generating a toolchain used to be quite painful, but since the awesome <a href="http://crosstool-ng.org/">crosstool-ng</a> tool has been made available, this is a piece of cake.</p>

<blockquote>
  <p>More namedropping: kudos to my friend <a href="http://ymorin.is-a-geek.org/">Yann E. Morin</a> for developping <a href="http://crosstool-ng.org/">crosstool-ng</a></p>
</blockquote>

<p>First, we need to fetch and install the tool.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>wget http://crosstool-ng.org/download/crosstool-ng/crosstool-ng-1.22.0.tar.xz
<span class="nv">$ </span><span class="nb">tar </span>xf crosstool-ng-1.22.0.tar.xz
<span class="nv">$ </span><span class="nb">cd </span>crosstool-ng/
<span class="nv">$ </span>./configure
<span class="nv">$ </span>make
<span class="nv">$ </span><span class="nb">sudo </span>make <span class="nb">install</span>
</code></pre></div></div>

<p>You can list the pre-configured toolchains that your cross-tool ng version supports:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>ct-ng list-samples
Status  Sample name
  LN    config
  MKDIR config.gen
  IN    config.gen/arch.in
  IN    config.gen/kernel.in
  IN    config.gen/cc.in
  IN    config.gen/binutils.in
  IN    config.gen/libc.in
  IN    config.gen/debug.in
<span class="o">[</span>G..]   alphaev56-unknown-linux-gnu
...
<span class="o">[</span>G..]   armeb-unknown-linux-uclibcgnueabi
...
<span class="o">[</span>G..]   xtensa-unknown-linux-uclibc
 L <span class="o">(</span>Local<span class="o">)</span>       : sample was found <span class="k">in </span>current directory
 G <span class="o">(</span>Global<span class="o">)</span>      : sample was installed with crosstool-NG
 X <span class="o">(</span>EXPERIMENTAL<span class="o">)</span>: sample may use EXPERIMENTAL features
 B <span class="o">(</span>BROKEN<span class="o">)</span>      : sample is currently broken
</code></pre></div></div>

<p>For the Mainstone board, we will use a generic ARM toolchain with <a href="https://www.uclibc.org/">uCLibc</a>, a smaller C library for embedded targets.</p>

<p>You can get the details of the toolchain that will be produced from the command-line:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>ct-ng show-arm-unknown-linux-uclibcgnueabi
  IN    config.gen/arch.in
  IN    config.gen/kernel.in
  IN    config.gen/cc.in
  IN    config.gen/binutils.in
  IN    config.gen/libc.in
<span class="o">[</span>G..]   arm-unknown-linux-uclibcgnueabi
    OS             : linux-4.3
    Companion libs : gmp-6.0.0a mpfr-3.1.3 mpc-1.0.3 libelf-0.8.13 expat-2.1.0 ncurses-6.0
    binutils       : binutils-2.25.1
    C compilers    : gcc  |  5.2.0
    Languages      : C,C++
    C library      : uClibc-ng-1.0.9 <span class="o">(</span>threads: nptl<span class="o">)</span>
    Tools          : dmalloc-5.5.2 duma-2_5_15 gdb-7.10 ltrace-0.7.3 strace-4.10
</code></pre></div></div>

<p>Let’s generate (this <em>will</em> take a while):</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>ct-ng arm-unknown-linux-uclibcgnueabi
<span class="nv">$ </span>ct-ng build
</code></pre></div></div>

<p>By default, the toolchain will be installed under $(HOME)/x-tools/arm-unknown-linux-uclibcgnueabi. In order to use it, we add the toolchain bin directory to the PATH:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">export </span><span class="nv">PATH</span><span class="o">=</span><span class="s2">"</span><span class="k">${</span><span class="nv">PATH</span><span class="k">}</span><span class="s2">:</span><span class="k">${</span><span class="nv">HOME</span><span class="k">}</span><span class="s2">/x-tools/arm-unknown-linux-gnueabi/bin"</span>
<span class="nv">$ </span>arm-unknown-linux-uclibcgnueabi-gcc <span class="nt">--version</span>
arm-unknown-linux-uclibcgnueabi-gcc <span class="o">(</span>crosstool-NG crosstool-ng-1.22.0<span class="o">)</span> 5.2.0
Copyright <span class="o">(</span>C<span class="o">)</span> 2015 Free Software Foundation, Inc.
This is free software<span class="p">;</span> see the <span class="nb">source </span><span class="k">for </span>copying conditions.  There is NO
warranty<span class="p">;</span> not even <span class="k">for </span>MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
</code></pre></div></div>

<blockquote>
  <p>Note that I have added a small routine to my shell startup script to automatically add paths to toolchains:</p>
  <div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for </span><span class="nb">dir </span><span class="k">in</span> <span class="sb">`</span><span class="nb">ls</span> ~/x-tools<span class="sb">`</span><span class="p">;</span> <span class="k">do
</span><span class="nv">PATH</span><span class="o">=</span>~/x-tools/<span class="nv">$dir</span>/bin:<span class="nv">$PATH</span>
<span class="k">done
</span><span class="nb">export </span>PATH
</code></pre></div>  </div>
</blockquote>

<h1 id="sanity-check-test-cross-compilation-environment">Sanity check: test cross-compilation environment</h1>

<p>It is always a good practice to verify at regular intervals that your setup is correct. Here, we will make sure that the toolchain is able to generate ARM code that can be run by qemu-arm, the QEMU ARM CPU emulator.</p>

<p>For those unfamiliar with cross-compilation, this may also help to put things in perspective.</p>

<p>We will compile a very simple program:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>main.c

#include &lt;stdio.h&gt;

int main(int argc, char*argv[])
{
	printf("Genuinely generated by the toolchain\n");
}
</code></pre></div></div>

<p>Let’s first build with a naive command:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>arm-unknown-linux-uclibcgnueabi-gcc main.c <span class="nt">-o</span> sanity
<span class="nv">$ </span><span class="nb">chmod</span> +x sanity
</code></pre></div></div>

<p>We verify that sanity is an ARM exec that cannot run on our system:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>./sanity
bash: ./sanity: cannot execute binary file: Exec format error
<span class="nv">$ </span>file sanity
sanity: ELF 32-bit LSB  executable, ARM, EABI5 version 1 <span class="o">(</span>SYSV<span class="o">)</span>, dynamically linked <span class="o">(</span>uses shared libs<span class="o">)</span>, not stripped
</code></pre></div></div>

<p>Now, let’s try to run it with QEMU:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>qemu-arm sanity
/lib/ld-uClibc.so.0: No such file or directory
</code></pre></div></div>

<p>What happened ? The reason we get this error is because by default GCC has generated a sanity executable that requires dynamic linking of system libraries, as the following command reveals:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>readelf <span class="nt">-d</span> sanity

Dynamic section at offset 0x4f0 contains 18 entries:
  Tag        Type                         Name/Value
 0x00000001 <span class="o">(</span>NEEDED<span class="o">)</span>                     Shared library: <span class="o">[</span>libc.so.1]
 0x0000000c <span class="o">(</span>INIT<span class="o">)</span>                       0x102d4
...
</code></pre></div></div>

<p>Here, QEMU needs to find the C library, and to load it using the dynamic linker, which happens to be also a library, ld-uClibc.so.0, as the INTERP program header reveals:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>readelf <span class="nt">-l</span> sanity
Elf file <span class="nb">type </span>is EXEC <span class="o">(</span>Executable file<span class="o">)</span>
Entry point 0x10334
There are 6 program headers, starting at offset 52

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x00010034 0x00010034 0x000c0 0x000c0 R E 0x4
  INTERP         0x0000f4 0x000100f4 0x000100f4 0x00014 0x00014 R   0x1
      <span class="o">[</span>Requesting program interpreter: /lib/ld-uClibc.so.0]
...
</code></pre></div></div>

<p>Both libraries are under the toolchain ‘sysroot’ directory.</p>

<blockquote>
  <p>Should you decide to support dynamic linking, the dynamic linker and the C library should at some point end up on your target.</p>
</blockquote>

<p>Specifically for that purpose, QEMU supports specifying the path to dynamically linked libraries using the -L option or the QEMU_LD_PREFIX environment variable.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>qemu-arm <span class="nt">-L</span> ~/x-tools/arm-unknown-linux-uclibcgnueabi/arm-unknown-linux-uclibcgnueabi/sysroot/ sanity
Genuinely generated by the toolchain
</code></pre></div></div>

<p>or</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ QEMU_LD_PREFIX</span><span class="o">=</span>~/x-tools/arm-unknown-linux-uclibcgnueabi/arm-unknown-linux-uclibcgnueabi/sysroot/ qemu-arm sanity
Genuinely generated by the toolchain
</code></pre></div></div>

<p>If you want to avoid these linking issues, you can tell GCC to generate a static executable instead:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>arm-unknown-linux-uclibcgnueabi-gcc <span class="nt">-static</span> main.c <span class="nt">-o</span> sanity
<span class="nv">$ </span>qemu-arm sanity
Genuinely generated by the toolchain
</code></pre></div></div>

<h1 id="configure-and-build-the-linux-kernel">Configure and build the Linux Kernel</h1>

<p>At the time this article is written, the latest Kernel stable version is 4.7.5.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">mkdir </span>linux
<span class="nv">$ </span>wget https://cdn.kernel.org/pub/linux/kernel/linux-4.7.5.tar.xz <span class="nt">-O</span> linux/linux-4.7.5.tar.xz
<span class="nv">$ </span><span class="nb">tar </span>xf linux/linux-4.7.5.tar.xz <span class="nt">-C</span> linux
</code></pre></div></div>

<p>We select the mainstone configuration to build the Kernel</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>make <span class="nt">-C</span> linux/linux-4.7.5 <span class="nv">ARCH</span><span class="o">=</span>arm mainstone_defconfig <span class="nv">O</span><span class="o">=</span>linux/build
</code></pre></div></div>

<blockquote>
  <p>You need to specify the architecture to tell the Kernel where it should look for existing configurations (here arch/arm/configs)</p>
</blockquote>

<p>The Linux Kernel is very versatile in the way it boots, and it can be frankly overwhelming if you consider all options.</p>

<p>In this article, I will illustrate two boot modes: a stand-alone Kernel with a RAM initrd, and a Kernel that boots on a root filesystem on an SD card.</p>

<p>As per the Linux Kernel documentation:</p>

<blockquote>
  <p>initrd provides the capability to load a RAM disk by the boot loader.
This RAM disk can then be mounted as the root file system and programs
can be run from it. Afterwards, a new root file system can be mounted
from a different device. The previous root (from initrd) is then moved
to a directory and can be subsequently unmounted.</p>

  <p>initrd is mainly designed to allow system startup to occur in two phases,
where the kernel comes up with a minimum set of compiled-in drivers, and
where additional modules are loaded from initrd.</p>
</blockquote>

<p>initrd is primarily intended to be a bootstrap in RAM that allows the Kernel to get access to the ‘real’ rootfs, but we can also use it to simply boot the Kernel without providing a rootfs.</p>

<p>We will see how we can create an initrd in the subsequent paragraphs.</p>

<p>The mainstone default configuration is fairly minimal, and we will need to add a few options to support these two boot modes.</p>

<p>First, we need to add initrd support by activating the BLK_DEV_INITRD configuration option.</p>

<p>Second, we need to add SD cards support for the mainstone board, that belongs to the PXA family. The driver is called MultiMedia card driver for PXA, and it requires Direct Memory Access: we will therefore need to select MMC, MMC_PXA, DMADEVICES and PXA_DMA.</p>

<p>We also need to activate the AEABI configuration to make sure the Kernel uses the latest ARM EABI convention. As per the Linux Kernel documentation:</p>

<blockquote>
  <p>This option allows for the kernel to be compiled using the latest ARM ABI (aka EABI).  This is only useful if you are using a user space environment that is also compiled with EABI.</p>
</blockquote>

<p>We need to add these options manually using the curses menuconfig interface:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>make <span class="nt">-C</span> linux/build <span class="nv">ARCH</span><span class="o">=</span>arm menuconfig
</code></pre></div></div>

<blockquote>
  <p>General Setup-&gt;Initial RAM filesystem and RAM disk (initramfs/initrd) support
Device Drivers-&gt;MMC/SD/SDIO card support-&gt;Intel PXA25x/.. Multimedia Card Interface support
Device Drivers-&gt;DMA Engine support-&gt;PXA DMA support
Kernel Features-&gt;Use the ARM EABI to compile the kernel</p>
</blockquote>

<p>Once our Kernel has been properly configured, we can build it:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>make <span class="nt">-C</span> linux/build <span class="nv">ARCH</span><span class="o">=</span>arm <span class="nv">CROSS_COMPILE</span><span class="o">=</span>arm-unknown-linux-uclibcgnueabi-
</code></pre></div></div>

<p>At the end of the build, our Kernel will be under linux/build/arch/arm/boot.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">ls </span>linux/build/arch/arm/boot/
compressed  Image  zImage
</code></pre></div></div>

<h1 id="sanity-check-launch-the-linux-kernel-with-qemu">Sanity check: launch the Linux Kernel with QEMU</h1>

<p>We verify that the Kernel has been properly generated by launching it with qemu-system-arm, the <a href="http://wiki.qemu.org/Main_Page">QEMU</a> system emulator (note the difference with qemu-arm, the CPU emulator).</p>

<p>We pass four parameters on the command-line:</p>

<ul>
  <li>kernel: path to our Kernel,</li>
  <li>machine: the machine w euse (here ‘mainstone’),</li>
  <li>serial: set to ‘stdio’ to the Kernel printk logs in the console,</li>
  <li>append: parameters to add to the Kernel command-line</li>
</ul>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>qemu-system-arm <span class="nt">-kernel</span> linux/zImage <span class="nt">-serial</span> stdio <span class="nt">-append</span> <span class="s1">'console=ttyS0'</span> <span class="nt">-M</span> mainstone
Two flash images must be given with the <span class="s1">'pflash'</span> parameter
</code></pre></div></div>

<p>The mainstone board has two 64 Mb flash banks whose images must be provided on the qemu-system-arm command-line.</p>

<p>We create two empty images:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">dd </span><span class="k">if</span><span class="o">=</span>/dev/zero <span class="nv">of</span><span class="o">=</span>mainstone-flash0.img <span class="nv">bs</span><span class="o">=</span>1024 <span class="nv">count</span><span class="o">=</span>65536
<span class="nv">$ </span><span class="nb">dd </span><span class="k">if</span><span class="o">=</span>/dev/zero <span class="nv">of</span><span class="o">=</span>mainstone-flash1.img <span class="nv">bs</span><span class="o">=</span>1024 <span class="nv">count</span><span class="o">=</span>65536
</code></pre></div></div>

<p>We can now launch the Kernel.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>qemu-system-arm <span class="nt">-kernel</span> linux/zImage <span class="nt">-append</span> <span class="s1">'console=ttyS0'</span> <span class="nt">-machine</span> mainstone <span class="nt">-serial</span> stdio <span class="nt">-pflash</span> mainstone-flash0.img <span class="nt">-pflash</span> mainstone-flash1.img
Booting Linux on physical CPU 0x0
Linux version 4.7.5 <span class="o">(</span>xxx@yyy<span class="o">)</span> <span class="o">(</span>gcc version 5.2.0 <span class="o">(</span>crosstool-NG crosstool-ng-1.22.0<span class="o">)</span> <span class="o">)</span> <span class="c">#1 Tue Sep 27 09:35:52 CEST 2016</span>
CPU: XScale-PXA270 <span class="o">[</span>69054117] revision 7 <span class="o">(</span>ARMv5TE<span class="o">)</span>, <span class="nv">cr</span><span class="o">=</span>00007977
...
XScale iWMMXt coprocessor detected.
VFS: Cannot open root device <span class="s2">"(null)"</span> or unknown-block<span class="o">(</span>0,0<span class="o">)</span>: error <span class="nt">-6</span>
Please append a correct <span class="s2">"root="</span> boot option<span class="p">;</span> here are the available partitions:
Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block<span class="o">(</span>0,0<span class="o">)</span>
...
</code></pre></div></div>

<p>It still fails because we didn’t provide a rootfs nor an initrd.</p>

<h1 id="create-a-tiny-init">Create a tiny init</h1>

<p>Let’s create a simplistic bootstrap:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>main.c:

#include &lt;stdio.h&gt;

void main()
{
	printf("Tiny init ...\n");
	while(1);
}
</code></pre></div></div>

<p>We compile it using the ARM toolchain, passing a few CFLAGS to specify the mainstone CPU instruction set:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>arm-unknown-linux-uclibcgnueabi-gcc <span class="nt">-static</span> <span class="nt">-march</span><span class="o">=</span>armv5te <span class="nt">-mtune</span><span class="o">=</span>xscale <span class="nt">-Wa</span>,-mcpu<span class="o">=</span>xscale main.c <span class="nt">-o</span> init
<span class="nv">$ </span><span class="nb">chmod</span> +x init
</code></pre></div></div>

<p>We will now use that bootstrap to boot the system after the Kernel has been loaded.</p>

<h1 id="ram-boot-using-initrd">RAM boot using initrd</h1>

<p>We create a CPIO RAM image that contains only the init program:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">echo </span>init | cpio <span class="nt">-o</span> <span class="nt">--format</span><span class="o">=</span>newc <span class="o">&gt;</span> initramfs
</code></pre></div></div>

<p>Now, if we launch the Kernel again, specifying our initramfs, we end up in the tiny init loop:</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>qemu-system-arm <span class="nt">-kernel</span> linux/zImage <span class="nt">-append</span> <span class="s1">'console=ttyS0'</span> <span class="nt">-machine</span> mainstone <span class="nt">-serial</span> stdio <span class="nt">-pflash</span> mainstone-flash0.img <span class="nt">-pflash</span> mainstone-flash1.img <span class="nt">-initrd</span> initramfs
Booting Linux on physical CPU 0x0
Linux version 4.7.5 <span class="o">(</span>xxx@yyy<span class="o">)</span> <span class="o">(</span>gcc version 5.2.0 <span class="o">(</span>crosstool-NG crosstool-ng-1.22.0<span class="o">)</span> <span class="o">)</span> <span class="c">#1 Tue Sep 27 09:35:52 CEST 2016</span>
CPU: XScale-PXA270 <span class="o">[</span>69054117] revision 7 <span class="o">(</span>ARMv5TE<span class="o">)</span>, <span class="nv">cr</span><span class="o">=</span>00007977
...
XScale iWMMXt coprocessor detected.
Freeing unused kernel memory: 148K <span class="o">(</span>c03cf000 - c03f4000<span class="o">)</span>
This architecture does not have kernel memory protection.
Tiny init ...
</code></pre></div></div>

<h1 id="boot-on-a-sd-card-image">Boot on a SD card image</h1>

<p>We will now create an SD card image containing the tiny init code.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>qemu-img create init.img 128K
</code></pre></div></div>

<p>We format the SD card image with an ext2 file-system.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>mkfs.ext2 init.img
mke2fs 1.42.13 <span class="o">(</span>17-May-2015<span class="o">)</span>
Discarding device blocks: <span class="k">done
</span>Creating filesystem with 128 1k blocks and 16 inodes

Allocating group tables: <span class="k">done
</span>Writing inode tables: <span class="k">done
</span>Writing superblocks and filesystem accounting information: <span class="k">done</span>
</code></pre></div></div>

<p>Then, we can mount it and copy the init program into the image</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span><span class="nb">mkdir </span>tmp
<span class="nv">$ </span><span class="nb">sudo </span>mount <span class="nt">-o</span> loop init.img tmp
<span class="nv">$ </span><span class="nb">mkdir</span> <span class="nt">-p</span> tmp/sbin
<span class="nv">$ </span><span class="nb">sudo cp </span>init tmp/sbin/
<span class="nv">$ </span><span class="nb">sudo </span>umount tmp
<span class="nv">$ </span><span class="nb">rmdir </span>tmp
</code></pre></div></div>

<blockquote>
  <p>Note that the Kernel expects the init bootstrap to be under /sbin/init, and not at the root of the file system like in the initram file system.</p>
</blockquote>

<p>We can now launch the Kernel specifying that the rootfs is on /dev/mmcblk0, which is the pseudo-device for the SD card passed to <a href="http://wiki.qemu.org/Main_Page">QEMU</a> with the -sd option.</p>

<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>qemu-system-arm <span class="nt">-kernel</span> linux/zImage <span class="nt">-append</span> <span class="s1">'console=ttyS0 root=/dev/mmcblk0'</span> <span class="nt">-machine</span> mainstone <span class="nt">-serial</span> stdio <span class="nt">-pflash</span> mainstone-flash0.img <span class="nt">-pflash</span> mainstone-flash1.img <span class="nt">-sd</span> init.img
Booting Linux on physical CPU 0x0
Linux version 4.7.5 <span class="o">(</span>xxx@yyy<span class="o">)</span> <span class="o">(</span>gcc version 5.2.0 <span class="o">(</span>crosstool-NG crosstool-ng-1.22.0<span class="o">)</span> <span class="o">)</span> <span class="c">#1 Tue Sep 27 09:35:52 CEST 2016</span>
CPU: XScale-PXA270 <span class="o">[</span>69054117] revision 7 <span class="o">(</span>ARMv5TE<span class="o">)</span>, <span class="nv">cr</span><span class="o">=</span>00007977
...
XScale iWMMXt coprocessor detected.
mmc0: host does not support reading read-only switch, assuming write-enable
mmc0: new SD card at address 4567
mmcblk0: mmc0:4567 QEMU! 1.00 GiB
VFS: Mounted root <span class="o">(</span>ext2 filesystem<span class="o">)</span> <span class="nb">readonly </span>on device 179:0.
Freeing unused kernel memory: 152K <span class="o">(</span>c03ee000 - c0414000<span class="o">)</span>
This architecture does not have kernel memory protection.
Tiny init ...
</code></pre></div></div>

<p>Voila !</p>

<p>In a following article, I will demonstrate how to create a small rootfs using <a href="https://busybox.net/">BusyBox</a>.</p>

          ]]>
      </description>
    </item>
    
    <item>
      <title>
          <![CDATA[
          Benchmarking build systems for a large C project
          ]]>
      </title>
      <link>http://www.kaizou.org/2016/09/build-benchmark-large-c-project.html</link>
      <pubDate>Thu, 01 Sep 2016 16:00:00 +0000</pubDate>
      <author>kaizouman@kaizou.org (David Corvoysier)</author>
      <guid>http://www.kaizou.org/2016/09/build-benchmark-large-c-project</guid>
      <description>
          <![CDATA[
          <p>The performance of build systems has been discussed at large in the developer community, with a strong emphasis made on the limitations of the legacy Make tool when dealing with large/complex projects.</p>

<p>I recently had to develop a build-system to create firmwares for embedded targets from more than 1000 source files.</p>

<p>The requirements were to use build recipes that could be customized for each directory and file in the source tree, similar to what the Linux Kernel does with <a href="https://www.kernel.org/doc/Documentation/kbuild/makefiles.txt">kbuild</a>.</p>

<p>I designed a custom recursive Make solution inspired by <a href="https://www.kernel.org/doc/Documentation/kbuild/makefiles.txt">kbuild</a>.</p>

<!--more-->

<blockquote>
  <p>Note: for those interested, this is the build system used in the <a href="https://github.com/CurieBSP/main">Intel Curie SDK for wearables</a>.</p>
</blockquote>

<p>After one major release, I had some time to muse around the abundant litterature on build systems, and in particular the infamous <a href="http://aegis.sourceforge.net/auug97.pdf">“Recursive Make considered harmful”</a>, and started to wonder whether I had made the right design choice.</p>

<p>Obviously, my solution had the same limitation that all recursive make have: it was unable to export explicit dependency from one part of the tree to another, but we had easily overcomed that by relying solely on headers to express dependencies , and taking advantage of the <a href="http://www.evanjones.ca/makefile-dependencies.html">GCC automatic dependencies</a>, pretty much like all C projects do anyway.</p>

<p>The solution was also relatively fast, which seemed to contradict the claims of many people](http://stackoverflow.com/questions/559216/what-is-your-experience-with-non-recursive-make).</p>

<p>I therefore decided to do a little benchmark to sort it out.</p>

<blockquote>
  <p>You can check for yourself the several solutions in this <a href="https://github.com/kaizouman/build-benchmark">repo</a>.</p>
</blockquote>

<h1 id="the-benchmark">The benchmark</h1>

<p>The benchmark is to compile a hierachical source tree with directories containing each two source files (header + implementation), and one build fragment specifying a custom preprocessor definition. Each directory implementation ‘depends’ on its children directories sources by including their headers.</p>

<blockquote>
  <p>Yes, it is a wacky design, but I just wanted to challenge the build-system</p>
</blockquote>

<p>The benchmark script tests several build-system invocations in four configurations:</p>

<ul>
  <li>cold start (full build from a fresh tree),</li>
  <li>full rebuild (touch all sources and rebuild),</li>
  <li>build leaf (only touch one of the leaf headers),</li>
  <li>nothing to do.</li>
</ul>

<h1 id="the-solutions">The solutions</h1>

<h2 id="kbuild">Kbuild</h2>

<p>The first solution is a variant of my kbuild clone. The design is dead simple:</p>

<ul>
  <li>each directory has a Makefile fragment that produces a C static library,</li>
  <li>a directory archive aggregates the object files in this directory and the static libraries of its subdirectories,</li>
  <li>a generic Makefile is launched recursively on the source tree to generate libraries and aggregate them to the top.</li>
</ul>

<p>The syntax of the Makefile fragments is the same as the one used by the Linux Kernel:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>obj-y = foo.c bar/
cflags-y = -Isomepath -DFOO
</code></pre></div></div>

<p>The generic Makefile is a bit cryptic for those not familiar with the Make syntax, but it actually not very complicated.</p>

<p>This Makefile starts by including the Makefile fragment, then does some processing on the local obj-y variable, to identify local objects and subdirectories.</p>

<p>It then defines rules to:</p>

<ul>
  <li>build subdirectory archives by relaunching itself on each subdirectory,</li>
  <li>build local objects, taking into account local CFLAGS,</li>
  <li>create the directory library as a ‘thin’ archive, ie a list of references to actual object files.</li>
</ul>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>THIS_FILE := $(abspath $(lastword $(MAKEFILE_LIST)))

all:

# Those are supposed to be passed on the command line
OUT ?= build
SRC ?= src

# Look for a Makefile in the current source directory
-include $(SRC)/Makefile

# First, identify if there are any directories specifed in obj-y that we need
# to descend into
subdir-y := $(sort $(patsubst %/,%,$(filter %/, $(obj-y))))

# Next, update the list of objects, replacing any specified directory by the
# aggregated object that will be produced when descending into it
obj-y := $(patsubst %/, %/built-in.a, $(obj-y))

# Prepend the subdirectories with the actual source directory
subdir-y := $(addprefix $(SRC)/,$(subdir-y))

# Prepend the objects with the actual build DIR
obj-y := $(addprefix $(OUT)/$(SRC)/,$(obj-y))

# Fake target used to force subdirectories to be visited on every Make call
.FORCE:
# Go into each subdirectory to build aggregated objects
$(OUT)/$(SRC)/%/built-in.a: .FORCE
	$(MAKE) -f $(THIS_FILE) SRC=$(SRC)/$* OUT=$(OUT)

# Include dependency files that may have been produced by a previous build
-include $(OUT)/$(SRC)/*.d

# Evaluate local CFLAGS
LOCAL_CFLAGS := -MD $(CFLAGS) $(cflags-y)

# Build C files
$(OUT)/$(SRC)/%.o: $(SRC)/%.c
	mkdir -p $(OUT)/$(SRC)
	$(CC) $(LOCAL_CFLAGS) -c -o $@ $&lt;

# Create an aggregated object for this directory
$(OUT)/$(SRC)/built-in.a: $(obj-y)
	mkdir -p $(OUT)/$(SRC)
	$(AR) -rcT $@ $^

all: $(OUT)/$(SRC)/built-in.a
</code></pre></div></div>

<p>Note that since we cannot ‘guess’ if a nested library needs to be rebuilt, we force going into each subdirectory using a fake target. This is the main drawback of this solution, as every single directory of the source tree will be parsed even if no file has changed in the source tree.</p>

<p>The top-level Makefile has only two targets:</p>

<ul>
  <li>one to create the target executable based on the top aggregated library,</li>
  <li>one to create the library by launching the generic Makefile at the top of the source tree.</li>
</ul>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>all: $(OUT)/foo

$(OUT)/foo: $(OUT)/built-in.a
        $(CC) -o $@ $^

$(OUT)/built-in.a: .FORCE
        mkdir -p $(OUT)
        $(MAKE) -C $(SRC) -f $(CURDIR)/Makefile.kbuild \
                SRC=. \
                OUT=$(OUT)

.FORCE:
</code></pre></div></div>

<h2 id="non-recursive-makefile">Non recursive Makefile</h2>

<p>The second solution is one that is inspired by the principles of Peter Miller’s paper.</p>

<p>It uses the same Makefile fragments, but instead of recursively launching Make on subdirectories, it recursively includes the fragments.</p>

<p>The whole process is implemented using a recursive <a href="https://www.gnu.org/software/make/manual/html_node/Eval-Function.html">GNU Make template</a>.</p>

<p>For performance reason, we use a single parameterized generic rule to build objects in the source tree.</p>

<p>During the evaluation of each subdirectory, we gather object files in a global variable, and customize the generic build rule by defining the value
of the CFLAGS for each object in the subdirectory.</p>

<blockquote>
  <p>I first designed a variant that created a rule for each subdirectory, but its performances decreased exponentially with the number of directories.</p>
</blockquote>

<p>At the end of the Makefile, we use a single <code class="language-plaintext highlighter-rouge">foreach</code> instruction to include dependency files based on the list of objects.</p>

<blockquote>
  <p>I also tried to include these during the subdirectories evaluation, but it was less performant</p>
</blockquote>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># These are actually passed to us, but provide default values for easier reuse
OUT ?= build
SRC ?= src

# We parse each subdirectory to gather object files
OBJS :=

# Sub-directory parsing function
define parse_subdir

# Reset sub-Makefile variables as a precaution
obj-y :=
cflags-y :=

# Include sub-Makefile
include $(1)/Makefile

# Isolate objects from subdirectories and prefix them with the output directory
_objs := $$(addprefix $(OUT)/$(1)/,$$(sort $$(filter-out %/, $$(obj-y))))

# Define a specific CFLAGS for objects in this subdir
$$(_objs): SUBDIR_CFLAGS := -MD $$(CFLAGS) $$(cflags-y)

# Add subdir objects to global list
OBJS += $$(_objs)

# Isolate subdirs from objects and prefix them with source directory
_subdirs := $$(addprefix $(1)/,$$(sort $$(patsubst %/,%,$$(filter %/, $$(obj-y)))))

# Recursively parse subdirs
$$(foreach subdir,$$(_subdirs), $$(eval $$(call parse_subdir,$$(subdir))))

endef

# Start parsing subdirectories at the root of the source tree
$(eval $(call parse_subdir,$(SRC)))

# Generic rule to compile C files
$(OUT)/%.o: %.c
        mkdir -p $(dir $@)
        $(CC) $(SUBDIR_CFLAGS) -c -o $@ $&lt;

# Include GCC dependency files for each source file
$(foreach obj,$(OBJS),$(eval -include $(obj:%.o=%.d)))

</code></pre></div></div>

<p>The top-level Makefile just includes the “subdirectories” Makefile.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>all: $(OUT)/foo

include $(CURDIR)/Makefile.subdir

$(OUT)/foo: $(OBJS)
        $(CC) -o $@ $^
</code></pre></div></div>

<blockquote>
  <p>It could be a single Makefile, but I found it neater to keep the “generic” template in a separate file.</p>
</blockquote>

<h2 id="custom-generated-makefile">Custom generated Makefile</h2>

<p>As a variant to the previous solution, I tried to parse the Makefile fragments only once to generate a Makefile in the output directory, then generate the target.</p>

<p>This is basically the same template that is used to generate the actual Makefile: the only difference is that the list of objects and custom CFLAGS per directory are evaluated in memory AND written to the actual Makefile.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># These are actually passed to us, but provide default values for easier reuse
OUT ?= build
SRC ?= src

# The only goal of this Makefile is to generate the actual Makefile
all: $(OUT)/Makefile

...


$(OUT)/Makefile::
        @echo "Generating $@"
        @echo "all: $(OUT)/foo" &gt;&gt; $@

# Sub-directory parsing function
define parse_subdir

...

# Include sub-Makefile
include $(1)/Makefile

...

# Define a specific CFLAGS for objects in this subdir
$(1)_CFLAGS := -MD $$(CFLAGS) $$(cflags-y)
# Insert the corresponding goal modifier in the target Makefile
$(OUT)/Makefile::
        echo "$(OUT)/$(1)/%.o: LOCAL_CFLAGS=$$($(1)_CFLAGS)" &gt;&gt; $$@

...

endef

# Start parsing subdirectories at the root of the source tree
$(eval $(call parse_subdir,$(SRC)))

# Finalize target Makefile inserting generic C compilation rule and GCC
# dependencies for each source file
$(OUT)/Makefile::
        echo "OBJS:= $(OBJS)" &gt;&gt; $@
        echo "$(OUT)/%.o: %.c" &gt;&gt; $@
        echo '  mkdir -p $$(dir $$@)' &gt;&gt; $@
        echo '  $$(CC) $$(LOCAL_CFLAGS) -c -o $$@ $$&lt;' &gt;&gt; $@
        echo "" &gt;&gt; $@
        $(foreach obj,$(OBJS),echo "-include $(obj:%.o=%.d)"; &gt;&gt; $@)
        @echo "$(OUT)/foo: $(OBJS)" &gt;&gt; $@
        @echo ' $$(CC) -o $$@ $$^' &gt;&gt; $@
        @echo "Done $@"
</code></pre></div></div>

<p>The top-level Makefile includes the generated Makefile and provides a rule to generate it: GNU Make take cares of the rest.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>all: $(OUT)/foo

$(OUT)/foo: $(OUT)/Makefile .FORCE
        $(MAKE) -C $(OUT)

FRAGMENTS := \
        $(shell find $(SRC) -name Makefile -cnewer $(OUT)/Makefile 2&gt;/dev/null)

$(OUT)/Makefile: $(FRAGMENTS)
        mkdir -p $(OUT)
        $(MAKE) -C $(SRC) -f $(CURDIR)/Makefile.gen \
                SRC=$(SRC) \
                OUT=$(OUT)

.FORCE:
</code></pre></div></div>

<blockquote>
  <p>Note the trick to make sure the Makefile is properly regenerated: since Make has difficulties to cope with a large number of dependencies, we use the shell to identify the fragments that have changed.</p>
</blockquote>

<h2 id="cmake">CMake</h2>

<p>CMake is a Makefile generator. I added a CMake solution to compare it with the previous custom generated Makefile.</p>

<p>Two issues I had to solve were:</p>

<ul>
  <li>how to recursively select sources for the final target</li>
  <li>how to express different CFLAGS for a directory</li>
</ul>

<p>The simple solution I found was to use CMake subdirectories and to define a static library in each one of them.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ADD_LIBRARY(output_src_1 STATIC foo.c)
ADD_SUBDIRECTORY(1)
TARGET_LINK_LIBRARIES(output_src_1 output_src_1_1)
...
TARGET_LINK_LIBRARIES(output_src_1 output_src_1_9)
ADD_SUBDIRECTORY(10)
TARGET_LINK_LIBRARIES(output_src_1 output_src_1_10)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -D'CURDIR=output/src/1'")
</code></pre></div></div>

<blockquote>
  <p>It seems to lead CMake to create a recursive Makefile. It would be interesting to try a different approach using include to gather fragments and per-source properties to set the CFLAGS.</p>
</blockquote>

<p>The top-level Makefile has two rules: one to build the generated Makefile, the other one to create the target using it.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$(OUT)/foo: $(OUT)/Makefile .FORCE
        $(MAKE) -C $(OUT)

-include $(OUT)/Makefile

$(OUT)/Makefile:
        mkdir -p $(OUT)
        cd $(OUT) &amp;&amp; cmake -Wno-dev $(SRC)

.FORCE:
</code></pre></div></div>

<blockquote>
  <p>Note that the generated Makefile will detect automatically changes made to the Makefile fragments and regenerate the target Makefile thanks to CMake built-in checks.</p>
</blockquote>

<h2 id="boilermake">Boilermake</h2>

<p><a href="https://github.com/dmoulding/boilermake">Boilermake</a> is an awesome generic non-recursive Make template. I included it in order to compare it to my own non-recursive solution.</p>

<h2 id="cninja-cmake--ninja">Cninja (CMake + Ninja)</h2>

<p>CMake is able to generate <a href="https://ninja-build.org/">Ninja</a> files, so I only had to adapt my CMake based-solution to compare the generated GNU Make build with the generated <a href="https://ninja-build.org/">Ninja</a> build.</p>

<p>One issue I had with Ninja is that it doesn’t cope well with large command lines.</p>

<p>There is an ugly fix that was introduced to address that for the WebKit project.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>set(CMAKE_NINJA_FORCE_RESPONSE_FILE 1)
</code></pre></div></div>

<blockquote>
  <p>Guys, gotcha: when are you gonna fix this ?</p>
</blockquote>

<h2 id="ninja">Ninja</h2>

<p>The CMake generated Ninja build performance was awesome for incremental builds, but not so good for full builds as soon as the number of files increased.</p>

<p>I had my suspicions it may come from the way CMake generated the Ninja file, especially with the intermediate libraries I had to declare.</p>

<p>Having received a similar feedback from the ninja mailing-list, I wrote a small parser in python to generate the <code class="language-plaintext highlighter-rouge">build.ninja</code> file directly from the Makefile fragments.</p>

<p>Here is how the generated file looks:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>rule cc
  deps = gcc
  depfile = $out.d
  command = cc -MD -MF $out.d $cflags -c $in -o $out
rule ld
  command = cc @$out.rsp -o $out
  rspfile = $out.rsp
  rspfile_content = $in
build /home/david/dev/make-benchmark/output/src/main.o: cc /home/david/dev/make-benchmark/output/src/main.c
  cflags = -D'CURDIR=output/src'
build /home/david/dev/make-benchmark/output/src/foo.o: cc /home/david/dev/make-benchmark/output/src/foo.c
  cflags = -D'CURDIR=output/src'
build /home/david/dev/make-benchmark/output/src/1/foo.o: cc /home/david/dev/make-benchmark/output/src/1/foo.c
  cflags = -D'CURDIR=output/src/1'

...

build foo : ld /home/david/dev/make-benchmark/output/src/main.o ...

</code></pre></div></div>

<p>The results are indeed much better, as you will see in the next paragraph.</p>

<h1 id="the-raw-results">The raw results</h1>

<p>I ran the benchmark on a Intel Core i7 with 16 GB RAM and an SSD drive.</p>

<p>All build times are in seconds.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$make --version
GNU Make 3.81
$cmake --version
cmake version 2.8.12.2
$ninja --version
1.3.4
</code></pre></div></div>

<p>Tree = 2 levels, 10 subdirectories per level (12 .c files)</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>|               | kbuild | nrecur | static | cmake | b/make | cninja | ninja |
|---------------|--------|--------|--------|-------|--------|--------|-------|
| cold start    |  0.08  |  0.06  |  0.08  | 0.55  |  0.08  |  0.36  | 0.08  |
| full rebuild  |  0.06  |  0.06  |  0.06  | 0.23  |  0.07  |  0.04  | 0.06  |
| rebuild leaf  |  0.04  |  0.03  |  0.03  | 0.16  |  0.04  |  0.05  | 0.02  |
| nothing to do |  0.01  |  0.00  |  0.00  | 0.06  |  0.01  |  0.00  | 0.00  |
</code></pre></div></div>

<p>Tree = 3 levels, 10 subdirectories per level (112 .c files)</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>|               | kbuild | nrecur | static | cmake | b/make | cninja | ninja |
|---------------|--------|--------|--------|-------|--------|--------|-------|
| cold start    |  0.47  |  0.45  |  0.51  | 1.84  |  0.52  |  0.91  | 0.53  |
| full rebuild  |  0.48  |  0.46  |  0.44  | 1.34  |  0.54  |  0.39  | 0.32  |
| rebuild leaf  |  0.10  |  0.09  |  0.09  | 0.46  |  0.11  |  0.07  | 0.05  |
| nothing to do |  0.06  |  0.05  |  0.06  | 0.40  |  0.07  |  0.00  | 0.01  |
</code></pre></div></div>

<p>Tree = 4 levels, 10 subdirectories per level (1112 .c files)</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>|               | kbuild | nrecur | static | cmake | b/make | cninja | ninja |
|---------------|--------|--------|--------|-------|--------|--------|-------|
| cold start    |  4.62  |  4.57  |  5.78  | 16.72 |  5.48  |  7.50  |  4.00 |
| full rebuild  |  4.85  |  4.57  |  4.78  | 15.12 |  5.56  |  6.39  |  3.90 |
| rebuild leaf  |  0.98  |  0.86  |  1.04  |  4.47 |  1.07  |  0.28  |  0.21 |
| nothing to do |  0.53  |  0.67  |  0.82  |  4.44 |  0.88  |  0.05  |  0.03 |
</code></pre></div></div>

<p>Tree = 5 levels, 10 subdirectories per level (11112 .c files)</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>|               | kbuild | nrecur | static | cmake  | b/make | cninja | ninja |
|---------------|--------|--------|--------|--------|--------|--------|-------|
| cold start    |  59.01 |  54.07 | 118.00 | 509.96 |  72.41 | 175.58 | 46.98 |
| full rebuild  |  63.41 |  61.38 | 103.95 | 376.40 |  80.17 | 101.76 | 46.66 |
| rebuild leaf  |  10.86 |  17.18 |  59.03 | 215.44 |  20.19 |   2.81 |  2.28 |
| nothing to do |   5.13 |  14.95 |  56.87 | 220.49 |  17.78 |   0.47 |  0.03 |
</code></pre></div></div>

<h1 id="my-two-cents">My two cents</h1>

<p>From the results above, I conclude that:</p>

<ul>
  <li>for my use case, and with my hardware (I suspect SSD is a huge bonus for recursive Make), non-recursive and recursive Makefiles are equivalent,</li>
  <li>my generated Makefile is completely suboptimal (would need to investigate),</li>
  <li>CMake generated Makefiles are pretty darn slow …</li>
  <li>As long as you don’t generate the <code class="language-plaintext highlighter-rouge">build.ninja</code> with CMake, Ninja is faster than any Make based solution, especially when only a few files have changed.</li>
</ul>


          ]]>
      </description>
    </item>
    
    <item>
      <title>
          <![CDATA[
          Decentralized modules declarations in C using ELF sections
          ]]>
      </title>
      <link>http://www.kaizou.org/2016/08/decentralized-modules-c-elf-sections.html</link>
      <pubDate>Wed, 17 Aug 2016 16:00:00 +0000</pubDate>
      <author>kaizouman@kaizou.org (David Corvoysier)</author>
      <guid>http://www.kaizou.org/2016/08/decentralized-modules-c-elf-sections</guid>
      <description>
          <![CDATA[
          <p>In modular programming, a standard practice is to define common interfaces allowing the same type of operation to be performed on a set of otherwise independent modules.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
modules = [a,b,...]

for each m in modules:
    m.foo
    m.bar

</code></pre></div></div>

<p>To implement this pattern, two mechanisms are required:</p>

<ul>
  <li>instantiation, to allow each module to define an ‘instance’ of the common interface,</li>
  <li>registration, to allow each module to ‘provide’ this instance to other modules.</li>
</ul>

<p>Instantiation is typically supported natively in high-level languages.</p>

<p>Registration is more difficult and usually requires specific code to be written, or relying on external frameworks.</p>

<p>Let’s see how these two mechanisms can be implemented for C programs.</p>

<!--more-->

<blockquote>
  <p>Note: the code snippets in this post can be browsed on the following github <a href="https://github.com/kaizouman/c_modules_section_sample">repo</a></p>
</blockquote>

<h2 id="interface-instantiation">Interface instantiation</h2>

<p>In C programs, interface instantiation is implemented using function pointers: basically, the common interface is specified using a struct whose members are the functions that needs to be implemented.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
module.h:

struct module {
	void (*foo)(void*);
	int (*bar)(char*);
};

a.c:

#include "module.h"

struct module module_a = {
	.foo = foo_a;
	.bar = bar_a;
};

static void foo_a()
{
	
}

static void bar_a()
{

}

b.c:

#include "module.h"

struct module module_b = {
	.foo = foo_b;
	.bar = bar_b;
};

static void foo_b()
{
	
}

static void bar_b()
{

}
</code></pre></div></div>

<h2 id="interface-registration">Interface registration</h2>

<p>The goal here is to allow client code to be able to ‘find’ the interface instances provided by the modules.</p>

<p>The first question we need to address is whether we register interfaces statically at design time or dynamically at runtime.</p>

<p>Some systems like Linux provide mechanisms for special ‘constructors’ functions to be called at program initialization. We could take advantage of that feature to allow each module to register its interfaces: see a full example <a href="https://github.com/idjelic/lttng2lxt">here</a>.</p>

<p>In this article, I assume that we are on a system without such capability, and that we as a consequence can only rely on static registration.</p>

<blockquote>
  <p>Note that static registration is also more effective, and always desirable on devices with limited hardware.</p>
</blockquote>

<p>A first solution for static registration of modules is to give the client code a direct access to the interface instances, by exposing them in public headers.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
module.h:

struct module {
	void (*foo)();
	void (*bar)();
};

extern struct module module_a;
extern struct module module_b;

foo.c

#include "module.h"

void foo()
{
    module_a.foo();
    module_b.foo();
}

bar.c

#include "module.h"

void bar()
{
    module_a.bar();
    module_b.bar();
}

</code></pre></div></div>

<p>This works, but it is not quite satisfactory: as more modules are added to the program, the client code needs to be modified.</p>

<p>A better solution would be to store the instances anonymously in a static array:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
module.h:

struct module {
	void (*foo)(void*);
	int (*bar)(char*);
};

extern struct module *modules[];

extern int modules_size;

module.c:

#include "module.h"

extern struct module module_a;
extern struct module module_b;

struct module *modules[2] = {
	&amp;module_a,
	&amp;module_b
};

int modules_size = 2;

foo.c

#include "module.h"

void foo()
{
	int i;
	for (i = 0; i &lt; modules_size; i++) {
		modules[i]-&gt;foo();
	}
}

bar.c

#include "module.h"

void bar()
{
	int i;
	for (i = 0; i &lt; modules_size; i++) {
		modules[i]-&gt;bar();
	}
}

</code></pre></div></div>

<p>This is quite neat, as we will only need to modify the <code class="language-plaintext highlighter-rouge">module.c</code> file when a new module is added.</p>

<p>This could be even better though: what if we could add modules without editing any other files ?</p>

<h2 id="taking-advantage-of-elf-sections-to-create-decentralized-module-tables">Taking advantage of ELF sections to create decentralized module tables</h2>

<p>The only reason why we need to edit the <code class="language-plaintext highlighter-rouge">modules.c</code> file is because we need to add new entries to the global modules static array.</p>

<p>The array in itself is just a bunch of pointers written one after the other in a contiguous memory space: what if we could find a way to populate it directly from the modules themselves ?</p>

<p>This cannot be achieved by either the preprocessor or the compiler, as they process compilation units atomically (when a file is processed, the compiler has no knowledge of the other files it has compiled or will compile in the future).</p>

<p>The linker however has the knowledge of all symbols declared in the program, and is even capable of grouping them according to section definitions, as long as we specify them in a custom linker script.</p>

<p>We can take advantage of that to make sure that all references to the interface instances are stored in the same section, and define the modules array as being the start address of the section.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
module.lds:

SECTIONS
{
	.modules : {
		modules_start = .;
		*(.modules)
		modules_end = .;
	}
}
INSERT AFTER .rodata;

module.h:

struct module {
	void (*foo)();
	void (*bar)();
};

extern const struct module modules_start[];
extern const struct module modules_end[];

Makefile:

OBJS := main.o a.o b.o foo.o bar.o

modules: $(OBJS)
	gcc -o $@ -T module.lds $(OBJS:*.c=*.o)

</code></pre></div></div>

<p>What we do here is that we add an extension to the generic linker script to add a <code class="language-plaintext highlighter-rouge">modules</code> section.
We also insert two labels at the beginning and end of the section that can be accessed from the C code.</p>

<p>In the <code class="language-plaintext highlighter-rouge">module.h</code> file, we use these labels to declare external references to the start and end of the section.</p>

<blockquote>
  <p>Note that the external references have to be declared as arrays, and not pointers, to make sure the compiler maps correctly the address to the memory region containing the modules: should we have declared them as pointers, the compiler would have mapped the beginning of the memory region to a pointer, then dereferenced it to get access to the modules.
You can refer to <a href="http://eli.thegreenplace.net/2009/10/21/are-pointers-and-arrays-equivalent-in-c">this post</a> for a really good explaination of differences between arrays and pointers.</p>
</blockquote>

<p>The modules have to be slightly nodified, to make sure they assign their interfaces to the new section:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
a.c:

...
struct module __attribute__ ((section (".modules"))) module_a = {
	.foo = foo_a,
	.bar = bar_a
};

b.c:

...
struct module __attribute__ ((section (".modules"))) module_b = {
	.foo = foo_a,
	.bar = bar_a
};

</code></pre></div></div>

<p>The syntax is quite ugly, so you probably would hide it inside a preprocessor macro in the <code class="language-plaintext highlighter-rouge">module.h</code> file.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#define DECLARE_MODULE(name, ...) \
    struct module __attribute__ ((section (".modules"))) name = { __VA_ARGS__ };
</code></pre></div></div>

<p>Now we just have to access the global array from the client code using the variables defining its boundaries:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
foo.c:

#include "module.h"

void foo()
{
	const struct module *m = modules_start;
	while (m &lt; modules_end) {
		m-&gt;foo();
		m++;
	}
}

bar.c:

#include "module.h"

void bar()
{
	const struct module *m = modules_start;
	while (m &lt; modules_end) {
		m-&gt;bar();
		m++;
	}
}

</code></pre></div></div>

<p>What we have now is a modules framework that can be extended without modifying its core. The modules registraton being static, this is greatly effective both in terms of RAM and CPU consumption.</p>

<h2 id="pitfalls-with-interface-sections">Pitfalls with interface sections</h2>

<p>There are a few things that you need to be aware of when using this framework.</p>

<p>First, you need to make sure that the linker aligns the modules in the same way the compiler would: otherwise when going through the table, you may shift and access the wrong data.</p>

<p>This is usually taken care of by enforcing alignment in the linker script:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
module.lds:

SECTIONS
{
	.modules ALIGN(8) : {
		modules_start = .;
		*(.modules)
		modules_end = .;
	}
}
INSERT AFTER .rodata;
</code></pre></div></div>

<p>Second, depending on your your link configuration, your modules section may be optimized out, as the linker has no way of knowing that it is actually used.</p>

<p>In particular, the <code class="language-plaintext highlighter-rouge">--gc-sections</code> options will for sure make your table disappear.</p>

<p>The workaround is to explicitly tell the linker that it should keep these symbols:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
module.lds:

SECTIONS
{
	.modules : {
		modules_start = .;
		*KEEP((.modules))
		modules_end = .;
	}
}
INSERT AFTER .rodata;
</code></pre></div></div>

<p>Last, if some of your modules are distributed as static libraries, the linker may also optimize out the corresponding symbols when linking the whole binary.</p>

<p>The workaround in that case is to prevent optimization by using the linker <code class="language-plaintext highlighter-rouge">--whole-archive</code> option.</p>


          ]]>
      </description>
    </item>
    
    <item>
      <title>
          <![CDATA[
          Better understanding Linux secondary dependencies solving with examples
          ]]>
      </title>
      <link>http://www.kaizou.org/2015/01/linux-libraries.html</link>
      <pubDate>Thu, 08 Jan 2015 14:00:00 +0000</pubDate>
      <author>kaizouman@kaizou.org (David Corvoysier)</author>
      <guid>http://www.kaizou.org/2015/01/linux-libraries</guid>
      <description>
          <![CDATA[
          <p>A few months ago I stumbled upon a linking problem with secondary dependencies I couldn’t solved without <a href="https://wiki.mageia.org/en/Overlinking_issues_in_packaging"><strong>overlinking</strong></a> the corresponding libraries.</p>

<p>I only realized today in a discussion with my friend <a href="http://ymorin.is-a-geek.org/">Yann E. Morin</a> that not only did I use the wrong solution for that particular problem, but that my understanding of the gcc linking process was not as good as I had imagined.</p>

<p>This blog post is to summarize what I have now understood.</p>

<p>There is also a <a href="https://github.com/kaizouman/linux-shlib-link-samples">small repository on github</a> with the mentioned samples.</p>

<!--more-->

<h1 id="a-few-words-about-linux-libraries">A few words about Linux libraries</h1>

<p>This paragraph is only a brief summary of what is very well described in <a href="http://tldp.org/HOWTO/Program-Library-HOWTO/introduction.html">The Linux Documentation Project library howto</a>.</p>

<p>Man pages for the linux <a href="http://linux.die.net/man/1/ld">linker</a> and <a href="http://linux.die.net/man/8/ld-linux">loader</a> are also a good source of information.</p>

<p>There are three kind of libraries in Linux: static, shared and dynamically loaded (DL).</p>

<p>Dynamically loaded libraries are very specific to some use cases like plugins, and would deserve an article on their own. I will only focus here on static and shared libraries.</p>

<h2 id="static-libraries">Static libraries</h2>

<p>A static library is simply an archive of object files conventionally starting with the <code class="language-plaintext highlighter-rouge">lib</code> prefix and ending with the <code class="language-plaintext highlighter-rouge">.a</code> suffix.</p>

<p><em>Example:</em></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>libfoobar.a
</code></pre></div></div>

<p>Static libraries are created using the <strong>ar</strong> program:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ar rcs libfoobar.a foo.o bar.o
</code></pre></div></div>

<p>Linking a program with a static library is as simple as adding it to the link command either directly with its full path:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -o app main.c /path/to/foobar/libfoobar.a
</code></pre></div></div>

<p>or indirectly using <a href="http://linux.die.net/man/1/ld">the <code class="language-plaintext highlighter-rouge">-l</code>/<code class="language-plaintext highlighter-rouge">L</code> options</a>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -o app main.c -lfoobar -L/path/to/foobar
</code></pre></div></div>

<h2 id="shared-libraries">Shared libraries</h2>

<p>A shared library is an <strong>ELF</strong> object loaded by programs when they start.</p>

<p>Shared libraries follow the same naming conventions as static libraries, but with the <code class="language-plaintext highlighter-rouge">.so</code> suffix instead of <code class="language-plaintext highlighter-rouge">.a</code>.</p>

<p><em>Example:</em></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>libfoobar.so
</code></pre></div></div>

<p>Shared library objects need to be compiled with the <code class="language-plaintext highlighter-rouge">-fPIC</code> option that produces position-independent code, ie code that can be relocated in memory.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -fPIC -c foo.c
$ gcc -fPIC -c bar.c
</code></pre></div></div>

<p>The <strong>gcc</strong> command to create a shared library is similar to the one used to create a program, with the addition of the <code class="language-plaintext highlighter-rouge">-shared</code> option.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -shared -o libfoobar.so foo.o bar.o
</code></pre></div></div>

<p>Linking against a shared library is achieved using the exact same commands as linking against a static library:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -o app main.c libfoobar.so
</code></pre></div></div>

<p>or</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -o app main.c -lfoobar -L/path/to/foobar
</code></pre></div></div>

<h2 id="shared-libraries-and-undefined-symbols">Shared libraries and undefined symbols</h2>

<p>An <strong>ELF</strong> object maintains a table of all the symbols it uses, including symbols belonging to another <strong>ELF</strong> object that are marked as undefined.</p>

<p>At compilation time, the linker will try to <strong>resolve</strong> an undefined symbol by linking it either statically to code included in the overall output <strong>ELF</strong> object or dynamically to code provided by a shared library.</p>

<p>If an undefined symbol is found in a shared library, a <code class="language-plaintext highlighter-rouge">DT_NEEDED</code> entry is created for that library in the output <strong>ELF</strong> target.</p>

<p>The content of the <code class="language-plaintext highlighter-rouge">DT_NEEDED</code> field depends on the link command:</p>

<ul>
  <li>the full path to the library if the library was linked with an absolute path,</li>
  <li>the library name otherwise (or the library <a href="#library-versioning-and-compatibility"><strong>soname</strong></a> if it was defined).</li>
</ul>

<p>You can check the dependencies of an <strong>ELF</strong> object using the <strong>readelf</strong> command:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ readelf -d main
</code></pre></div></div>

<p>or</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ readelf -d libbar.so
</code></pre></div></div>

<p>When producing an executable a symbol that remains undefined after the link will raise an error: all dependencies must therefore be available to the linker in order to produce the output binary.</p>

<p>For historic reason, this behavior is disabled when building a shared library: you need to specify the <code class="language-plaintext highlighter-rouge">--no-undefined</code> (or <code class="language-plaintext highlighter-rouge">-z defs</code>) flag explicitly if you want errors to be raised when an undefined symbol is not resolved.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -Wl,--no-undefined -shared -o libbar.so -fPIC bar.c
</code></pre></div></div>

<p>or</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -Wl,-zdefs -shared -o libbar.so -fPIC bar.c
</code></pre></div></div>

<blockquote>
  <p>Note that when producing a static library, which is just an archive of object files, no actual ‘linking’ operation is performed, and undefined symbols are kept unchanged.</p>
</blockquote>

<h2 id="library-versioning-and-compatibility">Library versioning and compatibility</h2>

<p>Several versions of the same library can coexist in the system.</p>

<p>By conventions, two versions of the same library will use the same library name with a different version suffix that is composed of three numbers:</p>

<ul>
  <li>major revision,</li>
  <li>minor revision,</li>
  <li>build revision.</li>
</ul>

<p><em>Example:</em></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>libfoobar.so.1.2.3
</code></pre></div></div>

<p>This is often referred as the library <strong>real name</strong>.</p>

<p>Also by convention, the library major version should be modified every time the library binary interface (<a href="http://en.wikipedia.org/wiki/Application_binary_interface">ABI</a>) is modified.</p>

<p>Following that convention, an executable compiled with a shared library version is theoretically able to link with another version of the <strong>same major revision</strong>.</p>

<p>This concept if so fundamental for expressing compatibility between programs and shared libraries that each shared library can be associated a <strong>soname</strong>, which is the library name followed by a period and the major revision:</p>

<p><em>Example:</em></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>libfoobar.so.1
</code></pre></div></div>

<p>The library <strong>soname</strong> is stored in the <code class="language-plaintext highlighter-rouge">DT_SONAME</code> field of the <strong>ELF</strong> shared object.</p>

<p>The <strong>soname</strong> has to be passed as a linker option to <strong>gcc</strong>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -shared -Wl,-soname,libfoobar.so.1 -o libfoobar.so foo.o bar.o
</code></pre></div></div>

<p>As mentioned before, whenever a library defines a <strong>soname</strong>, it is that <strong>soname</strong> that is stored in the <code class="language-plaintext highlighter-rouge">DT_NEEDED</code> field of <strong>ELF</strong> objects linked against that library.</p>

<h2 id="solving-versioned-libraries-dependencies-at-build-time">Solving versioned libraries dependencies at build time</h2>

<p>As mentioned before, libraries to be linked against can be specified using a shortened name and a path:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -o app main.c -lfoobar -L/path/to/foobar
</code></pre></div></div>

<p>When installing a library, the installer program will typically create a symbolic link from the library <strong>real name</strong> to its <strong>linker name</strong> to allow the linker to find the actual library file.</p>

<p><em>Example:</em></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/usr/lib/libfoobar.so -&gt; libfoobar.so.1.5.3
</code></pre></div></div>

<p>The linker uses the following search paths to locate required shared libraries:</p>

<ul>
  <li>directories specified by <code class="language-plaintext highlighter-rouge">-rpath-link</code> options (more on that later)</li>
  <li>directories specified by <code class="language-plaintext highlighter-rouge">-rpath</code> options (more on that later)</li>
  <li>directories specified by the environment variable <code class="language-plaintext highlighter-rouge">LD_RUN_PATH</code></li>
  <li>directories specified by the environment variable <code class="language-plaintext highlighter-rouge">LD_LIBRARY_PATH</code></li>
  <li>directories specified in <code class="language-plaintext highlighter-rouge">DT_RUNPATH</code> or <code class="language-plaintext highlighter-rouge">DT_RPATH</code> of a shared library are searched for shared libraries needed by it</li>
  <li>default directories, normally <code class="language-plaintext highlighter-rouge">/lib</code> and <code class="language-plaintext highlighter-rouge">/usr/lib</code></li>
  <li>directories listed inthe <code class="language-plaintext highlighter-rouge">/etc/ld.so.conf</code> file</li>
</ul>

<h2 id="solving-versioned-shared-libraries-dependencies-at-runtime">Solving versioned shared libraries dependencies at runtime</h2>

<p>On GNU glibc-based systems, including all Linux systems, starting up an <strong>ELF</strong> binary executable automatically causes the program loader to be loaded and run.</p>

<p>On Linux systems, this loader is named <a href="http://linux.die.net/man/8/ld-linux"><code class="language-plaintext highlighter-rouge">/lib/ld-linux.so.X</code></a> (where X is a version number). This loader, in turn, finds and loads recursively all other shared libraries listed in the <code class="language-plaintext highlighter-rouge">DT_NEEDED</code> fields of the <strong>ELF</strong> binary.</p>

<p>Please note that if a <strong>soname</strong> was specified for a library when the executable was compiled, the loader will look for the <strong>soname</strong> instead of the library real name. For that reason, installation tools automatically create symbolic names from the library <strong>soname</strong> to its <strong>real name</strong>.</p>

<p><em>Example:</em></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>/usr/lib/libfoobar.so.1 -&gt; libfoobar.so.1.5.3
</code></pre></div></div>

<p>When looking fo a specific library, if the value described in the <code class="language-plaintext highlighter-rouge">DT_NEEDED</code> doesn’t contain a <code class="language-plaintext highlighter-rouge">/</code>, the loader will consecutively look in:</p>

<ul>
  <li>directories specified at compilation time in the <strong>ELF</strong> object <code class="language-plaintext highlighter-rouge">DT_RPATH</code> (deprecated),</li>
  <li>directories specified using the environment variable <code class="language-plaintext highlighter-rouge">LD_LIBRARY_PATH</code>,</li>
  <li>directories specified at compile time in the <strong>ELF</strong> object <code class="language-plaintext highlighter-rouge">DT_RUNPATH</code>,</li>
  <li>from the cache file <code class="language-plaintext highlighter-rouge">/etc/ld.so.cache</code>, which contains a compiled list of candidate libraries previously found in the augmented library path (can be disabled at compilation time),</li>
  <li>in the default path <code class="language-plaintext highlighter-rouge">/lib</code>, and then <code class="language-plaintext highlighter-rouge">/usr/lib</code> (can be disabled at compilation time).</li>
</ul>

<h1 id="proper-handling-of-secondary-dependencies">Proper handling of secondary dependencies</h1>

<p>As mentioned in the introduction, my issue was related to secondary dependencies, ie shared libraries dependencies that are exported from one library to a target.</p>

<p>Let’s imagine for instance a program <strong>main</strong> that depends on a library <strong>libbar</strong> that itself depends on a shared library <strong>libfoo</strong>.</p>

<p>We will use either a static <strong>libbar.a</strong> or a shared <strong>libbar.so</strong>.</p>

<p><em>foo.c</em></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>int foo()
{
    return 42;
}
</code></pre></div></div>

<p><em>bar.c</em></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>int foo();

int bar()
{
    return foo();
}
</code></pre></div></div>

<p><em>main.c</em></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>int bar();

int main(int argc, char** argv)
{
    return bar();
}
</code></pre></div></div>

<h2 id="creating-the-libfooso-shared-library">Creating the libfoo.so shared library</h2>

<p><strong>libfoo</strong> has no dependencies but the <strong>libc</strong>, so we can create it with the simplest command:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -shared -o libfoo.so -fPIC foo.c
</code></pre></div></div>

<h2 id="creating-the-libbara-static-library">Creating the libbar.a static library</h2>

<p>As said before, static libraries are just archives of object files, without any means to declare external dependencies.</p>

<p>In our case, there is therefore no explicit connection whatsoever between libbar.a and libfoo.so.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -c bar.c
$ ar rcs libbar.a bar.o
</code></pre></div></div>

<h2 id="creating-the-libbarso-dynamic-library">Creating the libbar.so dynamic library</h2>

<p>The proper way to create the <strong>libbar.so</strong> shared library it by explicitly 
specifying it depends on <strong>libfoo</strong>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -shared -o libbar2.so -fPIC bar.c -lfoo -L$(pwd)
</code></pre></div></div>

<p>This will create the library with a proper <code class="language-plaintext highlighter-rouge">DT_NEEDED</code> entry for <strong>libfoo</strong>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ readelf -d libbar.so
Dynamic section at offset 0xe08 contains 25 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libfoo.so]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
...
</code></pre></div></div>

<p>However, since undefined symbols are not by default resolved when building a shared library, we can also create a “dumb” version without any <code class="language-plaintext highlighter-rouge">DT_NEEDED</code> entry:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -shared -o libbar_dumb.so -fPIC bar.c
</code></pre></div></div>

<p>Note that it is very unlikely that someone actually chooses to create such an incomplete library on purpose, but it may happen that by misfortune you encounter one of these beasts in binary form and still <strong>need</strong> to link against it (yeah, sh… happens !).</p>

<h2 id="linking-against-the-libbara-static-library">Linking against the libbar.a static library</h2>

<p>As mentioned before, when linking an executable, the linker must resolve all undefined symbols before producing the output binary.</p>

<p>Trying to link only with <strong>libbar.a</strong> produces an error, since it has an undefined symbol and the linker has no clue where to find it:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -o app_s main.c libbar.a
libbar.a(bar.o): In function `bar':
bar.c:(.text+0xa): undefined reference to `foo'
collect2: error: ld returned 1 exit status
</code></pre></div></div>

<p>Adding <strong>libfoo.so</strong> to the link command solves the problem:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -o app main.c libbar.a -L$(pwd) -lfoo
</code></pre></div></div>

<p>You can verify that the <strong>app</strong> binary now explicitly depends on <strong>libfoo</strong>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ readelf -d app
Dynamic section at offset 0xe18 contains 25 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libfoo.so]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
...
</code></pre></div></div>

<p>At run-time, the dynamic linker will look for <strong>libfoo.so</strong>, so unless you have installed it in standard directories (<code class="language-plaintext highlighter-rouge">/lib</code> or <code class="language-plaintext highlighter-rouge">/usr/lib</code>) you need to tell it where it is:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>LD_LIBRARY_PATH=$(pwd) ./app
</code></pre></div></div>

<p>To summarize, when linking an executable against a static library, you need to specify explicitly all dependencies towards shared libraries introduced by the static library on the link command.</p>

<blockquote>
  <p>Note however that expressing, discovering and adding implicit static libraries dependencies is typically a feature of your build system (<strong>autotools</strong>, <strong>cmake</strong>).</p>
</blockquote>

<h2 id="linking-against-the-libbarso-shared-library">Linking against the libbar.so shared library</h2>

<p>As specified in the <a href="http://linux.die.net/man/1/ld">linker documentation</a>, when the linker encounters an input shared library it processes all its <code class="language-plaintext highlighter-rouge">DT_NEEDED</code> entries as secondary dependencies:</p>

<ul>
  <li>if the linker output is a shared relocatable <strong>ELF</strong> object (ie a shared library), and the –copy-dt-needed-entries option is set (this is the legacy behavior) it will add all <code class="language-plaintext highlighter-rouge">DT_NEEDED</code> entries from the input library as new <code class="language-plaintext highlighter-rouge">DT_NEEDED</code> entries in the output,</li>
  <li>if the linker output is a shared relocatable <strong>ELF</strong> object (ie a shared library), and if the –no-copy-dt-needed-entries option is set (this is the new default behavior for binutils, following <a href="http://fedoraproject.org/wiki/UnderstandingDSOLinkChange">a move initiated by major distros like Fedora</a> ) it will simply ignore all <code class="language-plaintext highlighter-rouge">DT_NEEDED</code> entries from the input library,</li>
  <li>if the linker ouput is a non-shared, non-relocatable link (our case), it will automatically add the libraries listed in the <code class="language-plaintext highlighter-rouge">DT_NEEDED</code> of the input library on the link command line, producing an error if it can’t locate them.</li>
</ul>

<p>So, let’s see what happens when dealing with our two shared libraries.</p>

<h3 id="linking-against-the-dumb-library">Linking against the “dumb” library</h3>

<p>When trying to link an executable against the “dumb” version of <strong>libbar.so</strong>, the linker encounters undefined symbols in the library itself it cannot resolve since it lacks the <code class="language-plaintext highlighter-rouge">DT_NEEDED</code> entry related to <strong>libfoo</strong>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -o app main.c -L$(pwd) -lbar_dumb
libbar_dumb.so: undefined reference to `foo'
collect2: error: ld returned 1 exit status
</code></pre></div></div>

<p>Let’s see how we can solve this.</p>

<h4 id="adding-explicitly-the-libfooso-dependency">Adding explicitly the libfoo.so dependency</h4>

<p>Just like we did when we linked against the static version, we can just add <strong>libfoo</strong> to the link command to solve the problem:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -o app main.c -L$(pwd) -lbar_dumb -lfoo
</code></pre></div></div>

<p>It creates an explicit dependency in the <strong>app</strong> binary:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ readelf -d app
Dynamic section at offset 0xe18 contains 25 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libbar_dumb.so]
 0x0000000000000001 (NEEDED)             Shared library: [libfoo.so]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
...
</code></pre></div></div>

<p>Again, at runtime you may need to tell the dynamic linker where <strong>libfoo.so</strong> is:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ LD_LIBRARY_PATH=$(pwd) ./app
</code></pre></div></div>

<p>Note that having an explicit dependency to <strong>libfoo</strong> is not quite right, since our application doesn’t use directly any symbols from <strong>libfoo</strong>. What we’ve just done here is called <a href="https://wiki.mageia.org/en/Overlinking_issues_in_packaging"><strong>overlinking</strong></a>, and it is <strong>BAD</strong>.</p>

<p>Let’s imagine for instance that in the future we decide to provide a newer version of <strong>libbar</strong> that uses the same <strong>ABI</strong>, but based on a new version of <strong>libfoo</strong> with a different <strong>ABI</strong>: we should theoretically be able to use that new version of <strong>libbar</strong> without recompiling our application, but what would really happen here is that the dynamic linker would actually try to load the two versions of <strong>libfoo</strong> at the same time, leading to unpredictable results. We would therefore need to recompile our application even if it is still compatible with the newest <strong>libbar</strong>.</p>

<blockquote>
  <p>As a matter of fact, this <a href="https://lists.debian.org/debian-devel-announce/2005/11/msg00016.html">actually happened in the past</a>: a libfreetype update in the debian distro caused 583 packages to be recompiled, with only 178 of them actually using it.</p>
</blockquote>

<h4 id="ignoring-libfoo-dependency">Ignoring libfoo dependency</h4>

<p>There is another option you can use when dealing with the “dumb” library: tell the linker to ignore its undefined symbols altogether:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -o app main.c -L$(pwd) -lbar_dumb -Wl,--allow-shlib-undefined
</code></pre></div></div>

<p>This will produce a binary that doesn’t declare its hidden dependencies towards <strong>libfoo</strong>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ readelf -d app
Dynamic section at offset 0xe18 contains 25 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libbar_dumb.so]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
...
</code></pre></div></div>

<p>This isn’t without consequences at runtime though, since the dynamic linker is now unable to resolve the executable dependencies:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ ./app: symbol lookup error: ./libbar_dumb.so: undefined symbol: foo
</code></pre></div></div>

<p>Your only option is then to load <strong>libfoo</strong> explicitly (yes, this is getting uglier and uglier):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ LD_PRELOAD=$(pwd)/libfoo.so LD_LIBRARY_PATH=$(pwd) ./app
</code></pre></div></div>

<h3 id="linking-against-the-correct-library">Linking against the “correct” library</h3>

<h4 id="doing-it-the-right-way">Doing it the right way</h4>

<p>As mentioned before, when linking against the correct shared library, the linker encounters the <strong>libfoo.so</strong> <code class="language-plaintext highlighter-rouge">DT_NEEDED</code> entry, adds it to the link command and finds it at the path specified by <code class="language-plaintext highlighter-rouge">-L</code>, thus solving the undefined symbols … or at least that is what I expected:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -o app main.c -L$(pwd) -lbar
/usr/bin/ld: warning: libfoo.so, needed by libbar.so, not found (try using -rpath or -rpath-link)
/home/diec7483/dev/linker-example/libbar.so: undefined reference to `foo'
collect2: error: ld returned 1 exit status
</code></pre></div></div>

<p>Why the error ? I thought I had done everything by the book !</p>

<p>Okay, let’s take a look at the <code class="language-plaintext highlighter-rouge">ld</code> man page again, looking at the <code class="language-plaintext highlighter-rouge">-rpath-link</code> option. This says:</p>

<blockquote>
  <p>When using ELF or SunOS, one shared library may require another. This happens when an “ld -shared” link includes a shared library as one of the input files.
When the linker encounters such a dependency when doing a non-shared, non-relocatable link, it will automatically try to locate the required shared library and include it in the link, if it is not included explicitly. In such a case, the -rpath-link option specifies the first set of directories to search. The -rpath-link option may specify a sequence of directory names either by specifying a list of names separated by colons, or by appearing multiple times.</p>
</blockquote>

<p>Ok, this is not crystal-clear, but what it actually means is that when specifying the path for a secondary dependency, you should not use <code class="language-plaintext highlighter-rouge">-L</code> but <code class="language-plaintext highlighter-rouge">-rpath-link</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ gcc -o app main.c -L$(pwd) -lbar -Wl,-rpath-link=$(pwd)
</code></pre></div></div>

<p>You can now verify that <strong>app</strong> depends only on <strong>libbar</strong>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ readelf -d app
Dynamic section at offset 0xe18 contains 25 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libbar.so]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
...
</code></pre></div></div>

<p>And this is <strong>finally how things should be done</strong>.</p>

<blockquote>
  <p>You may also use <code class="language-plaintext highlighter-rouge">-rpath</code> instead of <code class="language-plaintext highlighter-rouge">-rpath-link</code> but in that case the specified path will be stored in the resulting executable, which is not suitable if you plan to relocate your binaries. Tools like <strong>cmake</strong> use the <code class="language-plaintext highlighter-rouge">-rpath</code> during the build phase (<code class="language-plaintext highlighter-rouge">make</code>), but remove the specified path from the executable during the installation phase(<code class="language-plaintext highlighter-rouge">make install</code>).</p>
</blockquote>

<h1 id="conclusion">Conclusion</h1>

<p>To summarize, when linking an executable against:</p>

<ul>
  <li>
    <p>a <strong>static</strong> library, you need to specify all dependencies towards other shared libraries this static library depends on explicitly on the link command.</p>
  </li>
  <li>
    <p>a <strong>shared</strong> library, you don’t need to specify dependencies towards other shared libraries this shared library depends on, but you may need to specify the path to these libraries on the link command using the <code class="language-plaintext highlighter-rouge">-rpath</code>/<code class="language-plaintext highlighter-rouge">-rpath-link</code> options.</p>
  </li>
</ul>

<blockquote>
  <p>Note however that expressing, discovering and adding implicit libraries dependencies is typically a feature of your build system (<strong>autotools</strong>, <strong>cmake</strong>), as demonstrated in my samples.</p>
</blockquote>

          ]]>
      </description>
    </item>
    
    <item>
      <title>
          <![CDATA[
          Unit testing with GoogleTest and CMake
          ]]>
      </title>
      <link>http://www.kaizou.org/2014/11/gtest-cmake.html</link>
      <pubDate>Wed, 05 Nov 2014 22:00:00 +0000</pubDate>
      <author>kaizouman@kaizou.org (David Corvoysier)</author>
      <guid>http://www.kaizou.org/2014/11/gtest-cmake</guid>
      <description>
          <![CDATA[
          <p>Continuous integration requires a robust test environment to be able to detect regressions as early as possible.</p>

<p>A typical test environment will typically be composed of integration tests of the whole system and unit tests per components.</p>

<p>This post explains how to create unit tests for a <code class="language-plaintext highlighter-rouge">C++</code> component using <a href="https://github.com/google/googletest"><strong>GoogleTest</strong></a> and <a href="http://www.cmake.org/"><strong>CMake</strong></a>.</p>

<!--more-->

<p>##Project structure</p>

<p>I will assume here that the project structure follows the model described in a <a href="/2014/11/typical-cmake-project/">previous post</a>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>+-- CMakeLists.txt
+-- main
|    +-- CMakeLists
|    +-- main.cpp
|
+-- test
|    +-- CMakeLists.txt
|    +-- testfoo
|       +-- CMakeLists.txt
|       +-- main.cpp
|       +-- testfoo.h
|       +-- testfoo.cpp
|       +-- mockbar.h
|
+-- libfoo
|    +-- CMakeLists.txt
|    +-- foo.h
|    +-- foo.cpp
|
+-- libbar
     +-- CMakeLists.txt
     +-- bar.h
     +-- bar.cpp

</code></pre></div></div>

<p>The <code class="language-plaintext highlighter-rouge">main</code> subdirectory contains the main project target, an executable providing the super-useful <code class="language-plaintext highlighter-rouge">libfoo</code> service using the awesome <code class="language-plaintext highlighter-rouge">libbar</code> backend (for example <code class="language-plaintext highlighter-rouge">libfoo</code> could be a generic face recognition library and <code class="language-plaintext highlighter-rouge">libbar</code> a GPU-based image processing library).</p>

<p>The <code class="language-plaintext highlighter-rouge">test</code> directory contains a single executable allowing to test the <code class="language-plaintext highlighter-rouge">libfoo</code> service using a <em>mock</em> version of <code class="language-plaintext highlighter-rouge">libbar</code>.</p>

<blockquote>
  <p>From <a href="http://en.wikipedia.org/wiki/Mock_object">Wikipedia</a>: In object-oriented programming, mock objects are simulated objects that mimic the behavior of real objects in controlled ways.</p>
</blockquote>

<p>For those interested, the code for this sample project is on <a href="https://github.com/kaizouman/gtest-cmake-example">github</a>.</p>

<p>##A closer look at the test directory</p>

<p>In my simplistic example, there is only one subdirectory under <code class="language-plaintext highlighter-rouge">test</code>, but in a typical project, it would contain several subdirectories, one for each test program.</p>

<p>Tests programs are based on Google’s <a href="https://github.com/google/googletest/blob/master/googletest/docs/Primer.md">Googletest</a> framework and its <a href="https://github.com/google/googletest/blob/master/googlemock/README.md">GoogleMock</a> extension.</p>

<p>Since all test programs will be using these packages, the root <code class="language-plaintext highlighter-rouge">CMakeLists.txt</code> file should contain all directives required to resolve the corresponding dependencies. This is where things get a bit hairy, since Google <a href="https://github.com/google/googletest/blob/master/googletest/docs/FAQ.md">does not recommend to install these packages in binary form</a>, but instead to recompile them with your project.</p>

<p>###Resolving GoogleTest and GoogleMock dependencies</p>

<p>There are at least three options to integrate your project with <strong>GoogleTest</strong> and <strong>GoogleMock</strong>.</p>

<p>####Having both packages integrated in your build system</p>

<p>Obviously, this is only an option if you actually <em>do</em> have a buildsystem, but if this is the case, this would be my recommendation.</p>

<p>Depending on how your buildsystem is structured, your mileage may vary, but in the end you should be able to declare <strong>GoogleTest</strong> and <strong>GoogleMock</strong> as dependencies using <code class="language-plaintext highlighter-rouge">CMake</code> functions like the built-in <code class="language-plaintext highlighter-rouge">find_package</code> or the <code class="language-plaintext highlighter-rouge">pkg-config</code> based <code class="language-plaintext highlighter-rouge">pkg_check_modules</code>.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>find_package(PkgConfig)
pkg_check_modules(GTEST REQUIRED gtest&gt;=1.7.0)
pkg_check_modules(GMOCK REQUIRED gmock&gt;=1.7.0)

include_directories(
    ${GTEST_INCLUDE_DIRS}
    ${GMOCK_INCLUDE_DIRS}
)
</code></pre></div></div>

<p>####Add both packages sources to your project</p>

<p>Adding the <strong>GoogleTest</strong> and <strong>GoogleMock</strong> sources as subdirectories of <code class="language-plaintext highlighter-rouge">test</code> would allow you to compile them as part of your project.</p>

<p>This is however really ugly, and I wouldn’t recommend you doing that …</p>

<p>####Add both packages as external CMake projects</p>

<p>According to various answers posted on <a href="http://stackoverflow.com/questions/9689183/cmake-googletest">StackOverflow</a>, this seems to be the recommended way of resolving <strong>GoogleTest</strong> and <strong>GoogleMock</strong> dependencies on a per project basis.</p>

<p>It takes advantage of the <code class="language-plaintext highlighter-rouge">CMake</code> <code class="language-plaintext highlighter-rouge">ExternalProject</code> module to fetch <strong>GoogleTest</strong> and <strong>GoogleMock</strong> sources from the internet and compile them as third-party dependencies in your project.</p>

<p>Below is a working example, with a few comments explaining what’s going on:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># We need thread support
find_package(Threads REQUIRED)

# Enable ExternalProject CMake module
include(ExternalProject)

# Download and install GoogleTest
ExternalProject_Add(
    gtest
    URL https://github.com/google/googletest/archive/master.zip
    PREFIX ${CMAKE_CURRENT_BINARY_DIR}/gtest
    # Disable install step
    INSTALL_COMMAND ""
)

# Get GTest source and binary directories from CMake project
ExternalProject_Get_Property(gtest source_dir binary_dir)

# Create a libgtest target to be used as a dependency by test programs
add_library(libgtest IMPORTED STATIC GLOBAL)
add_dependencies(libgtest gtest)

# Set libgtest properties
set_target_properties(libgtest PROPERTIES
    "IMPORTED_LOCATION" "${binary_dir}/googlemock/gtest/libgtest.a"
    "IMPORTED_LINK_INTERFACE_LIBRARIES" "${CMAKE_THREAD_LIBS_INIT}"
)

# Create a libgmock target to be used as a dependency by test programs
add_library(libgmock IMPORTED STATIC GLOBAL)
add_dependencies(libgmock gtest)

# Set libgmock properties
set_target_properties(libgmock PROPERTIES
    "IMPORTED_LOCATION" "${binary_dir}/googlemock/libgmock.a"
    "IMPORTED_LINK_INTERFACE_LIBRARIES" "${CMAKE_THREAD_LIBS_INIT}"
)

# I couldn't make it work with INTERFACE_INCLUDE_DIRECTORIES
include_directories("${source_dir}/googletest/include"
                    "${source_dir}/googlemock/include")
</code></pre></div></div>

<blockquote>
  <p>Note: It should theoretically be possible to set the <strong>GoogleTest</strong> and <strong>GoogleMock</strong> include directories as target properties using the INTERFACE_INCLUDE_DIRECTORIES variable, but it fails because these directoires don’t exist yet when they are declared. As a workaround, I had to explicitly use include_directories to specify them.</p>
</blockquote>

<p>###Writing a <strong>testfoo</strong> test program for <strong>libfoo</strong></p>

<p>The <strong>testfoo</strong> program depends on <strong>libfoo</strong>, <strong>GoogleTest</strong> and <strong>GoogleMock</strong>.</p>

<p>Here is how the <strong>testfoo</strong> <code class="language-plaintext highlighter-rouge">CMakeLists.txt</code> file would look like:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>file(GLOB SRCS *.cpp)

add_executable(testfoo ${SRCS})

target_link_libraries(testfoo
    libfoo
    libgtest
    libgmock
)

install(TARGETS testfoo DESTINATION bin)
</code></pre></div></div>

<p>The libraries required for the build are listed under <code class="language-plaintext highlighter-rouge">target_link_libraries</code>.
CMake will then add the appropriate include directories and link options.</p>

<p>The <strong>testfoo</strong> program will provide unit tests for the <code class="language-plaintext highlighter-rouge">Foo</code> class of the <strong>libfoo</strong> library defined below.</p>

<p>####<code class="language-plaintext highlighter-rouge">foo.h</code></p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>class Bar;

class Foo
{
    Foo(const Bar&amp; bar);
    bool baz(bool useQux);
protected:
    const Bar&amp; m_bar;
}
</code></pre></div></div>

<p>####<code class="language-plaintext highlighter-rouge">foo.cpp</code></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#include "bar.h"
#include "foo.h"

Foo::Foo(const bar&amp; bar)
 : m_bar(bar) {};

bool Foo::baz(bool useQux) {
    if (useQux) {
        return m_bar.qux();
    } else {
        return m_bar.norf();
    }
}
</code></pre></div></div>

<p>The sample Test program described in the <a href="http://code.google.com/p/googletest/wiki/Primer">GoogleTest Documentation</a> fits in a single file, but I prefer splitting the Unit Tests code in three types of files.</p>

<p>###<code class="language-plaintext highlighter-rouge">main.cpp</code></p>

<p>The <code class="language-plaintext highlighter-rouge">main.cpp</code> file will contain only the test program <code class="language-plaintext highlighter-rouge">main</code> function.
This is where you will put the generic <strong>Googletest</strong> Macro invocation to launch the tests and some initializations that need to be put in the <code class="language-plaintext highlighter-rouge">main</code> (nothing in this particular case).</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#include "gtest/gtest.h"

int main(int argc, char **argv)
{
    ::testing::InitGoogleTest(&amp;argc, argv);
    int ret = RUN_ALL_TESTS();
    return ret;
}
</code></pre></div></div>

<p>###<code class="language-plaintext highlighter-rouge">testfoo.h</code></p>

<p>This file contains the declaration of the <code class="language-plaintext highlighter-rouge">FooTest</code> class, which is the test fixture for the <code class="language-plaintext highlighter-rouge">Foo</code> class.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#include "gtest/gtest.h"
#include "mockbar.h"

// The fixture for testing class Foo.
class FooTest : public ::testing::Test {

protected:

    // You can do set-up work for each test here.
    FooTest();

    // You can do clean-up work that doesn't throw exceptions here.
    virtual ~FooTest();

    // If the constructor and destructor are not enough for setting up
    // and cleaning up each test, you can define the following methods:

    // Code here will be called immediately after the constructor (right
    // before each test).
    virtual void SetUp();

    // Code here will be called immediately after each test (right
    // before the destructor).
    virtual void TearDown();

    // The mock bar library shaed by all tests
    MockBar m_bar;
};
</code></pre></div></div>

<p>###<code class="language-plaintext highlighter-rouge">mockbar.h</code></p>

<p>Assuming the <strong>libbar</strong> library implements a public <code class="language-plaintext highlighter-rouge">Bar</code> interface, we use <strong>GoogleMock</strong> to provide a fake implementation for test purposes only:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#include "bar.h"

class MockBar: public Bar
{
    MOCK_METHOD0(qux, bool());
    MOCK_METHOD0(norf, bool());
} 
</code></pre></div></div>

<p>This will allow us to inject controlled values into the <strong>libfoo</strong> library when it will invoke the <code class="language-plaintext highlighter-rouge">Bar</code> class methods.</p>

<blockquote>
  <p>Please refer to the <a href="http://code.google.com/p/googlemock/wiki/V1_7_ForDummies">GoogleMock documentation</a> for a detailed description of the <code class="language-plaintext highlighter-rouge">GoogleMock</code> features.</p>
</blockquote>

<p>###<code class="language-plaintext highlighter-rouge">testfoo.cpp</code></p>

<p>This file contains the implementation of the <code class="language-plaintext highlighter-rouge">TestFoo</code> fixture class.</p>

<p>This is where the actual tests are written.</p>

<p>We will test the output of the <code class="language-plaintext highlighter-rouge">Foo::baz()</code> method, first having default values for the <code class="language-plaintext highlighter-rouge">Bar::qux()</code> and <code class="language-plaintext highlighter-rouge">Bar::norf()</code> methods returned by our mock, then overrding the value returned by <code class="language-plaintext highlighter-rouge">Bar::norf()</code> with a value specific to our test.</p>

<p>In all test cases, we use <strong>GoogleTest</strong> expectations to verify the output of the <code class="language-plaintext highlighter-rouge">Foo::baz</code> method.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#include "mockbar.h"
#include "testfoo.h"

using ::testing::Return;

FooTest()
{
    // Have qux return true by default
    ON_CALL(m_bar,qux()).WillByDefault(Return(true));
    // Have norf return false by default
    ON_CALL(m_bar,norf()).WillByDefault(Return(false));
}

TEST_F(FooTest, ByDefaultBazTrueIsTrue) {
    Foo foo(m_bar);
    EXPECT_EQ(foo.baz(true), true);
}

TEST_F(FooTest, ByDefaultBazFalseIsFalse) {
    Foo foo(m_bar);
    EXPECT_EQ(foo.baz(false), false);
}

TEST_F(FooTest, SometimesBazFalseIsTrue) {
    Foo foo(m_bar);
    // Have norf return true for once
    EXPECT_CALL(m_bar,norf()).WillOnce(Return(true));
    EXPECT_EQ(foo.baz(false), false);
}

</code></pre></div></div>

<blockquote>
  <p>Please refer to the <a href="http://code.google.com/p/googletest/wiki/Primer">GoogleTest documentation</a> for a much detailed presentation of how to create unit tests with Gtest.</p>
</blockquote>

<p>##Building tests</p>

<p>As usual, it is recommended to build your program out-of-tree, ie in a directory separated from the sources.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mkdir build
cd build
</code></pre></div></div>

<p>First, you need to invoke the <code class="language-plaintext highlighter-rouge">cmake</code> command to generate the build files.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cmake ..
</code></pre></div></div>

<p>This should produce an output similar to this one:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>-- The C compiler identification is GNU 4.8.2
-- The CXX compiler identification is GNU 4.8.2
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Looking for include file pthread.h
-- Looking for include file pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Configuring done
-- Generating done
-- Build files have been written to: ~/gtest-cmake-example/build
</code></pre></div></div>

<p>Then, build the project targets.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make
</code></pre></div></div>

<p>The following output corresponds to the case where <strong>GoogleTest</strong> and <strong>GoogleMock</strong> are automatically fetched from their repositories and built as third-party dependencies.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Scanning dependencies of target libfoo
[  7%] Building CXX object libfoo/CMakeFiles/libfoo.dir/foo.cpp.o
Linking CXX static library liblibfoo.a
[  7%] Built target libfoo
Scanning dependencies of target libbar
[ 15%] Building CXX object libbar/CMakeFiles/libbar.dir/bar.cpp.o
Linking CXX static library liblibbar.a
[ 15%] Built target libbar
Scanning dependencies of target myApp
[ 23%] Building CXX object main/CMakeFiles/myApp.dir/main.cpp.o
Linking CXX executable myApp
[ 23%] Built target myApp
Scanning dependencies of target gtest
[ 30%] Creating directories for 'gtest'
[ 38%] Performing download step (download, verify and extract) for 'gtest'
-- downloading...
     src='https://github.com/google/googletest/archive/master.zip'
     dst='/home/david/perso/gtest-cmake-example/build/test/gtest/src/master.zip'
     timeout='none'
-- [download 0% complete]
-- [download 1% complete]
...
-- [download 99% complete]
-- [download 100% complete]
-- downloading... done
-- verifying file...
     file='/home/david/perso/gtest-cmake-example/build/test/gtest/src/master.zip'
-- verifying file... warning: did not verify file - no URL_HASH specified?
-- extracting...
     src='/home/david/perso/gtest-cmake-example/build/test/gtest/src/master.zip'
     dst='/home/david/perso/gtest-cmake-example/build/test/gtest/src/gtest'
-- extracting... [tar xfz]
-- extracting... [analysis]
-- extracting... [rename]
-- extracting... [clean up]
-- extracting... done
[ 46%] No patch step for 'gtest'
[ 53%] No update step for 'gtest'
[ 61%] Performing configure step for 'gtest'
-- The C compiler identification is GNU 4.8.4
-- The CXX compiler identification is GNU 4.8.4
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Found PythonInterp: /usr/bin/python (found version "2.7.6")
-- Looking for include file pthread.h
-- Looking for include file pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - not found
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Configuring done
-- Generating done
-- Build files have been written to: /home/david/perso/gtest-cmake-example/build/test/gtest/src/gtest-build
[ 69%] Performing build step for 'gtest'
Scanning dependencies of target gmock
[ 14%] Building CXX object googlemock/CMakeFiles/gmock.dir/__/googletest/src/gtest-all.cc.o
[ 28%] Building CXX object googlemock/CMakeFiles/gmock.dir/src/gmock-all.cc.o
Linking CXX static library libgmock.a
[ 28%] Built target gmock
Scanning dependencies of target gmock_main
[ 42%] Building CXX object googlemock/CMakeFiles/gmock_main.dir/__/googletest/src/gtest-all.cc.o
[ 57%] Building CXX object googlemock/CMakeFiles/gmock_main.dir/src/gmock-all.cc.o
[ 71%] Building CXX object googlemock/CMakeFiles/gmock_main.dir/src/gmock_main.cc.o
Linking CXX static library libgmock_main.a
[ 71%] Built target gmock_main
Scanning dependencies of target gtest
[ 85%] Building CXX object googlemock/gtest/CMakeFiles/gtest.dir/src/gtest-all.cc.o
Linking CXX static library libgtest.a
[ 85%] Built target gtest
Scanning dependencies of target gtest_main
[100%] Building CXX object googlemock/gtest/CMakeFiles/gtest_main.dir/src/gtest_main.cc.o
Linking CXX static library libgtest_main.a
[100%] Built target gtest_main
[ 76%] No install step for 'gtest'
[ 84%] Completed 'gtest'
[ 84%] Built target gtest
Scanning dependencies of target testfoo
[ 92%] Building CXX object test/testfoo/CMakeFiles/testfoo.dir/main.cpp.o
[100%] Building CXX object test/testfoo/CMakeFiles/testfoo.dir/testfoo.cpp.o
Linking CXX executable testfoo
[100%] Built target testfoo
</code></pre></div></div>

<p>##Running tests</p>

<p>Once the test programs have been built, you can run them individually …</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>test/testfoo/testfoo
</code></pre></div></div>

<p>… producing a detailed output …</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from FooTest
[ RUN      ] FooTest.ByDefaultBazTrueIsTrue

GMOCK WARNING:
Uninteresting mock function call - taking default action specified at:
~/gtest-cmake-example/test/testfoo/testfoo.cpp:8:
    Function call: qux()
          Returns: true
Stack trace:
[       OK ] FooTest.ByDefaultBazTrueIsTrue (0 ms)
[ RUN      ] FooTest.ByDefaultBazFalseIsFalse

GMOCK WARNING:
Uninteresting mock function call - taking default action specified at:
~/gtest-cmake-example/test/testfoo/testfoo.cpp:10:
    Function call: norf()
          Returns: false
Stack trace:
[       OK ] FooTest.ByDefaultBazFalseIsFalse (0 ms)
[ RUN      ] FooTest.SometimesBazFalseIsTrue
[       OK ] FooTest.SometimesBazFalseIsTrue (0 ms)
[----------] 3 tests from FooTest (0 ms total)

[----------] Global test environment tear-down
[==========] 3 tests from 1 test case ran. (0 ms total)
[  PASSED  ] 3 tests.
</code></pre></div></div>

<blockquote>
  <p>Note: You can get rid of <strong>GoogleMock</strong> warnings by using a <a href="https://github.com/google/googletest/blob/master/googlemock/docs/CheatSheet.md"><strong>nice</strong> <strong>mock</strong></a>.</p>
</blockquote>

<p>… or globally through <code class="language-plaintext highlighter-rouge">CTest</code> …</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make test
</code></pre></div></div>

<p>… producing only a test summary.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Running tests...
Test project ~/gtest-cmake-example/build
    Start 1: testfoo
1/1 Test #1: testfoo ..........................   Passed    0.00 sec

100% tests passed, 0 tests failed out of 1

Total Test time (real) =   0.00 sec
</code></pre></div></div>

          ]]>
      </description>
    </item>
    
    <item>
      <title>
          <![CDATA[
          A typical Linux project using CMake
          ]]>
      </title>
      <link>http://www.kaizou.org/2014/11/typical-cmake-project.html</link>
      <pubDate>Mon, 03 Nov 2014 22:00:00 +0000</pubDate>
      <author>kaizouman@kaizou.org (David Corvoysier)</author>
      <guid>http://www.kaizou.org/2014/11/typical-cmake-project</guid>
      <description>
          <![CDATA[
          <p>When it comes to choosing a make system on Linux, you basically only have two options: autotools or CMake. I have always found Autotools a bit counter-intuitive, but was reluctant to make the effort to switch to CMake because I was worried the learning curve would be too steep for a task you don’t have to perform that much often (I mean, you usually spend more time writing code than writing build rules).</p>

<p>A recent project of mine required writing a lot of new Linux packages, and I decided it was a good time to give CMake a try. This article is about how I have used it to build plain old Linux packages almost effortlessly.</p>

<!--more-->

<p>Although CMake is fairly well documented, I personnally found the documentation (and especially the tutorial) a bit too CMake-oriented, forcing me to use cmake dedicated tools for tasks I had already tools for (tests and delivery for instance).</p>

<p>This is therefore my own tutorial to CMake, based on my primary requirement: just generate the makefiles using CMake, and use my own tools for everything else.</p>

<h2 id="project-structure">Project structure</h2>

<p>The project structure is partly driven by the project design, but it would ususally contain at least two common sub-directories, along with several “module” sub-directories:</p>

<pre class="diagram">
project
.
+-. main
+-. test
+-. moduleA
+-. moduleB
</pre>

<p>The <code class="language-plaintext highlighter-rouge">main</code> subdirectory contains the main project target, typically an executable.</p>

<p>The <code class="language-plaintext highlighter-rouge">test</code> directory contains one or more test executables.</p>

<p>The <code class="language-plaintext highlighter-rouge">moduleX</code> directories contain libraries to be used by either the tests or main executables.</p>

<p>At the root of the project, the main <code class="language-plaintext highlighter-rouge">CMakeLists.txt</code> should contain the common CMake directives that apply to all subdirectories.</p>

<p>First, the <code class="language-plaintext highlighter-rouge">CMakeLists.txt</code> would specify a minimum Cmake version, name your project and define a few common behaviours.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>CMAKE_MINIMUM_REQUIRED(VERSION 2.8)

PROJECT(MyProject)

SET(CMAKE_INCLUDE_CURRENT_DIR ON)
</code></pre></div></div>

<p>Here, I only set one option that is of uttermost importance if you want to build out-of-tree AND generate some of your source files automatically (you most certainly do actually if you are using ANY modern framework like Qt). What it does is that it adds the <code class="language-plaintext highlighter-rouge">${CMAKE_CURRENT_SOURCE_DIR}</code> (this one you don’t care that much) and <code class="language-plaintext highlighter-rouge">${CMAKE_CURRENT_BINARY_DIR}</code> to the include path, allowing generated include files to be found by the compiler.</p>

<p>Finally, the <code class="language-plaintext highlighter-rouge">CMakeLists.txt</code> would list all subdirectories to be included in the project:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ADD_SUBDIRECTORY(main)
ADD_SUBDIRECTORY(test)
ADD_SUBDIRECTORY(moduleA)
ADD_SUBDIRECTORY(moduleB)
...
</code></pre></div></div>

<h2 id="configuring-modules">Configuring Modules</h2>

<p>As explained in the previous paragraph, each subdirectory would contain at least either one executable or one library defined in a dedicated <code class="language-plaintext highlighter-rouge">CMakeLists.txt</code> file.</p>

<p>Executables are declared using the <a href="http://www.cmake.org/cmake/help/v3.0/command/add_executable.html#command:add_executable"><code class="language-plaintext highlighter-rouge">ADD_EXECUTABLE</code></a> command:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ADD_EXECUTABLE(myapp
    ${MY_SRCS}
)
</code></pre></div></div>

<p>Libraries are declared using the <a href="http://www.cmake.org/cmake/help/v3.0/command/add_library.html#command:add_library"><code class="language-plaintext highlighter-rouge">ADD_LIBRARY</code></a> command:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ADD_LIBRARY(mylib STATIC
    ${MY_SRCS}
)
</code></pre></div></div>

<p>Source files are specified either explicitly or using a wildcard:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SET(MY_SRC
    fileA.cpp
    fileB.cpp
    ...
)
</code></pre></div></div>

<p>or</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>file(GLOB MY_SRC
    "*.h"
    "*.cpp"
)
</code></pre></div></div>

<blockquote>
  <p>Note that using a wildcard, you need to rerun CMake if you add more files to a module</p>
</blockquote>

<h2 id="solving-dependencies-between-modules">Solving dependencies between modules</h2>

<h3 id="link-dependencies">Link dependencies</h3>

<p>Link dependencies between modules are solved using the <a href="http://www.cmake.org/cmake/help/v3.0/command/target_link_libraries.html"><code class="language-plaintext highlighter-rouge">TARGET_LINK_LIBRARIES</code></a> command.</p>

<p>CMake maintains throughout the whole project a named object for each target created by a command such as <code class="language-plaintext highlighter-rouge">ADD_EXECUTABLE()</code> or <code class="language-plaintext highlighter-rouge">ADD_LIBRARY()</code>.</p>

<p>This target name can be passed to the <a href="http://www.cmake.org/cmake/help/v3.0/command/target_link_libraries.html"><code class="language-plaintext highlighter-rouge">TARGET_LINK_LIBRARIES</code></a> command to tell CMake that an object A depends on on object B.</p>

<p>Example:</p>

<p>Given a library defined in a specific subdirectory</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ADD_LIBRARY(mylib STATIC
    ${MY_LIBSRCS}
)
</code></pre></div></div>

<p>One can specify a dependency from an application to that library</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ADD_EXECUTABLE(myapp
    ${MY_APPSRCS}
)

TARGET_LINK_LIBRARIES(myapp
    mylib
)
</code></pre></div></div>

<h3 id="include-dependencies">Include dependencies</h3>

<p>Include dependencies are automatically solved for dependent libraries declared in the <a href="http://www.cmake.org/cmake/help/v3.0/command/target_link_libraries.html"><code class="language-plaintext highlighter-rouge">TARGET_LINK_LIBRARIES</code></a> command if the corresponding libraries have properly declared their include directories using the <a href="http://www.cmake.org/cmake/help/v3.0/command/target_include_directories.html"><code class="language-plaintext highlighter-rouge">TARGET_INCLUDE_DIRECTORIES</code></a> command.</p>

<p>Example:</p>

<p>Given a library defined in a specific subdirectory</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ADD_LIBRARY(mylib STATIC
    ${MY_LIBSRCS}
)
</code></pre></div></div>

<p>Specifying</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>TARGET_INCLUDE_DIRECTORIES(mylib
    /path/to/includes
)
</code></pre></div></div>

<p>Allows a dependent app to be aware of the mylib include path just when adding the lib to the <code class="language-plaintext highlighter-rouge">TARGET_LINK_LIBRARIES</code></p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ADD_EXECUTABLE(myapp
    ${MY_APPSRCS}
)

TARGET_LINK_LIBRARIES(myapp
    mylib
)
</code></pre></div></div>

<p>Additional include dependencies can be solved explicitly using the <a href="http://www.cmake.org/cmake/help/v3.0/command/include_directories.html"><code class="language-plaintext highlighter-rouge">INCLUDE_DIRECTORIES</code></a> command, but most of the time, you won’t need it unless you have nested sub-directories that don’t have a <code class="language-plaintext highlighter-rouge">CMakeLists.txt</code> of their own (as a matter of fact, needing to add an explicit <code class="language-plaintext highlighter-rouge">INCLUDE_DIRECTORIES</code> may be a good hint that something is wrong with your other directives).</p>

<h2 id="resolving-dependencies-towards-external-packages">Resolving Dependencies towards external packages</h2>

<h3 id="packages-known-by-cmake">Packages known by CMake</h3>

<p>CMake provides a set of tools to register and retrieve information about packages stored in a CMake package registry.</p>

<p>CMake packages dependencies are solved easily by specifying them using the built-in CMake <code class="language-plaintext highlighter-rouge">FIND_PACKAGE</code> commands.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>FIND_PACKAGE(Qt5Core)
</code></pre></div></div>

<p>This command will create a CMake target Qt5::Core that can be referenced in <a href="http://www.cmake.org/cmake/help/v3.0/command/target_link_libraries.html"><code class="language-plaintext highlighter-rouge">TARGET_LINK_LIBRARIES</code></a> commands.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ADD_LIBRARY(mylib STATIC
    ${MY_LIBSRCS}
)

TARGET_LINK_LIBRARIES(mylib
    Qt5::Core
)
</code></pre></div></div>

<blockquote>
  <p>Note: The <code class="language-plaintext highlighter-rouge">FIND_PACKAGE</code> command will also export <a href="http://qt-project.org/doc/qt-5/cmake-manual.html#variable-reference">several related variables</a>.</p>
</blockquote>

<p>Just like when referencing an internal module, the paths to the specific includes of libraries found using <code class="language-plaintext highlighter-rouge">FIND_PACKAGE</code> are automatically added to the include search path. There is therefore no need to add them explicitly using an <code class="language-plaintext highlighter-rouge">INCLUDE_DIRECTORIES</code> directive.</p>

<h3 id="other-packages-pkg-config">Other packages: pkg-config</h3>

<p>For package whose definition is not maintained in CMake (ie there is no FIND_PACKAGE macro written for them), you may rely on the generic pkg-config tool instead.</p>

<p><a href="http://www.freedesktop.org/wiki/Software/pkg-config/">pkg-config</a> is a helper tool used when compiling applications and libraries. It helps you insert the correct compiler options on the command line so an application can use <code class="language-plaintext highlighter-rouge">gcc -o test test.c pkg-config --libs --cflags glib-2.0</code> for instance, rather than hard-coding values on where to find glib (or other libraries). It is language-agnostic, so it can be used for defining the location of documentation tools, for instance.</p>

<p>pkg-config compatible packages declare their include path, compiler options and linking flags in dedicated <code class="language-plaintext highlighter-rouge">.pc</code> files installed on the system.</p>

<p>Here is for instance the <code class="language-plaintext highlighter-rouge">glib-2.0</code> pkg-configfile:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>prefix=/usr
exec_prefix=${prefix}
libdir=${prefix}/lib/x86_64-linux-gnu
includedir=${prefix}/include

glib_genmarshal=glib-genmarshal
gobject_query=gobject-query
glib_mkenums=glib-mkenums

Name: GLib
Description: C Utility Library
Version: 2.36.0
Requires.private: libpcre
Libs: -L${libdir} -lglib-2.0 
Libs.private: -pthread  -lpcre    
Cflags: -I${includedir}/glib-2.0 -I${libdir}/glib-2.0/include
</code></pre></div></div>

<p>Before using pkg-config, you need to make sure the tool is available by inserting the following line in your <code class="language-plaintext highlighter-rouge">CMakeLists.txt</code>:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>FIND_PACKAGE(PkgConfig)
</code></pre></div></div>

<p>Then, insert the following <a href="http://www.cmake.org/cmake/help/v3.0/module/FindPkgConfig.html">PKG_CHECK_MODULES</a> command in your <code class="language-plaintext highlighter-rouge">CMakeLists.txt</code> file to tell CMake to resolve pkg-config dependencies for a specific package:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>PKG_CHECK_MODULES(GLIB2 REQUIRED glib-2.0&gt;=2.36.0)
</code></pre></div></div>

<p>The command will export several variables, including the <code class="language-plaintext highlighter-rouge">XXX_LIBRARIES</code> command that can be used in <a href="http://www.cmake.org/cmake/help/v3.0/command/target_link_libraries.html"><code class="language-plaintext highlighter-rouge">TARGET_LINK_LIBRARIES</code></a> commands.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>ADD_LIBRARY(mylib STATIC
    ${MY_LIBSRCS}
)

TARGET_LINK_LIBRARIES(mylib
    GLIB2_LIBRARIES
)
</code></pre></div></div>

<p>Unfortunately, I was unable to get the include paths of libraries found through pkg-config to be added automatically to the include source paths just like it it when using the standard <code class="language-plaintext highlighter-rouge">FIND_PACKAGE</code> function, so I needed to add them explicitly:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>INCLUDE_DIRECTORIES(
    GLIB2_INCLUDE_DIRS
)
</code></pre></div></div>

<h2 id="exporting-dependencies-towards-external-packages">Exporting dependencies towards external packages</h2>

<p>Although CMake supports its <a href="http://www.cmake.org/Wiki/CMake:How_To_Find_Libraries">own mechanism to export dependencies</a>, it is recommended to take advantage of the more generic pkg-config files.</p>

<p>CMake doesn’t provide any specific mechanism to generate <code class="language-plaintext highlighter-rouge">.pc</code> files.</p>

<p>However, one can take advantage of CMake variables substitution to generate a specific pkg-config file from a predefined template.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>CONFIGURE_FILE(
  "${CMAKE_CURRENT_SOURCE_DIR}/pkg-config.pc.cmake"
  "${CMAKE_CURRENT_BINARY_DIR}/${PROJECT_NAME}.pc"
)
</code></pre></div></div>

<p>A typical <code class="language-plaintext highlighter-rouge">.pc</code> template could be:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Name: ${PROJECT_NAME}
Description: ${PROJECT_DESCRIPTION}
Version: ${PROJECT_VERSION}
Requires: ${PKG_CONFIG_REQUIRES}
prefix=${CMAKE_INSTALL_PREFIX}
includedir=${PKG_CONFIG_INCLUDEDIR}
libdir=${PKG_CONFIG_LIBDIR}
Libs: ${PKG_CONFIG_LIBS}
Cflags: ${PKG_CONFIG_CFLAGS}
</code></pre></div></div>

<p>Where the following variables are provided by CMake:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">PROJECT_NAME</code></li>
  <li><code class="language-plaintext highlighter-rouge">PROJECT_DESCRIPTION</code></li>
  <li><code class="language-plaintext highlighter-rouge">PROJECT_VERSION</code></li>
  <li><code class="language-plaintext highlighter-rouge">CMAKE_INSTALL_PREFIX</code></li>
</ul>

<p>And these ones need to be specified explicitly:</p>

<ul>
  <li><code class="language-plaintext highlighter-rouge">PKG_CONFIG_REQUIRES</code></li>
  <li><code class="language-plaintext highlighter-rouge">PKG_CONFIG_INCLUDEDIR</code></li>
  <li><code class="language-plaintext highlighter-rouge">PKG_CONFIG_LIBDIR</code></li>
  <li><code class="language-plaintext highlighter-rouge">PKG_CONFIG_LIBS</code></li>
  <li><code class="language-plaintext highlighter-rouge">PKG_CONFIG_CFLAGS</code></li>
</ul>

<p>Example:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SET(PKG_CONFIG_REQUIRES glib-2.0)
SET(PKG_CONFIG_LIBDIR
    "\${prefix}/lib"
)
SET(PKG_CONFIG_INCLUDEDIR
    "\${prefix}/include/mylib"
)
SET(PKG_CONFIG_LIBS
    "-L\${libdir} -lmylib"
)
SET(PKG_CONFIG_CFLAGS
    "-I\${includedir}"
)

CONFIGURE_FILE(
  "${CMAKE_CURRENT_SOURCE_DIR}/pkg-config.pc.cmake"
  "${CMAKE_CURRENT_BINARY_DIR}/${PROJECT_NAME}.pc"
)
</code></pre></div></div>

<h2 id="installing-files-on-target">Installing files on target</h2>

<p>Installing files on target is as simple as adding the corresponding <a href="http://www.cmake.org/cmake/help/v3.0/command/install.html"><code class="language-plaintext highlighter-rouge">INSTALL</code></a> command to the target <code class="language-plaintext highlighter-rouge">CMakeLists.txt</code>.</p>

<p>To install the main targets of a project, use the <code class="language-plaintext highlighter-rouge">TARGETS</code> directive:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>INSTALL(TARGETS myapp
        DESTINATION bin)
</code></pre></div></div>

<p>or</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>INSTALL(TARGETS mylib ARCHIVE
        DESTINATION lib)
</code></pre></div></div>

<blockquote>
  <p>Note: The files will be installed relatively to the path specified in the <code class="language-plaintext highlighter-rouge">CMAKE_INSTALL_PREFIX</code> cmake variable, prepended by the <code class="language-plaintext highlighter-rouge">DESTDIR</code> variable passed on the command line (ie <code class="language-plaintext highlighter-rouge">make install DESTDIR=/home/toto</code>)</p>
</blockquote>

<p>Other project files can also be installed using the <code class="language-plaintext highlighter-rouge">FILES</code> directive:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>INSTALL(FILES header.h
        DESTINATION include/mylib)
</code></pre></div></div>

<p>or</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>INSTALL(FILES "${CMAKE_BINARY_DIR}/${PROJECT_NAME}.pc"
        DESTINATION lib/pkgconfig)
</code></pre></div></div>

<h2 id="building-the-project">Building the project</h2>

<p>I personnally always recommend to build a project out-of-tree, ie to put all build subproducts into a separate directory. Incidentally, building out-of-tree is also a good way to find out if your project is properly configured …</p>

<p>So, the first step is to create a build directory</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mkdir build &amp;&amp; cd build
</code></pre></div></div>

<p>Then you need to tell CMake to generate the project makefiles according to specific directives you may specify on the command line (typically by setting variables).
Most of the time, you can let CMake apply default values:</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cmake ..
</code></pre></div></div>

<p>But you may need for instance to specify a custom installation prefix (by default CMake will use <code class="language-plaintext highlighter-rouge">usr/local</code>):</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cmake -DCMAKE_INSTALL_PREFIX:PATH=usr ..
</code></pre></div></div>

<p>Once the makefiles have been generated you can simply build the project using make commands.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make
</code></pre></div></div>

<p>Finally, you can install the targets, either using defaults …</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>make install
</code></pre></div></div>

<p>… or specifying the destination directory (CMake use <code class="language-plaintext highlighter-rouge">/</code> as the default destination directory)</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>DESTDIR=/custom-destdir make install
</code></pre></div></div>

          ]]>
      </description>
    </item>
    

  </channel> 
</rss>
