Huffman code

Huffman code

    The Huffman code belongs between basic loss-free codes where no losses of data occur [4]. Because it is based on entropy, it is often arranged among the category of entropy codes. It uses a similar principle   as the Morse alphabet, where the most frequently used sounds of the English language  is assigned the shortest code, and less frequently ones have a longer code. The Huffman code aims for an optimum distribution of the binary symbol sequence in accordance with the frequency of the occurrence of the coded elements, concurrently with an effort to approximate to their entropy.


    Entropy H is defined for random variables f1, f2, ?, fk and probability p (fi) = pi as follows: 


    Let's have the values q1, q2, ?, q8 with the following probabilities: p3=0,08    p4=0,2    p5=0,12    p6=0,08    p7=0,04    p8=0.

    For binary coding of eight values we would need 3 bits for every value.

Entropy is H = -[0,4.log 2 (0,4) + 3 x 0.08. log 2 (0.08) + 
                           + 0.2 x log 2 (0.2) + 0.12 x log 2 (0.12) +
                           + 0.04 x log 2 (0.04)] =
                        = 2.42 bit
    The Huffman coding  aims

The procedure of coding is the following [4]:

  1. To find two lowest frequencies, and to total them
  2. In the result of proceeding step to find two lowest frequencies and to sum them
  3. To proceed in previous step until the relative frequency 1
  5. To code words either comprising of 0 and 1 in accordance with the tree arising through summing so that we continue in a reverse way from the frequency one where 1 is left and 0 is right, which are the higher bits, and using the same procedure, we count lower the bits.

You can see the procedure on the following applet.


Applet Huffman coding