The Huffman code belongs between basic loss-free codes where no losses of data occur [4].
Because it is based on entropy, it is often arranged among the category of entropy codes.
It uses a similar principle as the Morse alphabet, where the most frequently used sounds of the English
language is assigned the shortest code, and less frequently ones have a longer code. The Huffman code
aims for an optimum distribution of the binary symbol sequence in accordance with the frequency of the
occurrence of the coded elements, concurrently with an effort to approximate to their entropy.
Definition:
Entropy H is defined for random variables f1, f2, ?, fk and probability p (fi) = pi as follows:
PExample :
Let's have the values q1, q2, ?, q8 with the following probabilities: p3=0,08 p4=0,2 p5=0,12 p6=0,08 p7=0,04 p8=0.
For binary coding of eight values we would need 3 bits for every value.
Entropy is H = -[0,4.log 2
(0,4) + 3 x 0.08. log
2 (0.08) +
+ 0.2 x log 2
(0.2) + 0.12 x log 2 (0.12) +
+ 0.04 x log 2
(0.04)] =
= 2.42 bit
The Huffman coding aims
The procedure of coding is the following [4]:
You can see the procedure on the following applet.