## A hash function primer II: CRC

Today I will continue the series on hash functions with the cyclic redundancy check (CRC). It is typically used to detect errors in data transmission, but NOT for direct cryptographic applications. For definiteness we’ll consider a popular variant named CRC-32. That means the output/checksum/hash will be a 32 bit number.

To make it short, CRC-32 value of a message or a stream of bits is easily calculated in four steps:

1. Pad the message by 32 zero-valued bits.
2. Interpret the padded message as a polynomial with coefficients in $\mathbb{Z}_2$.
3. Divide this polynomial by the generator polynomial $x^{32} + x^{26} + x^{23} + x^{22} + x^{16} + x^{12} + x^{11} + x^{10} + x^{8} + x^{7} + x^{5} + x^{4} + x^{2} + x + 1$, which is the corresponding polynomial to the bit sequence 100000100110000010001110110110111.
4. Take the polynomial remainder of this division. Its bit representation is the CRC-32 of the message.

Well, steps 2 and 4 are kind of metaphysical. Now let’s give an example, say the message “goob“. In binary

"goob" = 01100111011011110110111101100010


Now the steps:

"goob" + 32 zeroes = 0110011101101111011011110110001000000000000000000000000000000000

2. Well, we “interpret” the above as the polynomial
$x^{62}+x^{61}+x^{58}+x^{57}+x^{56}+x^{54}+x^{53}+x^{51}+x^{50}+x^{49}+x^{48}+x^{46}+x^{45}+x^{43}+x^{42}+x^{41}+x^{40}+x^{38}+x^{37}+x^{33}$
Note that $x$ is only a dummy variable. Really a polynomial is only the family of coefficients, we only introduce $x$ for convenience.
3. Now we need to do the division
$\frac{x^{62}+x^{61}+x^{58}+x^{57}+x^{56}+x^{54}+x^{53}+x^{51}+x^{50}+x^{49}+x^{48}+x^{46}+x^{45}+x^{43}+x^{42}+x^{41}+x^{40}+x^{38}+x^{37}+x^{33}}{x^{32} + x^{26} + x^{23} + x^{22} + x^{16} + x^{12} + x^{11} + x^{10} + x^{8} + x^{7} + x^{5} + x^{4} + x^{2} + x + 1} = ?$
Remember from school how that works? To find the first “digit” we need to find something that, when multiplied by $x^{32}$, gives $x^{62}$. The answer is of course $x^{30}$. So lets write that down and see what remains
$\frac{x^{62}+x^{61}+x^{58}+x^{57}+x^{56}+x^{54}+x^{53}+x^{51}+x^{50}+x^{49}+x^{48}+x^{46}+x^{45}+x^{43}+x^{42}+x^{41}+x^{40}+x^{38}+x^{37}+x^{33}}{x^{32} + x^{26} + x^{23} + x^{22} + x^{16} + x^{12} + x^{11} + x^{10} + x^{8} + x^{7} + x^{5} + x^{4} + x^{2} + x + 1} = x^{30}+\text{the rest}$
where
4. where
$\text{the rest} = \frac{x^{61}+x^{58}+x^{57}+x^{54}-x^{52}+x^{51}+x^{50}+x^{49}+x^{48}+x^{45}+x^{43}-x^{35}-x^{34}+x^{33}-x^{32}-x^{31}-x^{30}}{x^{32} + \dots} = \frac{x^{61}+x^{58}+x^{57}+x^{54}+x^{52}+x^{51}+x^{50}+x^{49}+x^{48}+x^{45}+x^{43}+x^{35}+x^{34}+x^{33}+x^{32}+x^{31}+x^{30}}{x^{32} + \dots}$
The last equality follows from the fact that in $\mathbb{Z}_2$ plus one is the same as minus one, $+1 = -1 \mod 2$.
To continue, we need to repeat the same on “$\text{the rest}$“:
$\frac{x^{61}+x^{58}+x^{57}+x^{54}+x^{52}+x^{51}+x^{50}+x^{49}+x^{48}+x^{45}+x^{43}+x^{35}+x^{34}+x^{33}+x^{32}+x^{31}+x^{30} }{x^{32} + x^{26} + x^{23} + x^{22} + x^{16} + x^{12} + x^{11} + x^{10} + x^{8} + x^{7} + x^{5} + x^{4} + x^{2} + x + 1}= x^{29} + \text{the other rest}$
where $\text{the other rest} = \frac{x^{58} + \dots}{x^{32} + \dots}$.
Now the computation has to be repeated until the numerator of the last rest is a polynomial of degree 31 or less. Then we have to stop, there is nothing more to divide. We call the numerator the remainder of the division, which represents our CRC-32 value. Doing this by hand in the way I described it is rather tedious. Note that we actually do not care about the “non-remainder” (or “whole”) part in the end. The remainder is nothing else than the original expression minus the (whole part times generator polynomial). Note that “minus” is the same as XOR in $\mathbb{Z}_2$. Therefore, for every true bit of the padded message we can subsequently XOR it with the generator polynomial from left to right.
Example: our first division is in this formalism

 0110011101101111011011110110001000000000000000000000000000000000
^100000100110000010001110110110111000000000000000000000000000000
=  10011001011111001010000000111111000000000000000000000000000000

The number of padded zeroes to the generator polynomial is the highest power of the whole part, here $30 \leftrightarrow x^{30}+\text{the rest}$. One repeats this step, always “eliminating” the leftmost of the previous result. For leading zero-valued bits nothing has to be done, the corresponding coefficient in the whole part is zero. The entire calculation in its full length is

0110011101101111011011110110001000000000000000000000000000000000
100000100110000010001110110110111000000000000000000000000000000
10011001011111001010000000111111000000000000000000000000000000
10000010011000001000111011011011100000000000000000000000000000
11011000111000010111011100100100000000000000000000000000000
10000010011000001000111011011011100000000000000000000000000
1011010100000011111100111111111100000000000000000000000000
1000001001100000100011101101101110000000000000000000000000
11011101100011011111010010010010000000000000000000000000
10000010011000001000111011011011100000000000000000000000
1011111111011010111101001001001100000000000000000000000
1000001001100000100011101101101110000000000000000000000
11110110111010011110100100100010000000000000000000000
10000010011000001000111011011011100000000000000000000
1110100100010010110011111111001100000000000000000000
1000001001100000100011101101101110000000000000000000
110101101110010010000010010100010000000000000000000
100000100110000010001110110110111000000000000000000
10101001000010000001100100010101000000000000000000
10000010011000001000111011011011100000000000000000
101011011010001001011111001110100000000000000000
100000100110000010001110110110111000000000000000
1011111100001011010001111000011000000000000000
1000001001100000100011101101101110000000000000
11110101101011110010010101110110000000000000
10000010011000001000111011011011100000000000
1110111110011111010101110101101100000000000
1000001001100000100011101101101110000000000
110110111111111110110011000000010000000000
100000100110000010001110110110111000000000
10110011001111100111101110110101000000000
10000010011000001000111011011011100000000
110001010111101111010101101110100000000
100000100110000010001110110110111000000
10001110001101101011011011000011000000
10000010011000001000111011011011100000
1100010101100011100000011000100000
1000001001100000100011101101101110
100011100000011000011110101001110
100000100110000010001110110110111
11000110011010010000011111001

5. The last line 11000110011010010000011111001 = 0x18cd20f9 = 416096505 is the final code (or sometimes called checksum).

Intermediate remark: There are many conventions for CRC-32 “out in the wild”. Sometimes the first 32 bits in the message are inverted, the result is inverted, the bit sequence of the message/result is reversed, etc. Make sure to check the conventions if you want to compare outputs by different programs.

If one appends the CRC-32 value to the original message (in 32 bit form) and then calculates the CRC-32 of it (now without padding) one obtains zero, here

0110011101101111011011110110001000011000110011010010000011111001
100000100110000010001110110110111000000000000000000000000000000
.
.
.
0

The reason is that we have added the remainder polynomial. Since adding is the same as subtracting in $\mathbb{Z}_2$ that means we have subtracted the remainder. The result of this subtraction is, by construction, remainder free under the division with the generator and thus the algorithm will output zero. This approach is rather customary.

I will close with several properties of the CRC (assume the above, i.e. appending the CRC-32 value to the message):

1. If one (and only one) bit in the message is transmitted incorrectly, CRC will detect this error. Well, it will detect that there was an error, not what kind of error. Let $m(x)$ be the message polynomial and $M(x)$ be $m(x)$ padded by the CRC remainder. Then the generator polynomial $g(x)$ divides $M(x)$ as explained. If there is an one-bit error in the transmission then one effectively sends $M(x) + x^n$ with some index $n$. But then the division by $g(x)$ has a nonzero remainder, since $x^n/g(x)$ does so. Consequently, the error is detected.
2. If exactly two bits are transmitted incorrectly (of course bits in two distinct positions, otherwise there is no error), then one effectively transmits $M(x) + x^i + x^j = M(x) + x^i(1 + x^{j-i})$, where we assume $i < j$ without loss of generality. The error is detected iff $g(x)$ does not divide $1+x^{j-i}$. For a given $g(x)$, sooner or later that will be violated for some $i-j$. The large value of $i-j$ that guarantees that $g(x)$ does not divide $1+x^{k} \,\,\,\forall \,1\leq k\leq j-i$ can be turned into a condition for a good choice for $g(x)$. In fact $g(x)= x^{32} + x^{26} +\dots$ is a constructed in such a way, maybe I discuss details in a later post.
3. If we had chosen our $g(x)$ such that its number of terms is even, that it would contain the factor $1+x$. Such a choice would detect all odd-numbered bit errors, because an error term of the type $1+x^2+x^5$ is not divisible by $1+x$. However our chosen $g(x)$ does not fulfil that requirement.
4. Burst errors are detected, provided that the length $k$ of the burst is smaller than or equal to the degree of $g(x)$. A burst error is an error of the form $e(x)=x^n(1+\dots +x^{k-1})$, i.e. an error that “begins” at position $n+k-1$ and “ends” at $n$ with arbitrary errors in between. Roughly speaking there is an error at $n$ and $n+k-1$ and anything can happen in between. Now $g(x)$ does not divide $x^n$, since $g(0)=1$, therefore the burst error can go undetected iff $g(x)$ divides $1+\dots +x^{k-1}$. That can not happen since $k-1 < \mathrm{deg}(g)$.
5. Prepending the message with zeroes does not change the CRC-value. A way around it would be to always invert the first 32 bits of the message.
6. Removing leading zeroes of the message does not change the CRC-value. This can also be fixed by inversion of the first bits, see 5.
7. Adding zeroes after $M(x)$ cannot be detected. Or, if by chance $m(x)$ is divisible by $g(x)$, then adding zeroes after $m(x)$ cannot be detected. A way out is to use the prescription of reversing the CRC values of the message.