Today I will continue the series on hash functions with the cyclic redundancy check (CRC). It is typically used to detect errors in data transmission, but NOT for direct cryptographic applications. For definiteness we’ll consider a popular variant named CRC-32. That means the output/checksum/hash will be a 32 bit number.
To make it short, CRC-32 value of a message or a stream of bits is easily calculated in four steps:
- Pad the message by 32 zero-valued bits.
- Interpret the padded message as a polynomial with coefficients in .
- Divide this polynomial by the generator polynomial , which is the corresponding polynomial to the bit sequence
- Take the polynomial remainder of this division. Its bit representation is the CRC-32 of the message.
Well, steps 2 and 4 are kind of metaphysical. Now let’s give an example, say the message “
goob“. In binary
"goob" = 01100111011011110110111101100010
Now the steps:
"goob" + 32 zeroes = 0110011101101111011011110110001000000000000000000000000000000000
- Well, we “interpret” the above as the polynomial
Note that is only a dummy variable. Really a polynomial is only the family of coefficients, we only introduce for convenience.
- Now we need to do the division
Remember from school how that works? To find the first “digit” we need to find something that, when multiplied by , gives . The answer is of course . So lets write that down and see what remains
The last equality follows from the fact that in plus one is the same as minus one, .
To continue, we need to repeat the same on ““:
Now the computation has to be repeated until the numerator of the last rest is a polynomial of degree 31 or less. Then we have to stop, there is nothing more to divide. We call the numerator the remainder of the division, which represents our CRC-32 value. Doing this by hand in the way I described it is rather tedious. Note that we actually do not care about the “non-remainder” (or “whole”) part in the end. The remainder is nothing else than the original expression minus the (whole part times generator polynomial). Note that “minus” is the same as XOR in . Therefore, for every true bit of the padded message we can subsequently XOR it with the generator polynomial from left to right.
Example: our first division is in this formalism
0110011101101111011011110110001000000000000000000000000000000000 ^100000100110000010001110110110111000000000000000000000000000000 = 10011001011111001010000000111111000000000000000000000000000000
The number of padded zeroes to the generator polynomial is the highest power of the whole part, here . One repeats this step, always “eliminating” the leftmost of the previous result. For leading zero-valued bits nothing has to be done, the corresponding coefficient in the whole part is zero. The entire calculation in its full length is
0110011101101111011011110110001000000000000000000000000000000000 100000100110000010001110110110111000000000000000000000000000000 10011001011111001010000000111111000000000000000000000000000000 10000010011000001000111011011011100000000000000000000000000000 11011000111000010111011100100100000000000000000000000000000 10000010011000001000111011011011100000000000000000000000000 1011010100000011111100111111111100000000000000000000000000 1000001001100000100011101101101110000000000000000000000000 11011101100011011111010010010010000000000000000000000000 10000010011000001000111011011011100000000000000000000000 1011111111011010111101001001001100000000000000000000000 1000001001100000100011101101101110000000000000000000000 11110110111010011110100100100010000000000000000000000 10000010011000001000111011011011100000000000000000000 1110100100010010110011111111001100000000000000000000 1000001001100000100011101101101110000000000000000000 110101101110010010000010010100010000000000000000000 100000100110000010001110110110111000000000000000000 10101001000010000001100100010101000000000000000000 10000010011000001000111011011011100000000000000000 101011011010001001011111001110100000000000000000 100000100110000010001110110110111000000000000000 1011111100001011010001111000011000000000000000 1000001001100000100011101101101110000000000000 11110101101011110010010101110110000000000000 10000010011000001000111011011011100000000000 1110111110011111010101110101101100000000000 1000001001100000100011101101101110000000000 110110111111111110110011000000010000000000 100000100110000010001110110110111000000000 10110011001111100111101110110101000000000 10000010011000001000111011011011100000000 110001010111101111010101101110100000000 100000100110000010001110110110111000000 10001110001101101011011011000011000000 10000010011000001000111011011011100000 1100010101100011100000011000100000 1000001001100000100011101101101110 100011100000011000011110101001110 100000100110000010001110110110111 11000110011010010000011111001
- The last line
11000110011010010000011111001 = 0x18cd20f9 = 416096505is the final code (or sometimes called checksum).
Intermediate remark: There are many conventions for CRC-32 “out in the wild”. Sometimes the first 32 bits in the message are inverted, the result is inverted, the bit sequence of the message/result is reversed, etc. Make sure to check the conventions if you want to compare outputs by different programs.
If one appends the CRC-32 value to the original message (in 32 bit form) and then calculates the CRC-32 of it (now without padding) one obtains zero, here
0110011101101111011011110110001000011000110011010010000011111001 100000100110000010001110110110111000000000000000000000000000000 . . . 0
The reason is that we have added the remainder polynomial. Since adding is the same as subtracting in that means we have subtracted the remainder. The result of this subtraction is, by construction, remainder free under the division with the generator and thus the algorithm will output zero. This approach is rather customary.
I will close with several properties of the CRC (assume the above, i.e. appending the CRC-32 value to the message):
- If one (and only one) bit in the message is transmitted incorrectly, CRC will detect this error. Well, it will detect that there was an error, not what kind of error. Let be the message polynomial and be padded by the CRC remainder. Then the generator polynomial divides as explained. If there is an one-bit error in the transmission then one effectively sends with some index . But then the division by has a nonzero remainder, since does so. Consequently, the error is detected.
- If exactly two bits are transmitted incorrectly (of course bits in two distinct positions, otherwise there is no error), then one effectively transmits , where we assume without loss of generality. The error is detected iff does not divide . For a given , sooner or later that will be violated for some . The large value of that guarantees that does not divide can be turned into a condition for a good choice for . In fact is a constructed in such a way, maybe I discuss details in a later post.
- If we had chosen our such that its number of terms is even, that it would contain the factor . Such a choice would detect all odd-numbered bit errors, because an error term of the type is not divisible by . However our chosen does not fulfil that requirement.
- Burst errors are detected, provided that the length of the burst is smaller than or equal to the degree of . A burst error is an error of the form , i.e. an error that “begins” at position and “ends” at with arbitrary errors in between. Roughly speaking there is an error at $n$ and and anything can happen in between. Now does not divide , since , therefore the burst error can go undetected iff divides . That can not happen since .
- Prepending the message with zeroes does not change the CRC-value. A way around it would be to always invert the first 32 bits of the message.
- Removing leading zeroes of the message does not change the CRC-value. This can also be fixed by inversion of the first bits, see 5.
- Adding zeroes after cannot be detected. Or, if by chance is divisible by , then adding zeroes after cannot be detected. A way out is to use the prescription of reversing the CRC values of the message.