A hash function primer II: CRC

Today I will continue the series on hash functions with the cyclic redundancy check (CRC). It is typically used to detect errors in data transmission, but NOT for direct cryptographic applications. For definiteness we’ll consider a popular variant named CRC-32. That means the output/checksum/hash will be a 32 bit number.

To make it short, CRC-32 value of a message or a stream of bits is easily calculated in four steps:

  1. Pad the message by 32 zero-valued bits.
  2. Interpret the padded message as a polynomial with coefficients in \mathbb{Z}_2.
  3. Divide this polynomial by the generator polynomial x^{32} + x^{26} + x^{23} + x^{22} + x^{16} + x^{12} + x^{11} + x^{10} + x^{8} + x^{7} + x^{5} + x^{4} + x^{2} + x + 1, which is the corresponding polynomial to the bit sequence 100000100110000010001110110110111.
  4. Take the polynomial remainder of this division. Its bit representation is the CRC-32 of the message.

Well, steps 2 and 4 are kind of metaphysical. Now let’s give an example, say the message “goob“. In binary

"goob" = 01100111011011110110111101100010

Now the steps:

  1. Padding:
    "goob" + 32 zeroes = 0110011101101111011011110110001000000000000000000000000000000000
  2. Well, we “interpret” the above as the polynomial
    Note that x is only a dummy variable. Really a polynomial is only the family of coefficients, we only introduce x for convenience.
  3. Now we need to do the division
    \frac{x^{62}+x^{61}+x^{58}+x^{57}+x^{56}+x^{54}+x^{53}+x^{51}+x^{50}+x^{49}+x^{48}+x^{46}+x^{45}+x^{43}+x^{42}+x^{41}+x^{40}+x^{38}+x^{37}+x^{33}}{x^{32} + x^{26} + x^{23} + x^{22} + x^{16} + x^{12} + x^{11} + x^{10} + x^{8} + x^{7} + x^{5} + x^{4} + x^{2} + x + 1} = ?
    Remember from school how that works? To find the first “digit” we need to find something that, when multiplied by x^{32}, gives x^{62}. The answer is of course x^{30}. So lets write that down and see what remains
    \frac{x^{62}+x^{61}+x^{58}+x^{57}+x^{56}+x^{54}+x^{53}+x^{51}+x^{50}+x^{49}+x^{48}+x^{46}+x^{45}+x^{43}+x^{42}+x^{41}+x^{40}+x^{38}+x^{37}+x^{33}}{x^{32} + x^{26} + x^{23} + x^{22} + x^{16} + x^{12} + x^{11} + x^{10} + x^{8} + x^{7} + x^{5} + x^{4} + x^{2} + x + 1} = x^{30}+\text{the rest}
  4. where
    \text{the rest} = \frac{x^{61}+x^{58}+x^{57}+x^{54}-x^{52}+x^{51}+x^{50}+x^{49}+x^{48}+x^{45}+x^{43}-x^{35}-x^{34}+x^{33}-x^{32}-x^{31}-x^{30}}{x^{32} + \dots} = \frac{x^{61}+x^{58}+x^{57}+x^{54}+x^{52}+x^{51}+x^{50}+x^{49}+x^{48}+x^{45}+x^{43}+x^{35}+x^{34}+x^{33}+x^{32}+x^{31}+x^{30}}{x^{32} + \dots}
    The last equality follows from the fact that in \mathbb{Z}_2 plus one is the same as minus one, +1 = -1 \mod 2.
    To continue, we need to repeat the same on “\text{the rest}“:
    \frac{x^{61}+x^{58}+x^{57}+x^{54}+x^{52}+x^{51}+x^{50}+x^{49}+x^{48}+x^{45}+x^{43}+x^{35}+x^{34}+x^{33}+x^{32}+x^{31}+x^{30} }{x^{32} + x^{26} + x^{23} + x^{22} + x^{16} + x^{12} + x^{11} + x^{10} + x^{8} + x^{7} + x^{5} + x^{4} + x^{2} + x + 1}= x^{29} + \text{the other rest}
    where \text{the other rest} = \frac{x^{58} + \dots}{x^{32} + \dots}.
    Now the computation has to be repeated until the numerator of the last rest is a polynomial of degree 31 or less. Then we have to stop, there is nothing more to divide. We call the numerator the remainder of the division, which represents our CRC-32 value. Doing this by hand in the way I described it is rather tedious. Note that we actually do not care about the “non-remainder” (or “whole”) part in the end. The remainder is nothing else than the original expression minus the (whole part times generator polynomial). Note that “minus” is the same as XOR in \mathbb{Z}_2. Therefore, for every true bit of the padded message we can subsequently XOR it with the generator polynomial from left to right.
    Example: our first division is in this formalism

    =  10011001011111001010000000111111000000000000000000000000000000

    The number of padded zeroes to the generator polynomial is the highest power of the whole part, here 30 \leftrightarrow x^{30}+\text{the rest}. One repeats this step, always “eliminating” the leftmost of the previous result. For leading zero-valued bits nothing has to be done, the corresponding coefficient in the whole part is zero. The entire calculation in its full length is

  5. The last line 11000110011010010000011111001 = 0x18cd20f9 = 416096505 is the final code (or sometimes called checksum).

Intermediate remark: There are many conventions for CRC-32 “out in the wild”. Sometimes the first 32 bits in the message are inverted, the result is inverted, the bit sequence of the message/result is reversed, etc. Make sure to check the conventions if you want to compare outputs by different programs.

If one appends the CRC-32 value to the original message (in 32 bit form) and then calculates the CRC-32 of it (now without padding) one obtains zero, here


The reason is that we have added the remainder polynomial. Since adding is the same as subtracting in \mathbb{Z}_2 that means we have subtracted the remainder. The result of this subtraction is, by construction, remainder free under the division with the generator and thus the algorithm will output zero. This approach is rather customary.

I will close with several properties of the CRC (assume the above, i.e. appending the CRC-32 value to the message):

  1. If one (and only one) bit in the message is transmitted incorrectly, CRC will detect this error. Well, it will detect that there was an error, not what kind of error. Let m(x) be the message polynomial and M(x) be m(x) padded by the CRC remainder. Then the generator polynomial g(x) divides M(x) as explained. If there is an one-bit error in the transmission then one effectively sends M(x) + x^n with some index n. But then the division by g(x) has a nonzero remainder, since x^n/g(x) does so. Consequently, the error is detected.
  2. If exactly two bits are transmitted incorrectly (of course bits in two distinct positions, otherwise there is no error), then one effectively transmits M(x) + x^i + x^j = M(x) + x^i(1 + x^{j-i}), where we assume i < j without loss of generality. The error is detected iff g(x) does not divide 1+x^{j-i}. For a given g(x), sooner or later that will be violated for some i-j. The large value of i-j that guarantees that g(x) does not divide 1+x^{k} \,\,\,\forall \,1\leq k\leq j-i can be turned into a condition for a good choice for g(x). In fact g(x)= x^{32} + x^{26} +\dots is a constructed in such a way, maybe I discuss details in a later post.
  3. If we had chosen our g(x) such that its number of terms is even, that it would contain the factor 1+x. Such a choice would detect all odd-numbered bit errors, because an error term of the type 1+x^2+x^5 is not divisible by 1+x. However our chosen g(x) does not fulfil that requirement.
  4. Burst errors are detected, provided that the length k of the burst is smaller than or equal to the degree of g(x). A burst error is an error of the form e(x)=x^n(1+\dots +x^{k-1}), i.e. an error that “begins” at position n+k-1 and “ends” at n with arbitrary errors in between. Roughly speaking there is an error at $n$ and n+k-1 and anything can happen in between. Now g(x) does not divide x^n, since g(0)=1, therefore the burst error can go undetected iff g(x) divides 1+\dots +x^{k-1}. That can not happen since k-1 < \mathrm{deg}(g).
  5. Prepending the message with zeroes does not change the CRC-value. A way around it would be to always invert the first 32 bits of the message.
  6. Removing leading zeroes of the message does not change the CRC-value. This can also be fixed by inversion of the first bits, see 5.
  7. Adding zeroes after M(x) cannot be detected. Or, if by chance m(x) is divisible by g(x), then adding zeroes after m(x) cannot be detected. A way out is to use the prescription of reversing the CRC values of the message.

About goobypl5

pizza baker, autodidact, particle physicist
This entry was posted in Hash functions and tagged , , , , . Bookmark the permalink.

Share your thoughts

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s