A hash function primer II: CRC

Today I will continue the series on hash functions with the cyclic redundancy check (CRC). It is typically used to detect errors in data transmission, but NOT for direct cryptographic applications. For definiteness we’ll consider a popular variant named CRC-32. That means the output/checksum/hash will be a 32 bit number.

To make it short, CRC-32 value of a message or a stream of bits is easily calculated in four steps:

  1. Pad the message by 32 zero-valued bits.
  2. Interpret the padded message as a polynomial with coefficients in \mathbb{Z}_2.
  3. Divide this polynomial by the generator polynomial x^{32} + x^{26} + x^{23} + x^{22} + x^{16} + x^{12} + x^{11} + x^{10} + x^{8} + x^{7} + x^{5} + x^{4} + x^{2} + x + 1, which is the corresponding polynomial to the bit sequence 100000100110000010001110110110111.
  4. Take the polynomial remainder of this division. Its bit representation is the CRC-32 of the message.

Well, steps 2 and 4 are kind of metaphysical. Now let’s give an example, say the message “goob“. In binary

"goob" = 01100111011011110110111101100010

Now the steps:

  1. Padding:
    "goob" + 32 zeroes = 0110011101101111011011110110001000000000000000000000000000000000
    
  2. Well, we “interpret” the above as the polynomial
    x^{62}+x^{61}+x^{58}+x^{57}+x^{56}+x^{54}+x^{53}+x^{51}+x^{50}+x^{49}+x^{48}+x^{46}+x^{45}+x^{43}+x^{42}+x^{41}+x^{40}+x^{38}+x^{37}+x^{33}
    Note that x is only a dummy variable. Really a polynomial is only the family of coefficients, we only introduce x for convenience.
  3. Now we need to do the division
    \frac{x^{62}+x^{61}+x^{58}+x^{57}+x^{56}+x^{54}+x^{53}+x^{51}+x^{50}+x^{49}+x^{48}+x^{46}+x^{45}+x^{43}+x^{42}+x^{41}+x^{40}+x^{38}+x^{37}+x^{33}}{x^{32} + x^{26} + x^{23} + x^{22} + x^{16} + x^{12} + x^{11} + x^{10} + x^{8} + x^{7} + x^{5} + x^{4} + x^{2} + x + 1} = ?
    Remember from school how that works? To find the first “digit” we need to find something that, when multiplied by x^{32}, gives x^{62}. The answer is of course x^{30}. So lets write that down and see what remains
    \frac{x^{62}+x^{61}+x^{58}+x^{57}+x^{56}+x^{54}+x^{53}+x^{51}+x^{50}+x^{49}+x^{48}+x^{46}+x^{45}+x^{43}+x^{42}+x^{41}+x^{40}+x^{38}+x^{37}+x^{33}}{x^{32} + x^{26} + x^{23} + x^{22} + x^{16} + x^{12} + x^{11} + x^{10} + x^{8} + x^{7} + x^{5} + x^{4} + x^{2} + x + 1} = x^{30}+\text{the rest}
    where
  4. where
    \text{the rest} = \frac{x^{61}+x^{58}+x^{57}+x^{54}-x^{52}+x^{51}+x^{50}+x^{49}+x^{48}+x^{45}+x^{43}-x^{35}-x^{34}+x^{33}-x^{32}-x^{31}-x^{30}}{x^{32} + \dots} = \frac{x^{61}+x^{58}+x^{57}+x^{54}+x^{52}+x^{51}+x^{50}+x^{49}+x^{48}+x^{45}+x^{43}+x^{35}+x^{34}+x^{33}+x^{32}+x^{31}+x^{30}}{x^{32} + \dots}
    The last equality follows from the fact that in \mathbb{Z}_2 plus one is the same as minus one, +1 = -1 \mod 2.
    To continue, we need to repeat the same on “\text{the rest}“:
    \frac{x^{61}+x^{58}+x^{57}+x^{54}+x^{52}+x^{51}+x^{50}+x^{49}+x^{48}+x^{45}+x^{43}+x^{35}+x^{34}+x^{33}+x^{32}+x^{31}+x^{30} }{x^{32} + x^{26} + x^{23} + x^{22} + x^{16} + x^{12} + x^{11} + x^{10} + x^{8} + x^{7} + x^{5} + x^{4} + x^{2} + x + 1}= x^{29} + \text{the other rest}
    where \text{the other rest} = \frac{x^{58} + \dots}{x^{32} + \dots}.
    Now the computation has to be repeated until the numerator of the last rest is a polynomial of degree 31 or less. Then we have to stop, there is nothing more to divide. We call the numerator the remainder of the division, which represents our CRC-32 value. Doing this by hand in the way I described it is rather tedious. Note that we actually do not care about the “non-remainder” (or “whole”) part in the end. The remainder is nothing else than the original expression minus the (whole part times generator polynomial). Note that “minus” is the same as XOR in \mathbb{Z}_2. Therefore, for every true bit of the padded message we can subsequently XOR it with the generator polynomial from left to right.
    Example: our first division is in this formalism

     0110011101101111011011110110001000000000000000000000000000000000
     ^100000100110000010001110110110111000000000000000000000000000000
    =  10011001011111001010000000111111000000000000000000000000000000

    The number of padded zeroes to the generator polynomial is the highest power of the whole part, here 30 \leftrightarrow x^{30}+\text{the rest}. One repeats this step, always “eliminating” the leftmost of the previous result. For leading zero-valued bits nothing has to be done, the corresponding coefficient in the whole part is zero. The entire calculation in its full length is

    0110011101101111011011110110001000000000000000000000000000000000
     100000100110000010001110110110111000000000000000000000000000000
      10011001011111001010000000111111000000000000000000000000000000
      10000010011000001000111011011011100000000000000000000000000000
         11011000111000010111011100100100000000000000000000000000000
         10000010011000001000111011011011100000000000000000000000000
          1011010100000011111100111111111100000000000000000000000000
          1000001001100000100011101101101110000000000000000000000000
            11011101100011011111010010010010000000000000000000000000
            10000010011000001000111011011011100000000000000000000000
             1011111111011010111101001001001100000000000000000000000
             1000001001100000100011101101101110000000000000000000000
               11110110111010011110100100100010000000000000000000000
               10000010011000001000111011011011100000000000000000000
                1110100100010010110011111111001100000000000000000000
                1000001001100000100011101101101110000000000000000000
                 110101101110010010000010010100010000000000000000000
                 100000100110000010001110110110111000000000000000000
                  10101001000010000001100100010101000000000000000000
                  10000010011000001000111011011011100000000000000000
                    101011011010001001011111001110100000000000000000
                    100000100110000010001110110110111000000000000000
                      1011111100001011010001111000011000000000000000
                      1000001001100000100011101101101110000000000000
                        11110101101011110010010101110110000000000000
                        10000010011000001000111011011011100000000000
                         1110111110011111010101110101101100000000000
                         1000001001100000100011101101101110000000000
                          110110111111111110110011000000010000000000
                          100000100110000010001110110110111000000000
                           10110011001111100111101110110101000000000
                           10000010011000001000111011011011100000000
                             110001010111101111010101101110100000000
                             100000100110000010001110110110111000000
                              10001110001101101011011011000011000000
                              10000010011000001000111011011011100000
                                  1100010101100011100000011000100000
                                  1000001001100000100011101101101110
                                   100011100000011000011110101001110
                                   100000100110000010001110110110111
                                       11000110011010010000011111001
    
  5. The last line 11000110011010010000011111001 = 0x18cd20f9 = 416096505 is the final code (or sometimes called checksum).

Intermediate remark: There are many conventions for CRC-32 “out in the wild”. Sometimes the first 32 bits in the message are inverted, the result is inverted, the bit sequence of the message/result is reversed, etc. Make sure to check the conventions if you want to compare outputs by different programs.

If one appends the CRC-32 value to the original message (in 32 bit form) and then calculates the CRC-32 of it (now without padding) one obtains zero, here

0110011101101111011011110110001000011000110011010010000011111001
 100000100110000010001110110110111000000000000000000000000000000
                                                               .
                                                               .
                                                               .
                                                               0

The reason is that we have added the remainder polynomial. Since adding is the same as subtracting in \mathbb{Z}_2 that means we have subtracted the remainder. The result of this subtraction is, by construction, remainder free under the division with the generator and thus the algorithm will output zero. This approach is rather customary.

I will close with several properties of the CRC (assume the above, i.e. appending the CRC-32 value to the message):

  1. If one (and only one) bit in the message is transmitted incorrectly, CRC will detect this error. Well, it will detect that there was an error, not what kind of error. Let m(x) be the message polynomial and M(x) be m(x) padded by the CRC remainder. Then the generator polynomial g(x) divides M(x) as explained. If there is an one-bit error in the transmission then one effectively sends M(x) + x^n with some index n. But then the division by g(x) has a nonzero remainder, since x^n/g(x) does so. Consequently, the error is detected.
  2. If exactly two bits are transmitted incorrectly (of course bits in two distinct positions, otherwise there is no error), then one effectively transmits M(x) + x^i + x^j = M(x) + x^i(1 + x^{j-i}), where we assume i < j without loss of generality. The error is detected iff g(x) does not divide 1+x^{j-i}. For a given g(x), sooner or later that will be violated for some i-j. The large value of i-j that guarantees that g(x) does not divide 1+x^{k} \,\,\,\forall \,1\leq k\leq j-i can be turned into a condition for a good choice for g(x). In fact g(x)= x^{32} + x^{26} +\dots is a constructed in such a way, maybe I discuss details in a later post.
  3. If we had chosen our g(x) such that its number of terms is even, that it would contain the factor 1+x. Such a choice would detect all odd-numbered bit errors, because an error term of the type 1+x^2+x^5 is not divisible by 1+x. However our chosen g(x) does not fulfil that requirement.
  4. Burst errors are detected, provided that the length k of the burst is smaller than or equal to the degree of g(x). A burst error is an error of the form e(x)=x^n(1+\dots +x^{k-1}), i.e. an error that “begins” at position n+k-1 and “ends” at n with arbitrary errors in between. Roughly speaking there is an error at $n$ and n+k-1 and anything can happen in between. Now g(x) does not divide x^n, since g(0)=1, therefore the burst error can go undetected iff g(x) divides 1+\dots +x^{k-1}. That can not happen since k-1 < \mathrm{deg}(g).
  5. Prepending the message with zeroes does not change the CRC-value. A way around it would be to always invert the first 32 bits of the message.
  6. Removing leading zeroes of the message does not change the CRC-value. This can also be fixed by inversion of the first bits, see 5.
  7. Adding zeroes after M(x) cannot be detected. Or, if by chance m(x) is divisible by g(x), then adding zeroes after m(x) cannot be detected. A way out is to use the prescription of reversing the CRC values of the message.
Advertisements

About goobypl5

pizza baker, autodidact, particle physicist
This entry was posted in Hash functions and tagged , , , , . Bookmark the permalink.

Share your thoughts

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s