cksum

I had trouble unearthing the CRC used by GNU’s cksum. Its manpage mentions the particular algorithm is a POSIX standard. On seeing cksum print 32-bit integers on test inputs, I assumed the algorithm was CRC-32, the most popular 32-bit CRC.

I eventually realized I should have searched for “posix cksum”. Although it shares the same polynomial, the algorithm differs slightly from CRC-32.

Here’s a simple C implementation (typical implementations use a lookup table for the polynomial reduction):

#include<stdint.h>
#include<stdio.h>
int main() {
  unsigned int p = 0x04C11DB7, r = 0;
  void f(char c) {
    for(int i = 7; i >= 0; i--) {
      int msb = r & (1<<31);
      r <<= 1;
      if (msb) r = r ^ p;
    }
    r ^= c;
  }
  int c, len = 0;
  while(EOF != (c = getchar())) {
    len++;
    f(c);
  }
  int n = len;
  do {
    f(n & 0xff);
    n >>= 8;
  } while(n);
  f(0), f(0), f(0), f(0);
  printf("%u %u\n", ~r, len);
  return 0;
}

and a Go version, with support for filename arguments:

package main
import ("bufio";"flag";"fmt";"io";"os")
func main() {
  cksum := func(file *os.File) {
    in := bufio.NewReader(file)
    var p, r uint
    p = 0x04C11DB7
    r = 0
    f := func(b byte) {
      for i := 7; i >= 0; i-- {
        msb := r & (1<<31)
        r = r<<1
        if msb != 0 { r = r^p }
      }
      r ^= uint(b)
    }
    n := 0
    for done := false; !done; {
      switch b,er := in.ReadByte(); er {
      case io.EOF: done = true
      case nil:
        f(b)
        n++
      default:
        println("cksum:", file.Name() + ":", er)
        return
      }
    }
    for m := n;; {
      f(byte(m) & 0xff)
      m = m >> 8
      if m == 0 { break }
    }
    f(0); f(0); f(0); f(0)
    fmt.Print(^r, n);
    if file == os.Stdin {
      fmt.Println()
    } else {
      fmt.Printf(" %v\n", file.Name())
    }
  }
  flag.Parse()
  if 0 == flag.NArg() {
    cksum(os.Stdin)
    return
  }
  for i := 0; i < flag.NArg(); i++ {
    if flag.Arg(i) == "-" {
      cksum(os.Stdin)
      continue
    }
    f,er := os.Open(flag.Arg(i))
    if f == nil {
      println("cksum:", er)
      continue
    }
    cksum(f)
    f.Close()
  }
}

Bits

Go seems at least as good as C when it comes to bit-fiddling. One might argue it is better, because of little differences:

  • the operator precedence is more natural

  • bitwise NOT is "^", rather than "~"

  • the dedicated "&^" operator combines AND and NOT.

  • shifts are defined in the specification, rather than left to the implementation; shifts are arithmetic when the first operand is signed, and logical otherwise.

  • lack of implicit casting reduces bugs

The CRC32 package

A reader noted that Go has a crc32 package, which can compute more commonly encountered CRC-32 checksums.

Unfortunately, I can’t remember how cksum differs (perhaps it goes MSB first instead of LSB first or something like that); it’s been a while since I looked into it, and I’m busy.





Ben Lynn blynn@cs.stanford.edu