tr
The humble tr tool is surprisingly handy. It readily disposes of many little tasks:
-
conversion of newlines from one operating system to another
-
subsitution ciphers
-
extraction of, say, alphabetic characters from a file
-
changing lowercase to uppercase or vice versa
-
replacing consecutive spaces with a single space
Let’s look at a simplified tr, which only translates (it cannot delete nor squeeze) and only supports set of single characters (no ranges, escapes, classes).
#include <stdarg.h> #include <stdio.h> #include <stdlib.h> void die(const char *err, ...) { va_list params; va_start(params, err); vfprintf(stderr, err, params); fputc('\n', stderr); exit(1); va_end(params); } int main(int argc, char **argv) { if (argc < 2) die("tr: missing operand"); if (argc < 3) die("tr: missing operand after `%s'", argv[1]); if (argc > 3) die("tr: extra operand `%s'", argv[2]); char tab[256]; for(int i=0; i<256; i++) tab[i] = i; char *q = argv[2]; for(char *p = argv[1]; *p; p++) { tab[(unsigned int)*p] = *q; if (*(q+1)) q++; } int c; while(EOF != (c = getchar())) { if (EOF == putchar(tab[c])) perror("tr"), exit(1); } if (ferror(stdin)) perror("tr"), exit(1); return 0; }
UTF-8
This time, instead of moving to a Go program that behaves identically, we take advantage of Go’s features to make our program more versatile. Our Go version supports UTF-8, despite resembling the C original.
We use a map instead of an array, because there are much more than 256 Unicode characters. Go thankfully provides a built-in map type; in C, we’d have to supply our own.
package main import("bufio";"os";"fmt";"flag") func die(s string, v... interface{}) { fmt.Fprintf(os.Stderr, "tu: "); fmt.Fprintf(os.Stderr, s, v...); fmt.Fprintf(os.Stderr, "\n"); os.Exit(1) } func main() { flag.Parse() if 1 > flag.NArg() { die("missing operand"); } if 2 > flag.NArg() { die("missing operand after `%s'", flag.Arg(0)); } if 2 < flag.NArg() { die("extra operand after `%s'", flag.Arg(1)); } tab := make(map[int]int) set1 := []int(flag.Arg(0)) set2 := []int(flag.Arg(1)) j := 0 for i := 0; i < len(set1); i++ { tab[set1[i]] = set2[j] if j < len(set2) - 1 { j++ } } in := bufio.NewReader(os.Stdin) out := bufio.NewWriter(os.Stdout) flush := func() { if er := out.Flush(); er != nil { die("flush: %s", er.String()) } } writeRune := func(r int) { if _, er := out.WriteRune(r); er != nil { die("write: %s", er.String()) } } for done := false; !done; { switch r,_,er := in.ReadRune(); er { case os.EOF: done = true case nil: if s,found := tab[r]; found { writeRune(s) } else { writeRune(r) } if '\n' == r { flush() } default: die("%s: %s", os.Stdin.Name(), er.String()) } } flush() }
Then if the binary is named tu:
$ tu 0123456789 〇一二三四五六七八九 <<< 31415 三一四一五
Full translation
A complete tr utility takes a bit more work. For a classic version, we can get by with manipulating arrays of size 256. For a Unicode-aware version, complications arise with set complements and ranges.