Chapter 4. C Craft

A few crafty C tricks.

When are size 0 arrays useful?

In some dialects, such as GNU C, these can denote flexible arrays. For example:

struct {
  int n;
  char c[0];  // Nonstandard; prefer c[].
} *foo = malloc(sizeof(*foo) + 16);

foo->n = 16;
for(int i = 0; i < 16; i++) {
  foo->c[i] = 'a' + i;

Originally, this was an improvement on ISO C90, which required arrays to be declared with size at least 1. ISO C99 added support for flexible arrays, but still bans zero-size array declarations: code adhering to the standard should omit the 0.

When are size 1 arrays useful?

Some compilers forbid size 0 arrays, so size 1 arrays must be used instead for flexible arrays.

They can also be used to give both stack and heap variables a uniform interface. In other words, we can avoid the unary "&", and always use "->" and never ".". For example:

struct stat buf[1];
stat("/tmp", buf);
printf("%d\n", (int) buf->st_ino);

struct stat *buf2 = malloc(sizeof(*buf2));
stat("/home", buf2);
printf("%d\n", (int) buf2->st_ino);

When is while(0) useful?

For macros, using a do...while(0) loop to enclose several statements allows it to be used safely in conditionals.

This is unneeded in GNU C, as parentheses turn a compound statement into an expression, a far more useful construction. For example:

#define maxint(a,b) ({int _a = (a), _b = (b); _a > _b ? _a : _b; })

This macro is safe even if the operands have side effects. A generic max macro can be written using typeof, another GNU C extension.

Iverson brackets

Many are irked by the slight mismatch between logic in C and mathematical logic. In C90, there are no booleans; instead, logical and comparison operators return 1 or 0, and any nonzero value is considered true.

C99 adds a mathematically faithful boolean type to the language, but I consider it pointless. Firstly, the slight differences have never caused problems in my code. Secondly, the && and || operations still mismatch because of lazy evaluation. Thirdly, the faux booleans of C translate naturally to machine code. Fourthly, writing "true" and "false" as "1" and "0" is widespread, convenient, and international (think Karnaugh maps).

But most importantly, C logic is superior, because we gain the power of Iverson brackets for free. For example:

if ((x > 0) + (x < 0) + (x == 0) != 1) {
  puts("law of trichotomy broken!");

Tagged pointers

Because of pointer alignment, the lowest significant bits of each pointer acts as free storage: we can tag each pointer with a boolean, or a bit more, depending on the context. We just need to remember to mask out the lowest bits when dereferencing the modified pointer.

This trick should be used sparingly; recall premature optimization is a sin.

Nested functions

Closures are cumbersome in standard C as one must provide a function pointer along with a void * pointer for arguments that have already been assigned values. Moreover, APIs may only allow function pointers, while the programmer wants to pass a closure. For example, consider counting the number of comparisons called by qsort(). It seems the only way would be to use thread-local static variables in the comparison function.

Happily, GNU C has nested functions. While not as general as true closures, they handle many common cases. Trampolining, a clever trick involving placing code on the stack means existing function call conventions are preserved, but this may be prohibited in some environments. By default OS X suffers this restriction.

Section 2.7 of "The Practice of Programming" shows why function pointers are more convenient: one only writes the iterating loop once. But sometimes we gain efficiency as well as elegance.

Consider a routine for factoring a number via trial division. One could design the API so that it places the factors on a list and returns the list. Better is an API that calls a given closure for each factor it finds. We avoid introducing a list type, and furthermore, many callers can get by without looking back at previous factors.

For example, a routine that counts the number of prime factors simply increments a counter each time a factor is found. We have saved time and memory because we avoid initializing, appending to, and freeing a list.

Routines that do need to refer to previous factors are free to define their own storage scheme rather than convert from some list format we have chosen.

The following routine performs trial division using the GMP library:

void trial_divide(void (*fun)(mpz_t factor, int multiplicity), mpz_t n) {
  mpz_t p, m, fac;
  int mul;

  mpz_set(m ,n);
  mpz_set_ui(p, 2);

  while (mpz_cmp_ui(m, 1)) {
    if (mpz_probab_prime_p(m, 10)) {
      mpz_set(p, m);
    if (mpz_divisible_p(m, p)) {
      mul = 0;
      mpz_set(fac, p);
      do {
        mpz_divexact(m, m, p);
      } while (mpz_divisible_p(m, p));
      fun(fac, mul);
    mpz_nextprime(p, p);


We can now compute the number of distinct prime factors of n as follows:

  int count = 0;
  // A nested function. I would make it anonymous if I could.
  void f(mpz_t fac, int mul) {
  trial_divide(f, n);

Computing the number of factors of n is similar:

  int count = 1;
  void f(mpz_t fac, int mul) {
    count *= mul + 1;
  trial_divide(f, n);

Goto considered helpful

Although it should be mostly avoided, there are cases when goto is the right solution.

Virtual machines

The obvious way to implement state machine is to enclose a switch statement within a loop. A pointer iterates through the bytecode, and the switch statement decides what to do for each opcode. But this requires two jumps: the break at the end of each case statement, and the jump to the top of the loop.

We can save one jump per opcode by replacing the break with a macro that uses a computed goto (a GNU C extension) to jump straight to the location that handles the next opcode. See the Dalvik VM.

Inner loops

Goto can save a few cycles in performance-critical routines, and perhaps should be tried before resorting to assembly.


Simon Tatham writes coroutines in C. Macros and gotos hijack seemingly normal function calls to do stack-based coroutines.

Binary decision diagrams

I’ve raved about this data structure before, which is often best represented as a goto-heavy program [page uses MathML].

Hiding information via file scope

Take delta.h from the git version control system:

#ifndef DELTA_H
#define DELTA_H

/* opaque object for delta index */
struct delta_index;

 * create_delta_index: compute index data from given buffer
extern struct delta_index *
create_delta_index(const void *buf, unsigned long bufsize);


The header contains the interface, the whole interface, and nothing but the interface. The internals of struct delta_index are hidden until link time: we can even select a particular implementation by providing a different object file. Most importantly, the internals are hidden from programmers, who cannot learn any implementation details from the header.

Opacity versus performance

Suppose you have a data type that is frequently used, and often has a short lifespan, just as you would use int. Then if speed is important enough, move some of the memory usage from the heap to the stack by exposing the struct. I recommend something along these lines:

struct foo_s {
  int i;
  char c;
  void *p;
typedef struct foo_s foo_t[1];
typedef struct foo_s *foo_ptr;

void foo_some_method(foo_t f);

void some_function() {
  // Using struct foo_s is like using plain old data types.
  foo_t f;

I first saw this convention in the GMP library. I use it in a library that heavily involves computational mathematics, where speed is vital. Out of laziness, and also to give the library a consistent feel, most of my data types in the library follow this convention, even when performance is less important.

By the way, POSIX stipulates identifiers should not end with "_t", which is echoed by the GCC documentation, but this rule is widely flouted.

Opt-in visibility

Compile libraries with the flag -fvisibility=hidden, which causes all symbols to be hidden by default. Then use __attribute__((visibility("default"))) for symbols that are part of your API. You can confirm it worked by running objdump or nm. See GCC’s wiki page on visibility.

Memory mapped files

Instead of opening files as streams, use memory maps. The following code shows how you could open a file for reading on Linux:

void *mmap_file(size_t *len, const char *path) {
  // open_or_die(): a wrapper around open() that quits on error.
  int fd = open_or_die(path, O_RDONLY);
  struct stat st[1];
  if (fstat(fd, st)) die_perror("fstat");
  *len = (size_t) st->st_size;
  if (!*len) {
    return NULL;
  void *map = mmap(NULL, *len, PROT_READ, MAP_PRIVATE, fd, 0);
  if (MAP_FAILED == map) die("mmap failed");
  return map;

Now you can read files as follows:

size_t len;
char *buf = mmap_file(&len, "example.txt");

and the entire contents are in the buf array. No specialized stream functions are needed; instead treat buf like any other char array, though be aware buf is not NUL-terminated.

To unmap the file, call:

munmap(map, len);

Behind the scenes, the file is not actually loaded into memory. Instead, relevant parts of the file are only loaded into memory on demand. For huge file reads, parts of the file that haven’t been examined in a while are freed from memory, so new parts can be loaded. This is done transparently by the kernel virtual memory subsystem; the programmer can act as if the whole file were in the buffer.

Incidentally, I learned the handy "die" functions from the source code of Git:

extern void die(const char *err, ...) NORETURN
    __attribute__((format (printf, 1, 2)));

void die(const char *err, ...) {
  va_list params;

  va_start(params, err);
  vfprintf(stderr, err, params);
  fputc('\n', stderr);

void die_perror(const char *err, ...) {
  va_list params;

  va_start(params, err);
  vfprintf(stderr, err, params);
  fputc('\n', stderr);

Little languages

I was once hampered by the lack of native arbitrary precision arithmetic in C: during a programming competition, I belatedly realized the input involved numbers too big to fit into a machine word or two. I scrambled to rewrite my entry to use the GMP library. Due to its assembly-like nature, I ran out of time.

Since then, I wrote a library that interprets mathematical expressions, evaluating them via GMP. Though less efficient than coding by hand, to some extent, I gained the rapid-protyping ability I associate with scripting languages. My library does no optimization, except it attempts to detect when it can use a modular exponentiation. For example, the following solves Problem 97 of Project Euler:

#include <stdio.h>
#include <gmp.h>
#include "mpx.h"

int main() {
  mpz_t z;
  mpx_eval("mpz a; a = (28433 * 2^7830457 + 1) % 10^10;", z);
  gmp_printf("%Zd\n", z);
  return 0;

One could go further and compile the expression to bytecode so subsequent runs are more efficient. Another useful extension would be to output equivalent C code to an expression.

Just as specialists in a field develop their own jargon to better communicate with one another, a C programmer can create little languages to more concisely solve problems in a specific domain. Chapter 9 of "The Practice of Programming" mentions this, and cites printf, whose format string language is so effective that other languages have adopted it.


Since C lacks reflection, and since I despise the preprocessor, I typically metaprogram by writing code that generates C code. The metalanguage should be designed for the task at hand, such as Flex and Bison for generating parsers, and C itself for more general problems.

A good example of a bad metalanguage is C++ templates (a Turing tarpit). While intellectually challenging, template metaprogramming leads to inscrutable code and slow compilation. Indeed, as templates are Turing-complete, determining whether a given program can compile is undecidable.

When feasible, I distribute the generated C code along with its source so the recipients only require a C compiler. Of course, this is impossible when the metaprogram depends on some property of the target platform, which is especially troublesome when cross compiling.