Families of sets

Suppose we have a collection, or family, of sets of letters, such as \(\{ \{A, B, X, Y, Z\}, \{A, C, X, Y, Z\} \}\) (a family consisting of two sets). How would you represent a family on a computer so that you could easily manipulate them? By this, we mean performing typical set-theoretic operations such as finding the union or intersection of some of the sets, testing for membership, and testing if a given set is equal to one of our sets.

The most obvious solution is bit fields: use 26 bits to represent a set, where we set the \(i\)th bit if and only if the set contains the \(i\)th letter of the alphabet. So our example family would be represented by \(\{ 1100...0111, 1010...0111 \}\).

This is a good solution for many cases. It is easy to understand and implement. Computing the union, intersection, set difference and so on is simply a matter of performing an appropriate bitwise boolean operation. We could store our sets in a hash table to quickly test whether we have a given set.

As an aside, when a family always consists of disjoint sets, the Galler-Fischer representation is more efficient than bit fields.

Multiple families

Suppose we have two or more families of sets. For example, we might have the family \(F = \{ \{A, B, X, Y, Z\}, \{A, C, X, Y, Z\} \}\), as well as \(G = \{ \{ W, X \}, \{C, D, E \} \}\). In some problems, we want to find all sets which are the union of some set in \(F\) and some set in \(G\), which we denote \(F \sqcup G\) ("F join G"). In our example,

\[ F \sqcup G = \{ \{A, B, W, X, Y, Z\}, \{A, C, W, X, Y, Z\}, \{A, B, C, D, E, X, Y, Z\}, \{A,C,D,E,X,Y,Z\} \} . \]

There are 4 sets in the result because we have 2 choices in \(F\) times 2 in \(G\), and furthermore, none of the resulting unions turn out the same.

If we use bit fields to represent sets, computing \(F \sqcup G\) means we must iterate through all sets in \(F\) and all the sets in \(G\) for a total of \(|F| |G|\) unions. Also, after each union, we must check for duplicates. For instance, \(\{ \{A\}, \{B\}\} \sqcup \{\{A\},\{B\}\}\) = \(\{\{A\},\{B\},\{A,B\}\}\) only has 3 elements, not 4.

The same goes if we replace the union operation with the intersection operation in the above, in which case we have \(F \sqcap G\) aka "F meet G" (for example, \(\{ \{A\}, \{B\}\} \sqcap \{\{A\},\{B\}\}\) = \(\{\{A\},\{B\},\phi\}\)), or some other set operation. Whichever one we do, we end up performing \(|F| |G|\) operations on bit fields.

This may be adequate if we perform few operations on families. But many problems are heavy on family algebra, and the cost of bit fields rapidly becomes prohibitive. For these, we’ll need to exploit the fact that sets of our families often have much in common (there’s a family resemblance, so to speak).


Ben Lynn blynn@cs.stanford.edu 💡