Quine Golf Rebooted

Our hex boot sector quine is a painful reminder that hexdumps are grossly inefficient, as they need two bytes to represent one. One byte per nibble, we might say.

What if we use the Z85 encoding instead? That is, what if we take basenc --z85 -d to be our compiler?

xUT&{g4@ed0j#khg18z<Hq-&TfX{)BqzDQ^{W::C]hmo#B[ts#H#Iress:/b57zJ%+ZG-4pUj[p!
.*Mn1uJ[1+[bABg&}TnYZbLHeJLyBulH%KkpeGRc[ewaDWOd9bQio?0000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000
00000000000000000000000000000330

It’s hard to argue this is reasonable programming language. One can imagine figuring out opcodes and entering them in hexadecimal manually, and indeed, this must have been done to create the first assembler. But why would anyone also Z85-encode the binary by hand?

Whether or not a plausible setting for this language exists, it’s hard to resist writing a quine. There’s a delectable tension: encoding to Z85 rather than hex requires more code, but at the same time the more efficient encoding reduces the output size. Can we come out ahead?

As before, we prepare by pointing DS:SI to the start of our code. But this time we’ll only loop 128 times, because we’ll read 4 bytes each iteration.

org 0x7c00
push 0x7c0
pop ds
xor si,si
mov cx, 128
dump:

We treat 4 bytes as a big-endian 32-bit number. Intel is little-endian, thus after each LODSW we swap the higher and lower bytes with XCHG.

We convert the number to base 85. Divisions require DX and AX, while BX holds 85, and CX and DI are required by LOOP and LODSW. This leaves the DI register for use as a temporary variable.

Since we’re in real mode, to divide a 32-bit number by 85, roughly speaking we slide a 32-bit window, taking 16-bit steps and starting from the most-significant bits. This ensures that the quotient of DX:AX divided by BX fits in AX. The process is analgous to dividing a big number by a single digit with pen and paper. The successive remainders are the base-85 digits, which we push onto the stack so we can later print them in reverse order.

For the last two digits, we know the dividend fits into AX, and that the quotient and remainder are both less than 85 thus fit in a byte. This means we can divide by BL.

As before, we abuse ZF to loop twice, by exploiting that CH = 0 and BL is nonzero (because it is 85). This saves us repeating code a couple of times.

mov bx,85
xor dx,dx

loaddivtwice:
xchg ax, di
lodsw
xchg al, ah
div bx
xor ch, bl
jnz loaddivtwice

push dx  ; 1s place

divtwice:
xor dx,dx
xchg ax,di
div bx
xchg ax,di
div bx
push dx  ; 85s then 85^2s place
xor ch, bl
jnz divtwice

div bl
mov dl,ah
push dx  ; 85^3s place
push ax  ; 85^4s place

The next step is to print the character corresponding to each base-85 digit. The numerals and letters that represent [0..61] are easy enough to support, and once again shenanigans with AAM and AAD lead to compact code.

For [62..84], we use XLAT to look up its representation in a table of size 23. Unfortunately, there seems to be no easy way to avoid the table.

quintet:
pop ax
sub al,10
jl digit
aam 26
dec ah
jng letter
mov bl, syms - 0x7c00
xlat
jmp pr

letter:
aad 256-32
add al, 65 - 0x3a

digit: add al, 0x3a
pr: mov ah, 0xe
int 0x10

We need to loop 5 times. We abuse CF (Carry Flag) to do this by repeatedly adding 52 to DH = 0. Afterwards DH will be dirty, but it gets reset to 0 at the start of the next iteration.

add dh, 52
jnc quintet

Apart from the bulky lookup table, the remainder is the same as last time.

loop dump
jmp $
syms db ".-:+=^!/*?&<>()[]{}@%$#"
times 510-($-$$) db 0
dw 0xaa55

Our hex boot sector quine takes about 100 characters, and the Z85 edition has about 130. It takes about 30 characters to encode the lookup table, so it feels like a moral victory. We’ve managed to squeeze in the logic for a base-85 conversion of a 32-bit number using only 16-bit operations and a messy mapping to ASCII for free.

If we chose the variant of Ascii85 which simply chooses 85 consecutive printable characters starting from 33 ("!"), then we would have won easily.


Ben Lynn blynn@cs.stanford.edu 💡