## Friday, October 12, 2012

### Battle of Compression

Everyone else was doing it, so I might as well give it a try also. Here's my use case: I want to compress the C++ source code and anything else needed to rebuild my firmware (mostly Makefiles) into one tight little package, then append that package to the firmware itself. Naturally smaller is better. Especially I would like the Bootloader++ (I'll explain it when I'm ready to publish, but its a bootloader for a Logomatic-type circuit which handles fat32 and sdhc) code and source pack to fit within the 64kiB it has allocated for itself.

So, the test case. I already have a rule in my makefile to pack the code:
$(TARGET).tar.$(TAR_EXT): $(ALLTAR)$(CC) --version --verbose > /tmp/gccversion.txt 2>&1
tar $(TAR_FORMAT)cvf$(TARGET).tar.$(TAR_EXT) -C ..$(addprefix $(TARGETBASE)/,$(ALLTAR)) /tmp/gccversion.txt

$(TARGET).tar.$(TAR_EXT).o: $(TARGET).tar.$(TAR_EXT)
$(OBJCOPY) -I binary -O elf32-littlearm$(TARGET).tar.$(TAR_EXT)$(TARGET).tar.\$(TAR_EXT).o --rename-section .data=.xz -B arm

I'm quite proud of the latter, as it packs the archive into a normal object file, which my linker script makes sure gets packed into the final firmware image, with symbols bracketing it so I can dump just the source code bundle.

Anyway, we will look at our challengers:
• No compression, just a tar file. This one is actually a bit bigger than the total of the file sizes
• gzip, the old standard, both with no special flags and with the -9 option
• compress, the really old standard .Z file using the (expired) patented LZW algorithm
• bzip2, the second generation compresion algorithm notable for both better compression and longer compression time than gzip, used both with no special flags and with the -9 option
• Lempel-Ziv-Markov algorithm, implemented as the Ubuntu command lzip and xz. third generation compression algorithm, once again better compression, once again longer time
• lzop, a compressor optimized for speed and memory consumption rather than size
• PKZIP, implemented via the zip command available in Ubuntu. This might not be a fair test, as it is not compressing the TAR file, but is in fact using its own method to compress each file individually. So, it has an index, plus each file is compressed anew, meaning there is no advantage from the previous file's compression.
• 7z, implemented via the 7z command available in Ubuntu. Same notes as with PKZIP.
• zpaq, a compressor which at each step tries several methods and picks the best. This one takes a monumental amount of time and memory, but seems to be worth it if minimum file size is the goal.

So, we notice a couple of things. One, sometimes -9 doesn't improve things measurably, and sometimes makes things worse. Next, zpaq rocks out loud as far as compressing C++ source code. It's still larger than the firmware binary image, which is 12423 bytes. It might take more time and more memory than any other compressor, but all that time and memory is in a beefy desktop machine, and not in the Loginator.