Thursday, February 16, 2012

Memory Dumping

I mentioned below that I had developed a method of attaching all the source code to the firmware and then having that very firmware dump itself to its log. The problem is that it takes a long long time to do this. Partly this is due to the one per second packet dump rate, and partly because hexadecimal coding doubles the data size and adding packet headers adds yet more data.

So, here's what we do. For binary packets, we just do it. Dump 64 bytes at a time into a header with 8 bytes of overhead.

For NMEA packets, we do something smart. We use Base85 encoding. This is a method of encoding 4 bytes of arbitrary binary data with no restrictions, into 5 characters carefully chosen to all be printable. Base 85 is the smallest base where 5 symbols can be used to encode the 4Gi combinations possible from 4 bytes. There are 94 printable characters, so base85 will work. Base84 won't encode 4 bytes, and Base86 doesn't gain anything.

Postscript standard Ascii85 has a couple of cool features but also a couple of drawbacks. They just use the number calculated plus 33, so as to use characters from "!" to "u". If four consecutive zero characters are transmitted, these are compressed to a single "z". If four consecutive spaces are transmitted, this goes as a "y". There are several more characters that can be used, but one of the drawbacks are that "," and "*" are used. Comma is not a killer, but star is, since it marks the end of the packet.


Since there are 94 printable characters, we can decline to use certain characters like comma and star, to fit the NMEA structure. So, one thing to do is to switch out star, and might as well do comma as well. Star turns to "v" and comma turns to "w".

The compression mentioned above is nice, but since the LPC uses NAND flash, "empty" areas are actually filled with 0xFF, not 0x00. So, we will steal "x" for encoding 4 consecutive 0xff.

Another alternative is to just make an array of 85 characters in any order that we want. We have already given up Postscript compatibility, so why be constrained by it? The Wikipedia article mentioned above mentions RFC1924, which uses 0-9, A-Z, a-z and !#$%&()*+-;<=>?@^_`{|}~ in that order. It refrains from using ("',./:[]\) since those are harder to escape. We could choose one of those in place of star, and several other for compression.

No comments:

Post a Comment