Tuesday, February 1, 2022

Babbage's Dream

 I just got a pair of 512GB MicroSD cards. Of course, any such card should be tested, since these are about the easiest things in the world to counterfeit and it can be done in a way which normally wouldn't be detected for a while.

For instance, imagine you wanted to fake a 512GB card. If it was inert plastic, it would immediately fail and the user would dispute the transaction. So instead, you make a card with less capacity (say 32GB) and reprogram it to do the following:

  • Report 512GB of capacity
  • Whenever doing a read or write, take the block address given and just mod it with the actual capacity. 
So if you write data to block 0, then to the block at 32GB, the later block would overwrite the first block. This is less than ideal for a storage device... It would work fine until you wrote 33GB on it, which might take a while.

Having said this, there are very many MicroSD cards which are counterfeit in this manner. It behooves us then to test each card immediately.

You can't just test it by writing zeros, as the card may already have zeros on it. You can't just write a fixed pattern (say 0xAA) because a smart enough counterfeit (and note that MicroSD cards have a full-blown microcontroller with its own firmware) would notice this and read back the pattern. You can't just test a small amount of it, because the card (or host) may have a cache. 

So, I have put together a program which writes the output of a cryptographically-secure random number generator to stdout. This can be directed to a file on the SD card to be tested. The chosen CSPRNG is the Keccak-1600 sponge function. We absorb any arbitrary string as the seed, then keep squeezing it forever, or until the device fills up and throws an error message. 

There is no way to beat this, since a CSPRNG by its nature is unpredictable -- there is no detectable pattern unless you know the arbitrary string used as the seed. The internal microcontroller probably could calculate it, since Keccak is a well-known standard alogrithm, but it doesn't know the seed. It's not reasonable for the counterfeiters to guess that I am going to be checking the chip this way, and to try to guess the seed (hint -- it's close to a substring of the title of this blog).

There is no way to derive the seed from the output of the CSPRNG -- that's one of the properties that makes the PRNG cryptographically secure. The stream might have a period of 2^1600 bits, but I haven't seen a proof or even any good reason to believe that Keccak is full-period when used this way. Even if it is much less, it is almost certainly much greater than the 2^42 bits it takes to fill this device. So there is no way to store or cache this stream without actually having a functional amount of storage equal to the advertised amount.

In my case, I couldn't find the exact routine I wanted, so I wrote my own based on the XKCP package (since it isn't wise to implement a cryptographic primitive yourself).

#include "KeccakSponge.h"

#include "stdio.h"


#define r 576

#define c (1600-r)


int main(int argc, const unsigned char** argv) {

  KeccakWidth1600_SpongeInstance s;

  KeccakWidth1600_SpongeInitialize(&s,r,c);

  char buf[r/8];

  KeccakWidth1600_SpongeAbsorb(&s,argv[1],strlen(argv[1]));

  if(argc==2) {

    for(;;) {

      KeccakWidth1600_SpongeSqueeze(&s,buf,sizeof(buf));

      fwrite(buf,1,sizeof(buf),stdout);

    }

  } else {

    FILE* inf=fopen(argv[2],"rb");

    char inbuf[r/8];

    size_t last_match=0;

    while(!feof(inf)) {

      KeccakWidth1600_SpongeSqueeze(&s,buf,sizeof(buf));

      size_t incount=fread(inbuf,1,sizeof(inbuf),inf);

      for(size_t i=0;i<incount;i++) {

        if(buf[i]!=inbuf[i]) {

          fclose(inf);

          printf("Different at byte %ld\n",last_match);

          return 1;

        }

        last_match++;

        if ((last_match%(1024*1024*32))==0) {

          printf(".");

          if((last_match%(1024*1024*1024))==0) {

    printf("%ld\n",last_match/(1024*1024*1024));

  }

          fflush(stdout);

        }

      }

    }

    fclose(inf);

    printf("All bytes up to %ld matched",last_match);

    return 0;

  }

}

This program takes one or two strings as command-line arguments. The first is the seed. If there is only one argument, it uses this seed as mentioned above to absorb, then squeezes the sponge an unlimited amount of times, limited only by the device filling up. It writes to stdout, so the way I use it is to pipe it through pv to see how fast it is going, and then pipe it to a file on the device under test. If the card is genuine, then this test is non-destructive. It also doesn't disturb the original exFAT filesystem formatting that the card came with -- I have heard that there is a non-obvious optimum way to format these cards, and that they come formatted optimally.

If there are two arguments, the first is still the seed, and the second is a file to check to see if it matches. It does this in the simplest way possible, by absorbing the seed, then going into a loop: squeezing once, reading the same amount from the file, and byte-for-byte checking the blocks. It prints a period every 32MiB and a number every 1024MiB. It prints the byte offset of the first non-match (and then exits) or prints the total number of bytes it checked.

As it happens, my chip worked properly. This means the system produces a stream of over 4 TRILLION bits, then perfectly reproduces those 4 TRILLION bits. That's amazing when you think about it, and is Babbage's dream. The legend goes that after being disillusioned by the number of errors in an almanac, he remarked that he wishes that the tables could be generated by steam power. One of his colleagues said "that is possible", which statement changed the course of Babbage's life. He originally wanted to use the difference engine and analytical engine to literally crank out mathematical and almanac tables, with no human intervention between the initial conditions and the printed page. The machine was intended to perform the calculation, then with the results, automatically make a plate to be used in a printing press. Many years later, the first and simplest such difference engine was finally constructed, and printed the first several integer multiples of the circle constant pi. It made a mistake in the last entry.