St Kwan's Home for the Terminally ADD: 2022

Friday, September 9, 2022

The One Picture that explains Phase Locked Loops

A Phase Locked Loop has always been mysterious to me until now. The following three pictures explain it all, and the third one is where the light goes on. Here are the first two, to build dramatic tension and also to do the best job I have ever seen of explaining the block diagram of a PLL. It's from Shawn Hymel's series on FPGA programming.

First diagram:

The PLL consists of three sections:

The phase detector produces a signal based on whether the reference and fed back signals are in phase. I'm not sure of the details, but it might be something as simple as comparing both signals to zero (returning 1 if positive and 0 if negative) then XORing those comparisons. If the signals are in phase, they will always be on the same side of zero, and the phase detection output will be constant. If they are out of phase, sometimes they will be on opposite sides and the phase detection will not be constant.
The low-pass filter takes the phase detection signal, treats it as a PWM, and converts it to analog just by running it throug a resistor-capacitor (RC) circuit. The output is then some analog signal that is a function of the average of the phase detection signal.
The voltage-controlled oscilator (VCO) then takes that signal as an error signal. I'm sure it does some fancy PID magic, which finds just the right output signal to keep the input error signal at zero. It feeds this to the oscillator which then runs at the commanded frequency.
The output is fed back to the phase detector to produce a proper closed-loop control system.

In this diagram, the output of the VCO is significantly out of phase with the reference, *because* it is not the right frequency. It's impossible for two signals of different frequency to stay in phase.

Second diagram:

In this diagram, the output phase has locked. The error signal from the phase detector and LPF is zero, and the controller in the VCO knows that whatever settings it is using now are correct, and keeps them there. (Note that in this diagram, the clock is an analog sine wave. Just pretend it's a digital square wave.)

Third diagram, and critical part:

This shows a clock divider in the feedback part. Digital clock dividers are relatively easy to implement, requiring a counter. To divide by N, make a counter big enough to count to N. Each input clock, increment the counter, but when the counter is about to reach N, reset it instead. If N is even, then it's pretty easy to set up some logic so that whenever the counter is in the first half of its run, a low signal is output, and vice versa. Odd is a little bit trickier, but still doable.

Multipliers on the other hand are difficult, and in fact are why we need all this fancy PLL stuff to begin with. With a PLL and a *divider* in the feedback path, we can implement a *multiplier*.

If you put a divider on the input reference signal as well, you can get frequency multiplication by any rational factor.

Tuesday, August 16, 2022

Shipometer

Having given up on #SoME2, it's time to move on to the next project. Next week I will be going on a cruise. On the cruise, and also on the plane, I wish to record GPS signals. The pocketometer has all the right sensors for that, but unfortunately is not sufficiently reliable. The Raspberry Pi has already proven itself capable of recording GPS from one of the ZED-F9R breakout boards. Now the question is if it can record the sensors on I2C, and the time pulse.

I want to see if I can do this with just the parts that I already have on my desk.

I have a belt bag big enough to hold all the sensors, the Pi, and a 20Ah USB battery pack. That would be a lot less suspicious than stuffing stuff in my pocket.

It would also be great if the Pi could act like a wifi hotspot and serve SSH through it. That way I could look at it on the phone while (literally) in flight.

The last thing that would be awesome is timer capture on the GPIO, of at least the PPS and maybe others, like the interrupt lines from the sensors. If it can't, maybe we could get a program on the Teensy that would do the timer capture and output on UART or as an I2C slave.

#SoME2 post-mortem

I did not get a video out on time for SoME2. Even for just the descoped "good part" video, I couldn't get it done in time this morning.

Thursday, August 11, 2022

Deadline pressure

I ended up going with the Kalman gain video. I will use this blog as "making of" documentation.

I am going to use PictureBox much more than Manim -- I have already forgotten how to use Manim. I think I will only need it for dancing equations, and I don't plan on using those much. MatPlotLib knows how to use TeX, so it can make nice-looking math, but can't make them dance as well.
One video, two videos, N videos? I have about 9 minutes of narration so far, and have just covered up to measurement space. It might take 20 minutes to cover the stuff I want just for linear Kalman filtering. KalmanGain implies that the most important or interesting stuff is calculating the gain. That by itself might only take a minute, but all of the pre-requisites might take even more than 20 minutes. For now, I am planning on one long video, covering transformation of uncertainty from state to measurement space and back, and the hand-waving justification for why there even is an optimal gain matrix to transform innovations back into state vector adjustments.
Pure video, or text plus video? This interacts with above. The hard thing about videos, is that the maker of the video always has to leave stuff on the cutting room floor. This leads to a contradiction: If I include everything I know, the video will be too long and boring, while if I don't, the pedants in the audience will use this as an opportunity to show how much smarter they are than me. Text plus video would allow me to put the most important bits in a video, which could be run as a continuous playlist. The text part would then include the videos, and footnotes with all the pedantry included.
Narration. I hate the sound of my recorded voice. Plus, I don't have a good audio setup yet. Therefore, I am using Amazon Polly to generate narration from a script. I am having neural British Amy read it. This is the first text-to-speech I have heard that I would say is good enough to be plausible as a human-read narration. I would say it's 80% of the way there. I just wish I could adjust the emphasis on some words. American Joanna is also good enough, but as an American, I at least subconsciously buy into the "intelligent British accent" stereotype. I know that I write slightly different for Amy than I would for myself, and a lot different than I do on this blog.

Which brings us to deadline pressure. SoME2 was announced in the middle of June, but I didn't start on it in earnest until this past Monday (August 11). If I had all the time in the world, I would make a video covering time updating, nonlinear models, and maybe various filters beyond Extended Kalman. It might take an hour, but I would break it up into roughly 20-minute chapters. With the deadline pressure, I am not going to be able to carry out this whole program. I might be limited to just the gain matrix.

I am *not* going to do things like publish one chapter before the #SoME2 deadline and extend a playlist afterwards. I am committing to publishing a complete thought, or nothing at all.

Why is #SoME2 even important? For one thing, I don't expect to win. This might be a winning concept in the right hands, but I am learning that my hands are not those. I am also learning why 3b1b only publishes as infrequently as he does. That's the non-reason, so the reason is: I like having large numbers in my YouTube channel. My last-years late #SoME1 got 104,000 views, even though it didn't make the official #SoME1 playlist.

Also, without deadline pressure, I may never actually make this. One of the issues with my projects is that whenever I am working on one, I wish I was working on another. I'm sure that's a common character flaw among humans in general, but I don't know what it's called or how to fight it.

Thursday, June 9, 2022

Decisions, Decisions

Summer of Math Exposition #2 was finally announced. Deadline is August 15, 2022, which fits in quite well with my summer plans -- it's all in between the Florida trip and the other Florida trip.

I could do the next logical step from Exponential beats All. This one was supposed to be a quick explainer for my real video idea, why a rocket doesn't need a heat shield.

Or, I could do a visualization of the Kalman filter. There are of course lots of interesting visualizations to do, but the one I am interested in is showing off the Kalman gain, as the gain matrix which is a linear unbiased estimator, and demonstrating that picking other values for the gain will necessarily produce larger covariance.

Eventually I plan on doing both, but I am leaning towards the "why doesn't a rocket need a heat shield" video. It's more physics and less math, but "math" is given an especially wide breadth in Some2.

Friday, April 29, 2022

A better C++ than C++?

tl;dr -- Is Rust the language that C++ promised but failed to deliver? It's still unclear.

Programmers are adults. We don't need our hands held, and we certainly don't need to be told "Don't do that!" The only kind of advice that is needed is "I see what you are *trying* to do. This might be a better way to do it..."

Case in point: reading data from external sensors. I am still working on the rollercoasterometer (now for 13 glorious years!) and am doing my usual fighting with C++ about getting formatted data out of a byte buffer. Back in the bad old days, we would cast the address of the buffer as a pointer to a struct, and read the data out directly. Then the compiler writers, with their fancy-smancy alignment and such, said to not do that, because there is no guarantee that the structure will line up with what you think. The compiler is free to put any amount of padding between fields, order the fields however it likes, put invisible fields like vptr, etc.

That by itself isn't bad. The bad part is when I ask how to do what I want, and the answer comes back: "Don't". There is no portable way to guarantee that any field in a struct lands anywhere. This way we can write C++ targeting the Apollo Guidance Computer, with its 15-bit words, no such concept as "byte", ones-complement arithmetic with +0 and -0, etc. It doesn't matter that basically every machine in the last 40 years has been twos complement, 8-bit bytes, and word size of a power of 2 bytes. Basically the only disagreement is endian-ness.

But, C++ won't let me take advantage of the fact that both machines in a transaction have the same native word format. No, I have to individually extract each byte, shift and OR it myself, etc, to get the data out. If I am lucky, the compiler will see that I am translating from English to English, and optimize it out.

Game designers learned long ago to learn from their users. If all the Minecraft users are building farms, then support the building of farms. If the farm depends on a glitch, consider formalizing the glitch and making it an official feature. Don't just shower them with whatever they are farming for "free", but don't take away their ability to farm either. For instance, Mojang has considered in several instances changing the mechanics of how villagers and iron golems work to discourage farming iron. They got quite a bit of pushback from the community, and have therefore backed off. They don't "support" iron farming, but they haven't removed it either.

The C++ committee on the other hand seems to be driven by two factors:

Backward compatibility indefinitely into the past
Ability of compiler writers to game benchmarks

C++ has a lot of good ideas (some might say too many) but it is a language which has evolved for decades while at the same time hasn't been able to shed old, bad ideas or old, bad implementations of good ideas.

C++ for whatever reason also has a burning hatred for the preprocessor. The preprocessor and compiler are married, but they are both trapped in a loveless marriage, and the simmering hatred has boiled over to the point where the official C++ FAQ considers macros to be "evil". Now the preprocessor does have some minuses, mainly in type checking. So, we were given constants and templates. We were told that templates would basically eliminate the need for macros. However, when we tried to use them like that, we found that the promise has not quite been kept. Templates do some but not all of the compile-time processing that we want. Constexpr functions are helping, but aren't there yet.

My use case is that I want to make a self-documenting logger. The rollercoasterometer reads a bunch of sensors, formats the data into packets, then writes the packets to a file on the SD card. Since the data from the sensors doesn't naturally come in packets, and doesn't naturally have a timestamp, the main firmware timestamps the data and formats it into packets. Since I write the firmware, I also have to write the code on the host which interprets the packets. I came up with a [[clever idea]] to have the program emit a series of packet-documentation packets as it starts up. One way to do it is to have the program create a documentation packet the first time it emits each packet, and I have done this. It looks something like this:

void start_packet(apid, packet_desc_str) {

if apid is not yet documented:

write a documentation packet for this packet, using the packet_desc_str

start the packet, perhaps to a backup buffer if we are documenting the packet

}

void fill<type>(data_value, field_desc_str) {

if apid is not yet documented:

write a documentation packer for this field, using the field_desc_str

write the field to the packet, perhaps to the backup buffer

}

void finish(apid) {

if the apid is being documented:

take note that we have documented it and don't do it next time

finish the packet and write it, perhaps from the backup buffer

}

Note that we need to have two buffers. It would be far better if we did something like this:

void start_packet(apid,packet_desc_str) {

compile_time_code {

create a packet describing this packet in the packet description block. This block will be a read-only blob as far as the runtime code is concerned

}

start the packet, no need for backup buffer

}

void fill<type>(data_value,field_desc_str) {

compile_time_code {

Add a packet describing this field to the packet description block

}

add the field to the packet

}

void finish() {

finish the packet and write it

}

void setup() {

open sd card

write packet description block

set up sensors etc

}

void loop() {

read sensor

start_packet(0x1234,"sensor packet");

fillu16(value_from_sensor,"value from sensor")

...

finish_packet()

}

Preprocessor macros might be able to do it, but the preprocessor is Evil. Templates might be able to do it, but it might require template metaprogramming, which is actually evil. It seems like there isn't a way to do it in C++, certainly not a clean way. Therefore we are forced to either use the official preprocessor, write our own preprocessor (which has its own headaches), or do it at runtime.

The language which might be a better C++ than C++ is Rust. This is a statically typed language which is designed to be compiled into good machine code (the reference compiler uses LLVM as its back end) but with some features added and some taken away. I'm not sure if I like mutability and ownership yet, and haven't gotten used to the concept of borrowing yet, but it does look like there is support for forcing a struct to land on certain bytes, and it does look like (with procedural macros) there might be enough compiler support for compile-time computation.

To test this out, I am going to work on three projects:

A conversion of kwantrace to Rust, to experiment with plain application-domain programming.
A packet parser for reading rollercoasterometer logs
Firmware for the rollercoasterometer

The last one is probably not going to be ready in time for my next expedition to a roller coaster.

Tuesday, February 1, 2022

Babbage's Dream

I just got a pair of 512GB MicroSD cards. Of course, any such card should be tested, since these are about the easiest things in the world to counterfeit and it can be done in a way which normally wouldn't be detected for a while.

For instance, imagine you wanted to fake a 512GB card. If it was inert plastic, it would immediately fail and the user would dispute the transaction. So instead, you make a card with less capacity (say 32GB) and reprogram it to do the following:

Report 512GB of capacity
Whenever doing a read or write, take the block address given and just mod it with the actual capacity.

So if you write data to block 0, then to the block at 32GB, the later block would overwrite the first block. This is less than ideal for a storage device... It would work fine until you wrote 33GB on it, which might take a while.

Having said this, there are very many MicroSD cards which are counterfeit in this manner. It behooves us then to test each card immediately.

You can't just test it by writing zeros, as the card may already have zeros on it. You can't just write a fixed pattern (say 0xAA) because a smart enough counterfeit (and note that MicroSD cards have a full-blown microcontroller with its own firmware) would notice this and read back the pattern. You can't just test a small amount of it, because the card (or host) may have a cache.

So, I have put together a program which writes the output of a cryptographically-secure random number generator to stdout. This can be directed to a file on the SD card to be tested. The chosen CSPRNG is the Keccak-1600 sponge function. We absorb any arbitrary string as the seed, then keep squeezing it forever, or until the device fills up and throws an error message.

There is no way to beat this, since a CSPRNG by its nature is unpredictable -- there is no detectable pattern unless you know the arbitrary string used as the seed. The internal microcontroller probably could calculate it, since Keccak is a well-known standard alogrithm, but it doesn't know the seed. It's not reasonable for the counterfeiters to guess that I am going to be checking the chip this way, and to try to guess the seed (hint -- it's close to a substring of the title of this blog).

There is no way to derive the seed from the output of the CSPRNG -- that's one of the properties that makes the PRNG cryptographically secure. The stream might have a period of 2^1600 bits, but I haven't seen a proof or even any good reason to believe that Keccak is full-period when used this way. Even if it is much less, it is almost certainly much greater than the 2^42 bits it takes to fill this device. So there is no way to store or cache this stream without actually having a functional amount of storage equal to the advertised amount.

In my case, I couldn't find the exact routine I wanted, so I wrote my own based on the XKCP package (since it isn't wise to implement a cryptographic primitive yourself).

#include "KeccakSponge.h"
#include "stdio.h"

#define r 576
#define c (1600-r)

int main(int argc, const unsigned char** argv) {
KeccakWidth1600_SpongeInstance s;
KeccakWidth1600_SpongeInitialize(&s,r,c);
char buf[r/8];
KeccakWidth1600_SpongeAbsorb(&s,argv[1],strlen(argv[1]));
if(argc==2) {
for(;;) {
KeccakWidth1600_SpongeSqueeze(&s,buf,sizeof(buf));
fwrite(buf,1,sizeof(buf),stdout);
}
} else {
FILE* inf=fopen(argv[2],"rb");
char inbuf[r/8];
size_t last_match=0;
while(!feof(inf)) {
KeccakWidth1600_SpongeSqueeze(&s,buf,sizeof(buf));
size_t incount=fread(inbuf,1,sizeof(inbuf),inf);
for(size_t i=0;i<incount;i++) {
if(buf[i]!=inbuf[i]) {
fclose(inf);
printf("Different at byte %ld\n",last_match);
return 1;
}
last_match++;
if ((last_match%(1024*1024*32))==0) {
printf(".");
if((last_match%(1024*1024*1024))==0) {
printf("%ld\n",last_match/(1024*1024*1024));
}
fflush(stdout);
}
}
}
fclose(inf);
printf("All bytes up to %ld matched",last_match);
return 0;
}
}

This program takes one or two strings as command-line arguments. The first is the seed. If there is only one argument, it uses this seed as mentioned above to absorb, then squeezes the sponge an unlimited amount of times, limited only by the device filling up. It writes to stdout, so the way I use it is to pipe it through pv to see how fast it is going, and then pipe it to a file on the device under test. If the card is genuine, then this test is non-destructive. It also doesn't disturb the original exFAT filesystem formatting that the card came with -- I have heard that there is a non-obvious optimum way to format these cards, and that they come formatted optimally.

If there are two arguments, the first is still the seed, and the second is a file to check to see if it matches. It does this in the simplest way possible, by absorbing the seed, then going into a loop: squeezing once, reading the same amount from the file, and byte-for-byte checking the blocks. It prints a period every 32MiB and a number every 1024MiB. It prints the byte offset of the first non-match (and then exits) or prints the total number of bytes it checked.

As it happens, my chip worked properly. This means the system produces a stream of over 4 TRILLION bits, then perfectly reproduces those 4 TRILLION bits. That's amazing when you think about it, and is Babbage's dream. The legend goes that after being disillusioned by the number of errors in an almanac, he remarked that he wishes that the tables could be generated by steam power. One of his colleagues said "that is possible", which statement changed the course of Babbage's life. He originally wanted to use the difference engine and analytical engine to literally crank out mathematical and almanac tables, with no human intervention between the initial conditions and the printed page. The machine was intended to perform the calculation, then with the results, automatically make a plate to be used in a printing press. Many years later, the first and simplest such difference engine was finally constructed, and printed the first several integer multiples of the circle constant pi. It made a mistake in the last entry.