## Wednesday, April 27, 2011

### Further AVC post-mortem and future plans

"Steve is one of the brightest guys I've ever worked with - brilliant; but when we decided to do a microprocessor on our own, I made two great decisions - I gave them [Steve Furber and Sophie Wilson] two things which National, Intel and Motorola had never given their design teams: the first was no money; the second was no people. The only way they could do it was to keep it really simple." -- Hermann Hauser, from the Gnuarm web site
In a one-man organization, we replace people with time. I got the no people (time) part right, but not the no money. Towards the end I was burning out (or so I thought) and replacing $50 parts at an alarming rate. So, for next year, a budget of$100 plus the parts I already have. Since those parts include almost an entire robot, I should be fine. Once the car works, another $100 for the helicopter, but not before. I am not sure about things like a Dremel mini-drill-press or other hardware like that with purposes beyond building robots, but gotten for building robots. ## Tuesday, April 26, 2011 ### Source Code Embedding I am quite proud of this, and it took me several iterations to get right. Once upon a time I wanted to know about how the In-Application Programmer (IAP) worked. This is a bit of code loaded onto the LPC214x by its manufacturer. I wanted to see what special thing it was doing that my program couldn't. So, I wrote a little doodad into Logomatic Kwan which caused it to go through flash memory and dump its content into the normal log file. This would include the Sparkfun bootloader at the bottom of memory, my own code starting at 0x10000, and the IAP up at 0x7D000. I used$PKWNH packets and simple hex encoding.

I didn't end up doing any serious poking around with the IAP (it still is a mystery) but I was interested in dumping the firmware itself as part of the metadata for any particular run. The problem is that it takes a megabyte and at least a minute of run time before the dump is completed and a normal log run commences. So, I put a switch into the config file to turn it on or off.

Also, the actual bits of the program are interesting but hard to interpret. Better would be the original source code. So, I tweaked the make file to tack onto the end of FW.SFE a listing (ls -l) of the code used to build it, then all the files concatenated together. Now all the source code is there, but pretty hard to get to, being encoded in hex and quite difficult to parse out. Plus, it was in danger of not fitting into memory.

So, the next idea is to use that same file listing, but instead of doing a simple concatenate, concatenating a gzipped tar file. This is what Tar exists for, and it does a better job than anything I could roll on my own. Mostly it's better because a tar/untar utility already exists in Linux, and I would have to write one for my roll-my-own solution. Plus, the compression saved something like 80% of the space.

Then I (mis)remembered something we did with AIM/CIPS flight software. We wanted to check if the software was ever corrupted by things like radiation hits, so it was told to take a CRC of a series of blocks of memory which should have been constant, and transmit the CRC down. The misremembering is that originally I thought that it actually transmitted down the whole firmware one packet at a time. Since they are much better at NASA at configuration control than I am at home, they always know what software should be on the spacecraft, so they don't need to send the firmware down. But I do. So, I just had it do the old firmware dump routine, but one packet at a time, once every time a GPS update comes in. (Update: I finally re-remembered. We did have a memory dump command which would act pretty much as described. We said dump this memory bank from this address to that address, and it sent down multiple packets, one per second, with that data until it was finished. We could even dump RAM, so we could dump a chunk of memory that was changing as we dumped it. We could even dump the FPGA registers.)

Finally, I wrote a perl script which extracted the flash dump from a logging run, translated it from hex to binary, and then wrote out four files: Bootloader, User firmware, User firmware source tar file, and IAP.

Unfortunately, at 5 packets per second, the firmware dump now takes 54 minutes to complete.

Linkers are pretty smart, but they need some help to tell them what to do. That is where the linker script comes in. Mine originally descended from the Sparkfun Logomatic firmware (like my whole program) where it is called main_memory_block.ld , but it turned out to have a flaw.

I remember back in the 90s when I was first learning a real language, Turbo Pascal. Don't laugh. One of the features of its compiler/linker was what it called "smart linking" where it would only include code and/or data in the final executable if it had a chance to be used. It eliminated dead code.

GNU ld by default doesn't do this. The linker is a lot simpler and dumber than we think. For instance, the objects that it links are composed of sections and labels. It uses the labels to do the linking, but the sections are trackless expanses of bytes. They might be code, they might be data, they might be unlabeled static functions or read-only data. The linker doesn't know, it can't know.

In order to do a semblance of smart linking, GCC has a feature called -ffunction-sections and -fdata-sections. This tells GCC that when it is building an ELF object, it should put each function into its own section. The code goes into .text.function_name, while zeroed data goes into .bss.variable_name and initialized data goes into .data.variable_name.

The complementary feature in ld is called --gc-sections. The linker script tells the linker where to start, where the true program entry point is. All labels used in the section where the entry point is, are live. All labels used in the sections where those live labels are are live also, and so on and so on. Any section which has no live labels is dead, and the linker doesn't put the dead sections into the final executable.

With the much smaller sections provided by -ffunction-sections, the .text section is no longer a trackless waste of bytes. It's probably empty, in fact. All the functions live in their own sections, so the linker can know what is what, and can remove dead code. These complementary features are the GCC implementation of smart linking.

However, when the linker is done garbage collecting the dead code, the linker script might tell it to bundle together all the sections whose names match a pattern, together into one section in the output. No one is going to try to link the output into some other program, so this is ok. The Sparkfun linker script bundled together all the .text.whatever sections into a single .text section in the output, and all the .data.whatever, but not the .bss.whatever. This is important, because the linker creates a label at the beginning and end of the .bss section, and a block of code in Startup.S, the true entry point, fills memory between the labels with zeros. With all these unbundled .bss sections, the final .bss was very small and did not include all the variables, so some of the variables I expected to be zeroed, were not. This is a Bad Thing. Among other things, it meant that my circular buffer of 1024 bytes had its head pointer at byte 1734787890. As the wise man Strong Bad once said, "That is not a small number. That is a big number!"

It turns out this does not have anything to do with C++. I turned on -ffunction-sections to try and reduce the bloat from the system libraries needed in C++, but if I had turned it on in C, the same thing would have happened.

The fix:

Open up main_memory_block.ld . Inside is a section like this:

 /* .bss section which is used for uninitialized (zeroed) data */ .bss (NOLOAD) : { __bss_start = . ; __bss_start__ = . ; *(.bss) *(.gnu.linkonce.b*) *(COMMON) . = ALIGN(4); } > RAM . = ALIGN(4); __bss_end__ = . ; PROVIDE (__bss_end = .); 

This says among other things, that in the output file, the .bss section (which has a new label __bss_start at its beginning) includes all .bss sections from all the input object files. It also creates a new label at the end of the section called naturally enough, __bss_end . The startup code zeroes out all the RAM between these two labels.

The problem is that *(.bss) only includes the .bss sections from each object, not the .bss.variable stuff. So, change the

 *(.bss) 

line to:

 *(.bss .bss.*) 

This says to bundle in not only all the .bss sections from all the inputs, but also all the sections which start with .bss. in their names. Now with them all bundled, the bss zero-er will do the right thing, and the program will work.

## Monday, April 25, 2011

### C++ for Java programmers - part 1

1. You need a constructor, and it needs to initialize every field

When in C, I had this nice circular buffer structure. It lived in the .bss (zeroed out) data section, which was great, because all the pointers were supposed to start at zero.

When this was converted to a C++ class, it became obvious that the fields were uninitialized, and not zeroed out.

I actually suspect that it's not C++, but -fdata-sections. This breaks out every global variable into its own section. Zeroed variables are in sections called .bss._Zmangled_variable_name . This is great for removing dead code, but means that the startup code is not smart enough to do all the .bss.* sections

But: with a constructor which zeroes all the fields, it works.

## Sunday, April 24, 2011

### ARM C++ libraries

Add -fno-exception -ffunction-sections -fdata-sections to the compiler command line

So now we have our main program translated into C++. This mostly involved just cleaning up some warnings and using extern "C" where we needed. Now it's C++, but still not object oriented.

So, we take our four separate but similar sensor handling code blocks and make them all inherit from a common sensor ancestor.

By the way, if you want to learn object-oriented philosophy, I found it much easier to learn in Java. This is mostly because everything is an object, and there are no other ways to do things like there are in C with pointers and especially function pointers. Once you have learned the philosophy, it is easy to apply to any object-oriented language. You just have to learn how to say in your new language what you wanted to say in Java.

So, we write our new sensor base class, our specific accelerometer, gyro, compass, etc, derived classes, get it all to compile, and now it won't link. The linker is whining about not being able to find various vtables.

It turns out that when you start actually using objects, you have to include a new library, libstdc++. This usually means changing your makefile. In my case, my makefile is derived from the one which came from the Sparkfun Logomatic firmware, and the needed switches were already there, just commented out.

CPLUSPLUS_LIB = -lstdc++

This makes the linker pull in everything it needs to support inheritance and polymorphism. Now the vtables exist, and it compiles! Yay!

But now the final firmware is almost twice the size. Yukari is the code intended to go on the AVC, while Candace is the code intended for the rocketometer.

ProgramSize in hexSize in decimal
Yukari - C code only0x1a1f8107000
Candace - C++ code0x19xxx[1]~107000
Candace - Objects0x2a764173924

[1] I didn't get an exact measurement when it was in C++ with no objects, but it was similar in size to Yukari

Where did all that extra 80k come from? On a logomatic, it doesn't matter quite so much, but it's still a pain. So, what is all this extra code (and especially data)? Exception handling. I think in particular it is called the unwind tables, and includes among other things the name of each and every function in your program, along with demangler code, error message printing code, and stack backtrace code. This is a feature which is wound all through Java, but kind of bolted onto the side of C++. You don't need to use it, and I imagine in an embedded system, you mostly can't use it. I know that I don't.

So we can just turn off the exception system, right? Well, yes and no. First, let's just add that -fno-exceptions flag. It turns out that there was such a flag in my makefile all along. It also has -fno-rtti. I don't know what that does, but we will take these two flags as a unit.

FlagsSize in hexSize in decimal
-fno-exceptions -fno-rtti0x2a6c0173760

A measly couple of hundred bytes. What if we told it to not generate unwind tables specifically?

FlagsSize in hexSize in decimal
above plus -fno-unwind-tables0x2a6c0173760

Exactly the same size.

But Arduino uses C++ and doesn't fill memory to overflowing with these tables. What is it doing? Let's see. For compiling, it has -ffunction-sections and -fdata-sections, and to link it has -Wl,--gc-sections. What this is doing is telling the compiler to put each and every function into its own section in the object file, instead of the usual bundle everything into the .text section, and putting the data for each function, and each global variable, into its own section. Then it tells the linker to garbage-collect sections, that is, remove sections from the final executable which are never used. Does this work?

FlagsSize in hexSize in decimal
above plus -ffunction-sections -fdata-sections0x2aae0174816
above plus -Wl,--gc-sections0x1738495108

That's got it! Yes, I admit the code is rather heavy, but it includes among other things, a 3600 entry sine table. It fits, and it doesn't have the gross bloat that the unwind tables added. Apparently since this is smaller than Yukari, there must have been some dead code there that this section garbage-collect removed.

The linker provides a convenient map file which tells among other things, which sections were not included at all. Some of these are things I know were dead code, and some others I know were used. What seems to be happening is that these functions are used only in one module and the compiler decided to inline all actual uses of the function. It also wrote a standalone function, but the linker saw that nothing used it.

Some strangeness: I had a function with static local variables. That function pulled in symbols called "guard" and "unguard" and those pulled in tons of unwind stuff. I then pulled them out into static constants outside the function, then pulled them back in as static local variables, and then there was no guard and no code bloat.

### C++ really is a different language

When I write code, I tend to think of C++ as a superset of C. Everything I write in C is basically portable to C++ without too much trouble. However, mostly because of name mangling, C and C++ really are different under the hood. Without some help, C and C++ code could not be linked. The C++ compiler mangles every name. It mangles the names of methods so that classes have separate name spaces. It mangles the names of functions so that they can be overloaded. But, since it can't know in advance when a name will be overloaded, it mangles every single name.

So let's say we have a low-level library written in C, let's say for the filesystem. It has a body in a .c file somewhere, and a .h file describing the interface. This .h file is valid C and valid C++, but they mean different things in the different languages.

When you compile the library itself in C, it creates a bunch of functions with names like fat16_open_file, fat16_write, etc. Those names are written into the machine language .o file which is the final output of the compiler.

Now your higher-level program in C++ includes that same header. When it sees a function like:

int fat16_write(struct fat16_root*, char* data, int size)

It quite naturally assumes that the function is really called __QQNRUG_hhqnfat16_write_struct_fat16_rootP_cP_I_sghjc. How could it be otherwise? The C++ compiler then happily writes code which calls this mangled function. When the linker comes along, well, you know the rest.

So how do we fix this? One way is to recompile the lower-level library in C++, but for various reasons that may not be possible or a good idea.

So, we use an interesting capability in C++ to handle code in other languages. I don't know how general this mechanism is, and I think it is used about 99.99% of the time for this purpose.

extern "C" {
#include "fat16.h"
}

What this extern statement does is say that everything inside the brackets is in a different language, not C++. In this case, the header is to be treated as C code, and the names are not mangled. I imagine this is some exceptionally difficult code inside the compiler, and works on languages which are only sufficiently close to C++, but it works, and is needed to allow the adoption of C++ in the presence of so much existing C code.

### ARM C++

Actually, the first thing is to translate it to C++. The sensors are crying out for inheritance. A little of this, a bit of that, ok, it compiles in C++. There are quite a few things that were warnings in C that are errors in C++, and some errors are caught by the linker instead of the compiler. These latter things are mostly caused by function overloading and its companion mechanism, name mangling.

With a clean toolchain, like Java, you can do overloading easily and cleanly. All the tools are designed to support it. However, with C++, only the compiler is C++. It has to translate C++ code with overloaded function names into something that the assembler and linker can understand. If you try to make an object with two symbols with the same name, the next step in the toolchain (in this case, the assembler) will complain. If you try to trick the assembler by putting the two functions in two different objects, then the linker gets confused and complains. So, it is up to the compiler. What happens is that the compiler, when it generates its assembly language output, gives the two overloaded functions different names. It tacks on extra stuff to the name depending on the types of the parameters it takes and on the return type. Then, when the function is called, the compiler figures out the types of the parameters, then tacks on the appropriate extra stuff to call the correct function. This is called name mangling. It's something that took me a long time to appreciate, and maybe I still don't. I don't like it, but at least I understand it.

The thing is then, when translating from C to C++, things like mismatched function prototypes between the header and the code used to be caught by the compiler. Now, the C++ compiler thinks the header describes one function and the mismatched body describes a completely different function. So, it happily compiles the one in the body, for which there is now no interface, and when it compiles another file that uses this header, it happily generates code to call a function which doesn't exist. So, you compile all 572 .cpp files and they all go through without an error, but then the linker says something totally opaque like:

cannot find symbol: __ZZZ__KQNNfoo_I_I_JJnncxa

and you are all like: The function is called foo(int,int)! Where did all that other stuff come from? That is the power and mystery of name mangling. The error will be reported in the object that tried to call foo(int,int), but the problem is in the header that defined foo(int,int) and the body which defined foo(int, int, struct bar*). Hopefully you can see the original name under the mangling and know or can find where that foo() function is. Then make the headers and bodies match.

## Saturday, April 23, 2011

### AVC report

And my report on the AVC is that... my robot didn't make it to AVC. If St. Kwan's were an old style Soviet dictatorship, I would deny ever building a robot in the first place, erase this blog, then mock all those who wasted their time in this silly competition in the first place.

What I learned from trying to build a robot in a week is that... it is not possible for me to build a robot in a week. Maybe someone else could, but not me.

Instead, I am going to finish work on the rocketometer, and include a couple of interesting things I thought of but didn't have time to implement during this last hectic couple of weeks.

First is interrupt-driven I2c.

## Friday, April 22, 2011

### It's a long shot - I'm running out of ideas.

If I had another two weeks, I could do this. As is...

I am working on control by GPS course only. I just couldn't get the compass to work. It worked perfectly in the basement when the main wheels were held off the ground, but failed utterly at the park.

GPS has its own flaws, but I am down to my last option. I had to have my friend talk me out of giving up. It may not matter soon anyway.

Once I get control by GPS working, then I need guidance by GPS waypoint and bluetooth. Fortunately I have had a bluetooth attached to this before, all I need is a control protocol based on NMEA, an interpreter in the robot, and a sender in a laptop.

## Tuesday, April 19, 2011

### The hurrieder I go, the behinder I get

On the plus side: The Logomatic is commanding the Arduino which is commanding the ESC and steering. The compass works and the thing knows how to go to magnetic north.

On the minus side: I burned up the gyro. I might not need it, but I wanted this device to be a rocketometer also.

Also: the thing is Too Fast. Odd that, for a car which is supposed to be in a race. I won't be able to keep up with it on foot.

## Saturday, April 16, 2011

### Yukari Chassis - Warranty Voided!

Yukari is built on an Electrix RC 1/10 stadium truck (whatever that is) with a rather simple and hackable electrical system.

The receiver there in the middle is the brain of the operation. It takes 5V power from the ESC battery eliminator, and controls the two motors with two PWM signals running at 71Hz. Note carefully that the receiver draws current from the ESC and supplies it to the servo.

Neutral for steering is 1.54ms, full right is 2.0ms, and full left is 1.1ms.

Neutral for the drive motor is 1.50ms, full forward is 1.0ms, and full reverse is 2.0ms.

The steering signal is around 3.3Vp-p, while the drive is a sudden rise to 3.4V followed by what appears to be an exponential decay from 3.4V to 5V.

My clever plan is to put an engine room board where the receiver used to go.

### "Whoa, fire, fire!"

That was the sound of Test 15, which was intended to be the first free flight. It ended up being the last flight. I was holding the tail as usual, and could see the engine room electronics inside start to spark and smolder. I let the magic smoke out :( Arisa is done for the year.
 H-bridge chip with the magic smoke let out
But I'm not. I am going to see this through to the end. So, with a small change in design, I proudly introduce Yukari!
 Arisa is dead, long live Yukari!

## Friday, April 15, 2011

### Efficiency of the Kalman Filter

So as I said before, my single-loop design doesn't have any hard scheduling limits or 1201 sensitivity like the Apollo computers. However, since my filter is run so much more frequently, it is more important for it to be quick.

So, in the first, I just did a textbook implementation of the filter, with no particular optimization. I follow the twin laws of optimization:

Law 1: Don't do it.
Law 2 (for advanced programmers only!): Don't do it yet.

This worked acceptably well with just the 7-element orientation filter. It semed to run at about 15ms, sometimes hitting its 10ms target, and sometimes missing. There was still about 45% idle time, suggesting that it was really close to 10ms.

But, features gradually accumulated, especially a whole spiderweb of pointers in an attempt to use object-oriented features in a non-object-oriented language. Also, I actually added some control code. This starting raising the average time to closer to 20ms.

Finally, I added the position Kalman filter. Without even updating the acceleration sensors at all, the time used jumped to well over 100ms, clearly excessive.

So,  I have done an operation count analysis of the filter. Since the filter depends so heavily on matrix multiplication, and since matrix multiplication is an O(m^3) task, we would expect that doubling the size of the state vector would multiply the time by 8, and this seems to be the case. However, 55% of the time in the un-optimized case, and a staggering 81% of the time in a slightly optimized case, is spent on the time update of the covariance. Here's the first form, with just the orientation state vector

 mul add div % of mul % of add % of div time usec % of time A=1 0 0 0.00% 0.00% 0.00% 0 0.00% A+=A*Phi*dt 392 343 30.27% 30.22% 0.00% 1374.705 30.21% x+=F*dt 7 7 0.54% 0.62% 0.00% 26.4096 0.58% P=APA'+Q 686 637 52.97% 56.12% 0.00% 2483.908 54.59% H=H 0 0 0.00% 0.00% 0.00% 0 0.00% HP=H*P 49 42 3.78% 3.70% 0.00% 169.9768 3.74% Gamma=HP*H'+R 7 7 0.54% 0.62% 0.00% 26.4096 0.58% K=(P/Gamma)H' 98 42 1 7.57% 3.70% 100.00% 255.1912 5.61% Z=z-g 0 1 0.00% 0.09% 0.00% 2.1272 0.05% X=KZ 7 0 0.54% 0.00% 0.00% 11.5192 0.25% x=x+X 0 7 0.00% 0.62% 0.00% 14.8904 0.33% P=P-KHP 49 49 3.78% 4.32% 0.00% 184.8672 4.06% 1295 1135 1 100.00% 100.00% 100.00% 4550.004 100.00% time cost 2.057 2.659 5.725 MHz factor 1.6456 2.1272 4.58 time usec 2131.052 2414.372 4.58 4550.004

There are a couple of blazingly obvious and not-quite-so-obvious optimizations we can do here. First, in A+=A*Phi*dt, I am multiplying by an identity matrix. That's kinda silly, but still costs m^3 operations. So, we optimize that bit.

Secondly, something has been bugging me for a while and I finally solved it. For Gamma, we need to calculate HPH'. Now we use HP as an intermediate result later, but we also use PH', and it bugged me to calculate both. Finally, I worked it out. If I keep HP, I am left with K=PH'/Gamma. But, HP=HP'=(PH')', so we can say that K'=(PH')/Gamma=HP/Gamma. All we need to do is transpose HP and copy it to the storage for K, multiplying each element by 1/Gamma as we do so.

This brings us to here:

 mul add div % of mul % of add % of div time % of time 0.00% 0.00% 0.00% 0 0.00% A=Phi*dt+diag(1) 49 7 5.69% 0.98% 0.00% 95.5248 3.25% x+=F*dt 7 7 0.81% 0.98% 0.00% 26.4096 0.90% P=APA'+diag(Q) 686 595 79.67% 83.22% 0.00% 2394.566 81.38% H=H 0 0 0.00% 0.00% 0.00% 0 0.00% HP=H*P 49 42 5.69% 5.87% 0.00% 169.9768 5.78% Gamma=HP*H'+R 7 7 0.81% 0.98% 0.00% 26.4096 0.90% K=(HP)'/Gamma 7 0 1 0.81% 0.00% 100.00% 16.0992 0.55% Z=z-g 0 1 0.00% 0.14% 0.00% 2.1272 0.07% X=KZ 7 0 0.81% 0.00% 0.00% 11.5192 0.39% x=x+X 0 7 0.00% 0.98% 0.00% 14.8904 0.51% P=P-KHP 49 49 5.69% 6.85% 0.00% 184.8672 6.28% 861 715 1 100.00% 100.00% 100.00% 2942.39 100.00% time cost 2.057 2.659 5.725 MHz factor 1.6456 2.1272 4.58 1416.862 1520.948 4.58 2942.39

This represents a 35% improvement already. Woohoo.

That time update of covariance is still a problem. What if we just don't do it? You may think I am joking, but this can actually be done in some problems, and is called offline gain calculation. In some problems, the value of P converges on a final answer. When it does so, and if H is also constant, K also converges. What is done in this case is to just calculate the final values of P and K from the problem, and don't even do them in the loop.

Unfortunately, this is usually only possible in a linear problem, which I don't think the IMU filter is. I'm certainly not treating it as linear. But hopefully P may change slowly enough that we don't need to do a full update every time. We accumulate A over several time updates, and only use it to update P one every in-a-while. We are already not doing the time update if the time difference is zero, so the 80% of this time is not done as often as one might think. This change just does the expensive half of the time update even less often.

Also, it seems like there must be a way to optimize the APA' operation. It just has so much symmetry. Maybe there is a way to calculate only one triangle of this

### Major and Minor Loops

...and why I don't use them.

Back in the days of Apollo, they used a Kalman filter and control loop similar in principle to what I am doing. However, due to the g l a c i a l  s l o w n e s s of the computers of the era, they had to structure their code quite differently. It took about 1 second to update the Kalman filter and calculate all the control coefficients, and it took a bit less than 50% of the available time to run the control loop. This is with an IMU sensor which automatically mechanically handled orientation and integration of acceleration into velocity.

So what they did was split the code into a major and minor loop. The concepts were different, but we can imagine it like this. In the background, a loop running in non-interrupt priority repeatedly ran the Kalman filter. This is the major loop. It took about 1 second to do an update, and was required to finish and start on the next update every two seconds. This sounds generous, but in the mean time there is a timer interrupt going off something like 20Hz which runs control loop code. This is the minor loop. If the interrupt fires every 50ms, maybe it takes 20ms to run the control code. There's also lots of other interrupts coming in from various sensors. In one notorious case, the encoder on the rendezvous radar mechanism was jiggling quickly between two bits and sending an interrupt each time, at something like 6400Hz. This used up all the remaining time, and then some, which caused the background loop to be late.

The way their code was structured, they fired off a main loop every 2 seconds, expecting the previous one to be done by the time the next one started. With all this extra stuff going on, the background loop never finished, and effectively got left on a stack when another run of the main loop started. Eventually the stack overflowed, the system detected this, reset itself, and cleared all the old main loop iterations off the stack. It's a pretty good example of exception handling, but it generated the infamous 1201 alarm.

The root cause is that the Kalman filter loop had to run in 2 seconds or less, and this is because several constants, such as delta-t, the time between updates, was hard-coded. There was a plan to remove this limitation, so that a new iteration was kicked off when the old one finished, instead of every two seconds. This new design was implemented, but never flew.

Returning to 2011, we are using the Logomatic, which runs at 703 times the frequency and perhaps 1406 times the instruction rate, since the Apollo computers took a minimum of two cycles to execute, and I suspect most insructions in an ARM7TDMI run at around one per cycle, or maybe better.

Because this system is so much faster, we have the luxury of a simpler design. The system has a main loop, executed continuously (it's inside a for(;;) loop). There is a timer interrupt running at 100Hz, but all it does is set a flag. When the main loop comes back around, it checks if the flag is set, and if so, reads the sensors, runs the Kalman filter, calculates the control values, and sets the motor throttles. All this is done inside of one iteration of the loop. It may happen that all this stuff takes longer than 10ms, in which case the interrupt fires during the main calculation part. But, all this does is set that flag, so that the main loop knows to read the sensor the next time it comes around. All the Kalman code knows how to deal with variable-time loops, so it doesn't matter if the loop takes 10,20, or 50ms. Of course the system degrades and becomes more susceptible to noise as it slows, but this is a gradual degradation. There is no hard limit like in Apollo, and no split between major and minor loops, either.

## Thursday, April 14, 2011

### Parallel Filters

First: Matrix multiplication is O(n^3), or more particularly O(mnp) when multiplying an mxn by an nxp matrix. This means that doubling the length of the state vector increases the amount of work for the filter by about 8 (a little less, as sometimes we have 1xm by mx1 and such). In any case, it is large. Maybe we don't have to.

The only reason I wanted to combine the filters in the first place is so that the rotation part of the filter can take advantage of the acceleration part. Since the acceleration part is not symmetrical (gravity is there also) I figured that the filter would use the acceleration to adjust the orientation, but as it turns out, it doesn't. The acceleration measurement has no effect whatsoever on the orientation. So, no point in keeping them in the same filter.

Actually the two filters do have an effect on each other, and it's not a good one. I put together the two filters into one 16-element superfilter at about 10x the cost of the old orientation-only filter. I didn't feed it any acceleration updates, and yet the acceleration did update. Based on no information whatsoever, the thing thought it was 100m away from the starting point after 1 minute. I have no idea how the gyro updated the acceleration

So, break the filter in half.

### More crazy ideas

This time for during/after the competition. First, if I think that I deserve a "victory lap" I will program the thing to go hover and spin over the center of the pond during the lap.

Secondly, the "Viking Funeral". When I am done with the robot, I go out to the middle of a big field, and program it to fly straight up. As soon as it sees that it is no longer going up even with full throttle, it cuts all power, and tumbles back to Earth. Very Antimov, if I do say so.

Also, I am now putting in acceleration into my IMU Kalman filter. I could split the filter into two pieces, one for the gyro and one for the IMU. But, I still have this sneaking suspicion that knowing the current acceleration (including gravity) will help out with the orientation. If I prove it doesn't, I think I save a factor of about 4 by splitting it. The order of the problem goes from O(16^3)=~4096 to O(7^3)+O(9^3)=~(343+729)=~1072.

## Wednesday, April 13, 2011

### Literate Programming

Nothing I am doing on this project is pioneering. Nothing is original. It is just applying known solutions, figured out by other people, to my particular problem. I don't have an issue with this.

However, the reason I am doing it is partly to learn about the problem and its solution, and partly because I really want the rollercoasterometer. When I am done, my solution will exist in the form of a robot, all the code for it, this blog, and my other notes in other places. Somehow this should all be unified.

In Garden of Rama, Arthur C. Clarke mentions the artificial intelligence which controls Rama, making a report to its masters. He describes the report as being in a language which is not really translatable into English, or any human language, given that it is highly precise and has all the data necessary to support each conclusion attached and linked. In order to unify my report on my project, I need something like this language. However, Clarke is quite wrong, and was wrong when he wrote it, as we have such a language. We call it HTML.

So, what I want is some document form that I can run through one filter and get compilable code, and another and get a net of HTML pages. I want to write in wiki markup, to allow inclusion of math, pictures, videos, etc. I want to be able to attach Eagle files and generate schematics and board printouts. I also want a pony.

What are the closest existing solutions? The title gives this away. I want what Knuth described as Literate Programming. I want to be able to read my report and program like I read a good novel or textbook. But, I don't want to change the language. In a program, I want the document to be compilable directly without changes.

I don't want Javadoc or something like it adapted to C. This is really just for documenting at a low level. I want higher level stuff. I want to describe my design without immediate reference to any attached code. I want to have a section explaining the derivation of the various flavors of the Kalman filter, which is totally irrelevant to the program, since the filter is already derived.

I want something like a wiki, but with a blog. Every time I build the project, I want the documentation form directly uploaded to the wiki of my choice.

### This thing might actually work...

As I said before, a minor protocol change in the I2C link between the bridge and engine room fixed the little "no B rotor spinning" problem. The bad news is that I got all the parts I needed to build a new engine room, when I don't need them.

I have completed decorating the vehicle (except for signing it, that comes when it's done, probably on AVC day) by putting the official Kwan Systems roundel on the tail fin. I also glued the onboard camera there. I kept looking for a place to stick the camera in the cockpit, and nothing really worked well. So, I attached it to the tail fin. This is a full two feet behind the main body, to give a chase plane feel without a chase plane.

The video below shows the first onboard video taken from this unique viewpoint, during a test of the yaw control on the aircraft.

I need a control loop with a large derivative term on my mood, to control these oscillations. When it works, I feel great, every time I start it up, I scare myself, and when there is any tiny bug, my mood crashes.

## Tuesday, April 12, 2011

### Control Implementation

1) Do a bunch of refactoring to get the Kalman filter nice and portable.
2) Write a control loop structure which holds two Kalman filters and the Kp, Ki, and Kd.
3) Ignore the Kalman stuff to begin with. Just do a P controller on yaw.
4) Integrate the motor control stuff into Arisa.
5) Put Kyoko on the engine room board
6) Test the motor control stuff open-loop.

After quite some effort, I finally got Arisa and Kyoko talking to each other again. I had to cut all the traces on the engine room board for the SPI bus, turn I2C down to 200kHz (it's not really running that fast anyway) and put a couple of software patches into both programs, but Kyoko is now spinning the tail rotor on command from Arisa.

The bad news is that it is only spinning main rotor A. I think I probably damaged the connection to main rotor B when I was cutting traces. If this is it, I am going to build another engine room board. I am also planning on building a safety circuit to cut power to the engine room without an explicit command from the Arduino.

But, first thing's first. Get rotor B spinning, and the yaw control loop working. AVC looms.

Update: Good news! It is fixable in software. The problem is that the Arduino doesn't respond properly to two consecutive commands. So, we will just put all the motor commands into one packet.

## Monday, April 11, 2011

### Control Practice

To do this control loop, we will have a function that is passed in the commanded value of the state and the current value. It will maintain two Kalman filters, one for the commanded value and one for the state error. The one for the control value is to calculate the rate of change of the commanded value, and the one for the state error will track the current error as well as its integral and derivative. It will also have a vector of Kp, Ki, and Kd, the proportionality coefficients for all the control terms.

With the command Kalman filter, the captain is free to suddenly change the command in a step manner from one value to another. Imagine what would happen in the landing routine when the commaded pitch is set to zero and the commanded altitude is suddenly set from five feet to zero. The proportional error would become very large instantly, and the controller would likely cut the power completely, leaving the aircraft to fall out of the sky. The controller plans to correct this once the vehicle has passed through zero altitude, but I think that the parking lot may have something to say about that as well.

With the Kalman filter on the commanded altitude, the command received by the control loop is not allowed to change instantly. The command will change smoothly, allowing the controller time to track it, and the derivative term will not act on the rate of change of the state variable, but the error between that and the commanded rate.

There are two Kalman filters because they are independent, and we don't want to pay the n^3 penalty. We will use the extended form of the velocity filter, because we already have code to drive that. For the state variable error, we will use the following state:

x=<P,I,D>

with a physics function

F(x)=<D,P,0>

and a predicted observation function

g(x)=P

with the actual observation as actual-commanded. The other is just a plain Kalman velocity driven by the commanded value.

So, the control function is handed a commanded value, an actual value, runs these Kalman filters, then figures the control value u=Kp*P+Ki*I+Kd*D. This control value can then be used to drive the motors. The altitude control loop generates a main rotor average throttle Ma, and the yaw loop generates a value Md. The tail rotor generates Mt. The super-controller which calls the control functions then commands M1=Ma+Md, M2=Ma-Md, and MT=Mt.

If it's so simple, why haven't I implemented it yet?

### Control Theory

Error
Anyway, this control loop is given by navigation a current state element, and by guidance a desired value for that state element. The control loop then figures out the correct setting for the control that acts on that state element. To make it concrete, let's think about pitch. The guidance program knows that to maintain a certain speed, a certain pitch is required. It calculates that pitch (actually the sine of that pitch) and passes it as a commanded value to the control loop. The navigation program knows the current quaternion. It transforms a vector pointing towards the nose in the body frame into the inertial frame. The vertical component of this vector is the sine of the pitch. It will be positive if the nose is up, and negative conversely. This is the actual value. The job of the control system is to make the commanded and actual values match. When the values match, the error value is zero. When they are different, the error value is the difference between the two. So, to make them match, the control system tries to drive the error to zero.

Stability
Some systems are naturally stable. The classic stable system is a marble in a bowl. "An airplane wants to fly." Left to itself, it will largely fly straight and level. Any small disturbance like a gust of wind may knock the plane off of straight and level, but it will naturally correct itself. The aerodynamics designers work long and hard to give an airframe this stability, but once its there, the plane is stable largely without any effort by the pilot, who is then free to navigate and guide the plane. The airframe itself acts as the control system.

Some systems are unstable. The classic unstable system is a marble on a dome or a pencil balanced on its point. Any small disturbance in this system will be amplified, with the system falling farther and farther off balance. The Wright Flyer I was unstable, intentionally, and required a well-trained pilot as the control system to keep it nose into-the-wind. Anyone who has balanced a broomstick on end, or ridden a bicycle, has been a control system for an unstable system.

Some systems are neutrally stable. The classic example is a marble on a flat table top. If it is disturbed, it will not depart from its original condition, but will not return either. Steering in a car is neutrally stable.

One way to think about a control system is as a modification to a system to make it stable, and in particular stable around the desired value. You can think of it this way, that the control system is like a bunch of springs attached to the marble surrounding the desired point. These springs are strong enough to push the marble back to the desired point and overwhelm any instability in the system. The new system is stable, and stable at the point you choose.

The PID controller

The helicopter I have is stable in roll, neutrally stable in yaw and altitude, and unstable in pitch. There is no control for roll, and since it is stable in roll, none is necessary. The system will then run three independent control loops for main rotor average thrust to control altitude, main rotor differential torque to control yaw and tail rotor thrust to control pitch. These three variables are largely decoupled, and so can be handled independently.

The controller will be a PID controller, standing for Proportional, Integral, and Derivative. The control loop will look at the difference between the actual and commanded state, and generate a control signal proportional to the error. It will look at the total integrated error over the past, and generate a control signal proportional to this integral. It will look at the current rate of change of the error, and generate a signal proportional to this differential. It will then add them all up and use that control signal to drive the correct control.

The proportional part is obvious. If there is an error, you want to correct it, and drive the error back to zero. So, the proportional part is just the current value of the error multiplied by a coefficient. If the error is positive, you generally want to push negative, to push the error down, and conversely for negative error, so this proportional coefficient is negative.

The derivative part is there to keep the system from oscillating. If you have ever worked with differential equations, you have probably learned all about simple and damped oscillators. A simple oscillator is like a weight on a spring, or a pendulum. The restoring force is always proportional to the distance from equilibrium, so it is like the proportional control term. However, a system with just a proportional term will just keep bouncing around, always oscillating around the desired point with the same amplitude and frequency forever. To prevent this, we have a damping term. We drive the control system opposite to the direction of motion of the system, proportional to the speed. With enough of this term, any oscillations, either natural or driven by the proportional system, are damped out to zero.

The integral part is there for two reasons. The first is this: The proportional and derivative terms are somewhat at crossed purposes. The proportional part is trying to move the system back to the control value, while the derivative part is trying to stop the system completely. These two parts could balance out, leaving the system in equilibrium but not at the chosen point. The integral term is proportional to the integrated error in the system. Even if the error is constant, because the proportional and derivative terms have balanced out, the integral of the error will get larger and larger. So we attach a control to this integral error, proportional and opposite as usual.  This way, a constant error will lead to a larger and larger integral term, which will eventually break the deadlock between P and D

Secondly, if a system is not just unstable but actively pushed, a pure proportional controller will not be able to drive the system to zero error. Imagine a ball on an inclined plane. There is a disturbing force proportional to gravity on the ball. When the ball is in position, the P and D terms are zero, there is no control force, and the disturbing force pushes the ball away. The system will come to rest when the proportional term and disturbing term are equal, but this can only happen when the error is nonzero, since the proportional term is proportional to the error. The system will come to rest at a nonzero error. Wikipedia calls this phenomenon droop. Droop is bad.

In both cases, the integral term exists to destroy nonzero equilibrium error. It will grow larger and larger as needed. It can't be bargained with. It can't be reasoned with. It doesn't feel pity, or remorse, or fear. And it absolutely will not stop, ever, until the error is dead.

In our case, the pitch is unstable, because the center of gravity of the vehicle is not under the main rotors. If it were, pitch would be neutrally stable. This could in principle be achieved by shifting the components, or by adding ballast weight. But it would have to be totally accurate, and I don't think I can achieve that. So, PID it is.

### Great New Ideas! (Or not...)

So I had this great idea to unify all the sensor code into one great whole. The ideal way to do it is with C++, but I didn't want to convert everything at this late date, so my great idea was to use function pointers and fake objects.

Ten hours later, it still isn't working. :(

When I get it back, I need to quit screwing around and actually write a control loop. Yaw control at less than full thrust would be great. I also need to put the IMU as a plain passenger on the helicopter and see how noisy the chopper is.

## Sunday, April 10, 2011

### Pictures!

In Denver, all public projects are required to spend "1% for the arts" or in other words 1% of the budget is required to be spent by artists on art.

I believe in a similar principle for my projects: putting a little bit of polish, a little bit of art, into everything. One of the explicit design goals is to have the polish of an iPod. Not a wire out of place. I won't quite achieve that, but I think I have done well. So, I have decorated the hull of my helicopter thusly:

For comparison, here is a before image:

## Friday, April 8, 2011

### Matrices

Idea from Van Sickle's Matrix page but re-thought to make sure the convention matches.

Our IMU filter tracks a quaternion. This quaternion is interpreted as one which transforms a vector from body to inertial coordinates. For instance, if the quaternion is <e> and we have a measurement of the direction from the vehicle to some way point in inertial coordinates <Ai>. To convert the vector to body coordinates <Ab>, we use the quaternion from the state vector like this:

<Ab>=<e~><Ai><e>

Or, you could use a matrix

<Ab>=[E]<Ai>.

To convert back the other way, we do either
<Ai>=<e><Ab><e~>

or

<Ai>=[E']<Ab>

To form matrix [E], we follow the derivation in Aircraft.pdf. The vector part of quaternion <?> we will call <?.v> and it will naturally be a column vector. The row vector form will be the transpose <?.v'> . The scalar part we will call ?.w . All of these operations are conventional matrix multiplication, addition, and scaling a matrix. The cross-product matrix [?.c] is [0,-?.z,?.y;?.z,0,-?.x;-?.y,?.x,0] and is the matrix such that <?> cross <@> is always equal to [?.c]<@> for any vector <@>.

[E]=2*<e.v>*<e.v'>+(e.w^2-<e.v'>*<e.v>)*[1]-2*e.w*[e.c]

Anyway, this is all just convention stuff. If we know the quaternion, we can find the matrix.

Now the best way I have heard to visualize a matrix is like this. The axes of the reference frame point east (x), north (y), and up (z). From the point of view of the vehicle, in the body reference frame, east is in the direction indicated by the first column, north by the second column, and up by the third. Conversely, in the inertial frame, the first row is the body x axis, the second row is y, and the third row is z.

From the calibration experiments, we know the axes of the wooden box in the gyroscope sensor frame. Since we were rotating the box around an axis parallel to one of the box axes (that's what we used the bubble level to verify) we know the rotation axis, and therefore each box axis, in the sensor body frame - it is directly the vector reported by the gyroscope readout. For instance, when we rotate around +z, we get the lion's share of rotation in the z sensor, but also a bit in x and y. If you put those measurements together into a vector, that vector points along the rotation axis, and its length is the speed of rotation.

So, put this together with what we said before. The reference Z axis is measured in the body frame, so that is the third column of the matrix from the gyro sensor axis to the body axis. Likewise what we measured around X and Y are the first and second columns respectively. So from the calibration measurements, we can measure the orientation of the gyro sensor relative to the box. We can easily do the same thing with the acceleration measurements. The box vectors, in sensor space, are the colums of the matrix which transforms from box to sensor. So, the inverse, not necessarily transpose because the sensors might not be orthogonal, transforms from sensor to box.

So the calibration process is to stimulate each box axis, by spinning it for the gyro or setting it facing up for the accelerometer. Suck the data into a spreadsheet, and average and normalize things for each axis, then build the matrix. In my case, in the spreadsheet, I have the reference axes on rows, so suck those into matlab and transpose it, then take the inverse and use those for coefficients in the calibration onboard.

## Thursday, April 7, 2011

### Floating Point Math

The old AGC computer used back in the Apollo program to land the Lunar Module ran at a blazing 80kHz and was able to run a Kalman filter. It had no floating point support at all, so it used extensive fixed point math, and an interpreter to do things like matrix multiplication and such. I instead use an LPC2148, which runs at a mere 60MHz, or about 700 times faster. I should be able to run a guidance program with that.

For a long time, nothing I wrote on the Logomatic used floating point numbers. There was no need - everything was integers and addresses, integers and addresses all day long. Then I translated the Kalman filter to it, and everything changed. I figured I would need some fixed point math, and all the bookkeeping headaches that go with that. However, as a first attempt, I just used float to see what would happen. What happened was that my processor utilization climbed from 30% just reading and recording the sensors to about 50% with the gyro filter in place. This might work.

I designed my own matrix code, with just enough functionality to do the filter. No matrix inverse, especially. No in-place multiplication, so lots of scratch space needed. Inline code where I thought it could help. Operations which handle cases where I want to do a matrix times a transpose of a matrix, and operations where the result is known to be scalar.

So, I now have gobs of floating point. In the gyro routine alone, I have a temperature offset corrector (three polynomials, one for each axis). I have the filter itself, all running full-bore floating point matrix math.

Then I come across this. It is the timings for a "fast" math library, that apparently handles transcendentals well, but is worse than GCC for just adding and multiplying. These benchmarks are for a 48MHz ARM7TDMI, the AT91SAM7. All these timings are in microseconds, so smaller is better.

ARM7: AT91SAM7X256-EK, 48 MHz, Internal Flash, GCC v4.4.1
 Double Precision Single Precision Function GoFast GNU GoFast GNU add 4.822 3.806 3.806 2.659 subtract 5.074 3.799 3.779 2.814 multiply 4.674 3.334 3.008 2.057 divide 32.438 22.356 16.650 5.725 sqrt 63.384 50.835 33.136 17.603 cmp 2.843 1.821 2.152 1.533 fp to long 1.949 1.418 1.528 1.294 fp to ulong 1.892 1.184 1.470 1.090 long to fp 2.725 2.742 2.454 2.188 ulong to fp 2.329 2.704 1.941 2.264

I can expect a 60MHz core of the same design to go 25% faster (take 80% of the time) as shown here. First, the "fast" math library is slower than GCC for the operations that the Kalman filter actually uses. Second, sqrt doesn't take very long either. Third, multiply takes a lot less time than divide, especially for double precision. Fourth, double precision takes about 30% longer on average, except for divide.

Fifth and most important, double precision is an option if I feel like I need it. This is the first project I've worked on where I explicitly chose float as my own design decision, rather than to maintain compatibility with another piece of code. On any desktop built since the 486, floating point hardware was standard, and all math is done at greater than double precision once it is on the x87, so you only pay in space, not time, for using double precision.

With a state vector of 7 for orientation only, the scratch space alone for the Kalman filter takes 1344 bytes. No extra data space is needed for more observations, just code. Cranking the state vector up to 13 (6 for translation and 7 for orientation) and double precision uses up 8736 bytes, almost 1/4 of the 32kiB available.

Have I mentioned how much I love the LPC today? I am at 187 of 500kiB of flash and 19 of 40 kiB (32kiB easily usable), so I am still using under 50% of the available space.

## Tuesday, April 5, 2011

### The ITG-3200

I have finally gotten some excellent calibration data out, and have determined one important fact. The ITG-3200 is a fine, fine sensor. Once the sensor is calibrated, with no noise filtering at all, it has an almost un-measurable drift over 17 minutes. It is good enough that you can just straight up integrate the body rates.

The ITG-3200 is available from Sparkfun, of course. Highly recommended for all your IMU needs.

The next task is to determine the relative orientation of the sensors to each other and to the frame of the Bridge board. This actually seems like it is quite doable. First, set up the turntable very carefully, using a circular bubble level to verify the spin axis is parallel to local gravity. I used about 10 sheets of paper to shim it.

Also, we are going to use the sides of the box on the turntable to make sure that the IMU box is always in the same orientation relative to the turntable.

Next, we do a calibration dance. The accelerometer measurement will be along local gravity, which we have made sure to make parallel to the IMU box frame. Any misalignment between local gravity and the accelerometer axes is attributable to misorientation of the sensor. When spinning, the vector body rotation will also be parallel to local gravity and the IMU frame, so we can identify its misorientation as well. Finally, I don't think that compass orientation matters that much, and I don't think that it is observable. Maybe by looking at the plane the measurement describes as the box is spun...

## Monday, April 4, 2011

### The Importance of Bookkeeping

I may have mentioned this before, but there is about 1000 w, x, y, and z in the formulas for the Kalman filter, along with all their + and -. I found a feature of Matlab (sorry, doesn't seem to be available in Octave) symbolic computation. Actually I have know about it for years, I just didn't want to use it for this project.

Anyway, I was having trouble with the filter responding to real data. I was trying to reduce the data from the calibration dance, and at the same time debug my implementation of the filter. The nasty thing is that there isn't much in the way of debugging that can be done. All you can do is re-double-check all the signs you have already re-double-checked before. In this case I was doing it with the Matlab symbolic routines, to check against the semi-manual, semi-Alpha way I had done it before. I found one (only!) problem in the signs, but it still wasn't working.

Then I found it in the measurement covariance. To follow proper form, I needed a square matrix of the sensor covariance with itself. However, I am treating the components of the measurement error as uncorrelated with each other, so it is just a diagonal matrix. So, I did something like

R=diag([4,4,4]).^2;

which makes perfect sense, until I start pulling elements out. When I called the code to get element 1, I did

R(1)

which works fine, but then for element 2 and 3 I did something like

R(2)

Do you see the problem? It took me all day to see it. It should be

R(2,2)

to get the element on the diagonal I wanted. Instead, it got one of the zero off-diagonal elements, which said that the noise on this measurement was zero. I accidentally told the filter that the measurement was perfect, and that it was to believe the measurement with all its heart and soul, and do whatever was necessary with the state to match the measurement.

What I did instead was take out diag() and therefore leave R as a vector representing the diagonal of the matrix.

With this, my IMU sitting still thinks that it is sitting still. Yay! Now let's see about rotating...

## Saturday, April 2, 2011

### Sensor Fusion

It's time to stop dithering around and actually do something: write up a Sensor Fusion module.

As discussed before, sensor fusion is using two different kinds of sensors to make a measurement. Perhaps both kinds can do it independently and we just want to cross-check. Perhaps neither can do it by itself but the combination can. Perhaps one sensor can do it in theory, but any slight perturbation will screw it up in practice.

Let's get down to concrete. My IMU has a three-axis rotation sensor, commonly called a gyro even though there are no spinning parts in it. It also contains a three-axis magnetic sensor, which I will call a compass. The compass by itself is great for absolute measurements, but by itself cannot determine the pointing of the IMU. To completely determine the orientation of an object from the outside like with a compass, you need two different reference vectors. I have a daydream about using something like an FM radio with three orthogonal loop antennas as another reference vector, but this is not practical. So, only one vector. You can tell that the IMU is pointing some part towards that vector, but it could be rolled at any angle around that vector.

The gyro by itself can in principle completely determine the orientation, if the initial orientation is known. However, because it integrates, if there is any tiny offset in the zero setting, the orientation will degrade at a linear rate, proportional to the zero offset. This is why I very carefully calibrated the gyros against temperature, but I still don't think it's enough.

However, the two together back up each other's weak points. The gyro is accurate in a relative sense, but has no absolute means to make sure it doesn't go wandering off. The compass is incomplete, but is nailed down to the external frame. Together, they can conquer the world! Or at least orient the IMU.

Skip the explanation of quaternion math. Go look up on the net for that. I may eventually write it myself, but today I am building.

State Vector:

The state vector is the four components of the quaternion <e> (equivalent to position vector), the three components of the body rate sensor measurement <ω> (equivalent to velocity).

<x>=<e.w,e.x,e.y,e.z,ω.x, ω.y,ω.z>

Physics function:

The physics function here is actually less physics than kinematics. You will notice no mention of moment of inertia or anything like that. Just how you integrate a body rate measurement into a quaternion.

A note on notation and convention first. A quaternion <e>, is shown as a vector because it is four related components. The quaternion conjugate is shown as <?~> for any quaternion <?>. This quaternion when used properly, transforms a point in the inertial reference frame into one in the body reference frame:

<v_ b>=<e~><v_i><e>

Conversely, we can transform a vector in the body frame to one in the inertial frame with:

<v_i>=<e><v_b><e~>

where these multiplications are conventional quaternion multiplications, and the scalar part of the pure vectors <v_?> are zero.

If you know the rotation rate of a body over time, you can integrate this to get the orientation over time, starting with some initial condition.

d<e>/dt=<e><ω>/2

where this multiplication is just as shown, not a vector transform, just a single quaternion multiplication. The vector <ω> is the body-frame rotation speed, measured in radians/sec. By components, we get:

F(<x>)=<F.ew,F.ex,F.ey,F.ez,0,0,0>
F.ew=de.w/dt=(-e.x ω.x-e.y ω.y -e.z ω.z)/2
F.ex=de.x/dt=(e.w ω.x-e.z ω.y +e.y ω.z)/2
F.ey=de.y/dt=(e.z ω.x+e.w ω.y -e.x ω.z)/2
F.ez=de.z/dt=(-e.y ω.x+e.x ω.y +e.w ω.z)/2

The physics matrix [&Phi;] is 7x7, but since the last three elements are zero, so are the last three rows of the matrix. Alpha reminded me that these are easy, it's just bookkeeping

Phi[0,0]=dF.ew/de.w =0
Phi[0,1]=dF.ew/de.x =-ω.x/2
Phi[0,2]=dF.ew/de.y =-ω.y/2
Phi[0,3]=dF.ew/de.z =-ω.z/2
Phi[0,4]=dF.ew/dω.x =-e.x/2
Phi[0,5]=dF.ew/dω.y =-e.y/2
Phi[0,6]=dF.ew/dω.z =-e.z/2

Phi[1,0]=dF.ex/de.w =+ω.x/2
Phi[1,1]=dF.ex/de.x =0
Phi[1,2]=dF.ex/de.y =+ω.z/2
Phi[1,3]=dF.ex/de.z =-ω.y/2
Phi[1,4]=dF.ex/dω.x =+e.w/2
Phi[1,5]=dF.ex/dω.y =-e.z/2
Phi[1,6]=dF.ex/dω.z =+e.y/2

Phi[2,0]=dF.ey/de.w =+ω.y/2
Phi[2,1]=dF.ey/de.x =-ω.z/2
Phi[2,2]=dF.ey/de.y =0
Phi[2,3]=dF.ey/de.z =+ω.x/2
Phi[2,4]=dF.ey/dω.x =+e.z/2
Phi[2,5]=dF.ey/dω.y =+e.w/2
Phi[2,6]=dF.ey/dω.z =-e.x/2

Phi[3,0]=dF.ez/de.w =+ω.z/2
Phi[3,1]=dF.ez/de.x =+ω.y/2
Phi[3,2]=dF.ez/de.y =-ω.x/2
Phi[3,3]=dF.ez/de.z =0
Phi[3,4]=dF.ez/dω.x =-e.y/2
Phi[3,5]=dF.ez/dω.y =+e.x/2
Phi[3,6]=dF.ez/dω.z =+e.w/2

The observation g(<x>) is as follows:

g(x)=<G.x,G.y,G.z,B_b.x,B_b.y,B_b.z> where G stands for gyro reading and B stands for B-field reading (the magnetic field is usually represented by <B> in most textbooks).

<G> is just the gyro reading transformed into radians per second, which presumably is already done, so we have

<G>=<ω>

Since the magnetic sensor is nailed to the body (we hope the parts haven't diverged yet) we measure <B_b> , the magnetic field in body coordinates. This is just the exterior magnetic field <B_i> transformed into body coordinates, so we use

<B_ b>=<e~><B_i><e>

When this is expressed in component form, we effectively transform this quaternion into a matrix and then matrix-multiply the external vector by this matrix:

B_b.x=(e_w^2+e.x^2-e.y^2-e.z^2)B_i.x+2(e.x e.y+e.w e.z)B_i.y+2(e.x e.z-e.w e.y)B_i.z
B_b.y=2(e.x e.y-e.w e.z)*B_i.x+(e.w^2-e.x^2+e.y^2-e.z^2)B_i.y+2(e.y e.z+e.w e.x)B_i.z
B_b.z=2(e.z e.x+e.w e.y)*B_i.x+2(e.y e.z-e.w e.x)B_i.y+(e.w^2-e.x^2-e.y^2+e.z^2)B_i.z

The observation matrix [H] will be six rows, one for each element of the observation, by 7 columns, one for each element of the state vector. First, the three rows with the gyro:

H[0,0]=dg.Gx/de.w=0
H[0,1]=dg.Gx/de.x=0
H[0,2]=dg.Gx/de.y=0
H[0,3]=dg.Gx/de.z=0
H[0,4]=dg.Gx/dω.x=1
H[0,5]=dg.Gx/dω.y=0
H[0,6]=dg.Gx/dω.z=0

H[1,0]=dg.Gy/de.w=0
H[1,1]=dg.Gy/de.x=0
H[1,2]=dg.Gy/de.y=0
H[1,3]=dg.Gy/de.z=0
H[1,4]=dg.Gy/dω.x=0
H[1,5]=dg.Gy/dω.y=1
H[1,6]=dg.Gy/dω.z=0

H[2,0]=dg.Gz/de.w=0
H[2,1]=dg.Gz/de.x=0
H[2,2]=dg.Gz/de.y=0
H[2,3]=dg.Gz/de.z=0
H[2,4]=dg.Gz/dω.x=0
H[2,5]=dg.Gz/dω.y=0
H[2,6]=dg.Gz/dω.z=1

Now the three elements with the compass. This will be a bit more complicated, but still manageable.

B_b.x=(e_w^2+e.x^2-e.y^2-e.z^2)B_i.x+2(e.x e.y+e.w e.z)B_i.y+2(e.x e.z-e.w e.y)B_i.z
B_b.y=2(e.x e.y-e.w e.z)*B_i.x+(e.w^2-e.x^2+e.y^2-e.z^2)B_i.y+2(e.y e.z+e.w e.x)B_i.z
B_b.z=2(e.z e.x+e.w e.y)*B_i.x+2(e.y e.z-e.w e.x)B_i.y+(e.w^2-e.x^2-e.y^2+e.z^2)B_i.z

H[3,0]=dg.Bx/de.w=2(+e.w B_i.x+e.z B_i.y-e.y B_i.z)
H[3,1]=dg.Bx/de.x=2(+e.x B_i.x+e.y B_i.y+e.z B_i.z)
H[3,2]=dg.Bx/de.y=2(-e.y B_i.x+e.x B_i.y-e.w B_i.z)
H[3,3]=dg.Bx/de.z=2(-e.z B_i.x+e.w B_i.y+e.x B_i.z)
H[3,4]=dg.Bx/dω.x=0
H[3,5]=dg.Bx/dω.y=0
H[3,6]=dg.Bx/dω.z=0

H[4,0]=dg.By/de.w=2(-e.z B_i.x+e.w B_i.y-e.x B_i.z)
H[4,1]=dg.By/de.x=2(+e.y B_i.x-e.x B_i.y+e.w B_i.z)
H[4,2]=dg.By/de.y=2(+e.x B_i.x+e.y B_i.y+e.z B_i.z);
H[4,3]=dg.By/de.z=2(-e.w B_i.x-e.z B_i.y+e.y B_i.z);
H[4,4]=dg.By/dω.x=0
H[4,5]=dg.By/dω.y=0
H[4,6]=dg.By/dω.z=0

H[5,0]=dg.Bz/de.w=2(+e.y B_i.x-e.x B_i.y+e.w B_i.z);
H[5,1]=dg.Bz/de.x=2(+e.z B_i.x-e.w B_i.y-e.x B_i.z);
H[5,2]=dg.Bz/de.y=2(+e.w B_i.x+e.z B_i.y-e.y B_i.z);
H[5,3]=dg.Bz/de.z=2(+e.x B_i.x+e.y B_i.y+e.z B_i.z);
H[5,4]=dg.Bz/dω.x=0
H[5,5]=dg.Bz/dω.y=0
H[5,6]=dg.Bz/dω.z=0

See, none of these are complicated, it's just a matter of bookkeeping.

Now for the fun part. If the observation vector elements are uncorrelated with each other, and they are if the measurement covariance is diagonal, then we can do things one element of one observation at a time. Further, we only have to use the one row of H relevant to that element, and with this, the bit inside of the matrix inverse in the filter is just a 1x1 matrix, or a scalar, and the inverse is just division. Score! We don't have to write a matrix inverter to run on our controller!

### Compass Calibration

I took a walk over to the park next to my place to get as far from extraneous magnetic fields as I could. There is a cement curb between a patch of sand and grass there, which happens to run along true north (as far as Google Maps can tell). I went to the field, stood exactly in-line with this curb, turned on the IMU, aligned it with the curb for a couple of seconds, then spun the thing in my hands.

Now I know that there are extraneous fields in the Logomatic itself, probably especially in the battery, but maybe also in the mounting hardware. No matter. These rotate with the compass, so it is an easy matter to subtract them off. I didn't intentionally rotate the IMU in any particular direction, so I can't average the whole data set, but we can just take the median between the minimum and maximum of each axis.

Now let's see about the magnitude of the vector from the center to the surface of this ball of yarn.

It shows about a 10% variation, but more interestingly, there is a signal matching the rotation I did. The sensor magnitudes don't quite match each other. I haven't figured out what this signal is, but it's not sensor scale.

So, what we will do for our measurement is subtract out the offsets, then divide it all by the length of the vector to get a unit vector. Or, dividing by the expected length, about 640.