Saturday, April 2, 2011

Sensor Fusion

It's time to stop dithering around and actually do something: write up a Sensor Fusion module.

As discussed before, sensor fusion is using two different kinds of sensors to make a measurement. Perhaps both kinds can do it independently and we just want to cross-check. Perhaps neither can do it by itself but the combination can. Perhaps one sensor can do it in theory, but any slight perturbation will screw it up in practice.

Let's get down to concrete. My IMU has a three-axis rotation sensor, commonly called a gyro even though there are no spinning parts in it. It also contains a three-axis magnetic sensor, which I will call a compass. The compass by itself is great for absolute measurements, but by itself cannot determine the pointing of the IMU. To completely determine the orientation of an object from the outside like with a compass, you need two different reference vectors. I have a daydream about using something like an FM radio with three orthogonal loop antennas as another reference vector, but this is not practical. So, only one vector. You can tell that the IMU is pointing some part towards that vector, but it could be rolled at any angle around that vector.

The gyro by itself can in principle completely determine the orientation, if the initial orientation is known. However, because it integrates, if there is any tiny offset in the zero setting, the orientation will degrade at a linear rate, proportional to the zero offset. This is why I very carefully calibrated the gyros against temperature, but I still don't think it's enough.

However, the two together back up each other's weak points. The gyro is accurate in a relative sense, but has no absolute means to make sure it doesn't go wandering off. The compass is incomplete, but is nailed down to the external frame. Together, they can conquer the world! Or at least orient the IMU.

Skip the explanation of quaternion math. Go look up on the net for that. I may eventually write it myself, but today I am building.

State Vector:

The state vector is the four components of the quaternion <e> (equivalent to position vector), the three components of the body rate sensor measurement <ω> (equivalent to velocity).

<x>=<e.w,e.x,e.y,e.z,ω.x, ω.y,ω.z>

Physics function:

The physics function here is actually less physics than kinematics. You will notice no mention of moment of inertia or anything like that. Just how you integrate a body rate measurement into a quaternion.

A note on notation and convention first. A quaternion <e>, is shown as a vector because it is four related components. The quaternion conjugate is shown as <?~> for any quaternion <?>. This quaternion when used properly, transforms a point in the inertial reference frame into one in the body reference frame:

<v_ b>=<e~><v_i><e>

Conversely, we can transform a vector in the body frame to one in the inertial frame with:


where these multiplications are conventional quaternion multiplications, and the scalar part of the pure vectors <v_?> are zero.

If you know the rotation rate of a body over time, you can integrate this to get the orientation over time, starting with some initial condition.


where this multiplication is just as shown, not a vector transform, just a single quaternion multiplication. The vector <ω> is the body-frame rotation speed, measured in radians/sec. By components, we get:

F.ew=de.w/dt=(-e.x ω.x-e.y ω.y -e.z ω.z)/2
F.ex=de.x/dt=(e.w ω.x-e.z ω.y +e.y ω.z)/2
F.ey=de.y/dt=(e.z ω.x+e.w ω.y -e.x ω.z)/2
F.ez=de.z/dt=(-e.y ω.x+e.x ω.y +e.w ω.z)/2

The physics matrix [&Phi;] is 7x7, but since the last three elements are zero, so are the last three rows of the matrix. Alpha reminded me that these are easy, it's just bookkeeping

Phi[0,0]=dF.ew/de.w =0
Phi[0,1]=dF.ew/de.x =-ω.x/2
Phi[0,2]=dF.ew/de.y =-ω.y/2
Phi[0,3]=dF.ew/de.z =-ω.z/2
Phi[0,4]=dF.ew/dω.x =-e.x/2
Phi[0,5]=dF.ew/dω.y =-e.y/2
Phi[0,6]=dF.ew/dω.z =-e.z/2

Phi[1,0]=dF.ex/de.w =+ω.x/2
Phi[1,1]=dF.ex/de.x =0
Phi[1,2]=dF.ex/de.y =+ω.z/2
Phi[1,3]=dF.ex/de.z =-ω.y/2
Phi[1,4]=dF.ex/dω.x =+e.w/2
Phi[1,5]=dF.ex/dω.y =-e.z/2
Phi[1,6]=dF.ex/dω.z =+e.y/2

Phi[2,0]=dF.ey/de.w =+ω.y/2
Phi[2,1]=dF.ey/de.x =-ω.z/2
Phi[2,2]=dF.ey/de.y =0
Phi[2,3]=dF.ey/de.z =+ω.x/2
Phi[2,4]=dF.ey/dω.x =+e.z/2
Phi[2,5]=dF.ey/dω.y =+e.w/2
Phi[2,6]=dF.ey/dω.z =-e.x/2

Phi[3,0]=dF.ez/de.w =+ω.z/2
Phi[3,1]=dF.ez/de.x =+ω.y/2
Phi[3,2]=dF.ez/de.y =-ω.x/2
Phi[3,3]=dF.ez/de.z =0
Phi[3,4]=dF.ez/dω.x =-e.y/2
Phi[3,5]=dF.ez/dω.y =+e.x/2
Phi[3,6]=dF.ez/dω.z =+e.w/2

The observation g(<x>) is as follows:

g(x)=<G.x,G.y,G.z,B_b.x,B_b.y,B_b.z> where G stands for gyro reading and B stands for B-field reading (the magnetic field is usually represented by <B> in most textbooks).

<G> is just the gyro reading transformed into radians per second, which presumably is already done, so we have


Since the magnetic sensor is nailed to the body (we hope the parts haven't diverged yet) we measure <B_b> , the magnetic field in body coordinates. This is just the exterior magnetic field <B_i> transformed into body coordinates, so we use

<B_ b>=<e~><B_i><e>

When this is expressed in component form, we effectively transform this quaternion into a matrix and then matrix-multiply the external vector by this matrix:

B_b.x=(e_w^2+e.x^2-e.y^2-e.z^2)B_i.x+2(e.x e.y+e.w e.z)B_i.y+2(e.x e.z-e.w e.y)B_i.z
B_b.y=2(e.x e.y-e.w e.z)*B_i.x+(e.w^2-e.x^2+e.y^2-e.z^2)B_i.y+2(e.y e.z+e.w e.x)B_i.z
B_b.z=2(e.z e.x+e.w e.y)*B_i.x+2(e.y e.z-e.w e.x)B_i.y+(e.w^2-e.x^2-e.y^2+e.z^2)B_i.z

The observation matrix [H] will be six rows, one for each element of the observation, by 7 columns, one for each element of the state vector. First, the three rows with the gyro:




Now the three elements with the compass. This will be a bit more complicated, but still manageable.

B_b.x=(e_w^2+e.x^2-e.y^2-e.z^2)B_i.x+2(e.x e.y+e.w e.z)B_i.y+2(e.x e.z-e.w e.y)B_i.z
B_b.y=2(e.x e.y-e.w e.z)*B_i.x+(e.w^2-e.x^2+e.y^2-e.z^2)B_i.y+2(e.y e.z+e.w e.x)B_i.z
B_b.z=2(e.z e.x+e.w e.y)*B_i.x+2(e.y e.z-e.w e.x)B_i.y+(e.w^2-e.x^2-e.y^2+e.z^2)B_i.z

H[3,0]=dg.Bx/de.w=2(+e.w B_i.x+e.z B_i.y-e.y B_i.z)
H[3,1]=dg.Bx/de.x=2(+e.x B_i.x+e.y B_i.y+e.z B_i.z)
H[3,2]=dg.Bx/de.y=2(-e.y B_i.x+e.x B_i.y-e.w B_i.z)
H[3,3]=dg.Bx/de.z=2(-e.z B_i.x+e.w B_i.y+e.x B_i.z)

H[4,0]=dg.By/de.w=2(-e.z B_i.x+e.w B_i.y-e.x B_i.z)
H[4,1]=dg.By/de.x=2(+e.y B_i.x-e.x B_i.y+e.w B_i.z)
H[4,2]=dg.By/de.y=2(+e.x B_i.x+e.y B_i.y+e.z B_i.z);
H[4,3]=dg.By/de.z=2(-e.w B_i.x-e.z B_i.y+e.y B_i.z);

H[5,0]=dg.Bz/de.w=2(+e.y B_i.x-e.x B_i.y+e.w B_i.z);
H[5,1]=dg.Bz/de.x=2(+e.z B_i.x-e.w B_i.y-e.x B_i.z);
H[5,2]=dg.Bz/de.y=2(+e.w B_i.x+e.z B_i.y-e.y B_i.z);
H[5,3]=dg.Bz/de.z=2(+e.x B_i.x+e.y B_i.y+e.z B_i.z);

See, none of these are complicated, it's just a matter of bookkeeping.

Now for the fun part. If the observation vector elements are uncorrelated with each other, and they are if the measurement covariance is diagonal, then we can do things one element of one observation at a time. Further, we only have to use the one row of H relevant to that element, and with this, the bit inside of the matrix inverse in the filter is just a 1x1 matrix, or a scalar, and the inverse is just division. Score! We don't have to write a matrix inverter to run on our controller!


  1. Why are you keeping track of the quaternion magnitude in the state vector? SO(3) is associated with unit quaternions.

  2. I'm not keeping track of the magnitude |e|, but the scalar part e.w . Yes, I know you could throw away any one component and reconstitute it whenever you want, but:

    1) Everyone else does it this way. All the satellites I work with transmit all four elements when they send a quaternion
    2) I am more time-bound than memory-bound. I need all four elements of the quaternion to run the vector transformation and derivative functions. I could reconstitute the fourth element each time, but I think it's more efficient to keep the fourth element in memory rather than recalculate it each time

    I am putting in a step to force the quaternion to unit length after each measurement update.

  3. Oops, I meant the scalar part.

    As long as you're enforcing unit magnitude, you should be OK.

    I'm not worried about memory usage as much as execution time you'd save by throwing out a redundant term out of the state vector and reducing the size of all your matrices.

    BTW, augmenting your state vector with gyro bias is the way to go if you can add three more states

  4. Are you sure you don't have to do a matrix inverse? H P H' doesn't look diagonal to me.

  5. See, that's the beauty of the whole thing. When you do a measurement, the [H] matrix is m row by n columns, where m is the number of elements in the measurement vector and n is the number of elements in the state vector. If you process the elements of your measurement one at a time, you treat it as m separate measurements, with no time update in between. In this case, each time you do a measurement update, [H] is only one row by n columns, so the product [H][P] has one row and n columns, and since [H]' has only one column, the product of [H][P] and [H]' has only one row and one column. Not only is it diagonal, it's scalar.

    If you try to process more than one element of the vector at a time, you are correct that [H][P][H]', now no longer scalar, is not diagonal, and you have to do a full matrix inverse.

    So don't do that.

  6. But do you get the same answer as the matrix inverse approach? It sounds like you're essentially running seven one-measurement Kalman filters side by side on the same state vector? If you don't get the same answer, are you getting the full benefit of the Kalman filter?

    How do the relative op-counts of the two approaches compare?

    I just do the matrix inverse with no optimizations on a 16 MHz Arduino Mega and I can easily run a six-state, six measurement filter at 20 Hz. I do, however, play games with my H since not all measurements come in at 20 Hz. Sometimes H is full and sometimes H has two rows for GPS deleted (GPS runs slower).

    Fun stuff! I need to get back to my car. Testing this weekend.

  7. It's not seven parallel filters, since F and Phi cover the whole state.

    I am basically doing the same thing as you, with different H vectors at different times. The thing I am adding is that I am treating the vector sensors as three scalar sensors which happen to read simultaneously. Each element gets its own H matrix, which is unique to that element and 1xn. It does work out to be equivalent to a 3xn H matrix and vector observation - with the easy to meet constraint of having a diagonal measurement noise covariance [R]. Since all of the times are the same, there is no time update between the measurement.

    So, I run the filter with the first component of the vector, say gyro x. I have an H matrix function, but I only use the first row of it. I do a time update to the current time, then a measurement update using the scalar gyro x, the scalar measurement variance for gyro x (the first diagonal element of [R]) and the first row of [H].

    Now when I do gyro y immediately, it is the same time, so no time update. Score that to time saved. Use gyro y as the measurement, the second row of [H] and the second diagonal element of [R]. Same for gyro Z with the third row of [H[ and third diagonal of [R].

    It really does work out, and I can't even take credit for the idea. It was in the textbook I am working out of. That book goes so far as to recommend that if you don't have a diagonal [R], to run some fancy diagonalization on the filter to force [R] diagonal, then run it as I have described to avoid the numerical instability of taking a matrix inverse.

    I am working out of Kalman Filtering: Theory and Practice, 2nd ed. by Grewal and Andrews. I found a PDF of their book somewhere on the 'net. They talk about this idea in section 4.2.2, p124.

  8. By the way, good luck with your car! We may be competing, but I want to see all of us succeed
    (just me succeeding faster).