Wednesday, March 30, 2011

Beginners Kalman Filtering Theory, Part 5

Pay attention here, because we are finally going to develop something practical - The Kalman Velocity Model

This is about the only problem I have found which both has a practical use, and actually fits the linear model.

Let's imagine a rail car on a level frictionless rail. The car is unpowered and uncontrollable, but is subject to unpredictable accelerations (say from gusts of wind). We want to know the position and velocity of the car, but our only measuring device is a tape measure laid out along the track. Once again we can measure the position of the car to arbitrary position, but only finite accuracy.
First, let's look at the process model again:

<x_i>=[A]<x_{i-1}>+<w_i>
<z_i>=[H]<x_i>+<v_i>

Now that we are considering motion, we need to consider time. The spacing between two measurements i-1 and i we will call Δt (Delta-t). For now we will consider Δt to be constant, meaning that we have equally-spaced measurements.

Now I am beginning to curse Blogspot and it's lack of math typography. To show a matrix, I will do the following. If the matrix is

[a b]
[c d]

I will write it as [[a,b],[c,d]] or in other words, rows are the inside brackets and the whole matrix is a bunch of stacked rows. If I have a vector, I will say

[a]
[b]
[c]

is the same as <a,b,c>, which is the same as [[a],[b],[c]]. All vectors are column vectors unless otherwise stated. Even a vector written <a,b,c> is a column vector.

The state of the vehicle is completely determined by its current position p, and velocity v, so its state vector is:

x=<p,v>

The state transition matrix is two-by-two, since the state is two-element. In simple linear equation language, we have this:

p_{i}= p_{i-1} +Δt v_{i-1}
v_{i}=                   v_{i-1}

Converting this to a matrix equation gets us this:

<p,v>=[[1,Δt],[0,1]]<p_{i-1},v_{i-1}>
<x_i>=[A]<x_{i-1}>

so we have a state transition matrix:

[A]=[[1,Δt],[0,1]]

For the observation matrix, we are taking the two-component state and converting it to one measurement, so the matrix is one-by-two.

<z_i>=(1)p_i+(0)v_i
<z_i>=[[1,0]]<p_i,v_i>

so the observation matrix is:

[H]=[[1,0]]

Process Noise Model
We represent the unpredictable accleration component with the process noise covariance matrix. The process noise is a random vector variable with two elements, so its covariance is a two-by-two matrix. I will borrow the process noise model from Wikipedia. In between any two measurements, the unknown process noise acceleration is presumed to be constant a_i. By integrating this over the time step, we see that the velocity is changed by a_iΔt and by integrating twice we see that the position is changed by 0.5a_iΔt2.

So in mathematical terms, we get

<x_i>=[A]<x_{i-1}>+<w_i>[/itex]
<w_i>=<0.5a_iΔt2,a_iΔt>
<w_i>=<0.5Δt2,Δt>a_i
<g>=<0.5Δt2,Δt>
<x_i>=[A]<x_{i-1}>+<g>a_i

This comes straight from Wikipedia underived. The process noise covariance matrix is

[Q]=<g><g'>σ_a2=[[0.25Δt4, 0.5Δt3],[0.5Δt3,Δt2]]σ_a2

Here I use <?'> for the row vector form of column vector <?>. I don't know the exact derivation, but basically it is saying that since the standard deviation is known, and the variance is required, you just square the standard deviation. Since you can't directly square a vector, the idiom <g>2=<g><g'> is used.

As we look through the filter equations, it turns out that [Q] is always added to the projected value of [P], so it doesn't matter that [Q] is singular by itself. The parameter σ_a has units of acceleration, and represents the expected magnitude of the random velocity change per unit time, while a_i represents the unknown actual acceleration.

In this case the process noise is a real part of the model, and not an ugly hack or tuning parameter. We defined the model with this random acceleration, and this process noise matrix describes it in math. For real systems such as EVE, the process noise is still an ugly hack.

This form of the process noise is preferred, because the process noise parameter then is decoupled from the timestep size.

The measurement is a scalar, so its covariance is scalar also.

R=σ_z2

We can then apply these matrices directly to the Kalman filter form.

Sensor fusion and sensor anti-fusion
In this  simple model, we have only one kind of sensor, for position, but we are able to recover both position and velocity. This is possible because our physics model [A], simple as it is, is complete and captures the whole state vector. This concept is important. You don't necessarily need a sensor for every component of your state vector. This is sensor anti-fusion, I guess. It is a complementary concept to sensor fusion.

With sensor fusion, sensors of two different types are used to measure the same parts of the state vector. We'll get into exactly how to do this in a later post, but let's just talk about goals for a moment. Suppose your robot has a 6DOF IMU and a GPS. You use the IMU for accuracy over short distances and times, and the GPS to keep things aligned with the real world, and keep the IMU from wandering off too far.

Now, you are probably not going to measure the GPS and IMU at the same time. So, for some steps, perhaps most steps, the IMU measurement is available and the GPS is not. Other steps, the GPS is available and the IMU is not. Perhaps there are rare times when both measurements are taken simultaneously.

What you do in this case is use a different observation model, and therefore a different [H] depending on what combinations of sensors are available at a particular instant. When you use the IMU, you have an [H] (or actually the nonlinear equivalent of [H]) which only uses the IMU. You also have the measurement uncertainty of the IMU in its own [R]. When you use the GPS, you use a different [H]. And, when both are available simultaneously, you could use an [H] that incorporates both. We'll see why we don't do that in a later post.

With the right [H] for each measurement and the right [A] (or its nonlinear equivalent as we shall see) for the physics, combined with the correct measurement noise for each measurement, the filter does all the bookkeeping and weighting automatically to incorporate multiple different kinds of sensors into the same estimate. It automatically weights each measurement to use it as best as possible. If you don't lie with [R], it is able to weigh things appropriately without any further intervention. This is what is meant by sensor fusion.

EVE example
Now this process noise we have applied is just a model, just a fudge factor to keep the filter from getting too satisfied with itself. The filter thinks the process has noise, but in reality that noise is the process not following the model. I am not prepared to tell the sun what to do.

Here's an example using the velocity process above, applied to the EVE instrument. The filter uses this process noise model, but the real data does not. During a measurement sequence of 1000 measurements, the Sun generates a rock solid constant 20 units of flux for the first half. At the exact halfway point, we see a flare begin. This flare has a maximum amplitude of 20 units, and then decays away again.

The EVE instrument is modeled as having a Gaussian uncorrelated measurement noise with a standard deviation of one unit, with the following exception: On a randomly selected 1% of the measurements, the standard deviation is five units. This represents an actual phenomenon seen on EVE.

The heavily marked points are the 1% "popcorn" (There might not be exactly 10, since they are thrown in at random). The red line represents reality again, and the blue line represents the filter output. We notice immediately that the filter has less noise than the original data, so the filter is working. We also notice that while the filter output follows the popcorn, it is not nearly as bad as the original data itself.

Here is a plot of the estimate difference from the real measurement. The heavy line is the filter residual, and the light lines are the ±3-sigma uncertainty lines.

The problem here is that the filter takes a while to believe that the process really changed, so it lags a bit behind the actual process. This is that large downward spike we see at the half way point. Other than that, the filter has a standard deviation of about a third that of the original data, and the 3-sigma lines include the actual value almost all the time, as it should.

The process includes a notion of velocity, the rate of change of the flux, but this is not a measurement that EVE actually makes. Even so, the filter provides a velocity estimate automatically. We could estimate it, but no one has asked for it. Perhaps if the scientists know that it is available, they will want it.

This is very educational as to what the problem is. The real flux (literally) explodes, with a step jump in velocity. The filter takes a while to follow, and by the time it has followed, the flux velocity is dropping rapidly, and once again the filter has to chase it.

Unless you are a much better programmer than me, and I don't think you are, when you originally code this up, it is going to have bugs. Now the normal way to debug is to trace the program flow, and at every line, or at least every critical line, predict what the program is going to do then test it by letting it execute. However, if you don't understand the algorithm, how can you do this prediction?

This is why we are going step by step, with gradually more elaborate filters, each built off the last.I gave IDL code for a scalar Kalman filter in a previous post. This simplest filter was devolved to the point that all the magic in equations 3 and 5 has been distilled out, leaving only an obvious averager. Then, as you add refinements one at a time, test frequently. When it breaks (and it will) you know what part is broken: the last one you added, or at least an interaction between that part and the rest of the code.

All of these charts were generated by that script, gradually evolved to do the more and more elaborate filters. I could (well actually I can't, the code has evolved away from those points) give you code for each filter, but I won't, because that's not a favor to you. You need to work with the code to understand it, and the best way is to evolve the code a step at a time as I have. You are guided by the examples, but you put in the effort yourself to follow them. It's all very Socratic.

Even if you don't have IDL, you can program the simpler models even in a spreadsheet, and the more complicated one in your favorite data-handling language, or even general purpose language like C, Java, or Perl. If your language has no native charting abilities, have your filter write its output into a CSV, then suck it into a spreadsheet. There's no excuse, as C, Java, Perl, and OpenOffice are all free for the taking and available on almost all platforms (even your robot, if you write in C).

As you progress, you will come to the point where you will write a general Kalman filter function, which can handle any process model thrown at it. At this point, you will then be debugging your models. Before you try to debug on your robot, try using [A] and some initial conditions to simulate a robot trajectory and using [H] to generate measurements at frequent intervals. This way, the normally unknowable True State is in fact known, and you can compare your estimate with the Truth, in a way which is almost impossible otherwise.

The usefulness of this model
By itself, we can use this model to estimate the rate of change of any variable that we can measure. If you have a thermometer, you can use the filter to not only reduce the measurement noise, but also create out of thin air a measurement of the rate of change. Likewise the voltage and charge state of your battery, the altitude and climb/sink rate measured by an ultrasonic pinger used as an altimeter, or any number of things.

Further, when I describe the nonlinear filter and IMU model, you will see this model embedded in that one. The filter will integrate the acceleration to get velocity and position, but the acceleration it integrates will not be the direct measurement of the sensor, but an estimate based on the linear velocity model.

Code
Since this is a useful result in itself, code for it is posted here. Partly to check your work, but mostly so I have a home for it.

 function [xh,P]=kalman_vel(xh0,P0,t,z,sigma_a,R)   xh=zeros(numel(xh0),numel(t));   xh(:,1)=xh0;   P=zeros(numel(xh0),numel(xh0),numel(t));   P(:,:,1)=P0;   H=[1,0];     for I=2:numel(t)     i=I-1;     xh_im1=xh(:,I-1);     P_im1=P(:,:,I-1);     dt=t(I)-t(I-1);     A=[1,dt;0,1];     Q=[dt.^4/4,dt.^3/2;dt.^3/2,dt.^2]*sigma_a;         xh_im=A*xh_im1;     P_im=A*P_im1*A'+Q;     K=P_im*H'*inv(H*P_im*H'+R);     xh_i=xh_im+K*(z(I)-H*xh_im);     P_i=(eye(size(P0))-K*H)*P_im;         xh(:,I)=xh_i;     P(:,:,I)=P_i;   end end     

Input parameters:

• xh0 - Initial guess of state vector
• P0 - Initial guess of state vector estimate covariance
• t - list of times - 1D array
• z - list of measurements of position - 1D array same size as t
• sigma_a - 1-sigma estimated process noise magnitude - scalar
• R - 1-sigma measurement noise - scalar

Output parameters:

• xh - List of state vector estimates. First row is "position" and second row is rate of change.
• P - List of covariance matrices.