Thursday, March 31, 2011

Intermdiate Kalman Filtering Theory, Part 4

Worked example

We can hang this model on the same scaffold we built last time. Mostly we just change the model functions, but we are also going to change the simulator at the top, since this model is more complicated than last time.

We are going to orbit planet B612, a perfectly spherical, homogeneous, radar-transparent planet which has a radius of 10m but sufficient gravity to live on conveniently. In units of meters and seconds, it has a μ of 1000, which means that a ball thrown at 10m/s at the surface will stay in a circular orbit right at the surface, and circle the planet in 6.283 seconds. The planet has a surface gravity of about 1g, exactly 10m/s^2.

As an aside, B612 is one weird planet. It needs to weigh 15 billion tons to exert this much gravity, and has a density of 3.58 million kg/m^3, which about as dense as white dwarf degenerate matter. And yet it's radar-transparent...

Our program is pretty heavily modified here, so let's look at it fresh again. This first part is the driver, and as we can see, most of the code has been farmed out to various subfunctions.


function nonlinear_B612
  %Simulated actual trajectory. Unperturbed orbit around B612 at slightly
  %higher than circlular velocity.

  t=0:0.1:10; %Timestamp of each measurement
  xt0=[11;0;0;10]; %Position <11,0> meters from center, velocity <0,10>
  xt=trueState(t,xt0,@F_B612);
  
  R=0.5^2; %Measurement noise covariance, 50cm on position
  Q=diag([0,0,0.01,0.01]);  %Process noise covariance, 0 position and 10cm/s^2 on velocity
 
  zt=trueObs(xt,@g_B612,0);
  z=trueObs(xt,@g_B612,R);
    
  xh_0=[12;0;0;9]; %Initial state vector guess, slightly wrong
  P_0=diag([1,1,1,1]); %Initial state vector uncertainty, 1m in position and 1m/s in velocity
  
  [xh,P]=ekf_loop(xh_0,P_0,z,t,@F_B612,@g_B612,@Phi_B612,@H_B612,Q,R);
  
  %Plot some interesting results
  figure(1)
  plot(10*cos([0:0.01:2*pi]),10*sin(0:0.01:2*pi),'k-',... 
       xt(1,:),xt(2,:),'r-', ...  
       xh(1,:),xh(2,:),'b-'); 
  legend('Planet B612 surface','True position','Estimated position');
  axis equal;

  %measurement residual plot
  zh=trueObs(xh,@g_B612,0);
  figure(2)
  plot(t,zt,'r-',t,zh,'b-');
  legend('Actual radar distance','Radar distance from estimate');

end

In this first bit, we call a function called trueState() which we pass in the initial conditions, the timestamps we care about, and the physics function. It then does all the integrating and returns a list of true state vectors at all the requested times.

This @f bit is incredibly important. We are passing a function name as a parameter. trueState() will then call that function by name. In this way, we can use the same code to call our own model functions back, so the same framework can support any physics model. The same thing can be done in C with function pointers, in Java with interfaces, and in most truly useful languages in general.


function xt=trueState(t,xt0,F)
  xt=zeros(numel(xt0),numel(t)); %True state vector (to be calculated)
  xt(:,1)=xt0;
  %Do a numerical integrator to get the positions
  nstep=100;
  for i=1:numel(t)-1
    I=i+1;
    xt_im1=xt(:,I-1); %Previous true state
    xt_i=xt_im1;      %Will contain current true state
    dt_major=t(I)-t(I-1);
    dt=dt_major/nstep;         %Do 100 steps between measurements
    for j=1:nstep
      %Just use our physics function - that's what it is there for.
      xt_i=xt_i+dt*F(xt_i);
    end
    xt(:,I)=xt_i;
  end
end


Next, we call trueObs(), which takes a list of state vectors and the observation function, and produces a list of observations, with noise sprinkled on it if requested. In this case, we call it twice, once with noise and once without, so we can compare the Truth to our estimate.


function z=trueObs(xt,g,R)
  %Size the problem
  xt0=xt(:,1);
  s=size(xt)
  z=zeros(numel(g(xt0)),s(2));
  
  %Clean observations
  for i=1:numel(z)
    z(:,i)=g(xt(:,i));
  end
  
  %We want random noise, but the *same* random noise each time we run
  randn('seed',3217);
  pn=randn(size(z))*sqrt(R); %Measurement noise

  z=z+pn; %Noisy Observations
end


Next, we set up the initial guess and noise estimates, then call ekf_loop() to actually run the Extended Kalman filter. It takes a lot of parameters, but none of them are complicated. The hardest part is just passing them in the correct order.


function [xh,P]=ekf_loop(xh_0,P_0,z,t,F,g,Phi,H,Q,R)
  s=size(z);
  n=s(2); %Number of observations
  xh=zeros(numel(xh_0),n); %History of all estimates (starts as initial estimate)
  xh(:,1)=xh_0;
  P=zeros(numel(xh_0),numel(xh_0),n); %History of all estimate covariances
  P(:,:,1)=P_0;   
  for i=1:n-1
    %Stupid matlab, not using zero-based indexing. Our i here is zero based,
    %but we will make an I which is one-based, and use it to index the
    %history arrays.
    I=i+1;
    
    %Pull previous estimate from histories
    xh_im1=xh(:,I-1); %Previous extimate <x^_{i-1}>
    P_im1=P(:,:,I-1); %Previous covariance <P^_{i-1}>
    
    dt_major=t(I)-t(I-1);
    [xh_i,P_i]=ekf_step(xh_im1,P_im1,z(I),dt_major,@F_B612,@g_B612,@Phi_B612,@H_B612,Q,R);
    
    %Record the current estimates
    xh(:,I)=xh_i;
    P(:,:,I)=P_i;
  end
end


The function ekf_loop takes the following inputs:

  • xh_0 - initial guess of the state vector
  • P_0 - initial guess of the estimate covariance
  • z - list of measurement vectors
  • t - array of times, same length as number of measurements
  • F - Name of physics function
  • g - Name of observation function
  • Phi - Name of Physics matrix function
  • H - Name of observation matrix function
  • Q - Process noise covariance matrix, square n x n where n is number of elements of state vector
  • R - Measurement noise covariance matrix, square m x m where m is number of elements of measurement vector

It produces the following outputs:

  • xh - list of estimated states
  • P - list of estimate covariance matrices

If you have a whole bunch of measurements in a block that you want to do, this is the way to go. This function calls ekf_step() to actually run the equations we have derived. If you get your measurements one at a time and don't care so much about history, perhaps because you are running a robot control loop, you can call ekf_step() directly.


function [xh_i,P_i]=ekf_step(xh_im1,P_im1,z_i,dt_major,F,g,Phi,H,Q,R)
  %One numerical integrator for both 1A and 1B, so that 1B can use the
  %intermediate results of 1A.
  xh_im=xh_im1; %To become updated state estimate <x^_i->
  A=eye(size(P_im1)); %The thing we integrate in 1b, will be come [A]
  nstep=100;
  dt=dt_major/nstep;
  for j=1:nstep
    %1a. Use numerical integration to update state vector through time from
    %<x^_{i-1}> to <x^_i->
    xh_im=xh_im+dt*F(xh_im);
    %1b. Use numerical integration to update [A]
    A=A+dt*(Phi(xh_im)*A);
  end
  %1c, state deviation estimate
  Xh_im=zeros(size(xh_im));
  %2. Projected covariance estimate
  P_im=A*P_im1*A'+Q;
  %3a. Linearized observation matrix
  H_i=H(xh_im);
  %3b. Kalman gain
  K=P_im*H_i'*inv(H_i*P_im*H_i'+R);
  %4a. Measurement deviation
  Z_i=z_i-g(xh_im);
  %4b. Measurement update of state deviation
  Xh_i=Xh_im+K*(Z_i-H_i*Xh_im);
  %4c. Update of reference state vector
  xh_i=xh_im+Xh_i;
  %5. Measurement update of state covariance
  P_i=(eye(size(P_im))-K*H_i)*P_im;
end    


Its inputs are:
  • xh_im1 - Previous estimate of the state vector
  • P_im1 - Previous estimate covariance
  • z_i - current measurement vector
  • dt_major - time between last and current measurement
  • F - Name of physics function
  • g - Name of observation function
  • Phi - Name of Physics matrix function
  • H - Name of observation matrix function
  • Q - Process noise covariance matrix, square n x n where n is number of elements of state vector
  • R - Measurement noise covariance matrix, square m x m where m is number of elements of measurement vector
and its outputs are:

  • xh_i - New estimated state vector
  • P_i - New estimate covariance matrix
Finally, the model functions for B612:


%Physics function F(<x>). Takes a state vector, returns the derivative of
%the state vector. Primary physics model definition
function F=F_B612(x)
  rx=x(1);
  ry=x(2);
  vx=x(3);
  vy=x(4);
  r=[rx;ry];
  r3=sum(r.^2)^1.5;
  mu=1000;
  a=-mu*r/r3;
  F=[vx;vy;a(1);a(2)];
end

%Physics matrix [Phi(<x>)]. Takes the state vector and returns the jacobian
%matrix. Secondary physics model definition, as it is derived from F() (on
%paper).
function Phi=Phi_B612(x)
  mu=1000;
  rx=x(1);
  ry=x(2);
  r=[rx;ry];
  r5=sum(r.^2)^2.5;
  Phi=[0,0,1,0;
       0,0,0,1;
       [ry^2-2*rx^2,3*rx*ry,0,0]*mu/r5;
       [3*rx*ry,rx^2-2*ry^2,0,0]*mu/r5];
end

%Observation function g(<x>). Takes a state vector, returns an observation
%vector. Primary observation model definition
function g=g_B612(x)
  rx=x(1);
  ry=x(2);
  r=[rx;ry];
  mx=10; %Radar location
  my=0;
  g=sqrt((rx-mx)^2+(ry-my)^2);
end

%Observation matrix [H(<x>)]. Takes the state vector and returns the jacobian
%matrix. Secondary observation model definition, as it is derived from g() (on
%paper).
function H=H_B612(x)
  rho=g_B612(x);
  rx=x(1);
  ry=x(2);
  r=[rx;ry];
  mx=10; %Radar location
  my=0;
  H=[(rx-mx)/rho,(ry-my)/rho,0,0];
end


Let's see how we did:

Hmm, not so good. We have diverged from reality. Exercise for the reader: Why? Is our initial guess too far off? Is it too noisy? Is there a bug? Can we just not do the estimation with this kind of measurement? I have some ideas, what about yours? Try messing with the initial condition and noise variables and see if you can get it to converge. I have my own ideas, which I will share next post.

Edit: There was a bug, in the Phi function. I had factored some stuff out, the mu/r^5 term, but when I realized we had the 1 terms in the first couple of rows, those were being multiplied by mu/r^5 also, which is not correct. With the correct Phi, we get the following:
This is better, in that the path is not wandering off into deep space. However, it's still not great. What is amazing is how well this meandering path fits the actual data:
What's going on here is that you just can't completely observe the orbit based on only range measurements. That's the answer I expected above.

New fun toy discovered!

Wolfram Alpha. It's like the Integrator, but solves more problems, including derivatives. It is well known that any elementary function has a derivative expressible as an elementary function (for an appropriate value of "elementary") and that the chain rule, product rule, sum rule, and power rule are sufficient to build an algorithm around to find any derivative. In fact, it was considered to be trivial, which is why Wolfram originally only published the Integrator.

Well, now it does derivatives. derivative of sqrt((x-a)^2+(y-b)^2) . It even shows its work, in the form that a person would do it on paper. So, no excuses now about derivatives. Just learn what one is, and let Alpha actually do them.

Intermediate Kalman Filtering theory, Part 3

Problem definition
This is the simplest form of the orbit determination problem I learned this filter on back in school. A satellite is in orbit around a point mass. Newton's gravitation applies. The satellite is tracked from a radar at a fixed point above the point mass (or on the surface of a perfectly smooth, homogeneous, radar-transparent planet) in the plane of the satellite orbit. The point mass is so much heavier than the satellite that we may treat the satellite as a test mass. This means that the gravity of the satellite does not significantly move the central mass, and therefore the mass of the satellite cancels right out of the problem.

State Vector
This is like the cart, but now we have two dimensions. It is provable that the trajectory of any test mass around a point mass is confined to a plane containing the point mass, as long as it is only under the influence of gravity, or if all other forces are in the plane also.

So, our state vector has four components, two for position and two for velocity, for a total of four.

<x>=<rx,ry,vx,vy> 

We are using r instead of p for position, mostly for historical reasons. Think of it as the "ray" or "radius" from the center of mass to the satellite.

Physics
This time we use Newton's Universal Law of Gravitation (in title caps, because it is important.)

F=GMm/r2

This is combined with Newton's second law

F=ma

to get

ma=GMm/r2

where
  • F is the force on the test mass
  • G is the universal gravitational constant
  • M is the mass of the central mass
  • m is the mass of the test mass
  • r=sqrt(rx2+ry2) is the distance from the central mass to the test mass.
  • <r>=<rx,ry> is the actual vector from the center to the test mass
  • a is the acceleration imposed on the test mass

We see that m appears on both sides, so we will cancel it out and get

a=GM/r2

Also, this is a vector acceleration, directed towards the center of mass, so multiply this by a unit vector pointing from the test mass to the center -<r>/r to get

a=-GM<r>/r3

It is very difficult to measure G by itself, as this involves measuring gravity exerted by objects small enough to fit in a laboratory. It is also very difficult to measure M of a planet by itself, because it's a whole planet, and won't fit on a scale. However, by observing the motions of satellites, it turns out that it is easy to accurately measure the product GM much more accurately than either G or M. So, we call this product μ (Greek small mu) and use it instead.

F(<x>)=a=-μ<r>/r3

And there's our physics function F(). If we want, we can expand it by element

F.rx=0
F.ry=0
F.vx=-μx.rx/sqrt(x.rx2+x.ry2)3
F.vy=-μx.ry/sqrt(x.rx2+x.ry2)3

Observation
Our radar is able to measure the distance (only) from itself to the satellite. Therefore the measurement vector at any time is just this distance, only one component again. The station is located at <m>=<mx,my>, so the distance from the radar to the satellite is

g(<x>)=|<r>-<m>|=sqrt((rx-mx)2+(ry-my)2)

Jacobian matrices
First, [Φ] This one has a lot of blank space because two of its components are simple variables, and the other two are functions of only two elements of the state vector, so right off we have

[Φ]=[0,0,1,0]
       [0,0,0,1]
       [dF.vx/dx.rx,dF.vx/dx.ry,0,0]
       [dF.vy/dx.rx,dF.vy/dx.ry,0,0]

Only six components out of 16 are nonzero, and only four are nontrivial. So, what about the ones that are?

F.vx=-μx.rx/sqrt(x.rx2+x.ry2)3

To the derivinator! derivative of mu*x/(sqrt(x^2+y^2))^3 I don't think that Alpha is smart enough to substitute r, so I just put it in explicitly, and we can sub it back out when we get the result. Alpha (now I want to call it "Ziggy" for some reason) says that the result is

(d)/(dx)((mu x)/sqrt(x^2+y^2)^3) = (mu (y^2-2 x^2))/(x^2+y^2)^(5/2)

or cleaning up and putting back in r

dF.vx/dx.rx= μ (x.ry2-2 x.rx2)/r5

We continue to get the other three elements:

dF.vx/dx.ry= -μ(3 x.rx x.ry)/r5
dF.vy/dx.rx= -μ(3 x.rx x.ry)/r5
dF.vy/dx.ry= μ (x.rx2-2 x.ry2)/r5

Now for [H]. This has four elements, one row for the one element of the observation by four columns for the elements of the state, but two of them are zero since the measurement doesn't involve the velocity elements of the state. We will define ρ (Greek small rho) as the measurement itself, so ρ=g(<x>)=sqrt((x.rx-mx)2+(x.ry-my)2)
dg/dx.rx= (x.rx-mx)/ρ
dg/dx.ry= (x.ry-my)/ρ
So we have the two Jacobian matrices,

[Φ]=[0,0,1,0]μ/r5
        [0,0,0,1]
        [x.ry2-2 x.rx2,3 x.rx x.ry,0,0]
        [3 x.rx x.ry,x.rx2-2 x.ry2,0,0]
(that extra μ/r5 hanging out in front is some common terms factored out) and


[H]=[[(x.rx-mx)/ρ, (x.ry-my)/ρ,0,0]]

Intermediate Kalman Filtering theory, Part 2

Time for an example. We'll start really really simple, and just redo the linear velocity model.

First off, state vector. Basically the state vector is all the things that the system "remembers" about itself. Think about it as a simulation. Once we have a program, we run the simulation and freeze it at a certain point. What do we need to save so that when we restore, we have enough data to start the simulation back up? We'll use the cart on rails model as an example. We are working with a massive cart on smooth, level, perfectly frictionless rails. Therefore the cart is free to move in one dimension only. The very meaning of one dimension is that it takes only one number to describe the position, and we will call this number p. If we think of it as a function of time, we might say p(t). We need p in our state vector, but this does not yet completely describe the state. We also need to know the velocity. A cart at 4m down the track stopped is very different from a cart at 4m moving at 50m/s. As it turns out, this is all we need to know. This is all the cart "remembers" - everything else, such as our random gusts of wind, are instantaneous only. Any effect they have is only through changing the position and velocity. Therefore, <x>=<p,v>. Our state vector has two elements, p (in units of meters from the origin, positive on one side and negative on the other) and v (in units of meters per second, positive if moving one way and negative if the other).

Next, physics.We actually use Newton's first law here (normally problems start with the second law). The cart has no forces applied to it, so it continues in a straight line at constant speed. The derivative of the speed is zero (this is what constant speed means) and the derivative of the position is the speed, so we end up with this:

dv/dt=0
dp/dt=v

d<x>/dt=<x.v,0>

(An aside on vector notation. If the components of a vector <?> are named, I will specify the element named @ by ?.@ as if the vector were a structure in C. If the components are ordered, and they always are, I will call the first one ?[0], followed by ?[1] etc. Zero-based, again like C. So, for our state vector <x> above, we can talk about x.p, x.v, and note that x[0]=x.p . Vector components always have an order, but sometimes it is more clear to talk about the components by name if any, so I will freely use both notations. When I get to it, I will number matrix cells with two indexes, row first, then column, starting at 0. So, the upper-left cell of any matrix is ?[0,0].)

Believe it or not, we are done with the physics function.

F(<x>)=d<x>/dt=<x.v,0>.

Now for the observation function. The observation vector is a single number, the position along the track. So, we get

g(<x>)=<x.p>

Again, that's it.

Now for jacobians. We need a function that can calculate [Φ] given any <x>, and to do this we need derivatives. We are going to do the derivatives once on paper, code them up in a function, and then that function will provide the actual numerical derivative whenever we need it.

There are only four elements, so we can go nice and slow. In the upper left cell of the matrix, we take the derivative of the 0 component of the force function F[0](<x>)=x.v with respect to the zero component of the state vector x.p . Since this variable doesn't even occur in the formula, the derivative is zero and we have Φ[0,0]=0. Next, in the upper right [0,1] cell, we have component 0 of the force function F[0](<x>)=x.v still, but this time we are going with respect to the 1 component of the state vector x.v . The derivative of any variable with respect to itself is 1, so we get Φ[0,1]=1. Bottom left cell [1,0] uses F[1](<x>)=0 and x[0]=x.p so we have Φ[1,0]=0. To finish off, we have bottom right cell [1,1] which uses F[1](<x>)=0 and x[1]=x.v so Φ[1,1]=0. As it happens this time, all the cells are constant, so our physics matrix function can just return a constant in response to any state vector [Φ(<x>)]=[[0,1],[0,0]] for any <x>. As we shall see in our next example, things are not always so easy.

The jacobian for g() is even easier, but it is still a matrix. Any jacobian of any vector parameter function with a vector return value has as many columns as the input vector has elements, and as many rows as the output vector has elements. So, in this case, 1 row by 2 columns. For H[0,0], we use g[0](<x>)=x.p and x[0]=x.p, so H[0,0]=1. For H[0,1] we have g[0](<x>)=x.p and x[1]=x.v, so H[0,1]=0

To sum up, our problem is described as follows:

State vector <x>=<p,v>
Physics function F(<x>)=<x.v,0>
Observation function g(<x>)=x.p
Linearized physics matrix [Φ]=[[0,1],[0,0]]
Linearized observation matrix [H]=[[1,0]]

Also, a word about a priori initial guesses. You will note that we always need an initial guess of our state vector at step 0 <x^_0> as well as an initial guess of our uncertainty of our initial guess, [P_0]. For some problems, it doesn't matter so much what the initial guess is, since the filter converges rapidly. Sometimes, though, it doesn't, and when this happens, the filter gets farther and farther from the correct answer instead of closer, and we say that the filter diverges. If this happens in a real vehicle, the thing may diverge from its intended course and converge with a wall, causing the parts of the vehicle to diverge from each other. Let's agree right now not to diverge. Your filter can handle the truth, as much truth as you can give it. Give it as good of an initial guess as possible.

Now just because the notation is a little neater, we are going to use the Matlab language to write this example. You can use Gnu Octave to run this if you don't have and can't afford Matlab.

The program is structured as follows. The driver function starts with a piece to simulate the cart. It will create one data point every 100ms for 10s, for a total of 100 points. Next we will add some noise to create the measurements. Then, we create the a priori initial guess.

Finally, we run the filter loop. Each iteration of the filter loop will process one observation and create one state estimate. The time stamp of this estimate is exactly the time of the measurement. The numerical integrators will be in line, instead of in a separate function to start with. Just to see how it's done, we will do a numerical integration time step of 1ms, so we need 100 loops to get from one measurement to the next.
All the other equations will be pretty much as written.

After we have run the filter loop, we will compare the estimates with the true noise-free state we started with.


function nonlinear_velocity
  %Simulated actual trajectory. Stopped at the origin for first five seconds,
  %then constant acceleration such that position reaches 25m after 10
  %seconds.
 
  t=0:0.1:10; %Timestamp of each measurement
  size(t)
  pt=[zeros(1,50),(0:0.1:5).^2]; %True position
  size(pt)
 
  %We want random noise, but the *same* random noise each time we run
  randn('seed',3217);

  R=0.5^2; %Measurement noise covariance, 50cm on position
  pn=randn(size(t))*sqrt(R); %Position noise, standard deviation of 10cm.
  size(pn)
 
  Q=[0.01,0;0,0.01];  %Process noise covariance, 10cm/s on position and 10cm/s^2 on velocity
 
  z=pt+pn; %Observations
 
  xh_0=[0;0]; %Initial state vector guess
  xh=zeros(numel(xh_0),numel(t)); %History of all estimates (starts as initial estimate)
  xh(:,1)=xh_0;
 
  P_0=[1,0;0,1]; %Initial state vector uncertainty, 1m in position and 1m/s in velocity
  P=zeros(numel(xh_0),numel(xh_0),numel(t)); %History of all estimate covariances
  P(:,:,1)=P_0;  
 
  for i=1:100
    %Stupid matlab, not using zero-based indexing. Our i here is zero based,
    %but we will make an I which is one-based, and use it to index the
    %history arrays.
    I=i+1;
   
    %Pull previous estimate from histories
    xh_im1=xh(:,I-1); %Previous extimate <x^_{i-1}>
    P_im1=P(:,:,I-1); %Previous covariance <P^_{i-1}>
   
    %One numerical integrator for both 1A and 1B, so that 1B can use the
    %intermediate results of 1A.
    dt=0.001;
    xh_im=xh_im1; %To become updated state estimate <x^_i->
    A=eye(size(P_im1)); %The thing we integrate in 1b, will be come [A]
    for j=1:100
      %1a. Use numerical integration to update state vector through time from
      %<x^_{i-1}> to <x^_i->
      xh_im=xh_im+dt*F(xh_im);
      %1b. Use numerical integration to update [A]
      A=A+dt*(Phi(xh_im)*A);
    end
    %1c, state deviation estimate
    Xh_im=zeros(size(xh_im));
    %2. Projected covariance estimate
    P_im=A*P_im1*A'+Q;
    %3a. Linearized observation matrix
    H_i=H(xh_im);
    %3b. Kalman gain
    K=P_im*H_i'*inv(H_i*P_im*H_i'+R);
    %4a. Measurement deviation
    Z_i=z(I)-g(xh_im);
    %4b. Measurement update of state deviation
    Xh_i=Xh_im+K*(Z_i-H_i*Xh_im);
    %4c. Update of reference state vector
    xh_i=xh_im+Xh_i;
    %5. Measurement update of state covariance
    P_i=(eye(size(P_im))-K*H_i)*P_im;
   
    %Record the current estimates
    xh(:,I)=xh_i;
    P(:,:,I)=P_i;
  end
 
  %Plot some interesting results
  figure(1)
  plot(t,pt,'r-',t,z,'k*',t,xh(1,:),'b-');
  figure(2)
  plot(t,pt-pt,'r-',t,pt-z,'k*',t,pt-xh(1,:),'b-');
end

%Physics function F(<x>). Takes a state vector, returns the derivative of
%the state vector. Primary physics model definition
function F=F(x)
  F=[x(2);0];
end

%Physics matrix [Phi(<x>)]. Takes the state vector and returns the jacobian
%matrix. Secondary physics model definition, as it is derived from F() (on
%paper).
function Phi=Phi(x)
  Phi=[0,1;0,0];
end

%Observation function g(<x>). Takes a state vector, returns an observation
%vector. Primary observation model definition
function g=g(x)
  g=x(1);
end

%Observation matrix [Phi(<x>)]. Takes the state vector and returns the
%jacobian matrix. Secondary observation model definition, as it is derived
%from g() (on paper).
function H=H(x)
  H=[1,0];
end



And now for some results:






Hmm, not so good. The estimate starts lagging behind the truth, and never catches up. This is more because the model doesn't fit the actual cart (which in this case has a rocket on it which lights at t=5sec). If the model could not be improved, then we will have to sacrifice smoothness and increase the process noise. Try different (larger) values on the diagonal elements of [Q].

It doesn't really matter, this just supplies us with a good scaffold to hang a real nonlinear problem on.

Wednesday, March 30, 2011

IMU Calibration Plan

Step 1: Put the IMU into a square frame.

The bridge board is not precisely square, and the gyros are not precisely aligned with the accelerometers, and neither is precisely aligned with the compass. I don't think that this matters. I may be able to measure the orientation of the devices to each other, at least the accelerometers and gyros.


What I do need is a square frame. I have a wooden cube box which is square to my ability to measure it, with a circular bubble level. I have glued the bridge board and its battery into place, such that the switches, buttons, and USB port are all accessible.

Step 2: Warm and Cold.
It is known that the gyro has a temperature-dependent zero offset, and may have a temperature dependent scale as well. This is fine, as the device has a thermometer as well, which I read out with the rotation data. The accelerometer has a thermometer also, but I don't know how temperature sensitive it is.

So, one test will be done in my basement, as close as possible to my computer to make it nice and toasty warm. I'll probably render something during the test to keep things toasty.

Another identical test will be done in my garage with the doors open. This will keep it chillin' out.

Step 3: Accelerometers.
The ideal test site for an accelerometer doesn't exist. The surface of the Earth is great for a 1G signal, but to get a zero-G signal, I need a vacuum column drop chamber with a pillow at the bottom.

What I will do instead is record data with each axis pointing up and down. The surface I put it on won't be completely level, but the box is square, so when it is laid down one way it will be exactly opposing when it is laid down the other way. I will get a +1G and -1G signal on each axis, then average to get the 0G signal.

Step 4: Gyroscopes.
A record turntable would be ideal, but I don't have one. I should check Goodwill tomorrow. What I have instead is a non-driven food turntable. I will mark these such that it is evident when it is exactly at home position. Then I will spin the table up, let it coast a while, and when it slows down, move it to exactly home position. I will spin it both directions around each axis.

Step 5 (for the adventurous): Drop test.
Build a big pile of pillows in the basement. Climb as high as I can, then drop the IMU from the ceiling to the pillows. Try to have it as spin-free as possible, because the accelerometers are not at the center of rotation. The first fraction of a second after drop, before the package has built up any speed and therefore drag, will be the best. All axes will simultaneously go to zero. Repeat in the garage.

Intermediate Kalman Filtering theory, Part 1

Real programmers are Not Afraid of Math
If you know calculus, great. If not, stop now, and run, don't walk, to your library or bookstore and beg, borrow, or buy a good calculus book. I recommend Calculus the Easy Way by Douglas Downing. Come back when you understand what a derivative is, and know how to calculate some easy ones. Also at least know what an integral is, even if you don't know how to calculate any, and read about the fundamental theorem, which basically states that a derivative can be reversed and that it is useful to do so. Knowing at least a smidgen of calculus will serve you long after your robot has decided to take a swim in the pond. It opens up a whole new world of problems and solutions, and that is useful for you to have a feel for whether a problem is solvable even in principle, even if you don't know how to solve it.

I'll wait.

OK, are we ready to proceed? Let's move on then.


Kalman filter processes like to use the following model:

<x_i>=[A]<x_{i-1}>+<w_{i-1}>
<z_i>=[H]<x_i>+<v_i>

Nonlinear Kalman Filter processes tend to use a different but related form:

<x_i>=f(<x_{i-1}>)+<w_{i-1}>
<z_i>=g(<x_i>)+<v_i></math>

where

  • f(...) is the state transition function, which serves the same purpose as [A]. Given a previous state vector, the state transition function is used to transform it to the current state vector. So, this function takes a vector and returns a vector. This is more general than a pure matrix multiplication, and allows any kind of relation between this state and the next.
  • g(...) is the observation function, which serves the same purpose as [H]. Given a state, the observation function is used to transform it to the observations. Likewise, a vector function with vector parameter.

HOWEVER

Most physical laws are in the form of vector differential equations, so we have:

d<x>/dt=F(<x>,t)

where

  • F(...) is the physical law function, which is used to calculate the derivative of the state. This is then integrated to get a new function which describes the state in a continuous manner from the initial state and the laws. This function of state versus time is called the trajectory, since one of the basic uses of this is describing the actual trajectory of something, like a planet, moon, or robot. Remember not to confuse f() with F(). We almost never use or even have access to f().

Measurements are still taken at discrete instants. What we want is a way to shoehorn this continuous problem into a form suitable for the Kalman filter.
With sufficiently simple physics functions, you can crank out the integration as it shows in your textbooks and get the trajectory function directly. This is what Newton invented calculus for in the first place. However, for down-to-earth things like robots, the physics are usually too complicated to integrate directly. One of the things our filter will do for us is integrate numerically, so that we end up with (in principle) a list of times and the state vector for each time. This will be our trajectory function.


Linearization
If you just want the recipe, skip a couple of sections.


Nonlinear problems are hard. In fact, they can't be done. So instead, we do a modified problem and hope that it is close enough. This is the part where everyone goes off on Taylor series and such. I will just talk somewhat informally.
First, most physical processes are smooth. No discontinuities and no corners. Just nice sweeping curves. If you look at any of these curves closely enough, the relative radius of curvature gets bigger and bigger, and any small section eventually appears linear if you zoom in on it enough.

In order to take advantage of this, you have to look at the right part of a curve. We need a reasonably accurate initial condition and a correct dynamics and kinematics model. Now we can use the derivative of the curve to find the general direction of motion. We use the initial condition estimate and model to create a reference trajectory. We end up not estimating the trajectory directly, but the divergence from the reference.

So, we need a replacement for [A] and a way to update <x^_i> and <P_i>. The state vector is the easiest, so we'll talk about that. We know the physics model (If we don't, go find it!) so we can use it directly on our old estimate with numerical integration. I'll talk in detail about numerical integration in another post, but if you are a programmer, there is a good chance you have already written one. The simplest one is this simple:

  • Find the rate of change of all the elements of the state vector at this point. That is, run F(<x^_{i-1}>), run the physics function on the old estimate.
  • Find the time step size. This is t_i-t_{i-1}.
  • Multiply the rate of change vector by the time step size. This is how much the vector has changed over the step.
  • Add the change to the old estimate to get the new estimate.

All numerical integrators are just improvements and refinements on this. If the time step is short enough, you can just do this. One simple refinement is to break the time step into smaller sub-steps and do the integration repeatedly.

Replacing [A] is a bit harder. Skipping all the justifications and derivations, you need a physics matrix [Φ]. Finding it will require cranking out a bunch of derivatives by hand or computer, or writing a numerical differentiator. Then with [Φ] in hand, you can numerically integrate to get [A]. This is the "linearization" process talked about in textbooks.

[Φ] is a Jacobian matrix, a kind of vector derivative. Instead of taking the derivative with respect to time as you may be accustomed to, you take the derivative of each component of the physics function with respect to each of the other components. You end up with a matrix, where for each cell, you have the derivative of the component that is this cell's row number, with respect to the component that is this cell's column number. If you want to, you can think of it as fitting tangent hyperplane to the physics function, in the same way that an ordinary derivative fits a tangent line to a 1D function. This hyperplane is flat, so this is why we call it "linearized". We then project this hyperplane out as far as we need it, and use it to transform an identity matrix from one place on the trajectory to [A] on another. The math is pretty straight forward, it's just hard to imagine, especially with more than about 1 element in your state vector.

Once you have [A] in hand, you can easily find [P_i-] using the same formula as before.

We end up playing a similar Jacobian trick with the nonlinear observation. The Jacobian matrix of the observation function g(<x>) is dg/d<x>=[H]. No confusion with linear [H] because we use it the same way ever after.

Extended Kalman Filter algorithm

To remember, the five equations of the linear Kalman filter are

  1. <x^_i->=[A]<x^_{i-1}> 
  2. [P_i-]=[A][P_{i-1}][A']+[Q]
  3. [K_i]=[P_i-][H']([H][P_i-][H']+[R])-1 
  4. <x_i>=<x^_i->+[K_i](<z_i>-[H]<x^_i->)
  5. [P_i]=([1]-[K_i][H])[P_i-]

    Equations 1 and 2 are called the time update while equations 3 through 5 are called the measurement update.

    Plugging in what we have above into this form, we get the following. Don't panic too much, it will all make sense with a concrete example.

    Time update

    1a. <x^_i->=∫ F(<x(t)>,t) dt from t=t_{i-1} to t_i with initial condition <x(t_{i-1})>=<x_{i-1}>
    1b. [A(t_i,t_{i-1})]=∫ [Φ(<x(t)>,t)][A(t,t_{i-1})] dt from t=t_{i-1} to t_i with initial condition [A(t_{i-1},t_{i-1})]=[1]
    Basically these integrations cover the time from one step to the next, generate our [A] for us, and turn this from a continuous time problem back into a step problem like we had before. We do these integrations numerically and in lockstep, so that we can use the intermediate result <x(t)> from 1a where we need it in 1b.

    1c. <X^_i->=<0>
    2. [P_i-]=[A(t_i,t_{i-1})][P_{i-1}][A(t_i,t_{i-1})']+[Q]

    Equation 1c is where things start going different. We are estimating the deviation from the reference trajectory, but our update follows the reference trajectory perfectly, so our initial estimate of the deviation is zero. Equation 2 is exactly like the old form.

    Measurement update
     

    3a. [H_i]=dg(<x^_i->,t_i)/d<x^_i-> Linearized observation matrix
    3b. [K_i]=[P_i-][H_i']([H_i][P_i-][H_i']+[R])-1 Kalman gain
    4a. <Z_i>=<z_i>-G(<x^_i->,t_i) Measurement deviation
    4b. <X^_i>=<X^_i->+[K_i](<Z_i>-[H_i]<X^_i-> Measurement update of state deviation
    4c. <x^_i>=<x^_i->+<X^_i> Update of reference state vector
    5. [P_i]=([1]-[K_i][H_i])[P_i-] Measurement update of state covariance

    My teacher calls this is the extended Kalman filter, in distinction from the linearized Kalman filter, because the reference trajectory is updated every time around. To do the linearized filter, don't reset the state deviation to zero each time (just at the start) and don't add the state deviation to the reference state at the end. If the state deviation gets too big, you will need to do a rectification every once in a while, but not at every measurement, where you do these things. The extended filter basically does rectification every time.

    Beginners Kalman Filtering Theory, Part 5

    Pay attention here, because we are finally going to develop something practical - The Kalman Velocity Model

    This is about the only problem I have found which both has a practical use, and actually fits the linear model.

    Let's imagine a rail car on a level frictionless rail. The car is unpowered and uncontrollable, but is subject to unpredictable accelerations (say from gusts of wind). We want to know the position and velocity of the car, but our only measuring device is a tape measure laid out along the track. Once again we can measure the position of the car to arbitrary position, but only finite accuracy.
    First, let's look at the process model again:

    <x_i>=[A]<x_{i-1}>+<w_i>
    <z_i>=[H]<x_i>+<v_i>

    Now that we are considering motion, we need to consider time. The spacing between two measurements i-1 and i we will call Δt (Delta-t). For now we will consider Δt to be constant, meaning that we have equally-spaced measurements.

    Now I am beginning to curse Blogspot and it's lack of math typography. To show a matrix, I will do the following. If the matrix is

    [a b]
    [c d]

    I will write it as [[a,b],[c,d]] or in other words, rows are the inside brackets and the whole matrix is a bunch of stacked rows. If I have a vector, I will say

    [a]
    [b]
    [c]

    is the same as <a,b,c>, which is the same as [[a],[b],[c]]. All vectors are column vectors unless otherwise stated. Even a vector written <a,b,c> is a column vector.

    The state of the vehicle is completely determined by its current position p, and velocity v, so its state vector is:

    x=<p,v>

    The state transition matrix is two-by-two, since the state is two-element. In simple linear equation language, we have this:

    p_{i}= p_{i-1} +Δt v_{i-1}
    v_{i}=                   v_{i-1}

    Converting this to a matrix equation gets us this:

    <p,v>=[[1,Δt],[0,1]]<p_{i-1},v_{i-1}>
    <x_i>=[A]<x_{i-1}>

    so we have a state transition matrix:

    [A]=[[1,Δt],[0,1]]

    For the observation matrix, we are taking the two-component state and converting it to one measurement, so the matrix is one-by-two.

    <z_i>=(1)p_i+(0)v_i
    <z_i>=[[1,0]]<p_i,v_i>

    so the observation matrix is:

    [H]=[[1,0]]

    Process Noise Model
    We represent the unpredictable accleration component with the process noise covariance matrix. The process noise is a random vector variable with two elements, so its covariance is a two-by-two matrix. I will borrow the process noise model from Wikipedia. In between any two measurements, the unknown process noise acceleration is presumed to be constant a_i. By integrating this over the time step, we see that the velocity is changed by a_iΔt and by integrating twice we see that the position is changed by 0.5a_iΔt2.

    So in mathematical terms, we get

    <x_i>=[A]<x_{i-1}>+<w_i></math>
    <w_i>=<0.5a_iΔt2,a_iΔt>
    <w_i>=<0.5Δt2,Δt>a_i
    <g>=<0.5Δt2,Δt>
    <x_i>=[A]<x_{i-1}>+<g>a_i

    This comes straight from Wikipedia underived. The process noise covariance matrix is

    [Q]=<g><g'>σ_a2=[[0.25Δt4, 0.5Δt3],[0.5Δt3,Δt2]]σ_a2

    Here I use <?'> for the row vector form of column vector <?>. I don't know the exact derivation, but basically it is saying that since the standard deviation is known, and the variance is required, you just square the standard deviation. Since you can't directly square a vector, the idiom <g>2=<g><g'> is used.

    As we look through the filter equations, it turns out that [Q] is always added to the projected value of [P], so it doesn't matter that [Q] is singular by itself. The parameter σ_a has units of acceleration, and represents the expected magnitude of the random velocity change per unit time, while a_i represents the unknown actual acceleration.

    In this case the process noise is a real part of the model, and not an ugly hack or tuning parameter. We defined the model with this random acceleration, and this process noise matrix describes it in math. For real systems such as EVE, the process noise is still an ugly hack.

    This form of the process noise is preferred, because the process noise parameter then is decoupled from the timestep size.

    The measurement is a scalar, so its covariance is scalar also.

    R=σ_z2

    We can then apply these matrices directly to the Kalman filter form.

    Sensor fusion and sensor anti-fusion
    In this  simple model, we have only one kind of sensor, for position, but we are able to recover both position and velocity. This is possible because our physics model [A], simple as it is, is complete and captures the whole state vector. This concept is important. You don't necessarily need a sensor for every component of your state vector. This is sensor anti-fusion, I guess. It is a complementary concept to sensor fusion.

    With sensor fusion, sensors of two different types are used to measure the same parts of the state vector. We'll get into exactly how to do this in a later post, but let's just talk about goals for a moment. Suppose your robot has a 6DOF IMU and a GPS. You use the IMU for accuracy over short distances and times, and the GPS to keep things aligned with the real world, and keep the IMU from wandering off too far.

    Now, you are probably not going to measure the GPS and IMU at the same time. So, for some steps, perhaps most steps, the IMU measurement is available and the GPS is not. Other steps, the GPS is available and the IMU is not. Perhaps there are rare times when both measurements are taken simultaneously.

    What you do in this case is use a different observation model, and therefore a different [H] depending on what combinations of sensors are available at a particular instant. When you use the IMU, you have an [H] (or actually the nonlinear equivalent of [H]) which only uses the IMU. You also have the measurement uncertainty of the IMU in its own [R]. When you use the GPS, you use a different [H]. And, when both are available simultaneously, you could use an [H] that incorporates both. We'll see why we don't do that in a later post.

    With the right [H] for each measurement and the right [A] (or its nonlinear equivalent as we shall see) for the physics, combined with the correct measurement noise for each measurement, the filter does all the bookkeeping and weighting automatically to incorporate multiple different kinds of sensors into the same estimate. It automatically weights each measurement to use it as best as possible. If you don't lie with [R], it is able to weigh things appropriately without any further intervention. This is what is meant by sensor fusion.

    EVE example
    Now this process noise we have applied is just a model, just a fudge factor to keep the filter from getting too satisfied with itself. The filter thinks the process has noise, but in reality that noise is the process not following the model. I am not prepared to tell the sun what to do.

    Here's an example using the velocity process above, applied to the EVE instrument. The filter uses this process noise model, but the real data does not. During a measurement sequence of 1000 measurements, the Sun generates a rock solid constant 20 units of flux for the first half. At the exact halfway point, we see a flare begin. This flare has a maximum amplitude of 20 units, and then decays away again.

    The EVE instrument is modeled as having a Gaussian uncorrelated measurement noise with a standard deviation of one unit, with the following exception: On a randomly selected 1% of the measurements, the standard deviation is five units. This represents an actual phenomenon seen on EVE.



    The heavily marked points are the 1% "popcorn" (There might not be exactly 10, since they are thrown in at random). The red line represents reality again, and the blue line represents the filter output. We notice immediately that the filter has less noise than the original data, so the filter is working. We also notice that while the filter output follows the popcorn, it is not nearly as bad as the original data itself.

    Here is a plot of the estimate difference from the real measurement. The heavy line is the filter residual, and the light lines are the ±3-sigma uncertainty lines.



    The problem here is that the filter takes a while to believe that the process really changed, so it lags a bit behind the actual process. This is that large downward spike we see at the half way point. Other than that, the filter has a standard deviation of about a third that of the original data, and the 3-sigma lines include the actual value almost all the time, as it should.

    The process includes a notion of velocity, the rate of change of the flux, but this is not a measurement that EVE actually makes. Even so, the filter provides a velocity estimate automatically. We could estimate it, but no one has asked for it. Perhaps if the scientists know that it is available, they will want it.



    This is very educational as to what the problem is. The real flux (literally) explodes, with a step jump in velocity. The filter takes a while to follow, and by the time it has followed, the flux velocity is dropping rapidly, and once again the filter has to chase it.


    Testing your filter
    Unless you are a much better programmer than me, and I don't think you are, when you originally code this up, it is going to have bugs. Now the normal way to debug is to trace the program flow, and at every line, or at least every critical line, predict what the program is going to do then test it by letting it execute. However, if you don't understand the algorithm, how can you do this prediction?

    This is why we are going step by step, with gradually more elaborate filters, each built off the last.I gave IDL code for a scalar Kalman filter in a previous post. This simplest filter was devolved to the point that all the magic in equations 3 and 5 has been distilled out, leaving only an obvious averager. Then, as you add refinements one at a time, test frequently. When it breaks (and it will) you know what part is broken: the last one you added, or at least an interaction between that part and the rest of the code.

    All of these charts were generated by that script, gradually evolved to do the more and more elaborate filters. I could (well actually I can't, the code has evolved away from those points) give you code for each filter, but I won't, because that's not a favor to you. You need to work with the code to understand it, and the best way is to evolve the code a step at a time as I have. You are guided by the examples, but you put in the effort yourself to follow them. It's all very Socratic.

    Even if you don't have IDL, you can program the simpler models even in a spreadsheet, and the more complicated one in your favorite data-handling language, or even general purpose language like C, Java, or Perl. If your language has no native charting abilities, have your filter write its output into a CSV, then suck it into a spreadsheet. There's no excuse, as C, Java, Perl, and OpenOffice are all free for the taking and available on almost all platforms (even your robot, if you write in C).

    As you progress, you will come to the point where you will write a general Kalman filter function, which can handle any process model thrown at it. At this point, you will then be debugging your models. Before you try to debug on your robot, try using [A] and some initial conditions to simulate a robot trajectory and using [H] to generate measurements at frequent intervals. This way, the normally unknowable True State is in fact known, and you can compare your estimate with the Truth, in a way which is almost impossible otherwise.

    The usefulness of this model
    By itself, we can use this model to estimate the rate of change of any variable that we can measure. If you have a thermometer, you can use the filter to not only reduce the measurement noise, but also create out of thin air a measurement of the rate of change. Likewise the voltage and charge state of your battery, the altitude and climb/sink rate measured by an ultrasonic pinger used as an altimeter, or any number of things.

    Further, when I describe the nonlinear filter and IMU model, you will see this model embedded in that one. The filter will integrate the acceleration to get velocity and position, but the acceleration it integrates will not be the direct measurement of the sensor, but an estimate based on the linear velocity model.

    Code
    Since this is a useful result in itself, code for it is posted here. Partly to check your work, but mostly so I have a home for it.


    function [xh,P]=kalman_vel(xh0,P0,t,z,sigma_a,R)
      xh=zeros(numel(xh0),numel(t));
      xh(:,1)=xh0;
      P=zeros(numel(xh0),numel(xh0),numel(t));
      P(:,:,1)=P0;
      H=[1,0];
     
      for I=2:numel(t)
        i=I-1;
        xh_im1=xh(:,I-1);
        P_im1=P(:,:,I-1);
        dt=t(I)-t(I-1);
        A=[1,dt;0,1];
        Q=[dt.^4/4,dt.^3/2;dt.^3/2,dt.^2]*sigma_a;
       
        xh_im=A*xh_im1;
        P_im=A*P_im1*A'+Q;
        K=P_im*H'*inv(H*P_im*H'+R);
        xh_i=xh_im+K*(z(I)-H*xh_im);
        P_i=(eye(size(P0))-K*H)*P_im;
       
        xh(:,I)=xh_i;
        P(:,:,I)=P_i;
      end
    end
       


    Input parameters:

    • xh0 - Initial guess of state vector
    • P0 - Initial guess of state vector estimate covariance
    • t - list of times - 1D array
    • z - list of measurements of position - 1D array same size as t
    • sigma_a - 1-sigma estimated process noise magnitude - scalar
    • R - 1-sigma measurement noise - scalar

    Output parameters:

    • xh - List of state vector estimates. First row is "position" and second row is rate of change.
    • P - List of covariance matrices.

    Beginners Kalman Filtering Theory, Part 4

    Process Noise

    If we know that the constant really is a constant, then what we did before is just fine, but we might as well just do an average. Suppose however, that our constant isn't really constant. Suppose it has a step in it. If you think about the filter as an averager, an averager never forgets. The filter result will end up the weighted average between the measurements before the step and the measurements after.



    So, what we can do is give the filter a little uncertainty, a little nagging doubt, about its previous results. We tell it that its estimate of the estimate error covariance is a little bit wrong. We create a little bit of doubt that the system really is a rock solid constant. We add process noise.

    Reiterating the Kalman filter again in all its glory, let's reduce it again, but this time include the process noise term.

    1. <x^_i->=[A]<x_{i-1}>
    2. [P_i-]=[A][P_{i-1}][A']+[Q]
    3. [K_i]=[P_i-][H']([H][P_i-][H']+[R])-1
    4. <x_i^>=<x_i^->+[K_i](<z_i>-[H]<x_i^->)
    5. [P_i]=([1]-[K_i][H])[P_i-]

    Once again, a cancelfest:

    1. x^_i-=x_{i-1}
    2. P_i-=P_{i-1}+Q
    3. K_i=P_i-/(P_i-+R)
    4. x^_i=x^_i-+K_i(z_i-x^_i-)
    5. P_i=(1-K_i)P_i^-

    Now the estimate covariance never collapses to zero. In fact, it never decreases below the process covariance Q, which is what we would expect. If the process really is uncertain, our estimate can never be less than that uncertainty.

    Lets sic this on the true constant first, to see what we give up:


    The final estimate is 0.5055V±0.0520V. Naturally our uncertainty is a lot worse, since the filter thinks that the process really is uncertain about what its state is. It weighs the previous estimate quite a bit less than it did when the process was rock solid. As it turns out, once the filter converges, it weighs the previous estimate about three times heavier than the current measurement, so in a way its memory only extends for three measurements. The filter in this case is a moving average of the recent past, where older measurements' weight decays exponentially.

    Let's see how it does now against a step in the constant:


    Since it only remembers back about three measurements, it only takes about three measurements to start tracking the new value.

    Process noise represents our uncertainty about what the process really is doing, uncertainty about our model. But, how can you assign a standard deviation to "maybe the constant has a step in it"? It certainly doesn't have a normal distribution. The right answer is to fix the process model, but maybe we can't. In this case, we use the process noise level as just a tuning parameter. We can tune it to get the right balance between noise filtering and responsiveness to change. In the above examples, the process noise is rather large, only 1/sqrt(10) of the measurement noise (Q=0.001, R=0.01). Let's see what happens as we turn the process noise down. All of these have the same measurement covariance, only the process covariance changes.


    Q=1e-4, memory about 10 steps

    Q=1e-5, memory about 30 steps
    Q=1e-6, memory about 100 steps
    So we see that as we turn the process noise down, the estimate in the flat part becomes smoother, in exchange for the response to the new data becoming longer.

    This kind of tuning is a judgement call. It's actually an ugly hack, and should be avoided if possible, but it may not be possible.

    Beginners Kalman Filtering Theory, Part 3

    Examples
    As noted above, the hard part is adapting your problem to the filter's form. To do that requires practice, so let's work through a few examples.

    Let's just measure something, like the voltage across a resistor, with an analog-to-digital converter. We'll ignore the discrete nature of an ADC for now and say that our ADC converts a voltage into an unlimited-precision real number. However, like all real devices, it is imperfect, and while it gives unlimited precision, it only has finite accuracy.

    We presume that the voltage is constant.

    So, let's reiterate our model, and see how this problem matches up with it.

    <x_i>=[A]<x_{i-1}>+<w_i>
    <z_i>=[H]<x_i>+<v_i>

    The state <x_i> is the actual voltage. It is a one-dimensional vector, or in other words just a scalar. This means that n=1.

    The state transition [A] describes how the state changes with time. Since the state is one dimensional, the state transition is scalar also. Since the voltage is constant, the constant is 1.0 exactly.

    The process noise is theoretically an uncertainty in the state, but is mostly just used for tuning. For now, we will say that the covariance (scalar) [Q] on it is zero.

    The measurement <z_i> is supposed to just be the voltage, so the observation (scalar) [H] is also 1.0 exactly.

    We will say that the measurement uncertainty is normal with a 0.1 standard deviation. This makes the measurement noise has a covariance (scalar) [R]=0.01 .

    Now we re-iterate the filter equations, just so you don't have to skip around so much to read things.

    1. <x^_i->=[A]<x_{i-1}>
    2. [P_i-]=[A][P_{i-1}][A']+[Q]
    3. [K_i]=[P_i-][H']([H][P_i-][H']+[R])-1
    4. <x_i^>=<x_i^->+[K_i](<z_i>-[H]<x_i^->)
    5. [P_i]=([1]-[K_i][H])[P_i-]

    So, let's plug all that into our Kalman filter form, turn the crank, have a cancelfest, and see what we get:

    1. x^_i-=x_{i-1}
    2. P_i-=P_{i-1}
    3. K_i=P_i-/(P_i-+R)
    4. x^_i=x^_i-+K_i(z_i-x^_i-)
    5. P_i=(1-K_i)P_i^-

    Well, that cleans up quite nicely. Let's have a closer look at it. First off, our projected state estimate is just our old state estimate, since we don't expect it to change. Our projected estimate covariance likewise doesn't change.

    Since the measurement uncertainty and the estimate covariance are always positive, we see that the denominator of the Kalman gain term is always larger than the numerator, and thus that the gain is always between zero and one. It can only reach those extremes if either the measurement uncertainty or estimate uncertainty is zero, and neither is.

    If the Kalman gain were exactly one, the fourth equation says that our new estimate would be exactly our measurement. If it were zero, we would ignore the measurement.

    We notice that the estimate uncertainty does not depend on the measurement, only the measurement uncertainty. Likewise with the Kalman gain. So, knowing the ''a priori'' estimate uncertainty and measurement uncertainty, we can predict their future history without regard to the measurements themselves. We see from the last equation that the next uncertainty is always smaller than the current one, and therefore the Kalman gain will continually decrease. Sounds like an exponential decay to me.

    By combining equations 3 and 5, we get

    P=(RP-)/(R+P-)

    If we look for convergence, the P on the left side is the same as the P on the right. Taking R as a constant, we get

    P^2+RP=RP
    P^2=0
    P=0

    meaning that the only fixed point is at zero. Combining this with the knowledge that P continually decreases, we can predict with certainty that P collapses to zero. This actually makes sense. If you think about how intuitively we know that the uncertainty of a large number of measurements is proportional to 1/sqrt(N), in the limit the uncertainty will collapse to zero as we take an infinite number of measurements.

    Once P collapses like this, K will also, as P is in the numerator. When K collapses, this means that the measurement will be ignored, and the estimate will match the previous estimate.

    It's pretty easy to code all that up in IDL. Let's say that the true measurement is 0.5V, the ''a priori'' estimate is zero with a variance of 1. Then we can do this:


     pro kalman_constant
       ;True value of state
       val=0.5
       ;Measurement noise
       noise=randomn(0,1000,/double)*0.1
       ;State at all measurement times
       x=noise*0+val
       ;All measurements
       z=x+noise
       R=0.01d; //Measurement noise

       ;Record of estimate over time
       xh=dblarr(n_elements(z))
       P=dblarr(n_elements(z))
       K=dblarr(n_elements(z))
       P[0]=1d
       xh[0]=0d
       for i=1,n_elements(z)-1 do begin
     
         xhm=xh[i-1]
         Pm=P[i-1]
     
         K[i]=Pm/(Pm+R)
         xh[i]=xhm+K[i]*(z[i]-xhm)
         P[i]=(1-K[i])*Pm
       end
       window,0
       device,decompose=0
       loadct,39
       plot,z,psym=1
       oplot,x,color=254
       oplot,xh,color=192
       window,1
       plot,P,/ylog
       window,2
       plot,K,/ylog
     end


    By running this, we get this:

    The black crosses are 1000 measurements, the red line is the true state value, and the blue line is the Kalman estimate at each measurement, each point only considering the points to the left.

    We see that even though the filter started with a very inaccurate state estimate, and the measurements themselves are very inaccurate, the filter converged to near the correct answer very quickly, for appropriate values of "very".

    With these particular measurements, we get at our last estimate x=0.50241482V±0.0031638442V. This uncertainty is very near that predicted by the 1/sqrt(N) model. Looking at a plot of the inverse of the Kalman gain (effectively the weighting on the previous estimated state) it is just linear. So, this is just a fancy averager.

    Disclaimer
    It is a great disservice to Kalman and all those who developed the filter into what it is today, to call it a simple averager. Yes, it reduces to an averager in the simple case, but the filter also works in the complicated case. It has all the statistical properties that you want, and if you tell the truth about all the filter inputs, the filter output really is optimum. I am just trying to understand the filter, and running it on simple cases helps.

    Beginners Kalman Filtering Theory, Part 2

    Filter mathematical definition
    This mostly follows Welch & Bishop, except uses \(i\) instead of \(k\) so as to not conflict in IDL naming with Kalman gain coefficient \(\M K\).

    Our filter tries its best to come up with an estimate of what the real state is. We know that this estimate is imperfect, but it is the best the filter can do. For a problem which actually matches this model, and the proper definition of "best", it is provable that the Kalman filter really does compute the best possible estimate given the noise in the measurements.

    What follows is a recipe, light on derivations and proofs. Just follow it, without trying to understand it too much. I understand 1,2, and 4 below, but not 3 and 5. I have learned through sad experience that it is difficult to test a program step-by-step if you don't know what the intermediate steps are or how to calculate them. We will discuss testing in a later post.

    Symbol definitions
    \(\hat{?}\) Estimated value of unknown variable ?. In code, we will usually say ?hat. In general, an estimated variable is named the same as the unknown true variable, with the addition of "hat". If the variable is a vector, we will use the compound symbol \(\hat{\vec{?}}\). Notice that even if it is a vector, it is not necessarily one of unit length.
    \(?^-\) Updated value. Some variable ? is updated from the previous step to the current step, but the new measurement is not used in this process, so no new information is available from it. It's just the old value projected to the current step. This 'minus' is usually a superscript in textbooks, but we will almost always use it with vectors and matrices, and draw it as the last symbol inside the brackets of the matrix in question, so we will never confuse it with subtraction.
    \(\M{?}^T\) Transpose of matrix \(\M ?\)
    \(\M{?}^{-1}\) Inverse of square matrix \(\M ?\)

    Time update
    First, we project the state ahead, to get an estimate of what the state is at the current measurement, based on our best estimate at the previous measurement:

    \[\hat{\vec{x}}^-_i=\M A\hat{\vec{x}}_{i-1}\]

    where

    • \(\hat{\vec{x}}^-_i\) is the updated previous estimate of the state

    Next, we project the state covariance matrix forward:

    \[\MM{P}{^-_i}=\M{A}\MM{P}{^-_{i-1}}\M A^T+\M Q\]

    where

    • \(\MM{P}{_i}\) is the estimate error covariance, the covariance of the uncertainty in our estimate of the state. This value is an estimate, but customarily doesn't have a hat, because there is no corresponding "real" value for it. The state has a real value with zero uncertainty at each point, but we can't see it, except imperfectly through the measurements. This matrix is our estimate of how good our estimate is.
    • \(\MM{P}{^-_i}\) is the updated previous estimate error covariance.

    The idiom $\M A \M P \M A^T$ is how any covariance matrix $\M P$ is propagated with a transition matrix $\M A$. It isn't just $\M A\M P$ since $\M P$ is a covariance, wich matches what we think of in scalar form as variance $\sigma^2$ and not standard-deviation $\sigma$, as it is in the square of the units of $\vec{x}$. If we think about a scalar, we really want $(A\sigma)^2=A^2\sigma^2$ or in matrix form, $\M A^2\M P$. In order to come out with a matrix in the right shape, we do $\M A^2 \M P$ as $\M A \M P\M A^T$. Some textbooks refer to this as the similarity transformation.

    So this error projection makes sense. The projected covariance is simply the old covariance transformed by the state transition matrix, plus the process noise we think is there.

    Once done, we think we know where the state is at this measurement, based on the state at the last measurement.

    Measurement
    We next calculate the the difference between what we did measure and what we should have measured, based on our updated state estimate. This is called is called the residual (or measurement innovation)

    $$\vec{y}=\vec{z}_i-\M H\hat{\vec{x}}^-_i$$

    The matrix $\M H$ transforms a state into a measurement. As mentioned in the previous post, you provide this matrix, depending on the form of your problem and the sensors your robot has. We'll see more about this in the examples. As we will see, we never need to solve the much harder problem of transforming a measurement into a state, because that is part of what the filter is there to do for us. The problem is already solved.

    The residual is an $m$-dimensional vector (same size as the observation vector)

    Next, we figure the covariance of the residual (an m by m matrix)

    $$\M{\Gamma}=\M H\MM{P}{^-_i}\M H^T+\M R$$

    $\M \Gamma$ (Greek capital gamma) is a matrix which we can think of as the estimate covariance transformed into observation space, and then with the measurement uncertainty added. This time the transformation is from one space to another, instead of forward in time, but it's still the same thing, so the similarity transform still applies.

    Measurement update
    We define the cross-covariance matrix between the error in the state estimate and the residual (No, I don't know what this means, but some papers call out this matrix) as:

    $$\M S=\MM{P}{^-_i}\M H^T$$

    Now we incorporate the information from our current measurement. We calculate the Kalman Gain, which is the weighting function used to determine how much we believe the updated estimate, and how much we believe the measurement.

    $$\M K=\M S\M \Gamma^{-1}$$

    where

    • $\M K$ is the Kalman gain for this measurement. By working through the matrix operations, we see that this must be an n by m matrix. The derivation for the gain includes some kind of justification for this in this produces a maximum-likelyhood estimate in some Bayesian sense, but I don't understand it yet.

    Note that the Kalman gain is used both to weight that residual, and to transform it back from observation space to state space. The weighting of the residual is done with the $\M \Gamma^{-1}$ matrix. If you think about it in scalar terms, it says that the larger the residual covariance is, the less you weight it. The rest just transforms the measurement residual back into state space.

    Using this gain, we calculate the new state estimate

    $$\hat{\vec{x}}_i=\hat{\vec{x}}^-_i+\M K\vec{y}$$

    Finally we calculate the new estimate error covariance:

    $$\MM{P}{_i}=\left(\M 1-\M K\M H\right)\MM{P}{^-_i}$$

    where [1] is the n by n identity matrix which is compatible with the rest of the formula.

    This matrix is useful in itself when publishing formal uncertainties, but even if you don't care, the filter cares, because it needs this uncertainty to balance the old estimate with the new measurement.

    The Five-Line Form
    Customarily, several of these equations are combined, to give the five-line form of the Kalman filter. This is the form which you will probably use to write your program.

    1. $$\hat{\vec{x}}^-_i=\M A\hat{\vec{x}}_{i-1}$$
    2. $$\MM{P}{^-_i}=\M A\MM{P}{_i-1}\M A^T+\M Q$$
    3. $$\M K=\MM{P}{^-_i}\M H^T\left(\M H \MM{P}{^-_i}\M H^T+\M R\right)^{-1}$$
    4. $$\hat{\vec{x}}_i=\hat{\vec{x}^-_i}+\M K\left(\vec{z}_i-\M H\hat{\vec{x}^-_i}\right)$$
    5. $$\MM P{_i}=\left(\M 1-\M K\M H\right)\MM{P}{^-_i}$$

    Beginners Kalman Filtering Theory, Part 1

    This is all basically copied from my notes wiki. This one is a long one, but worth it I think.


    The best collection of papers on the net I have found is http://www.cs.unc.edu/~welch/kalman/. From there, the best understandable explanation (from which much of this is derived) is SIGGRAPH2001_CoursePack_08.pdf by Welch and Bishop. The theory part mostly follows the references, especially CoursePack 08, but is in my own words, and describes my own understanding, insights, and sometimes lack of understanding. The example parts at the end are my own work.

    Problem form
    We will work with an uncontrolled system, one in which you have no influence on, and you are just trying to observe. For example, take my day job. I work with the EVE instrument on the SDO spacecraft. It observes the current flux of ultraviolet in a particular wavelength from the Sun. The EVE instrument is attempting to measure this, but nothing the SDO satellite (or mankind as a whole) does has any influence at all on what the Sun is actually doing. We can control how well EVE works, but in this case, EVE is just an observer, passively trying to measure something it has no influence over.

    The system is presumed to have an internal ''state'', an ordered set of numbers (a vector) which completely describes the part of the system we care about. We want to know this state, but it is hidden from us, and cannot be known directly. In the EVE example, the state is the actual UV flux on the Sun at a particular instant. On a robot, it is the current position and velocity of the vehicle.

    We make a certain set of observations at one particular instant. Each of these observations is also a vector, but there may be more, less, or the same number of components as in the state. The observations help us learn about the state, but the observations are corrupted by measurement noise, and so do not perfectly describe the state. In the EVE example, the observation is the number of DNs (Data Numbers, the units of the raw value reported by an Analog-to-Digital converter) that the instrument records at a particular instant. A robot may use accelerometers, gyros, GPS, compasses, odometers, etc. This number may be calibrated into some physical unit, but still has an unavoidable amount of noise on it.

    We take a vector of measurements at many different times. In the EVE example, the system measures the UV flux four times per second continually. A robot reads some or all of its sensors simultaneouslely, and each measurement is a component of the measurement vector for this time. The units of the components of the vector don't have to match, and the vector doesn't have to be the same measuremente each time. For instance, sometimes we will read our accelerometers and gyros and get a 6-element vector of measurements. Much less frequently we will read our compass and get a 1-element vector. Perhaps even less frequently we will read our GPS and get a 6-element vector, but different from that produced by our IMU.

    We use a Kalman filter to combine what we have estimated about state from previous observations, with the data we get from the current observation, to provide a better estimate of the state at the current observation time.
    In math terms, the system is described as a set of matrix equations

    \[\vec{x}_i=\M{A}\vec{x}_{i-1}+\vec{w}_i\]
    \[\vec{z}_i=\M{H}\vec{x}_i+\vec{v}_i\]

    where

    • \(\vec{?}\) is any vector
    • \(\M{?}\) is any matrix
    • \(i\) is the step index. As far as the problem is concerned, we have an initial state at \(i=0\), then measurements subsequently at \(i=1\), \(i=2\), \(i=3\)... and so on. It's kind of like time, but not measured in seconds. In this particular form of the problem, the system exists in some pre-determined state at some \(i\), and then instanteaneously jumps to a new state at \(i+1\), where we measure it again. In any real system, the physics probably evolve over time, but as we will see, the filter math just doesn't care, and it is surprising how flexible this kind of "time" is. When we translate this all to code, we will find that $i$ is usually just a name, and we don't even have a variable representing $i$. The filter is mostly just concerned with values $?_i$ from this step and $?_{i-1}$ for the previous step.
    • $\vec{x}_i$ is the (unknown) actual state of the system at step $i$. This is an $n$-dimensional vector, or one with $n$ components.
    • $\M{A}$ is the state transition matrix. Given a previous state, the state transition matrix is used to transform it to the current state. Since it operates on an $n$-dimensional vector and returns an $n$-dimensional vector (the number of components in the state never changes) it must be an $n \times n$ square matrix. This matrix might change between measurements, but the filter presumes that it is constant.
    • $\vec{w}_i$ is the (unknown) process noise, an $n$-dimensional Gaussian random vector with a zero mean and covariance matrix $\M Q$. This is uncertainty in the process itself, unrelated to measurement. It implies that the next state is not perfectly determined by the previous state. Since the process noise is compatible with addition to the state vector, it must be a vector with $n$ components, and its covariance must be a matrix with $n \times n$ components.
    • $\vec{z}_i$ is the measurement vector for measurement $i$. This is an $m$-dimensional vector, where m may be greater than, less than, or equal to $n$. Unlike $n$, $m$ may change from step to step, if different kinds of measurements are taken at different steps.
    • $\M H$ is the observation matrix. Given a state, the observation matrix is used to transform it to an observation vector. Since it operates on an $n$-dimensional vector and returns an $m$-dimensional vector (the number of components of the observation may be different than the number of components of the state) it must be an $m \times n$ matrix. This matrix might change between measurements, but the filter presumes that it is constant. As above, the size of the matrix, in particular the number of rows, may change from step to step but must always be compatible with the current measurement.
    • $\vec{v}_i$ is the (unknown) measurement noise, an m-dimensional Gaussian random vector with a zero mean and covariance matrix $\M R$. Since the measurement noise is compatible with addition to the measurement vector, it must be a vector with $m$ components, and its covariance must be a matrix with $m \times m$ components.

    The dimensions $m$ and $n$ are independent, but once chosen, all vectors and matrices must use the correct values to keep all the matrices and vectors compatible with the operations they are used with.

    The state transition matrix and observation matrix are designed by you, the problem solver. You need to look at the dynamics of your problem, and see if they fit in this model, and if so, what are $\M A$ and $\M H$. This is how the problem is described to the filter. If you think of the filter as a function, then $\M A$ and $M H$ are two parameters passed to the function. You also need to provide process noise $\M Q$ and measurement noise $\M R$. Together, these four matrices completely describe the problem model. This is the hardest part of the problem. If you can't shoehorn your problem into this form (and most of the time, you probably can't) you need a nonlinear filter, which is the subject of another post. Keep reading this post, as you need to understand a linear filter as a pre-requisite to understanding a nonlinear filter.

    A word on history
    Normally I aggressively ignore history. In this case I can't, because a man's name is built into the name of the topic. If I just called it the "sequential filter" (as I have seen in textbooks occasionally) no one would know what I was talking about, and I wouldn't get any Google hits.

    Why Kalman?
    Many people reached the New World before Columbus. The Vikings, the Nephites, the Jaredites, perhaps others, etc. However, Columbus is properly said to have discovered the New World, because his discoveries made it into modern main-stream thought, and all further exploitation of the New World is as a result of him and his journeys. He discovered the New World so well that no one ever needed to discover it again, and no educated person could credibly claim to do so.

    Likewise, through a phenomenon I have heard called ''Jungian simultaneity'', many people derived the formulas we now call the Kalman filter almost simultaneously. However, Rudolph Kalman's paper directly influenced the work of the guidance guys at MIT for the Apollo mission and perhaps other secret missions. Therefore, the filter is named after him by those who put it into practice.

    Rumor has it that Kalman met at a conference and discussed ideas with other people who wrote papers which also described the sequential filter. Who is to say whether the idea was original or not? I don't want to get into this controversy, which is why I would like to call it the sequential filter rather than Kalman filter, but for reasons described above, I can't.

    Why filter?
    Because it filters out noise from a measurement stream. The estimate stream produced by the filter can have much less uncertainty than the measurement stream coming in.


    One of the amazing things about this filter, and estimation in general, is that it also filters out quantization noise. Under certain conditions, it is possible to get an estimate of the true state with less noise and uncertainty than any of the measurements that went into it. Ponder that deeply. The filter makes it possible to talk about fractions of a DN, about the space between the symbols produced by your ADC.