Wednesday, March 30, 2011

Intermediate Kalman Filtering theory, Part 1

Real programmers are Not Afraid of Math
If you know calculus, great. If not, stop now, and run, don't walk, to your library or bookstore and beg, borrow, or buy a good calculus book. I recommend Calculus the Easy Way by Douglas Downing. Come back when you understand what a derivative is, and know how to calculate some easy ones. Also at least know what an integral is, even if you don't know how to calculate any, and read about the fundamental theorem, which basically states that a derivative can be reversed and that it is useful to do so. Knowing at least a smidgen of calculus will serve you long after your robot has decided to take a swim in the pond. It opens up a whole new world of problems and solutions, and that is useful for you to have a feel for whether a problem is solvable even in principle, even if you don't know how to solve it.

I'll wait.

OK, are we ready to proceed? Let's move on then.


Kalman filter processes like to use the following model:

<x_i>=[A]<x_{i-1}>+<w_{i-1}>
<z_i>=[H]<x_i>+<v_i>

Nonlinear Kalman Filter processes tend to use a different but related form:

<x_i>=f(<x_{i-1}>)+<w_{i-1}>
<z_i>=g(<x_i>)+<v_i></math>

where

  • f(...) is the state transition function, which serves the same purpose as [A]. Given a previous state vector, the state transition function is used to transform it to the current state vector. So, this function takes a vector and returns a vector. This is more general than a pure matrix multiplication, and allows any kind of relation between this state and the next.
  • g(...) is the observation function, which serves the same purpose as [H]. Given a state, the observation function is used to transform it to the observations. Likewise, a vector function with vector parameter.

HOWEVER

Most physical laws are in the form of vector differential equations, so we have:

d<x>/dt=F(<x>,t)

where

  • F(...) is the physical law function, which is used to calculate the derivative of the state. This is then integrated to get a new function which describes the state in a continuous manner from the initial state and the laws. This function of state versus time is called the trajectory, since one of the basic uses of this is describing the actual trajectory of something, like a planet, moon, or robot. Remember not to confuse f() with F(). We almost never use or even have access to f().

Measurements are still taken at discrete instants. What we want is a way to shoehorn this continuous problem into a form suitable for the Kalman filter.
With sufficiently simple physics functions, you can crank out the integration as it shows in your textbooks and get the trajectory function directly. This is what Newton invented calculus for in the first place. However, for down-to-earth things like robots, the physics are usually too complicated to integrate directly. One of the things our filter will do for us is integrate numerically, so that we end up with (in principle) a list of times and the state vector for each time. This will be our trajectory function.


Linearization
If you just want the recipe, skip a couple of sections.


Nonlinear problems are hard. In fact, they can't be done. So instead, we do a modified problem and hope that it is close enough. This is the part where everyone goes off on Taylor series and such. I will just talk somewhat informally.
First, most physical processes are smooth. No discontinuities and no corners. Just nice sweeping curves. If you look at any of these curves closely enough, the relative radius of curvature gets bigger and bigger, and any small section eventually appears linear if you zoom in on it enough.

In order to take advantage of this, you have to look at the right part of a curve. We need a reasonably accurate initial condition and a correct dynamics and kinematics model. Now we can use the derivative of the curve to find the general direction of motion. We use the initial condition estimate and model to create a reference trajectory. We end up not estimating the trajectory directly, but the divergence from the reference.

So, we need a replacement for [A] and a way to update <x^_i> and <P_i>. The state vector is the easiest, so we'll talk about that. We know the physics model (If we don't, go find it!) so we can use it directly on our old estimate with numerical integration. I'll talk in detail about numerical integration in another post, but if you are a programmer, there is a good chance you have already written one. The simplest one is this simple:

  • Find the rate of change of all the elements of the state vector at this point. That is, run F(<x^_{i-1}>), run the physics function on the old estimate.
  • Find the time step size. This is t_i-t_{i-1}.
  • Multiply the rate of change vector by the time step size. This is how much the vector has changed over the step.
  • Add the change to the old estimate to get the new estimate.

All numerical integrators are just improvements and refinements on this. If the time step is short enough, you can just do this. One simple refinement is to break the time step into smaller sub-steps and do the integration repeatedly.

Replacing [A] is a bit harder. Skipping all the justifications and derivations, you need a physics matrix [Φ]. Finding it will require cranking out a bunch of derivatives by hand or computer, or writing a numerical differentiator. Then with [Φ] in hand, you can numerically integrate to get [A]. This is the "linearization" process talked about in textbooks.

[Φ] is a Jacobian matrix, a kind of vector derivative. Instead of taking the derivative with respect to time as you may be accustomed to, you take the derivative of each component of the physics function with respect to each of the other components. You end up with a matrix, where for each cell, you have the derivative of the component that is this cell's row number, with respect to the component that is this cell's column number. If you want to, you can think of it as fitting tangent hyperplane to the physics function, in the same way that an ordinary derivative fits a tangent line to a 1D function. This hyperplane is flat, so this is why we call it "linearized". We then project this hyperplane out as far as we need it, and use it to transform an identity matrix from one place on the trajectory to [A] on another. The math is pretty straight forward, it's just hard to imagine, especially with more than about 1 element in your state vector.

Once you have [A] in hand, you can easily find [P_i-] using the same formula as before.

We end up playing a similar Jacobian trick with the nonlinear observation. The Jacobian matrix of the observation function g(<x>) is dg/d<x>=[H]. No confusion with linear [H] because we use it the same way ever after.

Extended Kalman Filter algorithm

To remember, the five equations of the linear Kalman filter are

  1. <x^_i->=[A]<x^_{i-1}> 
  2. [P_i-]=[A][P_{i-1}][A']+[Q]
  3. [K_i]=[P_i-][H']([H][P_i-][H']+[R])-1 
  4. <x_i>=<x^_i->+[K_i](<z_i>-[H]<x^_i->)
  5. [P_i]=([1]-[K_i][H])[P_i-]

    Equations 1 and 2 are called the time update while equations 3 through 5 are called the measurement update.

    Plugging in what we have above into this form, we get the following. Don't panic too much, it will all make sense with a concrete example.

    Time update

    1a. <x^_i->=∫ F(<x(t)>,t) dt from t=t_{i-1} to t_i with initial condition <x(t_{i-1})>=<x_{i-1}>
    1b. [A(t_i,t_{i-1})]=∫ [Φ(<x(t)>,t)][A(t,t_{i-1})] dt from t=t_{i-1} to t_i with initial condition [A(t_{i-1},t_{i-1})]=[1]
    Basically these integrations cover the time from one step to the next, generate our [A] for us, and turn this from a continuous time problem back into a step problem like we had before. We do these integrations numerically and in lockstep, so that we can use the intermediate result <x(t)> from 1a where we need it in 1b.

    1c. <X^_i->=<0>
    2. [P_i-]=[A(t_i,t_{i-1})][P_{i-1}][A(t_i,t_{i-1})']+[Q]

    Equation 1c is where things start going different. We are estimating the deviation from the reference trajectory, but our update follows the reference trajectory perfectly, so our initial estimate of the deviation is zero. Equation 2 is exactly like the old form.

    Measurement update
     

    3a. [H_i]=dg(<x^_i->,t_i)/d<x^_i-> Linearized observation matrix
    3b. [K_i]=[P_i-][H_i']([H_i][P_i-][H_i']+[R])-1 Kalman gain
    4a. <Z_i>=<z_i>-G(<x^_i->,t_i) Measurement deviation
    4b. <X^_i>=<X^_i->+[K_i](<Z_i>-[H_i]<X^_i-> Measurement update of state deviation
    4c. <x^_i>=<x^_i->+<X^_i> Update of reference state vector
    5. [P_i]=([1]-[K_i][H_i])[P_i-] Measurement update of state covariance

    My teacher calls this is the extended Kalman filter, in distinction from the linearized Kalman filter, because the reference trajectory is updated every time around. To do the linearized filter, don't reset the state deviation to zero each time (just at the start) and don't add the state deviation to the reference state at the end. If the state deviation gets too big, you will need to do a rectification every once in a while, but not at every measurement, where you do these things. The extended filter basically does rectification every time.

    No comments:

    Post a Comment