Wednesday, March 30, 2011

Beginners Kalman Filtering Theory, Part 2

Filter mathematical definition
This mostly follows Welch & Bishop, except uses \(i\) instead of \(k\) so as to not conflict in IDL naming with Kalman gain coefficient \(\M K\).

Our filter tries its best to come up with an estimate of what the real state is. We know that this estimate is imperfect, but it is the best the filter can do. For a problem which actually matches this model, and the proper definition of "best", it is provable that the Kalman filter really does compute the best possible estimate given the noise in the measurements.

What follows is a recipe, light on derivations and proofs. Just follow it, without trying to understand it too much. I understand 1,2, and 4 below, but not 3 and 5. I have learned through sad experience that it is difficult to test a program step-by-step if you don't know what the intermediate steps are or how to calculate them. We will discuss testing in a later post.

Symbol definitions
\(\hat{?}\) Estimated value of unknown variable ?. In code, we will usually say ?hat. In general, an estimated variable is named the same as the unknown true variable, with the addition of "hat". If the variable is a vector, we will use the compound symbol \(\hat{\vec{?}}\). Notice that even if it is a vector, it is not necessarily one of unit length.
\(?^-\) Updated value. Some variable ? is updated from the previous step to the current step, but the new measurement is not used in this process, so no new information is available from it. It's just the old value projected to the current step. This 'minus' is usually a superscript in textbooks, but we will almost always use it with vectors and matrices, and draw it as the last symbol inside the brackets of the matrix in question, so we will never confuse it with subtraction.
\(\M{?}^T\) Transpose of matrix \(\M ?\)
\(\M{?}^{-1}\) Inverse of square matrix \(\M ?\)

Time update
First, we project the state ahead, to get an estimate of what the state is at the current measurement, based on our best estimate at the previous measurement:

\[\hat{\vec{x}}^-_i=\M A\hat{\vec{x}}_{i-1}\]

where

  • \(\hat{\vec{x}}^-_i\) is the updated previous estimate of the state

Next, we project the state covariance matrix forward:

\[\MM{P}{^-_i}=\M{A}\MM{P}{^-_{i-1}}\M A^T+\M Q\]

where

  • \(\MM{P}{_i}\) is the estimate error covariance, the covariance of the uncertainty in our estimate of the state. This value is an estimate, but customarily doesn't have a hat, because there is no corresponding "real" value for it. The state has a real value with zero uncertainty at each point, but we can't see it, except imperfectly through the measurements. This matrix is our estimate of how good our estimate is.
  • \(\MM{P}{^-_i}\) is the updated previous estimate error covariance.

The idiom $\M A \M P \M A^T$ is how any covariance matrix $\M P$ is propagated with a transition matrix $\M A$. It isn't just $\M A\M P$ since $\M P$ is a covariance, wich matches what we think of in scalar form as variance $\sigma^2$ and not standard-deviation $\sigma$, as it is in the square of the units of $\vec{x}$. If we think about a scalar, we really want $(A\sigma)^2=A^2\sigma^2$ or in matrix form, $\M A^2\M P$. In order to come out with a matrix in the right shape, we do $\M A^2 \M P$ as $\M A \M P\M A^T$. Some textbooks refer to this as the similarity transformation.

So this error projection makes sense. The projected covariance is simply the old covariance transformed by the state transition matrix, plus the process noise we think is there.

Once done, we think we know where the state is at this measurement, based on the state at the last measurement.

Measurement
We next calculate the the difference between what we did measure and what we should have measured, based on our updated state estimate. This is called is called the residual (or measurement innovation)

$$\vec{y}=\vec{z}_i-\M H\hat{\vec{x}}^-_i$$

The matrix $\M H$ transforms a state into a measurement. As mentioned in the previous post, you provide this matrix, depending on the form of your problem and the sensors your robot has. We'll see more about this in the examples. As we will see, we never need to solve the much harder problem of transforming a measurement into a state, because that is part of what the filter is there to do for us. The problem is already solved.

The residual is an $m$-dimensional vector (same size as the observation vector)

Next, we figure the covariance of the residual (an m by m matrix)

$$\M{\Gamma}=\M H\MM{P}{^-_i}\M H^T+\M R$$

$\M \Gamma$ (Greek capital gamma) is a matrix which we can think of as the estimate covariance transformed into observation space, and then with the measurement uncertainty added. This time the transformation is from one space to another, instead of forward in time, but it's still the same thing, so the similarity transform still applies.

Measurement update
We define the cross-covariance matrix between the error in the state estimate and the residual (No, I don't know what this means, but some papers call out this matrix) as:

$$\M S=\MM{P}{^-_i}\M H^T$$

Now we incorporate the information from our current measurement. We calculate the Kalman Gain, which is the weighting function used to determine how much we believe the updated estimate, and how much we believe the measurement.

$$\M K=\M S\M \Gamma^{-1}$$

where

  • $\M K$ is the Kalman gain for this measurement. By working through the matrix operations, we see that this must be an n by m matrix. The derivation for the gain includes some kind of justification for this in this produces a maximum-likelyhood estimate in some Bayesian sense, but I don't understand it yet.

Note that the Kalman gain is used both to weight that residual, and to transform it back from observation space to state space. The weighting of the residual is done with the $\M \Gamma^{-1}$ matrix. If you think about it in scalar terms, it says that the larger the residual covariance is, the less you weight it. The rest just transforms the measurement residual back into state space.

Using this gain, we calculate the new state estimate

$$\hat{\vec{x}}_i=\hat{\vec{x}}^-_i+\M K\vec{y}$$

Finally we calculate the new estimate error covariance:

$$\MM{P}{_i}=\left(\M 1-\M K\M H\right)\MM{P}{^-_i}$$

where [1] is the n by n identity matrix which is compatible with the rest of the formula.

This matrix is useful in itself when publishing formal uncertainties, but even if you don't care, the filter cares, because it needs this uncertainty to balance the old estimate with the new measurement.

The Five-Line Form
Customarily, several of these equations are combined, to give the five-line form of the Kalman filter. This is the form which you will probably use to write your program.

  1. $$\hat{\vec{x}}^-_i=\M A\hat{\vec{x}}_{i-1}$$
  2. $$\MM{P}{^-_i}=\M A\MM{P}{_i-1}\M A^T+\M Q$$
  3. $$\M K=\MM{P}{^-_i}\M H^T\left(\M H \MM{P}{^-_i}\M H^T+\M R\right)^{-1}$$
  4. $$\hat{\vec{x}}_i=\hat{\vec{x}^-_i}+\M K\left(\vec{z}_i-\M H\hat{\vec{x}^-_i}\right)$$
  5. $$\MM P{_i}=\left(\M 1-\M K\M H\right)\MM{P}{^-_i}$$

No comments:

Post a Comment