The best collection of papers on the net I have found is http://www.cs.unc.edu/~welch/kalman/. From there, the best understandable explanation (from which much of this is derived) is SIGGRAPH2001_CoursePack_08.pdf by Welch and Bishop. The theory part mostly follows the references, especially CoursePack 08, but is in my own words, and describes my own understanding, insights, and sometimes lack of understanding. The example parts at the end are my own work.

**Problem form**

We will work with an uncontrolled system, one in which you have no influence on, and you are just trying to observe. For example, take my day job. I work with the EVE instrument on the SDO spacecraft. It observes the current flux of ultraviolet in a particular wavelength from the Sun. The EVE instrument is attempting to measure this, but nothing the SDO satellite (or mankind as a whole) does has any influence at all on what the Sun is actually doing. We can control how well EVE works, but in this case, EVE is just an observer, passively trying to measure something it has no influence over.

The system is presumed to have an internal ''state'', an ordered set of numbers (a vector) which completely describes the part of the system we care about. We want to know this state, but it is hidden from us, and cannot be known directly. In the EVE example, the state is the actual UV flux on the Sun at a particular instant. On a robot, it is the current position and velocity of the vehicle.

We make a certain set of observations at one particular instant. Each of these observations is also a vector, but there may be more, less, or the same number of components as in the state. The observations help us learn about the state, but the observations are corrupted by measurement noise, and so do not perfectly describe the state. In the EVE example, the observation is the number of DNs (Data Numbers, the units of the raw value reported by an Analog-to-Digital converter) that the instrument records at a particular instant. A robot may use accelerometers, gyros, GPS, compasses, odometers, etc. This number may be calibrated into some physical unit, but still has an unavoidable amount of noise on it.

We take a vector of measurements at many different times. In the EVE example, the system measures the UV flux four times per second continually. A robot reads some or all of its sensors simultaneouslely, and each measurement is a component of the measurement vector for this time. The units of the components of the vector don't have to match, and the vector doesn't have to be the same measuremente each time. For instance, sometimes we will read our accelerometers and gyros and get a 6-element vector of measurements. Much less frequently we will read our compass and get a 1-element vector. Perhaps even less frequently we will read our GPS and get a 6-element vector, but different from that produced by our IMU.

We use a

*to combine what we have estimated about state from previous observations, with the data we get from the current observation, to provide a better estimate of the state at the current observation time.*

**Kalman filter**In math terms, the system is described as a set of matrix equations

\[\vec{x}_i=\M{A}\vec{x}_{i-1}+\vec{w}_i\]

\[\vec{z}_i=\M{H}\vec{x}_i+\vec{v}_i\]

where

- \(\vec{?}\) is any vector
- \(\M{?}\) is any matrix
- \(i\) is the step index. As far as the problem is concerned, we have an initial state at \(i=0\), then measurements subsequently at \(i=1\), \(i=2\), \(i=3\)... and so on. It's kind of like time, but not measured in seconds. In this particular form of the problem, the system exists in some pre-determined state at some \(i\), and then instanteaneously jumps to a new state at \(i+1\), where we measure it again. In any real system, the physics probably evolve over time, but as we will see, the filter math just doesn't care, and it is surprising how flexible this kind of "time" is. When we translate this all to code, we will find that $i$ is usually just a name, and we don't even have a variable representing $i$. The filter is mostly just concerned with values $?_i$ from this step and $?_{i-1}$ for the previous step.
- $\vec{x}_i$ is the (unknown) actual state of the system at step $i$. This is an $n$-dimensional vector, or one with $n$ components.
- $\M{A}$ is the state transition matrix. Given a previous state, the state transition matrix is used to transform it to the current state. Since it operates on an $n$-dimensional vector and returns an $n$-dimensional vector (the number of components in the state never changes) it must be an $n \times n$ square matrix. This matrix might change between measurements, but the filter presumes that it is constant.
- $\vec{w}_i$ is the (unknown) process noise, an $n$-dimensional Gaussian random vector with a zero mean and covariance matrix $\M Q$. This is uncertainty in the process itself, unrelated to measurement. It implies that the next state is not perfectly determined by the previous state. Since the process noise is compatible with addition to the state vector, it must be a vector with $n$ components, and its covariance must be a matrix with $n \times n$ components.
- $\vec{z}_i$ is the measurement vector for measurement $i$. This is an $m$-dimensional vector, where m may be greater than, less than, or equal to $n$. Unlike $n$, $m$ may change from step to step, if different kinds of measurements are taken at different steps.
- $\M H$ is the observation matrix. Given a state, the observation matrix is used to transform it to an observation vector. Since it operates on an $n$-dimensional vector and returns an $m$-dimensional vector (the number of components of the observation may be different than the number of components of the state) it must be an $m \times n$ matrix. This matrix might change between measurements, but the filter presumes that it is constant. As above, the size of the matrix, in particular the number of rows, may change from step to step but must always be compatible with the current measurement.
- $\vec{v}_i$ is the (unknown) measurement noise, an m-dimensional Gaussian random vector with a zero mean and covariance matrix $\M R$. Since the measurement noise is compatible with addition to the measurement vector, it must be a vector with $m$ components, and its covariance must be a matrix with $m \times m$ components.

The dimensions $m$ and $n$ are independent, but once chosen, all vectors and matrices must use the correct values to keep all the matrices and vectors compatible with the operations they are used with.

The state transition matrix and observation matrix are designed by you, the problem solver. You need to look at the dynamics of your problem, and see if they fit in this model, and if so, what are $\M A$ and $\M H$. This is how the problem is described to the filter. If you think of the filter as a function, then $\M A$ and $M H$ are two parameters passed to the function. You also need to provide process noise $\M Q$ and measurement noise $\M R$. Together, these four matrices completely describe the problem model. This is the hardest part of the problem. If you can't shoehorn your problem into this form (and most of the time, you probably can't) you need a nonlinear filter, which is the subject of another post. Keep reading this post, as you need to understand a linear filter as a pre-requisite to understanding a nonlinear filter.

**A word on history**

Normally I aggressively ignore history. In this case I can't, because a man's name is built into the name of the topic. If I just called it the "sequential filter" (as I have seen in textbooks occasionally) no one would know what I was talking about, and I wouldn't get any Google hits.

**Why Kalman?**

Many people reached the New World before Columbus. The Vikings, the Nephites, the Jaredites, perhaps others, etc. However, Columbus is properly said to have discovered the New World, because his discoveries made it into modern main-stream thought, and all further exploitation of the New World is as a result of him and his journeys. He discovered the New World so well that no one ever needed to discover it again, and no educated person could credibly claim to do so.

Likewise, through a phenomenon I have heard called ''Jungian simultaneity'', many people derived the formulas we now call the Kalman filter almost simultaneously. However, Rudolph Kalman's paper directly influenced the work of the guidance guys at MIT for the Apollo mission and perhaps other secret missions. Therefore, the filter is named after him by those who put it into practice.

Rumor has it that Kalman met at a conference and discussed ideas with other people who wrote papers which also described the sequential filter. Who is to say whether the idea was original or not? I don't want to get into this controversy, which is why I would like to call it the sequential filter rather than Kalman filter, but for reasons described above, I can't.

**Why filter?**

Because it filters out noise from a measurement stream. The estimate stream produced by the filter can have much less uncertainty than the measurement stream coming in.

One of the amazing things about this filter, and estimation in general, is that it also filters out quantization noise. Under certain conditions, it is possible to get an estimate of the true state with less noise and uncertainty than any of the measurements that went into it. Ponder that deeply. The filter makes it possible to talk about fractions of a DN, about the space between the symbols produced by your ADC.

## No comments:

## Post a Comment