This page is meant to be a brief introduction to Kalman filters
with a few simple examples to illustrate the use and implementation
of such filters. The goal is to apply the Kalman filter technique
to kinematic fitting in MINOS data analysis.
The Kalman filter is a recursive technique of obtaining the
solution to a least squares fit. This recursive method has
the advantage of computational efficiency. Given N hits, with
2 measurements (x,y) per hit,
a traditional least squares fit involves
a matrix inversion whose dimensions are 2N x 2N. In addition, the
hits to be included in the fit must be known prior to the fit.
The Kalman filter approach performs a calculation as each hit
is added to the fit. This has the advantage of a much simpler
matrix inversion (2 x 2 versus 2N x 2N). In addition, the Kalman
filter is able to extrapolate the fit track to the next hit point
at which time actual hits may be searched for in some window of the
extrapolation. For references on the Kalman filter, see
[1]
[2]
[3]
[4]
[5]
[6].
The basic Kalman filter deals with linear systems (non-linear systems
are treated by a linear approximation using the extended Kalman filter,
to be discussed later). We use the notation of Fruhwirth [4],
and make the following definitions:
- xk = the filtered state vector at point k
- xk+1k = the extrapolated state vector from point k to point k+1
- Ck+1k = the covariance matrix of xk+1k - xk+1t, where xk+1t is the true value of the state vector at point k+1
- Ck = the filtered covariance matrix at point k
- Fk = the propagator of the state vector from point k to point k+1
- wk = the process noise (random disturbance) to the state vector at point k
- Qk = the covariance matrix of wk
- mk = the measurement vector at point k
- ek = the measurement noise at point k
- Vk = the covariance matrix of ek
The two basic equations are
-
xk = Fk-1 xk-1 + wk-1
-
mk = Hk xk + ek
Typically, the state vector would be a vector of length 5, with
two transverse positions and three momentum as its components.
The measurement vector might be of length 3, consisting of
one hit scintillator strip, the time of the hit, and the integrated
charge. These are merely examples to illustrate the two vector
types.
To begin the Kalman filter, the following must first be defined:
- x0 = an initial estimate of the state vector
- Fk = the state propagator matrices for all points
- Qk = the process noise covariance matrices for all points
- Vk = the measurement noise covariance matrices for all points
- Hk = the matrices which convert a state vector at point k into a measurement vector
The initial state covariance matrix C0 can be set
to the identity matrix multiplied by a large scale factor. The smaller
this is, the more weight is put on the initial state vector, so in
general we would like to make C0 as large as possible.
However, because of round-off errors, we need to restrict this matrix
to a reasonable value, which depends on the particular fit under
consideration.
We are now ready to apply the Kalman filter. At each point k, a four
step process is applied:
-
Ckk-1 = Fk-1 Ck-1 Fk-1T + Qk-1
-
Kk = Ckk-1 HkT ( Vk + Hk Ckk-1 HkT )-1
-
xk = Fk-1 xk-1 + Kk ( mk - Hk Fk-1 xk-1 )
-
Ck = (1 - Kk Hk) Ckk-1
I have combined the extrapolation and filtering of the state vector
into one step (3). In most references, these two processes are
presented separately. At the end of the iterative process, the
final state vector xk represents the fit values
using all data points. To obtain the fit values at any point k,
the user can then fit backwards, starting with the last point.
The fit value at some point k is then the average between the
state vectors xk between the two fits.
I have written some code in C++ to help me understand the Kalman
filter. This code uses the TMatrix and TVector classes from ROOT,
so to compile and run this code, one needs the ROOT libraries on their
system.
Example: Average
The first program presented here takes the weighted average
of a set of values. In this example, state and measurement vectors
have length 1. Vectors and matrices are then equivalent to scalars.
The measurements mk I set to the input data points
to be averaged. I initialize the matrix (scalar)
Vk to the variances (squares of the errors).
Because the state vector is the measurement vector, the matrix
Hk becomes unity, as does the propagator matrix
Fk. There is no process noise, so
Qk vanishes.
The Kalman filter applied to a weighted average.
|
The black points represent the measured data; the blue points represents
the filtered state vector as calculated by the Kalman filter at point
k. The uncertainty on the filtered state vector is obtained from the
covariance matrix Ck and is seen to grow smaller with
increasing k, as expected. The final value of the Kalman fit is
4.24835+-0.20320, whereas the true value from a least squares calculation
(shown as a horizontal red line in the plot) is
4.24838+-0.20320, a fractional agreement in the weighted average
of 10-5.
The main program is in average.cc.
The Kalman class is defined in
kalman.h and is the heart of the
Kalman filtering routine. Other header files needed to compile
and run this example are misc.h,
definitions.h, and
simplearray.h.
Example: Charged particle in a dipole field
In this example, several x,y coordinates are measured in regions of
no magnetic field at fixed z positions both upstream and downstream of
a dipole magnet (with primary component transverse to the beam axis).
A total of 20 tracking stations with a uniform spacing of 1 meter
is simulated; each tracking station provides a measurement of both
transverse position with a resolution of 5 mm in each coordinate.
Each tracking station represents a radiation length of 1%. No energy
loss is simulated. The magnetic field is located after the 10th tracking
station and before the 11th. There is only one component, the magnitude
of which is a Gaussian function of the longitudinal position, with a
maximum value of 5 Tesla and a spatial RMS of 0.1 meter, representing
a momentum kick of 0.376 GeV/c.
We choose our state vector to be of dimension 5, representing
the two transverse positions, the transverse slopes, and the charge to
momentum ratio. At every iteration of the filter except for the one
which traverses the magnetic field, the equations of motion are linear.
The only complication involves multiple scattering, which is taken into
account through the process noise covariance matrix.
We use the calculation of Wolin and Ho [5].
The propagation of the state vector between the 10th and 11th planes is
almost linear, but not quite. We linearize the system by calculating
first derivatives of the state vector with respect to itself. In the
Kalman filter, the propagation is done not by taking the product of the
propagator matrix with the state vector, but rather by swimming the state
vector through the magnetic field. The linearized propagator matrix
is used in calculating the state covariance matrix and the Kalman gain
matrix.
The Kalman filter applied to charged particle in a dipole field. The
momentum distribution is plotted in (a). The difference between the
fit and true momenta divided by the true momentum is shown in (b).
The mean value of the fit momentum ratio is shown as a function of
momentum in (c). The square of the RMS of the fit momentum ratio is
shown as a function of momentum squared in (d).
|
The magnetic field represents a simple pT kick in the x direction:
From this we calculate that the fractional uncertainty in the momentum
is related to the uncertainty in the angle theta by
- edp/p2 = (p2 / pT2) etheta2
The uncertainty in theta has two components: one from the spatial resolution
in the measurement and another from multiple scattering. The spatial
resolution component is independent of momentum, whereas that for multiple
scattering is proportional to the inverse of momentum. Given N planes,
the angular uncertainty due to spatial resolution is
- etheta = ex / L [12 / (N (N2 - 1))]1/2
where ex = 5 mm is the spatial resolution and L = 1 m is the
distance between planes. For half of the detector (N = 10), we find the
angular uncertainty to be 0.00055. The uncertainty in theta is 0.00078,
which is root 2 times the angular uncertainty for half the detector.
The angular uncertainty due to multiple scattering in one plane is
- etheta = [0.0136 / p] (x/x0)1/2 [1 + .038 log(x/x0)]
where x/x0 = 0.01 is the radiation length of one plane.
Multiple scattering in the first eight or last eight planes do not contribute
to the uncertainty in theta, which is the angular difference between the two
detector halves. Therefore, the uncertainty in theta is equal to twice
(root 4) the single plane angular uncertainty, and is equal to
0.0022/p.
Adding the two components of angular uncertainty in quadrature, we find that
the fraction uncertainty in momentum has two components: one which is constant
due to multiple scattering, and another from spatial resolution and which
has a dependence on momentum:
- edp/p2 = 0.34e-4 + .43e-5 p2
This agrees quite well with the observed values in (d). Thus, we are
confident that in this example of a charged particle in a dipole field
with multiple scattering included, the Kalman filter is giving sensible
results.
[1]
http://www.cs.unc.edu/~welch/kalmanIntro.html
[2] P. Maybeck, Stochastic Models, Estimation, and Control. Academic Press, New York, 1979.
[3] P. Billor and S. Qian, "Simultaneous Pattern Recognition and Track Fitting by the Kalman Filtering Method," Nucl. Inst. Meth. A294, 219 (1990).
[4] R. Fruhwirth, "Application of Kalman Filtering to Track and Vertex Fitting," Nucl. Inst. Meth. A262, 444 (1987).
[5] E.J. Wolin and L.L. Ho, "Covariance matrices for track fitting with the Kalman filter," Nucl. Inst. Meth. A329, 493 (1993).
[6] R. Kutschke and A. Ryd, "Billoir fitter for CLEO II," CLEO internal memo CBX 96-20, 1996 (unpublished).
Last modified: 21 April 1999
Roy Lee