MINOS: Kalman filter

Kalman filter


This page is meant to be a brief introduction to Kalman filters with a few simple examples to illustrate the use and implementation of such filters. The goal is to apply the Kalman filter technique to kinematic fitting in MINOS data analysis.

The Kalman filter is a recursive technique of obtaining the solution to a least squares fit. This recursive method has the advantage of computational efficiency. Given N hits, with 2 measurements (x,y) per hit, a traditional least squares fit involves a matrix inversion whose dimensions are 2N x 2N. In addition, the hits to be included in the fit must be known prior to the fit. The Kalman filter approach performs a calculation as each hit is added to the fit. This has the advantage of a much simpler matrix inversion (2 x 2 versus 2N x 2N). In addition, the Kalman filter is able to extrapolate the fit track to the next hit point at which time actual hits may be searched for in some window of the extrapolation. For references on the Kalman filter, see [1] [2] [3] [4] [5] [6].

The basic Kalman filter deals with linear systems (non-linear systems are treated by a linear approximation using the extended Kalman filter, to be discussed later). We use the notation of Fruhwirth [4], and make the following definitions:

  • xk = the filtered state vector at point k
  • xk+1k = the extrapolated state vector from point k to point k+1
  • Ck+1k = the covariance matrix of xk+1k - xk+1t, where xk+1t is the true value of the state vector at point k+1
  • Ck = the filtered covariance matrix at point k
  • Fk = the propagator of the state vector from point k to point k+1
  • wk = the process noise (random disturbance) to the state vector at point k
  • Qk = the covariance matrix of wk
  • mk = the measurement vector at point k
  • ek = the measurement noise at point k
  • Vk = the covariance matrix of ek

The two basic equations are

  • xk = Fk-1 xk-1 + wk-1
  • mk = Hk xk + ek

Typically, the state vector would be a vector of length 5, with two transverse positions and three momentum as its components. The measurement vector might be of length 3, consisting of one hit scintillator strip, the time of the hit, and the integrated charge. These are merely examples to illustrate the two vector types.

To begin the Kalman filter, the following must first be defined:

  • x0 = an initial estimate of the state vector
  • Fk = the state propagator matrices for all points
  • Qk = the process noise covariance matrices for all points
  • Vk = the measurement noise covariance matrices for all points
  • Hk = the matrices which convert a state vector at point k into a measurement vector

The initial state covariance matrix C0 can be set to the identity matrix multiplied by a large scale factor. The smaller this is, the more weight is put on the initial state vector, so in general we would like to make C0 as large as possible. However, because of round-off errors, we need to restrict this matrix to a reasonable value, which depends on the particular fit under consideration.

We are now ready to apply the Kalman filter. At each point k, a four step process is applied:

  1. Ckk-1 = Fk-1 Ck-1 Fk-1T + Qk-1
  2. Kk = Ckk-1 HkT ( Vk + Hk Ckk-1 HkT )-1
  3. xk = Fk-1 xk-1 + Kk ( mk - Hk Fk-1 xk-1 )
  4. Ck = (1 - Kk Hk) Ckk-1

I have combined the extrapolation and filtering of the state vector into one step (3). In most references, these two processes are presented separately. At the end of the iterative process, the final state vector xk represents the fit values using all data points. To obtain the fit values at any point k, the user can then fit backwards, starting with the last point. The fit value at some point k is then the average between the state vectors xk between the two fits.

I have written some code in C++ to help me understand the Kalman filter. This code uses the TMatrix and TVector classes from ROOT, so to compile and run this code, one needs the ROOT libraries on their system.

Example: Average
The first program presented here takes the weighted average of a set of values. In this example, state and measurement vectors have length 1. Vectors and matrices are then equivalent to scalars. The measurements mk I set to the input data points to be averaged. I initialize the matrix (scalar) Vk to the variances (squares of the errors). Because the state vector is the measurement vector, the matrix Hk becomes unity, as does the propagator matrix Fk. There is no process noise, so Qk vanishes.
The Kalman filter applied to a weighted average.

The black points represent the measured data; the blue points represents the filtered state vector as calculated by the Kalman filter at point k. The uncertainty on the filtered state vector is obtained from the covariance matrix Ck and is seen to grow smaller with increasing k, as expected. The final value of the Kalman fit is 4.24835+-0.20320, whereas the true value from a least squares calculation (shown as a horizontal red line in the plot) is 4.24838+-0.20320, a fractional agreement in the weighted average of 10-5.

The main program is in average.cc. The Kalman class is defined in kalman.h and is the heart of the Kalman filtering routine. Other header files needed to compile and run this example are misc.h, definitions.h, and simplearray.h.

Example: Charged particle in a dipole field
In this example, several x,y coordinates are measured in regions of no magnetic field at fixed z positions both upstream and downstream of a dipole magnet (with primary component transverse to the beam axis). A total of 20 tracking stations with a uniform spacing of 1 meter is simulated; each tracking station provides a measurement of both transverse position with a resolution of 5 mm in each coordinate. Each tracking station represents a radiation length of 1%. No energy loss is simulated. The magnetic field is located after the 10th tracking station and before the 11th. There is only one component, the magnitude of which is a Gaussian function of the longitudinal position, with a maximum value of 5 Tesla and a spatial RMS of 0.1 meter, representing a momentum kick of 0.376 GeV/c. We choose our state vector to be of dimension 5, representing the two transverse positions, the transverse slopes, and the charge to momentum ratio. At every iteration of the filter except for the one which traverses the magnetic field, the equations of motion are linear. The only complication involves multiple scattering, which is taken into account through the process noise covariance matrix. We use the calculation of Wolin and Ho [5]. The propagation of the state vector between the 10th and 11th planes is almost linear, but not quite. We linearize the system by calculating first derivatives of the state vector with respect to itself. In the Kalman filter, the propagation is done not by taking the product of the propagator matrix with the state vector, but rather by swimming the state vector through the magnetic field. The linearized propagator matrix is used in calculating the state covariance matrix and the Kalman gain matrix.
The Kalman filter applied to charged particle in a dipole field. The momentum distribution is plotted in (a). The difference between the fit and true momenta divided by the true momentum is shown in (b). The mean value of the fit momentum ratio is shown as a function of momentum in (c). The square of the RMS of the fit momentum ratio is shown as a function of momentum squared in (d).


The magnetic field represents a simple pT kick in the x direction:

  • p theta = px2 - px1 = pT
From this we calculate that the fractional uncertainty in the momentum is related to the uncertainty in the angle theta by
  • edp/p2 = (p2 / pT2) etheta2
The uncertainty in theta has two components: one from the spatial resolution in the measurement and another from multiple scattering. The spatial resolution component is independent of momentum, whereas that for multiple scattering is proportional to the inverse of momentum. Given N planes, the angular uncertainty due to spatial resolution is
  • etheta = ex / L [12 / (N (N2 - 1))]1/2
where ex = 5 mm is the spatial resolution and L = 1 m is the distance between planes. For half of the detector (N = 10), we find the angular uncertainty to be 0.00055. The uncertainty in theta is 0.00078, which is root 2 times the angular uncertainty for half the detector. The angular uncertainty due to multiple scattering in one plane is
  • etheta = [0.0136 / p] (x/x0)1/2 [1 + .038 log(x/x0)]
where x/x0 = 0.01 is the radiation length of one plane. Multiple scattering in the first eight or last eight planes do not contribute to the uncertainty in theta, which is the angular difference between the two detector halves. Therefore, the uncertainty in theta is equal to twice (root 4) the single plane angular uncertainty, and is equal to 0.0022/p. Adding the two components of angular uncertainty in quadrature, we find that the fraction uncertainty in momentum has two components: one which is constant due to multiple scattering, and another from spatial resolution and which has a dependence on momentum:
  • edp/p2 = 0.34e-4 + .43e-5 p2
This agrees quite well with the observed values in (d). Thus, we are confident that in this example of a charged particle in a dipole field with multiple scattering included, the Kalman filter is giving sensible results.




[1] http://www.cs.unc.edu/~welch/kalmanIntro.html
[2] P. Maybeck, Stochastic Models, Estimation, and Control. Academic Press, New York, 1979.
[3] P. Billor and S. Qian, "Simultaneous Pattern Recognition and Track Fitting by the Kalman Filtering Method," Nucl. Inst. Meth. A294, 219 (1990).
[4] R. Fruhwirth, "Application of Kalman Filtering to Track and Vertex Fitting," Nucl. Inst. Meth. A262, 444 (1987).
[5] E.J. Wolin and L.L. Ho, "Covariance matrices for track fitting with the Kalman filter," Nucl. Inst. Meth. A329, 493 (1993).
[6] R. Kutschke and A. Ryd, "Billoir fitter for CLEO II," CLEO internal memo CBX 96-20, 1996 (unpublished).

Last modified: 21 April 1999
Roy Lee