Kalman Filtering: Intro

2023/03/10

This is the first in a series of posts on the Kalman Filter! Over the series, we’ll first derive a filter with the same properties as the KF, then the KF itself, and finally more advanced variants such as EKF and UKF.

I’ll assume for this series that you have some knowledge of linear algebra and probability theory. However, a quick refresher never hurt anyone, so the remainder of this post will cover that. It will take the form of sparse notes, since it’s intended to be an information-dense set of reminders/reference sheet rather than a beginner’s introduction to linear algebra.

Linear Algebra

Linear Systems

The quadratic form (also: the weighted two-norm) looks like: $$ \alpha = x^T A x $$

Where $x$ is a vector and $A$ is a positive definite matrix, meaning that $\alpha$ will be a scalar.

Inverses commute like: $$ (AB)^{-1} = B^{-1}A^{-1} $$

Gradients and Derivatives

Derivative of a vector: if $x_n(t)$ is a vector-valued function of scalar $t$, its vector derivative is $\dot{x}_n(t)$

Gradient: if $\alpha(x)$ is a scalar-valued function of vector $x$, its gradient is a vector $$\frac{\partial \alpha}{\partial x_n} $$

If $y$ is a vector-valued function of vector $x$, then $\frac{dy}{dx}$ is a “partials matrix” whose $n$th row is the gradient of $y_n$ with respect to $x$

If $z$ is a 1xm row vector and $y$ is an mx1 column vector, $$\frac{\partial zy}{\partial x} = z \frac{\partial y}{\partial x} + y^T \frac{\partial z^T}{\partial x}$$

Linalg

Rank is the minimum of the number of linearly independent rows and columns in a matrix. Given some vector $x$, $$rank(x x^T) = 1$$

The hermitian transpose of a matrix is the complex conjugate of its transpose. A matrix is hermitian if $A = A^H$

Determinants have two important properties: first, $|AB| = |A||B|$, and second:

$$|A| = \prod_{i=1}^n \lambda_i$$

A few of the ways to state that a matrix is noninvertible

The trace of a matrix (only defined for a square matrix) is the sum of its diagonal elements.

A symmetic, nxn matrix can be characterized as positive definite, positive semidefinite, negative definite, negative semidefinite, or indefinite. This depends on the sign of $x^T A x$ for all $x$ (positive definite if sign 1, pos semi if 1 or 0, etc). However, this also tells us about the eigenvalues of a matrix (pos def has only pos, real eigenvalues, pos semidef has only nonnegative real eigenvals, etc). Indefinite signifies that some eigenvalues are positive and some are negative.

The singular values of a matrix are defined as $\sigma^2(A) = \lambda(AA^T) = \lambda(A^T A)$. An nxm matrix will always have min(n,m) singular values.

Matrix Exponential

Finding the matrix exponential can be done in a variety of ways, including by finding the Jordan form of $A$.

The definition of $e^{At}$ is: $$e^{At} = \sum_{j = 0}^{\inf} \dfrac{(At)^j}{j!}$$.

Another way to calculate $e^{At}$ is with the inverse Laplace transform:

$$e^{At} = \mathcal{L}^{-1}[(sI - A)^{-1}]$$

Diagonalization

If $A$, an $n \times n$ matrix, has $n$ linearly independent eigenvectors, then it can be fully diagonalized as $X D X^{-1} = A$, where $D$ is a diagonal matrix of the eigenvalues and $X$ is a full-rank matrix of the eigenvectors. Remember that you can find eigenvectors by solving for $(A - \lambda I)x = 0$

Using the Taylor series definition of matrix exponentiation, we can see that $e^{At} = X e^{Dt} X^{-1}$. Since $D$ is diagonal, its matrix exponential is equal to its elementwise exponentiation.

Linear System Dynamics

Continuous-time linears systems are described as:

$$ \dot{x} = Ax + Bu, y = Cx$$

If A, B, and C are constant matrices, then the solution to the system is:

$$x(t) = e^{A(t - t_0)}x(t_0) + \int_{t_0}^t e^{A(t - \tau)}Bu(\tau) d\tau$$ $$y(t) = Cx(t)$$

If $u = 0$, then $x(t) = e^{A(t - t_0)}x(t_0)$. Therefore, $e^(At)$ is called the state-transition matrix of the system. This is true even if $x$ is a vector.

Linearizing a System

The idea of linearizing a system is to choose some point $\bar{x}$ around which to linearize, then define a $\tilde{x} = x - \bar{x}$. We can then do:

$$\dot{x} = f(x, u, t) \approx f(\bar{x}, \bar{u}, \bar{w}) + \dfrac{\partial f}{\partial x}\rvert_{\bar{x}} \tilde{x} + \dfrac{\partial f}{\partial u}\rvert_{\bar{u}} \tilde{u} + \dfrac{\partial f}{\partial w}\rvert_{\bar{w}} \tilde{w}$$

Where $w$ is a noise parameter.

So that we can then get our familiar:

$$ \dot{x} = Ax + Bu + Lw $$

In practice, we can evaluate the first term out to a constant, then find the analytical solutions for the gradients of the next terms before plugging in $\bar{x}$ (or other relevant variable) to get constant matrices that we can use.