The 12 variables that we need to estimate are: \(x\), \(y\), \(z\), \(\dot x\), \(\dot y\), \(\dot z\), \(\phi\), \(\theta\), \(\psi\), \(\dot \phi\), \(\dot \theta\), \(\dot \psi\).

There are cases when you don’t need to estimate all of them and can get away with only estimating a few of them, such as when controlling the drone via radio control (RC), but I’ll leave that discussion for the “full architecture example” post. For now, let’s assume that we want to measure all of the state variables.

I’ll give 3 examples of varying complexity for measuring the state variables depending on the onboard sensors and available compute power. The first example assumes that you have a gyroscope, accelerometer, magnetometer, barometer or altimeter, and a GPS but have heavy computational constraints. The second example uses the same set of sensors but assumes that you have enough onboard processing power to perform more powerful filtering techniques such as an Extended Kalman Filter (EKF) or an Unscented Kalman Filter (UKF). The last example assumes that you are in a GPS denied environment, have onboard Lidar or stereo camera pairs, and enough compute power and implementation skills to perform ICP or visual odometry in real time.

The simplest state estimation pipeline I’ll introduce requires having a GPS with steady signal, an altimeter or barometer, and an IMU with a gyroscope, accelerometer, and magnetometer.

In order to estimate \(\phi\), \(\theta\), \(\dot \phi\), \(\dot \theta\) we will be fusing the measurements of the gyroscope and accelerometer using a complementary filter. However, it's important to realize that when we read from the gyroscope we are getting \(p\) and \(q\), not \(\dot \phi\) and \(\dot \theta\). If this notation is unfamiliar to you, see here. We can convert the gyroscope readings at time \(p_t\) and \(q_t\) and \(r_t\) to \(\dot \phi^g_t\) and \(\dot \theta^g_t\) as follows, $$ \begin{bmatrix} \dot \phi^g_t \\ \dot \theta^g_t \\ \dot \psi^g_t \end{bmatrix} = \begin{bmatrix}1 & S_\phi T_\theta & C_\phi T_\theta \\ 0 & C_\phi & -S_\phi \\ 0 & \frac{S_\phi}{C_\theta} & \frac{C_\phi}{C_\theta} \end{bmatrix} \begin{bmatrix} p_t \\ q_t \\ r_t \end{bmatrix} $$ where \(C_x\) denotes \(\cos(x)\), \(S_x\) denotes \(\sin(x)\), and \(T_x\) denotes \(\tan(x)\).

We'll also need to convert the accelerometer readings \(A_x\), \(A_y\), and \(A_z\) to get \(\phi^a_t\) and \(\theta^a_t\) as follows, $$ \begin{bmatrix} \phi^a_t \\ \theta^a_t \end{bmatrix} = \begin{bmatrix} \tan^{-1}(\frac{A_y}{\sqrt{A^2_x + A^2_z}}) \cdot \frac{180}{\pi} \\ \tan^{-1}(\frac{A_x}{\sqrt{A^2_y + A^2_z}}) \cdot \frac{180}{\pi} \end{bmatrix} $$ the multiplication by \(\frac{180}{\pi}\) is needed to convert from radians to degrees.

Finally, \(\phi_t\) and \(\theta_t\) can be estimated as follows using a complementary filter: $$ \begin{bmatrix} \phi_t \\ \theta_t \end{bmatrix} = \begin{bmatrix} (1 - \alpha)(\phi_{t-1} + \dot \phi^g_t \varDelta t) + \alpha \phi^a_t \\ (1 - \alpha)(\theta_{t-1} + \dot \theta^g_t \varDelta t) + \alpha \theta^a_t \end{bmatrix} $$ where \(\alpha\) is a parameter that can be tuned but will typically be around \(0.95\). See the next paragraph for a guide on tuning the alpha parameter.

The complementary filter looks like a naive weighting between the gyroscope and accelerometer but it is performing a low pass filter on the gyroscope measurement and a high pass filter on the accelerometer measurement. A great visualization along with a derivation of these equations can be seen here and a visualization of differing values for the complementary filter that may be useful for tuning can be found here.

If you are operating under the “near hover” condition as mentioned in the model post, you can make the following assumption: $$ \begin{bmatrix}1 & S_\phi T_\theta & C_\phi T_\theta \\ 0 & C_\phi & -S_\phi \\ 0 & \frac{S_\phi}{C_\theta} & \frac{C_\phi}{C_\theta} \end{bmatrix} \approx \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix} $$ which allows you to make the approximation that, $$ \begin{bmatrix} \dot \phi \\ \dot \theta \end{bmatrix} \approx \begin{bmatrix} p \\ q \end{bmatrix} $$ Remember that these simplifications are only valid if the drone is near hover.

The psi value can be found using the magnetometer. As shown here[https://cdn-shop.adafruit.com/datasheets/AN203_Compass_Heading_Using_Magnetometers.pdf], the main equation here is: arctan(val_y, val_x) where val_y and val_x are the readings given from the magnetometer. In order to find \(\dot \psi\) you can take subsequent values of \(\psi\) and find \(\dot \psi\) numerically as follows: \( \dot \psi = \frac{\psi_t - \psi_{t - 1}}{\varDelta t}\), where \(\varDelta t\) is the sample rate of your magnetometer.

Magnetometers can be quite noisy so trying to accurately obtain yaw with a magnetometer may work well in an open field but may fail unexpectedly when close to metallic structures. Keep this in mind during your design and when flying.

X and y can be found via your gps. Depending on your application you may need to transform the output of your GPS to a local coordinate frame. The datasheet of your sensor should give you plenty of information on this. \(\dot x\) and \(\dot y\) can be found with differential measurements of \(x\) and \(y\) as follows: \(\dot x = \frac{x_t - x_{t-1}}{\varDelta t}\) and \(\dot y = \frac{y_t - y_{t-1}}{\varDelta t}\).

Z can be found with an altimeter or a barometer. If you have a barometer and are getting raw pressure measurements you should be able to look into the datasheet of your sensor to find a conversion between the output and the change in height. There are also many off the shelf altimeters sold on places like sparkfun, adafruit, and others which do this pressure conversion for you and will save you a couple of minutes of looking into the data sheet. \(\dot z\) can be found by taking differential measurements of \(z\) as follows: \(\dot z = \frac{z_t - z_{t-1}}{\varDelta t}\) where \(\varDelta t\) is the sample rate of your sensor.

The biggest difference of this approach when compared to the last one is that this one assumes that you have a model of your system. We’ll be using the model from a previous post [INSERT LINK TO MODEL POST]

Filtering approaches usually adhere to the “Prediction-Correction” (or “Prediction-Update”) framework made famous by the Kalman Filter [INSERT REFERENCE TO ORIGINAL KALMAN FILTER PAPER]. The main idea here is that at every time step you use your model of the system to make a prediction of the state of the system, then use the onboard sensors to measure the state of the system and fuse these two states together in order to obtain a more accurate state of the system. A fun fact is that the complementary filter that I introduced in the last section also adheres to this framework, though the simplified equations make it less clear. If you’re interested in investigating that, see here:[http://www.olliw.eu/2013/imu-data-fusing/#chapter41]

The Kalman Filter [insert link to kalman filter wikipedia and kalman filter original paper] is a technique that can be used to estimate the state of a linear dynamical system. Two popular extensions, the extended kalman filter and the unscented kalman filter, have been created in order to extend the original Kalman Filter to problems with non-linear dynamics. The Extended Kalman Filter (EKF) is a popular approach that came about in the 60s after Dr. Kalman proposed the Kalman Filter for linear systems. The main idea is to linearize the dynamics of your system and use similar equations to those proposed by the Kalman Filter to propagate your state prediction and fuse your sensor predictions at each time step.

EKF intuition and equations. Give example code.

The Unscented Kalman Filter came about due to problems with the Extended Kalman Filter such as how difficult it was to tune and the computational overhead of generating the jacobians at every time step.

UKF intuition and equations. Give example code.

For those who want a more thorough introduction to the topic see here [https://github.com/rlabbe/Kalman-and-Bayesian-Filters-in-Python] I recommend using the UKF when possible since it is typically more robust to noise and easier to tune. Additionally, you don’t have the computational overhead of calculating jacobians.

This last example assumes that your onboard sensors are either stereo cameras and an IMU, or Lidar. A word of warning: the algorithms in this section are much more complicated than those in previous sections and will likely require a small team of engineers to implement robustly.

Lidar sensors give phenomenal range and accuracy but they come with a large price tag and design constraints that require the drone to be significantly larger. On the other hand, small consumer drones with autonomous capabilities use data from stereo cameras and an IMU to implement Visual-Inertial Odometry (VIO) algorithms and perform state estimation.

The most popular algorithm for state estimation using Lidar is Iterative Closest Point (ICP) of which there are two variants: point-to-point and point-to-plane. Point to point ICP takes as input a Lidar scan at time “t” and a Lidar scan at time “t-1.” It then solves the following optimization problem: [INSERT POINT TO POINT COST FUNCTIONS AND SOLUTIONS]. Intuitively, the algorithm tries to find the best transformation that rotates the points in the scan “t-1” to be on top of the points at time “t” by minimizing the distance between each point in scan “t-1” and it’s closest point in the scan at time “t”. see here for a visualization of point to point ICP [https://manipulation.csail.mit.edu/pose.html#section4]. Point to plane ICP finds corresponding points in the same way (by nearest neighbor) but instead tries to minimize the distance between every point in time scan t-1 and the projection onto the normal vector of each corresponding point in time t. [INSERT POINT TO PLANE EQUATIONS AND SOLUTIONS] The intuition behind this is that the point to plane metric assumes that the corresponding points come from the same surface whereas the point to point metric makes the unlikely assumption that they are in fact the same point. See here for ICP reference ICP_explained. ICP algorithms will typically use KD-Trees [https://en.wikipedia.org/wiki/K-d_tree ] to speed up nearest neighbor searches.

Visual-Inertial Odometry is an alternative to Lidar when an IMU and stereo cameras (or a monocular camera) are present. Visual-Inertial Odometry itself requires a whole post to explain so I’ll give a high level overview of one example pipeline here, then in a later post I’ll go into depth on the details and walk you through several different implementations.

As shown in [https://ieeexplore.ieee.org/document/6096039 ] and [http://rpg.ifi.uzh.ch/visual_odometry_tutorial.html ], at each time step the typical pipeline goes as follows: 1) Capture new stereo image pair I_l_k and I_R_k 2) Extract and match features between I_l_k and I_l_(k-1). 3) Triangulate matched features for each stereo pair 4) Compute T_k from 3D features 5) Concatenate transformation 6) Repeat from step 1 Where a subscript k denotes the time step, a subscript “l” denotes the image is from the left camera, subscript r denotes the image is from the right camera, and “I” denotes an image.

The features that are extracted in step 2 may be SIFT, ORB, etc. If you’re unfamiliar with these, see here [https://docs.opencv.org/4.5.2/db/d27/tutorial_py_table_of_contents_feature2d.html ]. The triangulation in step 3 assumes that you have the camera matrix [https://en.wikipedia.org/wiki/Camera_matrix ] of your camera which maps 3D points in the world to 2D coordinates in your image which will allow you to use any triangulation method shown in [http://web.stanford.edu/class/cs231a/course_notes/04-stereo-systems.pdf ]. Step 4 will require you to solve an optimization problem to find the best rotation and translation that makes up your transformation T_k. Step 5 is just a matrix multiplication of your previous pose along with the newly formed transformation to create your current pose.

Again, later I’ll write a post describing VO in detail since there is a lot more to be said, but if you want to dig deeper right now see [https://ieeexplore.ieee.org/document/6096039 ].

The time and expertise required to implement a simple state estimation technique is significantly less than some state of the art methods. However, it’s not always clear when a system will respond well with a more advanced approach. I believe that simpler is better in most engineering projects and this especially holds true for choosing a state estimation technique. Most quadcopters will get great results using the first approach shown above even though it’s quite simple. Moreover, the EKF is infamously hard to tune and many papers show that for quadcopters a complementary filter will give comparable results [https://aip.scitation.org/doi/pdf/10.1063/1.5018520 ] The goal of any project should be to get the best results in the amount of time you have, and sometimes going for the most advanced approach will cause more headaches with minimal gain. If you intend to implement the techniques shown in this post I suggest starting with the simple techniques and only going to the more advanced after tuning and not obtaining sufficient results.

For those that want a different look at these topics, a great write up can be found here: [http://philsal.co.uk/projects/imu-attitude-estimation]. In the next post we’ll start talking about motion planning by introducing a special property of quadcopters called differential flatness which will make our lives a lot easier when we start thinking about planning trajectories.