Home Visual Odometry and Visual Algorithms [Part 15] - Visual inertial fusion
Post
Cancel

Visual Odometry and Visual Algorithms [Part 15] - Visual inertial fusion

Chapter 15 - Visual inertial fusion

Lecture 13

Slides 1 - 50

An IMU, alsocalled Interial Measurement Unit, is a combination of six sensors: Three Gyroscope measure the angular velocity (angle change) in each direction, while three accelerometers measure linear acceleration, also in each direction.

Interial Measurement Unit Figure 1: Interial Measurement Unit. source

Micro Electro Mechanical System (MEMS) IMUs are cheap, power efficient and lightweight and used in all smartphones or drones today. Their optical counterpart however can range in price from 20’000 to 100’000 dollars, mechanical IMUs can even be up to 1’000’000 Dollars. For our applications, only MEMS IMUs are important, we are therefore only covering their mechanics here in this chapter.

MEMS Accelerometer

Accelerometer measure their internal acceleration in the direction of their construction. The acceleration is measured by a capacitive divider with a certian mass which is attached at both ends to two springs. When the Mass is accelerated in the direction of the springs, one spring will extend and the other will shrink, displacing the capactive filter and creating a electric signal at one of its ends. The Mass is dumped by gas which is sealed inside the IMU. Note that this device can only measure the acceleration along the springs, not perpendicular to them. We therefore need three perpendicular Accelerometers to measure 3D movement.

Accelerometer Figure 2: Accelerometer. source

The measured data we get out of an accelorometer is the real acceleration minus G-Force, multiplied by the Rotation from the previous frame to the current frame + IMU bias + noise.

Measurement of Acceleration Figure 3: Measurement of Acceleration. source

The formula states that the measured acceleration from frame B(ody) with respect to frame W(orld), measured in frame B, equals the true acceleration from B to W, but measured in frame W, minus the G-Force measured in frame W. To put this in frame B, we have to apply the rotation from frame W to frame B. To all this, we of course have to add a bias and noise.

MEMS Gyroscopes

Gyroscopes measure the rotation of the sensor in the direction of their construction by having a vibrating structure. When a rotational force is applied, the Coriolis force will move the vibrating mass perpendicular to its vibration. The distance by which the mass is moved corresponds to the speed of angular rotation.

In the image below, the gyroscope vibrates along the x axis and a rotation is applied on the z axis. The coliolis force is perpendicular to both, so on the y axis, while the sign of the force indicates positive or negative rotation, and the magnitude indicates the rotational force. Note that we again need three perpendicular gyroscopes to measure rotation in all three axis.

Gyroscope Figure 3: Gyroscope. source

The measured angular velocity from the IMU at a time t is equal to the real angular velocity + IMU bias + noise.

Measurement of Angular Velocity Figure 4: Measurement of Angular Velocity. source

The formulate states that the measured angular velocity from frame B with respect to frame W, measured in frame B, equals the true angular velocity from frame W to frame B, measured in frame B. To all this, we of course have to add a bias and noise.

Camera-IMU System

IMUs can help VO in situations with low texture, high dynamic range and high-speed motion since they rely on intrinsic data and not extrinsic vision information. However, using an IMU without vision will often not work since pure iMU integration will drift drastically over time. IMUs measure acceleration, so we need to integrate once to get the speed, and integrate again to get the position, meaning that the error is proportional to t2. So after 1 minute of measuring, cheap IMUs will have a positional error of over 2.2 killometers.

However, the combination of IMUs and Camera VO is perfect since their specifications are complementary. Cameras are precise in slow motion and provide rich information about the outer world, while having a limited output rate, being scale ambiguous and lack robustness in many scenarios. IMUs have their own disadvantage, like drift, large uncertainty for low acceleration/angular velocity, and gravity/acceleration ambiguity. But that’s no problem, VO is perfect in low acceleration/rotation scenarios and gravity has no influence. However, the advantages of IMUs being robust, having high throughput, being extremely accurate in high acceleration scenarios and being able to predict the future position are all characteristics where the IMU can extend the VOs weaknesses.

In order to use the IMU effectively, it has to be in the same coordinate system as our camera. We therefore need to transform the IMU world measurement W into the body frame B such that the calculated IMU position can be used as an estimate for the camera position as well. Then, we transform all cameras into the body frame as well. We can therefore use multiple cameras with a single IMU to get the cameras new position & rotation estimates.

Noise and Bias

We have seen that both IMU parts introduce noise and bias. The noise is just additive gaussian noise for both sensors. Looking at the bias, we observe that the derivative of the bias will again be white gaussian noise, which is good since it makes the noise estimatable. However, the exact bias can change every time the IMU is started due to temperature and pressure.

Poition estimate

To estimate the posotion pwt2 at timestamp 2 in the World W using an IMU only, we need the initial position from the previous frame at time t1, pwt1. To this, we add the velocity vWt1 at frame 1 * the time passed, (t2 - t1). The velocity in the previous frame has to be known sice our IMU only measures change in velocity. To this model, we can now add the sensory data from the IMU. We take the double integral (acceleration -> speed -> position) from the Rotation RWt (given by the gyroscope), multiply it with the measured acceleration, from which we substract our bias estimate, and add the g-force. The g-force is added since in the measurement of the IMU itself, the g-force was already subtracted, so to zero it out we have to add it now.

IMU Model integration Figure 5: IMU Model integration. source

Visual Interial Fusion (VIO-Fusion)

Now that we have defined how to read out data from an IMU and the camera and bring it into the same Body-Frame, we need to talk about how to make our VO-System more robust using the IMU data. There are two fundamenal approaches: Having the two data systems loosely coupled, meaning we thread the VO and IMU as seperate, independent black boxes. Each estimate pose and velocity independently. In the end, both data are united to redefine the position, orientation and velocity. We could take a weighted average, for example.

Loosely coupled approach Figure 6: Loosely coupled approach. source

But since the two systems are indeed coupled and we know that they - in theory - should output the same data, we are better of using a tightly coupled system, where one system makes use of the others measurements. This means we fusion the raw data: Before we do VO on the tracked features, we use the raw measurements of the IMU. The Fusion part is therefore much bigger and more complex to implement, but it deals with raw data from both sensors to get a fitting position, orientation & velocity estimation. In this chapter, we are only going to focus on the tightly coupled system.

Tightly coupled approach Figure 7: Tightly coupled approach. source

We are now going to examine different Fusion methods.

Closed-form solution

From the vision algorithm we get a pose estimation that is ambigous with respect to a scaling factor but otherwise reliable. Therefore the absolute pose $x$ can be derived from the measured pose $\tilde{x}$ by just applying a scaling factor $s$. Therefore $x=s\tilde{x}$.

From the IMU we get the pose estimation as before: \(\begin{align*} x = s\tilde{x} = x_0 + v_0(t_1-t_0)+\iint_{t_0}^{t_1} a(t) dt \end{align*}\) when the camera and IMU is moving in one dimension we can take the measurement and estimation at three different point in time and therefore get two equations. If we take $v_0(t_{i+1}-t_i)$ to the left we can also write this as a matrix multiplication. note that after taking the term to the left the order if $t_{i+1}$ and $t_i$ is swapped instead of placing a minus befor the term from taking it to the left.

Closed-Form Solution 1D Figure 8: Closed-Form Solution 1D. source

However in general we do not want to assume a one dimensional movement. Therefore we have to assume 6 degrees of freedom (6DOF). As usual three for translation and three for rotation. Lets also assume that we have $N$ features to work with. This general case is much more complex to derive than the 1D case however at its base t is sill a linear system of equations. We can write this as matrix lequation: \(\begin{align*} AX=S \end{align*}\) where $A$ is the general equivalent to \(\begin{bmatrix} \tilde{x_1} & (t_0-t_1)\\ \tilde{x_2} & (t_0-t_2) \end{bmatrix}\) and contains the 2D feature coordinates acceleration and angular velocity measurement. It looks as follows:

Matrix A Figure 9: Matrix A. source

$X$ is equivalent to \(\begin{bmatrix} s\\ v_0 \end{bmatrix}\) containing the unknowns which are the absolute scale $s$, the initial velocity $v_0$, the 3D distances with respect to the first camera, the direction of gravity and the biases.

and S is equivalent to \(\begin{bmatrix} \iint_{t_0}^{t_1} a(t) dt\\ \iint_{t_0}^{t_2} a(t) dt \end{bmatrix}\) containing acceleration and angular velocity measurement

With this closed form solution we can initialize filters and smoothers since the need a starting point and we now have a good estimate.

Filtering

When using filter usually the algorithm uses a series of measurements which were observed over time. All these data-points do conation noise. Therefore when baseing the estimation on these datapoints then it is better to include many of the instead of just besing it one a single measurement. (»>interpretation based on WIKI)

The state of the art filtering mechanims is the Kalman filter also called linear quadratic estimation or the extended kalman filter (ETK). When working with the kalman filter one often hear the word gain which describes the weights that are give to the different measurments and the current state when estimating the furture measurement or state. In our VIO pocess the IMU states are used for the calculation of the gain.

However when using filtering the are various drawbacks. One is that in the filtering the linearization is based on only the current estimation, in the case that this estimation is erroneus this error will be intruduced into all further estiamtions. Also the complexity of the EKF grows quadratically in the number of estimated features. Therefore for realtime application only about 20 landmarks are tracked.

An altervative to the EKF is the MSCKF which uses a liding window approach where not only one current state is kept for the upcoming estimations but rether a whole window of them which then are updated using the EKF. It also incoperates the visual observation whichout including the point positions into the states.

Smoothing methods

Another emthods of improving the estoamtions using the IMU data is the smoothing. With smooting we are tyring to solve the VIO as a graph optimization problem.

Smoothing Method Figure 10: Smoothing Method. source

Let say $X = {x_1,…,x_N}$ are the states at frametimes $1,…N$ defining position, velocity and orientation. Further define $L = {l_1,…l_M}$ to the the 3D landmark positions. $f$ is the function that integration the IMU measurements so our fusion method so that $x_k= f(x_{k-1},u)$ where $u=(a,\omega)$ is the IMU measurement. We also define $z_{i_k}=\pi(x_k,l_i)$ as the backprojection of the landmark $l_i$ onto the camera frame $I_k$, the $k$-th camera frame. Then we can define the estimation as:

Smoothing Formula Figure 11: Smoothing Formula. source

Fixed-lag vs. Full Smoothing

Again like in the filtering we can apply this methods by only considering the last (few) frames for the estimation of the next then we are using the fixed-lag/ sliding window method. But we can also keep all the frames since the start of the trajectory in memory and use them for the estimation. This method is the called full smoothing. For full smoothing it is beneficial to not keep all frames but rather just the kexframes in memory to increase the efficiency since that high of a resolution is not needed. Also we can pre-itegrate the IMU data between the keframes. In the next section we will describe what this pre-integration is. The optimizaton step of the full-methods can be done using a factor grapth (GTSAM) which is vary fast because it only optimizes the poses that are affected by the new observation rather than all.

IMU (Pre-)Iteration

the problem with standart integration of the IMU data is that the integration from $k$ to $k+1$ is related to the state estimation at time $k$. Therefore if the estimation of the state $k$ retropectivetly changes it is costly to then change $k+1$ accordingly. With the Pre-iteration we solve this problem by not using a global frame but rather expressing the state in $k+1$ direclty int the a frame based on the state at $k$.

IMU Pre-Iteration Figure 12: IMU Pre-Iteration. source

Camera-IMU calibration

When we use a Device which has both IMU and a camera integrated most often the camera and the IMU are located at different places and therefore do not have the same origin in theri reference frame. As a result we have a transforation from the Body frame which is the one based on the IMU and the camera frame. Usung the calibarion method we ant to estimate this rigid-body transformation $T_BC$: Also the we want to find the elay $t_d$ between cameraframe and the IMU estimation since the IMU has much higher freqency than the frames of the camera. Hoeever note that we have already calibrated the camera. As imput data for the calibration we have a set of features, often from a calibration pattern like a checkerboard. Also we have the IMU measurements like acceleration ($a_k$) and the gyroscope’s angular velocity ($\omega_k$)

IMU-Camera Calibration Figure 13: IMU-Camera Calibration. source

The goal of the calibration is to minimize the cost function from below.

Cost Function Figure 14: Cost Function. source

The unknown we want to find are: gravity $g_w$, the transformation $T_{WB}(t)$ ($t_k$) and the biases $b_{acc}(t)$ and $b_{gyro}(t)$.

This problem can be solved using the continous-time modeling usng splines and as for numerical solvers we can use the Levenberg-Marquardt solver.

This post is licensed under CC BY 4.0 by the author.