Chapter 10 - RANSAC & Bundle Adjustment
Slides 44 - 88
Slides 1 - 7
In this chapter we are looking into how we can construct a robust SFM. Since when working with SFM it can often happen that features are wrongly matched. If we determine the motion from the wrongly matched features we get totally wrong solution therefore we have to filter out such outliers first.
Robust Estimation
When matching features there are usually some outliers, meaning point correspondences which are matched wrongly. Such outliers are caused by changing view points and changes is illumination. Also image noise, occlisions or blur can cause outliers.
Figure 1: Matching Example. source
When working with Motion estimation such faulty estimation can cause our estimation to drift far away from the truth since all these small deviation do add up.
Figure 2: Outliers on Motion Estimation. source
RANdom SAmple Consensus (RANSAC)
A solution to having many outliers is the RANSAC method which is the standard method of fitting models in presence of having outliers. Since we have a normal fitting problem this solution can be applied whenever the goal is to estimate the parameters of a model. Therefore this is not exclusive to motion estimation and can allso be applied for example to SFM, Calibration or PnP and more.
The basic concept of RANSAC is that it samples randomly points and the tries to fit a model that describe these points. Then for all remaining points the error is calculated and the number of points inside e current region around our model/line are counted. Then the process is repeated. In the end we hselect the model with most points inside a certain region as the model.
So here is a example in more detail. Let go through the images from left to right. Given a set of points we select two points randomly. Then We fit a model throught these points. In this example we just draw a line. Then we calculate the error of all other points to that model. In our case this is just the shortest distance from our model/line to the point. Then we select all the data that support the hypothsis of our current model. This means we select the points that are inside a certain threshold around the model.
Figure 3: Sampling and Fitting. source
This process is denn repeated for many different random samples.
Figure 4: Repeating. source
After k iterations we select the modle that hast most datapoint inside its support region.
So how can we make sure that the produced results are reliable. As seen in the example when we only iterate 3 times and each time only select 2 points for fitting the model we will no reach a meaning full result. In the example above only at the forth iteration we reach a reasonable model. So what what is the optimal amount of point to use for fitting and what it the optmal number of iterations? So idelly we would always select 2 points for fitting and check all possible combinations of them. This however is computationallya very expensive since then there are $N(N-1)/2$ combinations possible for $N$ points. Even with only 1000 points this leadt to 500’000 combinations. So we would like to reduce the cost by not checking all possibilites and stop the RANSAC before. We can find out the optimal number of iterations in a probabilistic way.
Optimal k
First we have to define our prefered support range for this we have to set the inlierratio, so what ratio of all points have to be inside the support range.
\(\begin{align*} \textrm{number of points} &= \textbf{N}\\ \textrm{inlier ratio} &= \textbf{w} := \textrm{number of inliers}/N \\ \textrm{probaility of selecting inliers for both points} &= \mathbf{w^2} := P(p_1 == \textrm{inlier} \land p_2 == \textrm{inlier})\\ \textrm{probaility of selecting at least one outlier} &= \mathbf{1-w^2} := P(p_1 == \textrm{outlier} \lor p_2 == \textrm{outlier})\\ \textrm{RANSAC iteration} &= \textbf{k}\\ \textrm{probability that RANSAC never selcted both points as inlier} &= \mathbf{(1-w^2)^k} := (P(p_1 == \textrm{outlier} \lor p_2 == \textrm{outlier}))^k\\ \textrm{probaility of success} &= \textbf{p} := P(\textrm{both points inlier and therefore model reasonable})\\ \textrm{probaility of failure} &= \mathbf{1-p} := P(\textrm{never both points intlier, none of the models reasonable}) \end{align*}\) \(\begin{align*} &\implies 1-p = (1-w^2)^k\\ &\implies k = \dfrac{\log(1-p)}{\log(1-w^2)}\\ \end{align*}\) So if we know the ratio of inliers $w$ and the probaility of success $p$ we can determine the number of iterations. Usually the user can choose these two parameters and therefor can decide the probaility of success and the inlier ratio. For example if we set the probaility of success ($p$) to 99% and the inlier ration $w$ to 50% we get $k=16$ iterations. This are significanlty fewer iterations than all combinations. Also notices that this derivation does not directly depend on the total number of points. Only $w$ depends on the total number of points.
Now we have derived a formula for getting the optimal number of iteration when we sample two points in each iteration. How does this change we we sample $s$ points in each iteration? Well the only thing that changes it that we don’t have to take $w$ to the power of two but rather power $s$. \(\begin{align*} &\implies k = \dfrac{\log(1-p)}{\log(1-w^\mathbf{s})} \end{align*}\)
RANSAC in SFM
No we show how to use RANSAC for structure from motion. First we need to define model and error measures as well as the number of pointss to sample. Well for SFM the model is the essential / fundamental matrix. So in general the model is defined by the rotation $R$ and the translation $T$. As seen in chapter 9 we need at least 5 points to estimate the model, however more practical are 8 points for then using the 8-point algorithm. And we need to know which error measure we want to use.
So as already described in the general case we start with sampling the point, this time we sample 8 of them. then we model them using the 8-point algorithm to recieve the rotation and translation. We check how many point correspondences support this hypothesis for the transformation. Then we repeate the process for k iteration. In the end we select the essential matrix which encompasses the rotation and translation which has most point corresponences supporting it.
For the 8-point RANSAC case we need $k = \dfrac{\log(1-p)}{\log(1-w^\mathbf{8})}$ iterations. For a probaility $p=99%$, inlier-ration of $w=50%$ this results in 1177 iterations.
For now we only considered the inlier ratio, but of course we can also express all above in terms of outlier ratio. For this we only have to set the outlier ration $\epsilon = 1-w$.
When we plot the number of RANSAC-iterations again the outlier-ratio we see that with increasing outlier ratio the number of itertions needed rises rapidly. We also see that reducing the number of point to sample reduces the number of iterations. So therefore it would be beneficial to not use 8 points but 5.
Figure 5: Iteratons Needed. source
Indeed 5 points are enought for a good estimation if we incluse some motion constraints.
Planar Motion
For many SFM application we can define clearmotion constrainst. For example for cars we have only planar motion as the car itself can not move into the vertical direction. Or Trains we can constraint the motion by the turning radius.
We will now forcus on the constraint of planar motion. For such motions we can describe the transformation in a different format. The translation of the origin of the reference frame of the car is described in polar coordinates which consists of the angular coordinate $\varphi$ and the radial coordinate $\rho$. Additionaly the rotation of the vehicle-refference frame is described with the angle $\theta$.
Figure 6: Planar Motion. source
Fro these definitions of rotation and translation we can construct the new essential matrix.
Figure 7: Epipolar Constraint. source
It is important to notice that $E$ only has two degrees of freedom. These are $\varphi$ and $\theta$. $\rho$ however is not a DoF since it is just a scale factor. As a result 2 correspondences are enought to find a solution. $4n \geq 2+3n$
So is it possible to use even less (than 2) corresponences? The surbrising answer is yes. This is when we use the exploit that wheeled vehicles underly the non-holonomic constraint meaning that they follow a circular motion.
Planar and Circular Motion
For Circular motion we only have one degree of freedom. This is because the circular motion is around the Instantaneous Center of Rotation (ICR). The motion angle around the ICR is the same as the rotation angle of the vihicle around its onw refernece frame origin. Therefore the only degree of freedom is $\theta$. Therefore only 1 point correspondence is needed. Since less points are obviously not possible this is the smalles parametrization possible and therefore produces the most efficient algorithm with only one iteration.
Figure 8: Circular Motion. source
When applying the above notedd essential matrix for the circular motion to the correspondences $p_1$ and $p_2$ we get the following equation: \(\begin{align*} {p_2}^T E p_1 = 0 &\implies sin(\dfrac{\theta}{2}) (u_2 + u_1) + cos(\dfrac{\theta}{2}) (v_2 - v_1) = 0\\ &\implies \theta = -2 tan^{-1}(\dfrac{v_2 - v_1}{u_2 + u_1})\\ \end{align*}\) the 1-Point RANSAC is ONLY used to find the inliers, the motion is then estimated from them in the usual 6 Degrees of freedom using triangulation.
Bundle Adjustment
Bundle adjustments is a method for minimizing the reprojection error between ovserved and reprojected image. So the goal is to refine the structure of P and the motion in a non-linar fashon. Usually the BA is used after the ersimation of the transformation (R and T). BA refines motion and points at the same time the by minimizing the Sum of Squared Reprojection Errors. \(\begin{align*} (P^i,C_2) = argmin_{P^i,C_1,C_2} \sum_{i=1}^{N} \lVert {p_1}^i - \pi_1(P^i,C_1)\rVert^2 + \lVert {p_2}^i - \pi_2(P^i,C_2)\rVert^2 \end{align*}\) In the above formula we used $C_i$ as the pose of the camera in the world frame. For the actual minimization one can use the Levenberg-Marquadt algorithm. As for most optimization methods the initialisation is important to not get stuck and therefore the method should be initialized colse to the minimum.
Bundle Adjustments can also be applied to more than just two views. In that case the equations looks as follows for K views. \(\begin{align*} (P^i,C_k) = argmin_{P^i,C_i} \sum_{k=1}^{K} \sum_{i=1}^{N} \lVert {p_k}^i - \pi_k(P^i,C_k)\rVert^2 \end{align*}\) Figure 9: Multiple View Bundle Adjustment. source
Sometimes large reprojection errors have a bad influence for the optimization algorithm therefore the squared error is not advisable since the error get large quickly then. Some more advisable norm functions $\rho()$ are: \(\begin{align*} (P^i,C_k) = argmin_{P^i,C_i} \sum_{k=1}^{K} \sum_{i=1}^{N} \rho({p_k}^i - \pi_k(P^i,C_k)) \end{align*}\) Figure 10: Hubner and Tukey Norms. source
Summary
What is the trend of RANSAC vs. iterations, vs. the fraction of outliers, vs. the number of points to estimate the model? How do we apply RANSAC to the 8-point algorithm, DLT, P3P?
- We need RANSAC to remove outliers or oîn other words we need it to fit the model based on the inliers.
- the theoretical number of combination for a two point sampling with N points is 𝑁(𝑁−1)/2
- RANSAC can be stopped after k = log(1-p)\log(1-w^2) iterations for probability p and inlier rate w
- the number of needed iteration increases with number of points in sample and with the outlier-rate.
- We can apply the RANSAC to the 8-point algorithm by sampling random 8 points and then estimate the model / essential matrix. For other algorithm we use the triangulation or other methods to the sampled points. We count number of inlliers that fulfill the model and repeate te process. in the end we select the model with the most inliers.
- we can reduce the number of RANSAC iteration by adding some motion constraints. Planarmotion can reduce to two point-RANSAC and planar&circular motion reduces to 1-point-RANSAC
- Bundle Adjustment tries to correct and minimize the error by reprojecting the position of the estimated 3D point onto the cameras plane and then measuring the distance of observed image and reprojected image. This distance is the error that is tried to beminimized. In a application where the camera positions are known but the 3D point position ist not. Then the 3d point is perturbed to minimize the error based in the reprojection with static camera pose. In application where we want to find the pose of the cameras we betrub them ang get the reprojection from a static 3D point.