Home Visual Odometry and Visual Algorithms [Part 2] - Optics
Post
Cancel

Visual Odometry and Visual Algorithms [Part 2] - Optics

Chapter 2 - Optics

Lecture 2

Slides 1 - 48

In this article, we are going to find out how an image is formed on an image plane.

The overall principle is quite basic: An object reflects light that falls on an Image sensor, which captures the lights intensity and therefore, forms an image. To ensure that every part of the scenery only falls onto the optical sensor at one spot only, we can introduce an optical barrier with a hole in it which ensures that - for each point in our scene - only light rays with a particular angle fall onto the image plane. We can therefore create an upside-down copy of the scenery on our optical sensor. The smaller the barriers hole, the more angle-selective our camera becomes, the sharper the image appears. The hole is also known as aperture, or pinhole.

Pinhole Model, p. 7 Figure 1: Concept of a Pinhole camera. source

An ideal pinhole camera has - mathematically - an infinitely small pinhole to have an infinitely sharp image. In practice, this comes in hand with two problems: The smaller the pinhole, the less light we can capture on our sensor. Further, a pinhole smaller than 0.3mm will cause light wave interferences, making the image blurry again due to diffraction effects.

To combat these issues, lenses are used. They have the property that they bundle light rays coming from the same point in our scenery into a (preferably) single spot on the optical sensor. Lenses must fulfill two characteristics to be fitting camera lenses:

    1. Light rays that travel through the optical center of the lens will not be deviated
    1. Light rays that fall parallel to the optical axis into the lens are focused in a so called “focal point” f behind the lens.

Lens, p. 16 Figure 2: Lens. source

With combining these two properties, we can derive the thin lens equation: From a single object in our scene at distance z</t> and height A</t> in front of our lens, we can construct two light rays: One passing through the optical center, the other entering the lens parallel to the optical axis.

To make the object appear sharp, we have to ensure that both light rays fall onto the same point on our optical sensor. Since the focal point is given by the lens properties, we cannot vary the focal length f</t> (distance between focal point and lens). What we can change is the position of the optical sensor: We can either bring it closer or farther to lens. We call the distance to the optical sensor e.

With looking back at similarity triangles principles, we can see that The objects elevation A</t> in respect to the position of the sensor crossing point B</t> must be the same as the distance to the object z</t> in respect to the distance from the lens to the image sensor e</t>. We therefore conduct that B/A = e/z</t>. As a second equation, we can conduct that B/A</t> must also be equal to the ratio e-f/f</t>. To simplify, we can also write B/A = e/f - 1.

Thin Lens equation, p. 17 Figure 2: Thin lens equation model visualization. source

With combining these two equations, we get that e/f - 1 = e/z</t> and finally 1/f = 1/z + 1/e. Therefore, for a given distance z</t>, the object only appears sharp if the optical sensor is distance e</t> away from the lens.

It becomes clear that if we move an object further away (increasing z</t>), the distance e</t> must be changed to make the object appear sharp again. When the thin lens equation is not satisfied, the light rays do not intersect at the optical sensor, creating a blur circle which is perceived as “unsharp”. Only a blur circle with radius less than 1 Pixel gives a sharp image.

The distance between the focal plane (where the light rays at this distance actually meet) and the image plane (optical sensor) is called δ, the diameter of a pinhole is referred to as L. For simple pinhole cameras, this gives us a blur circle radius of:

\[\begin{align*} \dfrac{L/2}{R}=\dfrac{e}{\delta} \implies R = \dfrac{L \cdot \delta}{2 \cdot e} \end{align*}\]

Blur circle, p. 19 Figure 3: Blur circle. source

Why is this relevant? Well, for large distances to our object z we can approximate our lens-based camera model with a pinhole camera, since the distance to any world object is much much larger than the focal length or the lens size. Typically, smartphone cameras have focal lengths of 1.7mm and lens sizes of < 1 cm. We can therefore focus at objects at infinity. This implies that the focal plane for objects that are infinitely far away moves to the focal point.

We can therefore safely approximate that the focal length f</t> is equal to the optical sensor distance e : e ≈ f</t>. This makes the relation between our objects elevation A</t> and the point on the image plane B</t> even simpler: We don’t have to consider two lightrays but just one falling straight throught the pinhole. This also leaves us with a simpler equation to find the point B</t> where a object at distce z</t> and elevation A</t> would fall on the image sensor: B/A = f/z</t>, or B = f/z ∙ A</t>. Therefore, objects twice as far away appear half as large in on the optical sensor.

Pinhole approximation, p. 22 Figure 4: Pinhole approximation. source

The distance range in which an object appears sharp (R < 1 pixel) is called the Depth of Field (DoF). The smaller the apperture, the larger the Depth of Field, but the less light we have left for the image sensor. Further, the lens size at a fixed focal length defines the biggest angle the camera can perceive, the Field of View (FoV). We can also increase/decrease the FoV by changing the focal length: Larger focal lengths intuitively result in a more narrow viewing angle. The ratio between the focal length f</t>, lenswidth W</t> and the FoV angle θ</t> can be simply expressed via a tangential relation: tan(θ/2) = W/(2∙f)</t>

FoV-Focal length ratio, p. 22 Figure 5: Ratio between Field of View and Focal length. source

An interesting consequence of perspective projection is that parallel lines in the 3D world are no longer perceived as parallel on the 2D image plane. Neither are angles preserved. Further, it seems that parallel lines in an image will cross at some point, the so called vanishing point. With tracking all these vanishing points, we can fit a vanishing line through them: A line on which all vanishing points land. We observe two vanishing lines: one for horizontal and one for vertical parallel lines.

Vanishing Lines and Points Figure 6: Vanishing Lines and Points. source

We can mathematically proof that parallel vanishing lines will intersect at a vanishing point (xVP, yVP) in the camera frame. Note that this vanishing point must NOT be on the visible part of the image frame. Let’s consider the perspective equations in camera metric coordinates. We know that x = f * (X/Z) and y = f * (Y/Z). This is given by the previously seen similarity triangle principles.

We can now define two parallel lines in parametric equations. A line is described via a point, a direction and a direction-scalar that can be changed to reach any point in the system. To make two parallel lines, we choose different base points but the same directions.

Two parallel lines Figure 7: Two parallel lines. source

To proof that at infinity, the lines will cross at a specific point (xVP, yVP), we look at both x and y dimensions seperately and take the limes from s to infinity. First of all, the X/Z part becomes irrelevantly small, so it can be neglected. We are left with the term f * sl/sn, from which we can cancel out the s. We finally get a point xVP that is only dependent on the direction of the line. The same holds true for yVP. We have found the image coordinates of the vanishing point.

Taking the limes of the vanishing line Figure 8: Taking the limes of the vanishing line. source

This post is licensed under CC BY 4.0 by the author.