Chapter 10 - Variational Multiview Reconstruction
Shape optimization
Shape optimization: find 3d shape that best corresponds to the given images.
Explicit shape representations
For example, splines: Parametric models (e.g. of curve or surfaces). Here, a linear combination of basis functions $\sum_i C_i B_i(s)$
- interpolating vs. approximating spline: must the control points be met?
intuition: basis function specify how much impact control points have in certain regions
- for surfaces: open vs. closed surfaces (with/without boundaries) corresponds to cyclical/non-cyclical basis functions. Formula then $\sum_{ij} C_{ij} B_i(s) B_j(t)$
Implicit shape representations
In optimization: people have moved towards implicit representations.
Advantages:
- union/difference easy to compute
- not depending on reparametrization (e.g. in explicit polygons, you could renumber the vertices; here we get rid of these degrees of freedom)
- arbitrary topology (i.e. arbitrary number of holes)
- many shape optimization problems wrt. implicit representations lead to a convex cost function
Drawbacks:
- more memory intensive
- Updating an implicit representations over time is less efficient.
Indicator function
Implicit repr. of a closed surface $S$: $u(x) = 1$ if $x \in \text{int}(S)$, otherwise $u(x) = 0$.
Signed distance function
$\varphi: V \to \mathbb{R}$ s.t. $\varphi(x) = d(x, S)$ if $x \in \text{int}(S)$, otherwise $\varphi(x) = -d(x, S)$.
Example: Matlab function bwdist
.
Multiview Reconstruction as Shape Optimization
Goal again: several images; reconstruct geometry. Importantly, we assume that camera orientations are given.
Idea: For a voxel on the surface, projecting this into different images should give same color. => Assign each voxel $x \in V$ a value via the photoconsistency function $\rho(x) \in [0, 1]$. $\rho(x)$ is small if projected voxels have similar colors, large otherwise. (Actually, this measures non-photoconsistency)
Underlying assumptions:
- visibility in all images
- Lambertian (non-reflecting) surface
- textured surface
Weighted minimal surface approach [Faugeras & Keriven 1998]
Cost function of a surface $S$:
\[\int_S \rho(S) \, ds\]A good surface is a surface which has good photoconsistency.
Problem: global minimizer is $\emptyset$ (this has cost 0) => there is a shrinking bias! One “solution” is to just optimize locally (this probably eliminates $\emptyset$, but not the bias itself)
Imposing silhouette consistency [Cremers & Kolev, PAMI 2011]
Trying to alleviate the bias problem of [[#Weighted minimal surface approach Faugeras Keriven 1998|previous approach]].
Idea: impose our knowledge that there actually is an object; not the empty object. Do that via a silhouette and impose that projections of the shape should match the silhouettes seen in the images.
\[\min_S \int_S \rho(s) \,ds ~ \text{ s.t.}~ \pi_i(S) = S_i\]Compute a photoconsistent surface that is also silhouette-consistent.
Formulation wrt. indicator functions
Written wrt. the indicator function:
\[\min_{u: V \to \{0,1\}} \int_V \rho(x) |\nabla u(x)| \,dx\]s.t.
\[\int_{R_{ij}} u(x) \,d R_{ij} \geq 1 \Leftrightarrow j \in S_i\]First equation: rewrite above cost function (using a weighted total variation; apparently it’s a well-known fact in optimization that it can be rewritten like this).
Second equation: The ray $R_{ij}$ should hit the object if and only if it intersects the silhouette in the image area.
Convexity
Problem: set of indicator functions $u: V \to {0, 1}$ is not convex.
Instead use \(\mathcal{D} = \bigg\{u: V \to [0,1] \bigg\lvert \int_{R_{ij}} u(x) \,d R_{ij} \geq 1 \Leftrightarrow j \in S_i \bigg\}\)
This is called the set of silhouette-consistent configurations. It is a convex set. Interpretation: $u(x) \in (0, 1)$ represents some uncertainty (soft constraint). Like this, we avoid a difficult combinatorial problem and use an easier continuous problem.
Thresholding: You get an energy $E_{\text{thresh}}$ from the relaxed problem $E_{\text{relaxed}}$. For the actual optimal binary solution $E_{\text{optim}}$, it holds that \(E_{\text{thresh}} \geq E_{\text{optim}} \geq E_{\text{relaxed}}\)
The paper shows: thresholding can be performed in a way s.t. silhouette consistency is preserved.
Multi-view Texture Reconstruction [Goldlücke & Cremers, ICCV 2009]
Motivation: The things we saw above could also (and more precisely) be done with a laser scanner. What laser scanners cannot capture, but cameras can, are colors.
- Very simple approach: just backprojeçt from image to 3d shape to find colors. But: need multiple views to cover whole object
- => averaging multi-view values leads to blurring.
- Stitching instead of averaging instead leads to seams.
Alternative: Variational approach [Goldlücke & Cremers, ICCV 2009]. Intuition: find a sharp texture s.t. after blurring and downsampling (i.e. what a real digital camera does), it matches the observations.
Cost function for textures
Solve:
\[\min_{T: S \to \mathbb{R}^3} \sum_i^n \int_{\Omega_i} \bigg(b \ast (T \circ \pi_i^{-1}) - \mathcal{I}\bigg)^2 \,dx + \lambda \int_S ||\nabla_s T|| \,ds\]Regularization constant $\lambda$ is typically very small. $b$ represents a linear operator including blurring and downsampling ($\ast$ is a little misleading notation). The $\nabla$ is taken along the two degrees of freedom of the surface.
Advantage: this cost function is convex!
Important: the color includes the shading effects.
Super-resolution textures
The blurring + downsampling can be undone, we get the actual sharp texture (super-resolution textures).
How can we hallucinate details that aren’t there in the input image?
- we know a bit how the degradation (downsample + blurring) works (we also have to know how exactly the camera blurs, else it won’t work!)
- we have many images => the more, the sharper we can get
Space-Time Reconstruction from MV Video [Oswald & Cremers, 4DMOD 2013]
Another advantage of cameras vs. laser scanners: reconstruct actions over time filmed with multiple cameras.
Some interesting applications:
- video conferencing with full 3D model of speaker
- sports analysis
- free-viewpoint television