Shape Generation

Goal: be able to generate shapes automatically. Usecases: for example, allow amateurs to create quality 3D models, and professionals to reduce repetitive work. Also, complete 3D structures that were partially observed.

Example usecases

Modeling by Example [Funkhouser et al. ’04]: Technique where the user selects a part from a model and part of another model to replace it with, which are then automatically fused. For example, fuse a chair back, chair legs, seating surface from different chair models.
Part suggestions to support creativity [Chaudhuri and Koltun ‘10]
Semantic part suggestion [Chaudhuri et al. ‘13] (make plane more aerodynamic or animal more scary)

Shape Reconstruction

3D-EPN for Shape Completion [Dai et al. ’17]

Classification of partial shape; class label + input scan -> encoder - predictor network (32^2 voxel grid) -> database prior, multiresolution 3D shape synthesis -> output distance field

3D-R2N2: 3D Reconstruction from Multi-View Images [Choy et al. ‘16]

Recurrent approach: Encode each image with 2D CNN, then run feature vectors through convolutional LSTM and decode into a 3D occupancy grid with a CNN.

DeepSDF: Implicit 3D Reconstruction [Park et al. ‘19]

Auto-decoder architecture: like auto-encoder without the encoder part. Instead, the decoder works on codes sampled from an artificial latent space, which is optimized jointly with the decoder.

Allows

shape completion from single depth image (how?)
shape interpolation via interpolating the corresponding latent codes

Occupancy Networks [Mescheder et al. ‘19]

Also implicit 3D reconstruction: instead of an SDF, just reconstruct the function “p $\mapsto$ p inside the object?” implicitly.

Point Cloud Generation

“Just predict 512 points” or similar; less resolution <-> space tradeoff problems than with voxels.

PSGN (Point Set Generation) [Fan et al. ‘17]

Input: segmented image. Output: point cloud that represents the segmented object.

Input image -> 2D CNN -> MLP generates points.

Chamfer loss: for each point in target point set, distance to closest point in prediction point set; and vice versa; summed together
- both directions, since else the loss could be “cheated” by predicting a sub- or superset

Parametric 3D model generation [Smirnov et al. ‘21]

Sketch-based task: from a 2D sketch, generate 3D shape.

Reconstruction via Coon’s patch:

parametric representation of a surface in computer graphs (smoothly joins surfaces together)
specified: four curves that bound the patch
- $P(\mu, 0), P(\mu, 1), P(0, \nu), P(1, \nu)$
Linearly interpolate between these curves: $P(\mu, \nu) = P(\mu, 0)(1 - \nu) + P(\mu, 1)\nu + P(0, \nu)(1-\mu) + P(1, \nu)\mu - P(0, 0)(1 - \mu)(1-\nu) - P(0, 1)(1-\mu)\nu - P(1,0)\mu(1-\nu) - P(1,1)\mu\nu$

Per category, templates of part decomposition are generated (details?). For each category, a generator is trained:

Resnet encodes sketch
FC layer predicts control points

Trained with Chamfer distance loss + additional losses (normal alignment, collission penalization, patch flatness regularizer)

Reconstructing Explicit 3D Meshes

(indirectly possible via previous methods, e.g. predict sdf -> apply marching cubes)

Directly: can have loss actually on the mesh. Also, possibly more efficient mesh output than from marching cubes.

Pixel2Mesh: Deforming template mesh [Wang et al. ‘18]

Start with ellipsoid mesh; deform to e.g. airplane

Architecture: Two pipelines, one convolving the input image, another deforming the mesh.

Image undergoes several convolutions
Graph NN predicts vertex displacements based on features from CNN pipeline

Disadvantage: no different topology possible

Mesh R-CNN [Gkioxari et al. ‘19]

Start with coarse occupancy grid obtained from image -> create template mesh from it -> refine with deformation approach.

(not many details)

Freeform Mesh Generation: Scan2Mesh [Dai and Niessner ‘19]

“Cut out the template intermediary” Details: see [[06 - Learning on Different 3D Representations#Scan2Mesh Dai et al ‘19|here in Lecture 6]].

Retrieval-based Object Representation [Li et al. ‘15]

Retrieve a similar looking object’s mesh from a database (say, ShapeNet). Enables real-time 3D reconstruction!

Retrieving similar objects: matching constellations of keypoints and descriptors.

Joint Embedding Space: space of both real images and shapes, s.t. semantically similar things are close - constructed from multi-view features - images mapped into space via CNN

Joint Embedding for Retrieval

…means that shapes and images are embedded in a joint embedding space (CNN image purification). Used in the [[#Retrieval-based Object Representation Li et al ‘15|previous method]].

Joint embedding of shapes/images [Li et al. ‘15]

shapes: construct embedding space based on multi-view features
images: train CNN to learn to map images into the embedding space

Joint embedding of 3D scans and CAD objects [Dahnert et al. ‘19]

Construct embedding space end-to-end; use triplet loss for metric learning. This means that an instance is compared with a known positive correspondence (should be close) and a known negative correspondence (should be far away).

Mask2CAD [Kuo et al. ‘20]

Start with image input. Segment into instances -> calculate embedding -> retrieve shape with close embeding. Also, classify the object pose. Combine retrieved shapes with refined poses into reconstruction.

ML for 3D Geometry - Part 4