Home ML for 3D Geometry - Part 4

ML for 3D Geometry - Part 4

Shape Generation

Goal: be able to generate shapes automatically. Usecases: for example, allow amateurs to create quality 3D models, and professionals to reduce repetitive work. Also, complete 3D structures that were partially observed.

Example usecases

  • Modeling by Example [Funkhouser et al. ’04]: Technique where the user selects a part from a model and part of another model to replace it with, which are then automatically fused. For example, fuse a chair back, chair legs, seating surface from different chair models.
  • Part suggestions to support creativity [Chaudhuri and Koltun ‘10]
  • Semantic part suggestion [Chaudhuri et al. ‘13] (make plane more aerodynamic or animal more scary)

Shape Reconstruction

3D-EPN for Shape Completion [Dai et al. ’17]

Classification of partial shape; class label + input scan -> encoder - predictor network (32^2 voxel grid) -> database prior, multiresolution 3D shape synthesis -> output distance field


3D-R2N2: 3D Reconstruction from Multi-View Images [Choy et al. ‘16]

Recurrent approach: Encode each image with 2D CNN, then run feature vectors through convolutional LSTM and decode into a 3D occupancy grid with a CNN.


DeepSDF: Implicit 3D Reconstruction [Park et al. ‘19]

Auto-decoder architecture: like auto-encoder without the encoder part. Instead, the decoder works on codes sampled from an artificial latent space, which is optimized jointly with the decoder.


  • shape completion from single depth image (how?)
  • shape interpolation via interpolating the corresponding latent codes
Occupancy Networks [Mescheder et al. ‘19]

Also implicit 3D reconstruction: instead of an SDF, just reconstruct the function “p $\mapsto$ p inside the object?” implicitly.

Point Cloud Generation

“Just predict 512 points” or similar; less resolution <-> space tradeoff problems than with voxels.

PSGN (Point Set Generation) [Fan et al. ‘17]

Input: segmented image. Output: point cloud that represents the segmented object.

Input image -> 2D CNN -> MLP generates points.

  • Chamfer loss: for each point in target point set, distance to closest point in prediction point set; and vice versa; summed together
    • both directions, since else the loss could be “cheated” by predicting a sub- or superset


Parametric 3D model generation [Smirnov et al. ‘21]

Sketch-based task: from a 2D sketch, generate 3D shape.

Reconstruction via Coon’s patch:

  • parametric representation of a surface in computer graphs (smoothly joins surfaces together)
  • specified: four curves that bound the patch
    • $P(\mu, 0), P(\mu, 1), P(0, \nu), P(1, \nu)$
  • Linearly interpolate between these curves: \(P(\mu, \nu) = P(\mu, 0)(1 - \nu) + P(\mu, 1)\nu + P(0, \nu)(1-\mu) + P(1, \nu)\mu - P(0, 0)(1 - \mu)(1-\nu) - P(0, 1)(1-\mu)\nu - P(1,0)\mu(1-\nu) - P(1,1)\mu\nu\)

Per category, templates of part decomposition are generated (details?). For each category, a generator is trained:

  • Resnet encodes sketch
  • FC layer predicts control points

Trained with Chamfer distance loss + additional losses (normal alignment, collission penalization, patch flatness regularizer)


Reconstructing Explicit 3D Meshes

(indirectly possible via previous methods, e.g. predict sdf -> apply marching cubes)

Directly: can have loss actually on the mesh. Also, possibly more efficient mesh output than from marching cubes.

Pixel2Mesh: Deforming template mesh [Wang et al. ‘18]

Start with ellipsoid mesh; deform to e.g. airplane

Architecture: Two pipelines, one convolving the input image, another deforming the mesh.

  • Image undergoes several convolutions
  • Graph NN predicts vertex displacements based on features from CNN pipeline


Disadvantage: no different topology possible

Mesh R-CNN [Gkioxari et al. ‘19]
  • Start with coarse occupancy grid obtained from image -> create template mesh from it -> refine with deformation approach.

(not many details)

Freeform Mesh Generation: Scan2Mesh [Dai and Niessner ‘19]

“Cut out the template intermediary” Details: see [[06 - Learning on Different 3D Representations#Scan2Mesh Dai et al ‘19|here in Lecture 6]].

Retrieval-based Object Representation [Li et al. ‘15]

Retrieve a similar looking object’s mesh from a database (say, ShapeNet). Enables real-time 3D reconstruction!

  • Retrieving similar objects: matching constellations of keypoints and descriptors.

Joint Embedding Space: space of both real images and shapes, s.t. semantically similar things are close - constructed from multi-view features - images mapped into space via CNN

Joint Embedding for Retrieval

…means that shapes and images are embedded in a joint embedding space (CNN image purification). Used in the [[#Retrieval-based Object Representation Li et al ‘15|previous method]].

Joint embedding of shapes/images [Li et al. ‘15]
  • shapes: construct embedding space based on multi-view features
  • images: train CNN to learn to map images into the embedding space


Joint embedding of 3D scans and CAD objects [Dahnert et al. ‘19]

Construct embedding space end-to-end; use triplet loss for metric learning. This means that an instance is compared with a known positive correspondence (should be close) and a known negative correspondence (should be far away).

Mask2CAD [Kuo et al. ‘20]

Start with image input. Segment into instances -> calculate embedding -> retrieve shape with close embeding. Also, classify the object pose. Combine retrieved shapes with refined poses into reconstruction.


This post is licensed under CC BY 4.0 by the author.