3D Object Detection for Autonomous Driving: A Comprehensive Survey

1. Data Source for 3D Object Detection

1.1. Datasets for 3D Object Detection

2022

DAIR-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection (CVPR 2022)

2021s

One Million Scenes for Autonomous Driving: ONCE Dataset (NeurIPS 21)
Argoverse 2: Next Generation Datasets for Self-driving Perception and Forecasting (NeurIPS 21)
Cirrus: A Long-range Bi-pattern LiDAR Dataset (ICRA 21)
RADIATE: A Radar Dataset for Automotive Perception in Bad Weather (ICRA 21)
PandaSet: Advanced Sensor Suite Dataset for Autonomous Driving (ITSC 21)
KITTI-360: A Novel Dataset and Benchmarks for Urban Scene Understanding in 2D and 3D (arXiv 21)
All-In-One Drive: A Large-Scale Comprehensive Perception Dataset with High-Density Long-Range Point Clouds (arXiv 21)

2020

nuScenes: A Multimodal Dataset for Autonomous Driving (CVPR 20)
Scalability in Perception for Autonomous Driving: Waymo Open Dataset (CVPR 20)
Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather (CVPR 20)
Cityscapes 3D: Dataset and Benchmark for 9 DoF Vehicle Detection (CVPRW 20)
The ApolloScape Open Dataset for Autonomous Driving and its Application (T-PAMI 20)
EU Long-term Dataset with Multiple Sensors for Autonomous Driving (IROS 20)
LIBRE: The Multiple 3D LiDAR Dataset (IV 20)
A2D2: Audi Autonomous Driving Dataset (arXiv 20)
Canadian Adverse Driving Conditions Dataset (arXiv 20)

2019

TrafficPredict: Trajectory Prediction for Heterogeneous Traffic-Agents (AAAI 19)
Lyft Level 5 AV Dataset (Website)
The H3D Dataset for Full-Surround 3D Multi-Object Detection and Tracking in Crowded Urban Scenes (ICRA 19)
Precise Synthetic Image and LiDAR (PreSIL) Dataset for Autonomous Vehicle Perception (IV 19)
A*3D Dataset: Towards Autonomous Driving in Challenging Environments (arXiv 19)

2017 or earlier

Vision meets Robotics: The KITTI Dataset (IJRR 13)
Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite (CVPR 12)

1.2. Evaluation Metrics

2021

Revisiting 3D Object Detection From an Egocentric Perspective (NeurIPS 21)

2020

Learning to Evaluate Perception Models using Planner-Centric Metrics (CVPR 20)
The efficacy of Neural Planning Metrics: A meta-analysis of PKL on nuScenes (IROSW 20)

1.3. Loss Functions

2021

Object DGCNN: 3D Object Detection using Dynamic Graphs (NeurIPS 21)
Center-based 3D Object Detection and Tracking (CVPR 21)
Accurate 3D Object Detection using Energy-Based Models (CVPRW 21)

2020

Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots (ECCV 20)
Rotation-robust Intersection over Union for 3D Object Detection (ECCV 20)
Improving 3D Object Detection via Joint Attribute-oriented 3D Loss (IV 20)

2019

IoU Loss for 2D/3D Object Detection (3DV 19)
Focal Loss in 3D Object Detection (RA-L 19)

2. Sensor-based 3D Object Detection

2.1. LiDAR-based 3D Object Detection

A chronological overview of the most prestigious LiDAR-based 3D object detection methods.

2.1.1. Point-based 3D Object Detection

A general point-based detection framework contains a point-based backbone network and a prediction head. The point-based backbone consists of several blocks for point cloud sampling and feature learning, and the prediction head directly estimates 3D bounding boxes from the candidate points.

2022

SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object Detection (AAAI 22)

2021

3D Object Detection with Pointformer (CVPR 21)
Relation Graph Network for 3D Object Detection in Point Clouds (T-IP 21)
3D-CenterNet: 3D object detection network for point clouds with center estimation priority (PR 21)

2020

3DSSD: Point-based 3D Single Stage Object Detector (CVPR 20)
Point-GNN: Graph Neural Network for 3D Object Detection in a Point Cloud (CVPR 20)
Joint 3D Instance Segmentation and Object Detection for Autonomous Driving (CVPR 20)
Improving 3D Object Detection through Progressive Population Based Augmentation (ECCV 20)
False Positive Removal for 3D Vehicle Detection with Penetrated Point Classifier (ICIP 20)

2019

PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud (CVPR 19)
Attentional PointNet for 3D-Object Detection in Point Clouds (CVPRW 19)
STD: Sparse-to-Dense 3D Object Detector for Point Cloud (ICCV 19)
StarNet: Targeted Computation for Object Detection in Point Clouds (arXiv 19)
PointRGCN: Graph Convolution Networks for 3D Vehicles Detection Refinement (arXiv 19)

2018

IPOD: Intensive Point-based Object Detector for Point Cloud (arXiv 18)

2.1.2. Grid-based 3D Object Detection (Voxel and Pillars)

The grid-based approaches rasterize point cloud into 3 grid representations: voxels, pillars, and bird’s-eye view (BEV) feature maps. 2D convolutional neural networks or 3D sparse neural networks are applied on grids for feature extraction. 3D objects are finally predicted from BEV grid cells.

2021

Object DGCNN: 3D Object Detection using Dynamic Graphs (NeurIPS 21)
Center-based 3D Object Detection and Tracking (CVPR 21)
Voxel Transformer for 3D Object Detection (ICCV 21)
LiDAR-Aug: A General Rendering-based Augmentation Framework for 3D Object Detection (CVPR 21)
RAD: Realtime and Accurate 3D Object Detection on Embedded Systems (CVPRW 21)
AGO-Net: Association-Guided 3D Point Cloud Object Detection Network (T-PAMI 21)
CIA-SSD: Confident IoU-Aware Single-Stage Object Detector From Point Cloud (AAAI 21)
Voxel R-CNN: Towards High Performance Voxel-based 3D Object Detection (AAAI 21)
Anchor-free 3D Single Stage Detector with Mask-Guided Attention for Point Cloud (ACM MM 21)
Integration of Coordinate and Geometric Surface Normal for 3D Point Cloud Object Detection (IJCNN 21)
PSANet: Pyramid Splitting and Aggregation Network for 3D Object Detection in Point Cloud (Sensors 21)

2020

Every View Counts: Cross-View Consistency in 3D Object Detection with Hybrid-Cylindrical-Spherical Voxelization (NeurIPS 20)
HVNet: Hybrid Voxel Network for LiDAR Based 3D Object Detection (CVPR 20)
Associate-3Ddet: Perceptual-to-Conceptual Association for 3D Point Cloud Object Detection (CVPR 20)
DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes (CVPR 20)
Object as Hotspots: An Anchor-Free 3D Object Detection Approach via Firing of Hotspots (ECCV 20)
SSN: Shape Signature Networks for Multi-class Object Detection from Point Clouds (ECCV 20)
Pillar-based Object Detection for Autonomous Driving (ECCV 20)
From Points to Parts: 3D Object Detection From Point Cloud With Part-Aware and Part-Aggregation Network (T-PAMI 20)
Reconfigurable Voxels: A New Representation for LiDAR-Based Point Clouds (CoRL 20)
SegVoxelNet: Exploring Semantic Context and Depth-aware Features for 3D Vehicle Detection from Point Cloud (ICRA 20)
TANet: Robust 3D Object Detection from Point Clouds with Triple Attention (AAAI 20)
SARPNET: Shape attention regional proposal network for liDAR-based 3D object detection (NeuroComputing 20)
Voxel-FPN: Multi-Scale Voxel Feature Aggregation for 3D Object Detection from LIDAR Point Clouds (Sensors 20)
BirdNet+: End-to-End 3D Object Detection in LiDAR Bird’s Eye View (ITSC 20)
1st Place Solution for Waymo Open Dataset Challenge - 3D Detection and Domain Adaptation (arXiv 20)
AFDet: Anchor Free One Stage 3D Object Detection (arXiv 20)

2019

PointPillars: Fast Encoders for Object Detection from Point Clouds (CVPR 19)
End-to-End Multi-View Fusion for 3D Object Detection in LiDAR Point Clouds (CoRL 19)
IoU Loss for 2D/3D Object Detection (3DV 19)
Accurate and Real-time Object Detection based on Bird’s Eye View on 3D Point Clouds (3DV 19)
Focal Loss in 3D Object Detection (RA-L 19)
3D-GIoU: 3D Generalized Intersection over Union for Object Detection in Point Cloud (Sensors 19)
FVNet: 3D Front-View Proposal Generation for Real-Time Object Detection from Point Clouds (CISP 19)
Class-balanced Grouping and Sampling for Point Cloud 3D Object Detection (arXiv 19)
Patch Refinement - Localized 3D Object Detection (arXiv 19)

2018

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection (CVPR 18)
PIXOR: Real-time 3D Object Detection from Point Clouds (CVPR 18)
SECOND: Sparsely Embedded Convolutional Detection (Sensors 18)
RT3D: Real-Time 3-D Vehicle Detection in LiDAR Point Cloud for Autonomous Driving (RA-L 18)
BirdNet: a 3D Object Detection Framework from LiDAR Information (ITSC 18)
YOLO3D: End-to-end real-time 3D Oriented Object Bounding Box Detection from LiDAR Point Cloud (ECCVW 18)
Complex-YOLO: An Euler-Region-Proposal for Real-time 3D Object Detection on Point Clouds (ECCVW 28)

2017 or earlier

3D Fully Convolutional Network for Vehicle Detection in Point Cloud (IROS 17)
Vote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks (ICRA 17)
Vehicle Detection from 3D Lidar Using Fully Convolutional Network (RSS 16)
Voting for Voting in Online Point Cloud Object Detection (RSS 15)

2.1.3. Point-voxel based 3D Object Detection

Single-stage point-voxel detection framework fuses point and voxel features in the backbone network. Two-stage point-voxel detection framework first generates 3D object proposals with a voxel-based 3D detector, and then refines these proposals using keypoints sampled from point cloud.

2022

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection (AAAI 22)

2021

LiDAR R-CNN: An Efficient and Universal 3D Object Detector (CVPR 21)
PVGNet: A Bottom-Up One-Stage 3D Object Detector with Integrated Multi-Level Features (CVPR 21)
HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection (CVPR 21)
Pyramid R-CNN: Towards Better Performance and Adaptability for 3D Object Detection (ICCV 21)
Improving 3D Object Detection with Channel-wise Transformer (ICCV 21)
SA-Det3D: Self-Attention Based Context-Aware 3D Object Detection (ICCVW 21)
From Voxel to Point: IoU-guided 3D Object Detection for Point Cloud with Voxel-to-Point Decoder (ACM MM 21)
RV-FuseNet: Range View Based Fusion of Time-Series LiDAR Data for Joint 3D Object Detection and Motion Forecasting (IROS 21)
Pattern-Aware Data Augmentation for LiDAR 3D Object Detection (ITSC 21)
From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection (T-CSVT 21)
Pseudo-Image and Sparse Points: Vehicle Detection With 2D LiDAR Revisited by Deep Learning-Based Methods (T-ITS 21)
Dual-Branch CNNs for Vehicle Detection and Tracking on LiDAR Data (T-ITS 21)
Improved Point-Voxel Region Convolutional Neural Network: 3D Object Detectors for Autonomous Driving (T-ITS 21)
DSP-Net: Dense-to-Sparse Proposal Generation Approach for 3D Object Detection on Point Cloud (IJCNN 21)
P2V-RCNN: Point to Voxel Feature Learning for 3D Object Detection From Point Clouds (IEEE Access 21)
PV-RCNN++: Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection (arXiv 21)
M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers (arXiv 21)

2020

PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection (CVPR 20)
Structure Aware Single-stage 3D Object Detection from Point Cloud (CVPR 20)
Searching Efficient 3D Architectures with Sparse Point-Voxel Convolution (ECCV 20)
InfoFocus: 3D Object Detection for Autonomous Driving with Dynamic Information Modeling (ECCV 20)
SVGA-Net: Sparse Voxel-Graph Attention Network for 3D Object Detection from Point Clouds (arXiv 20)

2019

Point-Voxel CNN for Efficient 3D Deep Learning (NeurIPS 19)
Fast Point R-CNN (ICCV 19)

2018

LMNet: Real-time Multiclass Object Detection on CPU Using 3D LiDAR (ACIRS 18)

2.1.4. Range-based 3D Object Detection

The first category of range-based approaches directly predicts 3D objects from pixels in range images, with standard 2D convolutions, or specialized convolutional/graph operators for feature extraction. The second category transforms features from range view into bird’s-eye view or point-view, and then detects 3D objects from the transformed view.

2021

RSN: Range Sparse Net for Efficient, Accurate LiDAR 3D Object Detection (CVPR 21)
RangeIoUDet: Range Image based Real-Time 3D Object Detector Optimized by Intersection over Union (CVPR 21)
To the Point: Efficient 3D Object Detection in the Range Image with Graph Convolution Kernels (CVPR 21)
RangeDet: In Defense of Range View for LiDAR-based 3D Object Detection (ICCV 21)
It’s All Around You: Range-Guided Cylindrical Network for 3D Object Detection (ICCVW 21)
LaserFlow: Efficient and Probabilistic Object Detection and Motion Forecasting (RA-L 21)

2020

Range Conditioned Dilated Convolutions for Scale Invariant 3D Object Detection (arXiv 20)
RangeRCNN: Towards Fast and Accurate 3D Object Detection with Range Image Representation (arXiv 20)

2019

LaserNet: An Efficient Probabilistic 3D Object Detector for Autonomous Driving (CVPR 19)

2.1.5. Anchor-based 3D object detection

3D anchor boxes are placed at each BEV grid cell. Those anchors that have high IoUs with ground truths are selected as positives. The sizes and centers of 3D objects are regressed from the positive anchors, and the objects’ heading angles are predicted by bin-based classification and regression.

2.1.6. Anchor-free 3D object detection

The anchor-free learning targets can be assigned to diverse views, including the bird’s-eye view, point view, and range view. Object parameters are predicted directly from the positive samples.

2.2 Camera-based 3D Object Detection

A chronological overview of the camera-based 3D object detection methods.

2.2.1. Monocular-based 3D Object Detection

Image only

Single-stage anchor-based approaches predict 3D object parameters leveraging both image features and predefined 3D anchor boxes. Single-stage anchor-free methods directly predict 3D object parameters from image pixels. Two-stage approaches first generate 2D bounding boxes from a 2D detector, and then lift up 2D detection to the 3D space by predicting 3D object parameters from the 2D RoI features.

Depth-assisted

Depth-image based approaches obtain depth-aware image features by fusing information from both the RGB image and the depth image. Pseudo-LiDAR based methods first transform the depth image into a 3D pseudo point cloud, and then apply LiDAR-based 3D detector on the point cloud to detect 3D objects. Patch-based approaches transform the depth image into a 2D coordinate map, and then apply a 2D neural network on the coordinate map for detection.

Prior-guided monocular

Prior-guided approaches leverage object shape priors, geometric priors, segmentation and temporal constrains to help detect 3D objects.

2022

MonoDistill: Learning Spatial Features for Monocular 3D Object Detection (ICLR 22)
Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection (AAAI 22)
ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection (WACV 22)

2021

Progressive Coordinate Transforms for Monocular 3D Object Detection (NeurIPS 21)

lidar, progressive refine.

Delving into Localization Errors for Monocular 3D Object Detection (CVPR 21)

image, error analysis.

Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection (CVPR 21)

image+depth, depth-conditioned graph.

Monocular 3D Object Detection: An Extrinsic Parameter Free Approach (CVPR 21)

image, extrinsic free.

MonoRUn: Monocular 3D Object Detection by Reconstruction and Uncertainty Propagation (CVPR 21)

lidar, 2d det, shape reconstruction.

GrooMeD-NMS: Grouped Mathematically Differentiable NMS for Monocular 3D Object Detection (CVPR 21)

image, nms.

Categorical Depth Distribution Network for Monocular 3D Object Detection (CVPR 21)

lidar, catrgorical depth, frustum+voxel.

Objects are Different: Flexible Monocular 3D Object Detection (CVPR 21)

image+depth, edge fusion.

M3DSSD: Monocular 3D Single Stage Object Detector (CVPR 21)

image, anchor, attention.

Exploring Intermediate Representation for Monocular Vehicle Pose Estimation (CVPR 21)

pose only.

AutoShape: Real-Time Shape-Aware Monocular 3D Object Detection (ICCV 21)

image, shape constraint.

Is Pseudo-Lidar needed for Monocular 3D Object detection? (ICCV 21)

image+depth, end-to-end

The Devil is in the Task: Exploiting Reciprocal Appearance-Localization Features for Monocular 3D Object Detection (ICCV 21)

image+depth, feature disentangle.

FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection (ICCVW 21)

image, end-to-end, 2d boxes, centerness.

MonoCInIS: Camera Independent Monocular 3D Object Detection using Instance Segmentation (ICCVW 21)

image, camera-independent, instance segmentation.

Probabilistic and Geometric Depth: Detecting Objects in Perspective (CoRL 21)

image+depth, end-to-end, depth estimation with probability and geo-relation.

Monocular 3D Detection With Geometric Constraint Embedding and Semi-Supervised Training (RA-L 21)

image, geo-reasoning.

Ground-aware Monocular 3D Object Detection for Autonomous Driving (RA-L 21)

image+depth, 3D anchors, ground estimation.

Neighbor-Vote: Improving Monocular 3D Object Detection through Neighbor Distance Voting (ACM MM 21)
Point-Guided Contrastive Learning for Monocular 3-D Object Detection (T-Cybernetics 21)
Lidar Point Cloud Guided Monocular 3D Object Detection (arXiv 21)
OCM3D: Object-Centric Monocular 3D Object Detection (arXiv 21)

lidar, voxel car.

2020

MonoPair: Monocular 3D Object Detection Using Pairwise Spatial Relationships (CVPR 20)

image+depth, objects relations.

Autolabeling 3D Objects with Differentiable Rendering of SDF Shape Priors (CVPR 20)

image, SDF shape estimation.

Learning Depth-Guided Convolutions for Monocular 3D Object Detection (CVPRW 20)

image+depth, depth-guided conv.

SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation (CVPRW 20)

image, keypoints estimation.

RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving (ECCV 20)

image, 9 keypoints estimation.

Distance-Normalized Unified Representation for Monocular 3D Object Detection (ECCV 20)

image, keypoint, center distance.

Monocular 3D Object Detection via Feature Domain Adaptation (ECCV 20)

lidar, real-pseudo DA.

Monocular Differentiable Rendering for Self-Supervised 3D Object Detection (ECCV 20)

image, differential rendering, render and compare, depth+det+seg.

Rethinking Pseudo-LiDAR Representation (ECCV 20)

lidar, xyz map, seg.

Kinematic 3D Object Detection in Monocular Video (ECCV 20)

image, video, motion, kalman filter.

Towards Generalization Across Depth for Monocular 3D Object Detection (ECCV 20)

image, categorical distance.

Monocular 3D Object Detection with Decoupled Structured Polygon Estimation and Height-Guided Depth Estimation (AAAI 20)

lidar, polygon estimation, height constraint, 3D box pool on BEV.

Task-Aware Monocular Depth Estimation for 3D Object Detection (AAAI 20)

lidar, foreground-background depth estimation.

MoNet3D: Towards Accurate Monocular 3D Object Localization in Real Time (ICML 20)

image+depth, keypoints.

MonoFENet: Monocular 3D Object Detection With Feature Enhancement Networks (T-IP 20)

image+depth, 2-stage, point 2nd-stage.

Dynamic Depth Fusion and Transformation for Monocular 3D Object Detection (ACCV 20)
IAFA: Instance-aware Feature Aggregation for 3D Object Detection from a Single Image (ACCV 20)
PerMO: Perceiving More at Once from a Single Image for Autonomous Driving (arXiv 21)

2019

Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving (CVPR 19)

lidar

GS3D: An Efficient 3D Object Detection Framework for Autonomous Driving (CVPR 19)

image, 2d box + heading, feature extraction on surfaces.

Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction (CVPR 19)

image, local point cloud reconstruction, 2-stage.

ROI-10D: Monocular Lifting of 2D Detection to 6D Pose and Metric Shape (CVPR 19)

image+depth, lifting 8 corners, fitting CAD models.

Deep Fitting Degree Scoring Network for Monocular 3D Object Detection (CVPR 19)

image, anchors, fitting IoU.

Joint Monocular 3D Vehicle Detection and Tracking (ICCV 19)

image, video, depth-order tracking.

M3D-RPN: Monocular 3D Region Proposal Network for Object Detection (ICCV 19)

image, end-to-end, depth-aware convolution, anchors.

Accurate Monocular 3D Object Detection via Color-Embedded 3D Reconstruction for Autonomous Driving (ICCV 19)

lidar, rgb-aug, seg.

Deep Optics for Monocular Depth Estimation and 3D Object Detection (ICCV 19)

lidar, sensor depth.

MonoLoco: Monocular 3D Pedestrian Localization and Uncertainty Estimation (ICCV 19)

image, pedestrian, uncertainty.

Disentangling Monocular 3D Object Detection (ICCV 19)

image, disentangle 2D and 3D det. (paradigm)

Monocular 3D Object Detection with Pseudo-LiDAR Point Cloud (ICCVW 19)

lidar, instance mask, 2d-3d box consistency constraint.

Mono3D++: Monocular 3D Vehicle Detection with Two-Scale 3D Hypotheses and Task Priors (AAAI 19)

image+depth, morphable wireframe model.

MonoGRNet: A Geometric Reasoning Network for Monocular 3D Object Localization (AAAI 19)

image+depth, instance depth, delta, multi-stage.

Orthographic Feature Transform for Monocular 3D Object Detection (BMVC 19)

image, feature transformed to BEV.

Shift R-CNN: Deep Monocular 3D Object Detection with Closed-Form Geometric Constraints (ICIP 19)

image, optimize distance to max iou.

Beyond Bounding Boxes: Using Bounding Shapes for Real-Time 3D Vehicle Detection from Monocular RGB Images (IV 19)
Deep Learning based Vehicle Position and Orientation Estimation via Inverse Perspective Mapping Image (IV 19)
Objects as Points (arXiv 19)

image, centernet.

Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss (arXiv 19)

image, box fitting with optimization.

Monocular 3D Object Detection via Geometric Reasoning on Keypoints (arXiv 19)
RefinedMPL: Refined Monocular PseudoLiDAR for 3D Object Detection in Autonomous Driving (arXiv 19)
Learning 2D to 3D Lifting for Object Detection in 3D for Autonomous Vehicles (arXiv 19)

2018

Multi-Level Fusion based 3D Object Detection from Monocular Images (CVPR 18)

image+depth, 2-stage, point 2nd-stage.

3D-RCNN: Instance-level 3D Object Reconstruction via Render-and-Compare (CVPR 18)

image, shape reconstruction, TSDF, fitting CAD models.

3D Bounding Boxes for Road Vehicles: A One-Stage, Localization Prioritized Approach using Single Monocular Images (ECCVW 18)
The Earth ain’t Flat: Monocular Reconstruction of Vehicles on Steep and Graded Roads from a Moving Camera (IROS 18)
MB-Net: MergeBoxes for Real-Time 3D Vehicles Detection (IV 18)

2017 or earlier

Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image (CVPR 17)

image, parts, shape template.

3D Bounding Box Estimation Using Deep Learning and Geometry (CVPR 17)

image, basic

Subcategory-aware Convolutional Neural Networks for Object Proposals and Detection (WACV 17)

image, sub-category.

Monocular 3D Object Detection for Autonomous Driving (CVPR 16)

image, ground constraint, anchors, energy minimization, seg.

Data-Driven 3D Voxel Patterns for Object Category Recognition (CVPR 15)

image, seg, voxel pattern, shape reconstruction.

Are Cars Just 3D Boxes? – Jointly Estimating the 3D Shape of Multiple Objects (CVPR 14)

image, wireframe 3D shape, ground.

2.2.2. Stereo-based 3D Object Detection

2D-detection based methods first generate a pair of 2D proposals from the left and right image respectively, and then estimate 3D object parameters from the paired proposals. Pseudo-LiDAR based approaches predict a disparity map by stereo matching, and then transform the disparity estimation into depth and 3D point cloud subsequently, followed by a LiDAR-based detector for 3D detection. Volume-based methods construct a 3D feature volume by view transform, and then a grid-based 3D object detector is applied on the 3D volume for detection.

2022

SIDE: Center-based Stereo 3D Detector with Structure-aware Instance Depth Estimation (WACV 22)

2021

LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector (ICCV 21)
YOLOStereo3D: A Step Back to 2D for Efficient Stereo 3D Detection (ICRA 21)
PLUMENet: Efficient 3D Object Detection from Stereo Images (IROS 21)
Shape Prior Guided Instance Disparity Estimation for 3D Object Detection (T-PAMI 21)

2020

End-to-End Pseudo-LiDAR for Image-Based 3D Object Detection (CVPR 20)
DSGN: Deep Stereo Geometry Network for 3D Object Detection (CVPR 20)
IDA-3D: Instance-Depth-Aware 3D Object Detection from Stereo Vision for Autonomous Driving (CVPR 20)
Disp R-CNN: Stereo 3D Object Detection via Shape Prior Guided Instance Disparity Estimation (CVPR 20)
Pseudo-LiDAR++: Accurate Depth for 3D Object Detection in Autonomous Driving (ICLR 20)
ZoomNet: Part-Aware Adaptive Zooming Neural Network for 3D Object Detection (AAAI 20)
Object-Centric Stereo Matching for 3D Object Detection ICRA 20
Confidence Guided Stereo 3D Object Detection with Split Depth Estimation (IROS 20)

2019

Pseudo-LiDAR from Visual Depth Estimation: Bridging the Gap in 3D Object Detection for Autonomous Driving (CVPR 19)
Stereo R-CNN based 3D Object Detection for Autonomous Driving (CVPR 19)
Triangulation Learning Network: From Monocular to Stereo 3D Object Detection (CVPR 19)
Realtime 3D Object Detection for Automated Driving Using Stereo Vision and Semantic Information (ITSC 19)

2017 or earlier

3D Object Proposals using Stereo Imagery for Accurate Object Class Detection (T-PAMI 17)
3D Object Proposals for Accurate Object Class Detection (NIPS 15)

2.2.3. Multi-view-based 3D Object Detection

2022

ImVoxelNet: Image to Voxels Projection for Monocular and Multi-View General-Purpose 3D Object Detection (WACV 22)

2021

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries (CoRL 21)

2020

siaNMS: Non-Maximum Suppression with Siamese Networks for Multi-Camera 3D Object Detection (IV 20)

2017

3D Object Localisation from Multi-View Image Detections (T-PAMI 17)

A chronological overview of the most prestigious multi-modal 3D object detection methods.

2.3.1. LiDAR & Camera Fusion for 3D Object Detection

Early Fusion

Early-fusion approaches enhance point cloud features with image information before they are passed through a LiDAR-based 3D object detector. In region-level knowledge fusion, 2D detection is firstly employed on images to generate 2D bounding boxes. Then 2D boxes are extruded into viewing frustums to select proper point cloud regions for the subsequent LiDAR-based 3D object detection. In point-level knowledge fusion, semantic segmentation is firstly applied on images, and then the segmentation results are transferred from the image pixels to points and used as an additional feature attached to each point. The augmented point cloud is finally passed through a LiDAR detector for 3D object detection.

Intermediate Fusion

Intermediate fusion approaches aim to conduct multi-modal fusion at the intermediate steps of a 3D object detection pipeline. In backbone networks, pixel-to-point correspondences are firstly established by camera-to-LiDAR transform, and then with the correspondences, LiDAR features are fused with image features through diverse fusion operators. The fusion can be conducted either at the intermediate layers or only at the output feature maps. In the proposal generation and refinement stage, 3D object proposals are first generated and then projected into the camera and LiDAR views to crop features of different modalities. The multi-view features are finally fused to refine the 3D object proposals for detection.

Late Fusion

Late-fusion based approaches operate on the outputs, i.e. 3D and 2D bounding boxes, generated from a LiDAR-based 3D object detector and an image-based 2D object detector respectively. 3D boxes and 2D boxes are combined together and fused to obtain the final detection results.

2022

AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection (arXiv 22)
Fast-CLOCs: Fast Camera-LiDAR Object Candidates Fusion for 3D Object Detection (WACV 22)

2021

Multimodal Virtual Point 3D Detection (NeurIPS 21)
PointAugmenting: Cross-Modal Augmentation for 3D Object Detection (CVPR 21)
Frustum-PointPillars: A Multi-Stage Approach for 3D Object Detection using RGB Camera and LiDAR (ICCVW 21)
Multi-Stage Fusion for Multi-Class 3D Lidar Detection (ICCVW 21)
Cross-Modality 3D Object Detection (WACV 21)
Sparse-PointNet: See Further in Autonomous Vehicles (RA-L 21)
FusionPainting: Multimodal Fusion with Adaptive Attention for 3D Object Detection (ITSC 21)
MF-Net: Meta Fusion Network for 3D object detection (IJCNN 21)
Multi-Scale Spatial Transformer Network for LiDAR-Camera 3D Object Detection (IJCNN 21)
Boost 3-D Object Detection via Point Clouds Segmentation and Fused 3-D GIoU-L1 Loss (T-NNLS)
RangeLVDet: Boosting 3D Object Detection in LIDAR with Range Image and RGB Image (Sensors Journal 21)
LiDAR Cluster First and Camera Inference Later: A New Perspective Towards Autonomous Driving (arXiv 21)
Exploring Data Augmentation for Multi-Modality 3D Object Detection (arXiv 21)

2020

PointPainting: Sequential Fusion for 3D Object Detection (CVPR 20)
3D-CVF: Generating Joint Camera and LiDAR Features Using Cross-View Spatial Feature Fusion for 3D Object Detection (ECCV 20)
EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection (ECCV 20)
PI-RCNN: An Efficient Multi-Sensor 3D Object Detector with Point-Based Attentive Cont-Conv Fusion Module (AAAI 20)
CLOCs: Camera-LiDAR Object Candidates Fusion for 3D Object Detection IROS 20
LRPD: Long Range 3D Pedestrian Detection Leveraging Specific Strengths of LiDAR and RGB (ITSC 20)
Fusion of 3D LIDAR and Camera Data for Object Detection in Autonomous Vehicle Applications (Sensors Journal 20)
SemanticVoxels: Sequential Fusion for 3D Pedestrian Detection using LiDAR Point Cloud and Semantic Segmentation (MFI 20)

2019

Multi-Task Multi-Sensor Fusion for 3D Object Detection (CVPR 19)
Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds (CVPRW 19)
Sensor Fusion for Joint 3D Object Detection and Semantic Segmentation (CVPRW 19)
MVX-Net: Multimodal VoxelNet for 3D Object Detection (ICRA 19)
SEG-VoxelNet for 3D Vehicle Detection from RGB and LiDAR Data (ICRA 19)
3D Object Detection Using Scale Invariant and Feature Reweighting Networks (AAAI 19)
Frustum ConvNet: Sliding Frustums to Aggregate Local Point-Wise Features for Amodal 3D Object Detection (IROS 19)
Deep End-to-end 3D Person Detection from Camera and Lidar (ITSC 19)
RoarNet: A Robust 3D Object Detection based on RegiOn Approximation Refinement (IV 19)
SCANet: Spatial-channel attention network for 3D object detection (ICASSP 19)
One-Stage Multi-Sensor Data Fusion Convolutional Neural Network for 3D Object Detection (Sensors 19)

2018

Frustum PointNets for 3D Object Detection from RGB-D Data (CVPR 18)
PointFusion: Deep Sensor Fusion for 3D Bounding Box Estimation (CVPR 18)
Deep Continuous Fusion for Multi-Sensor 3D Object Detection (ECCV 18)
Joint 3D Proposal Generation and Object Detection from View Aggregation (IROS 18)
A General Pipeline for 3D Detection of Vehicles (ICRA 18)
Fusing Bird’s Eye View LIDAR Point Cloud and Front View Camera Image for 3D Object Detection (IV 18)
Robust Camera Lidar Sensor Fusion Via Deep Gated Information Fusion Network (IV 18)

2017 or earlier

Multi-View 3D Object Detection Network for Autonomous Driving (CVPR 17)

2.3.2. LiDAR & Other Sensors Fusion for 3D Object Detection

2021

Robust Multimodal Vehicle Detection in Foggy Weather Using Complementary Lidar and Radar Signals (CVPR 21)
CenterFusion: Center-based Radar and Camera Fusion for 3D Object Detection (WACV 21)
Graph Convolutional Networks for 3D Object Detection on Radar Data (ICCVW 21)
3D for Free: Crossmodal Transfer Learning using HD Maps (arXiv 21)
MapFusion: A General Framework for 3D Object Detection with HDMaps (arXiv 21)
Monocular 3D Vehicle Detection Using Uncalibrated Traffic Cameras through Homography (arXiv 21)

2020

What You See is What You Get: Exploiting Visibility for 3D Object Detection (CVPR 20)
RadarNet: Exploiting Radar for Robust Perception of Dynamic Objects (ECCV 20)
High Dimensional Frustum PointNet for 3D Object Detection from Camera, LiDAR, and Radar (IV 20)
Radar-Camera Sensor Fusion for Joint Object Detection and Distance Estimation in Autonomous Vehicles (IROSW 20)

2019

Vehicle Detection With Automotive Radar Using Deep Learning on Range-Azimuth-Doppler Tensors (ICCVW 19)

2018

HDNET: Exploiting HD Maps for 3D Object Detection (CoRL 18)
Sensors and Sensor Fusion in Autonomous Vehicles (TELFOR 18)

2017 or earlier

Deep Learning Based 3D Object Detection for Automotive Radar and Camera (ERC 16)

3. Temporal 3D Object Detection

A chronological overview of the most prestigious temporal 3D object detection methods.

3D object detection from LiDAR sequences

In temporal 3D object detection from LiDAR sequences, diverse temporal aggregation modules are employed to fuse features and object proposals from multi-frame point clouds.

3D object detection from streaming data

Detection from streaming data is conducted on each LiDAR packet before the scanner produces a complete sweep.

2022

Joint 3D Object Detection and Tracking Using Spatio-Temporal Representation of Camera Image and LiDAR Point Clouds (AAAI 22)

2021

PolarStream: Streaming Lidar Object Detection and Segmentation with Polar Pillars (NeurIPS 21)
Offboard 3D Object Detection from Point Cloud Sequencess (CVPR 21)
3D-MAN: 3D Multi-frame Attention Network for Object Detection (CVPR 21)
4D-Net for Learned Multi-Modal Alignment (ICCV 21)
Graph Neural Network and Spatiotemporal Transformer Attention for 3D Video Object Detection from Point Clouds (T-PAMI 21)
LaserFlow: Efficient and Probabilistic Object Detection and Motion Forecasting (RA-L 21)
VelocityNet: Motion-Driven Feature Aggregation for 3D Object Detection in Point Cloud Sequences (ICRA 21)
RV-FuseNet: Range View Based Fusion of Time-Series LiDAR Data for Joint 3D Object Detection and Motion Forecasting (IROS 21)
LiDAR-based 3D Video Object Detection with Foreground Context Modeling and Spatiotemporal Graph Reasoning (ITSC 21)
Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection for Autonomous Driving (T-CSVT 21)
Auto4D: Learning to Label 4D Objects from Sequential Point Clouds (arXiv 21)

2020

STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction (CVPR 20)
LiDAR-based Online 3D Video Object Detection with Graph-based Message Passing and Spatiotemporal Transformer Attention (CVPR 20)
An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds (ECCV 20)
Streaming Object Detection for 3-D Point Clouds (ECCV 20)
Kinematic 3D Object Detection in Monocular Video (ECCV 20)
STROBE: Streaming Object Detection from LiDAR Packets (CoRL 20)
3D Object Detection and Tracking Based on Streaming Data (ICRA 20)
3D Object Detection For Autonomous Driving Using Temporal Lidar Data (ICIP 20)
Deep SCNN-Based Real-Time Object Detection for Self-Driving Vehicles Using LiDAR Temporal Data (IEEE Access 20)

2019

4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks (CVPR 19)
Joint Monocular 3D Vehicle Detection and Tracking (ICCV 19)

2018

Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net (CVPR 18)
YOLO4D: A Spatio-temporal Approach for Real-time Multi-object Detection and Classification from LiDAR Point Clouds (NIPSW 18)

4. Label-Efficient 3D Object Detection

4.1. Domain Adaptation for 3D Object Detection

In real-world applications, 3D object detectors suffer from severe domain gaps across different datasets, sensors, and weather conditions.

2021

Learning Transferable Features for Point Cloud Detection via 3D Contrastive Co-training (NeurIPS 21)
ST3D: Self-training for Unsupervised Domain Adaptation on 3D Object Detection (CVPR 21)
SRDAN: Scale-aware and Range-aware Domain Adaptation Network for Cross-dataset 3D Object Detection (CVPR 21)
SPG: Unsupervised Domain Adaptation for 3D Object Detection via Semantic Point Generation (ICCV 21)
Unsupervised Domain Adaptive 3D Detection with Multi-Level Consistency (ICCV 21)
PIT: Position-Invariant Transform for Cross-FoV Domain Adaptation (ICCV 21)
FAST3D: Flow-Aware Self-Training for 3D Object Detectors (BMVC 21)
What My Motion tells me about Your Pose: A Self-Supervised Monocular 3D Vehicle Detector (ICRA 21)
Adversarial Training on Point Clouds for Sim-to-Real 3D Object Detection (RA-L 21)
3D for Free: Crossmodal Transfer Learning using HD Maps (arXiv 21)
Uncertainty-aware Mean Teacher for Source-free Unsupervised Domain Adaptive 3D Object Detection (arXiv 21)
Exploiting Playbacks in Unsupervised Domain Adaptation for 3D Object Detection (arXiv 21)
See Eye to Eye: A Lidar-Agnostic 3D Detection Framework for Unsupervised Multi-Target Domain Adaptation (arXiv 21)
Attentive Prototypes for Source-free Unsupervised Domain Adaptive 3D Object Detection (arXiv 21)
Cycle and Semantic Consistent Adversarial Domain Adaptation for Reducing Simulation-to-Real Domain Shift in LiDAR Bird’s Eye View (arXiv 21)

2020

Train in Germany, Test in The USA: Making 3D Object Detectors Generalize (CVPR 20)
SF-UDA3D: Source-Free Unsupervised Domain Adaptation for LiDAR-Based 3D Object Detection (3DV 20)

2019

Transferable Semi-Supervised 3D Object Detection From RGB-D Data (ICCV 19)
Range Adaptation for 3D Object Detection in LiDAR (ICCVW 19)
Domain Adaptation for Vehicle Detection from Bird’s Eye View LiDAR Point Cloud Data (ICCVW 19)
Cross-Sensor Deep Domain Adaptation for LiDAR Detection and Segmentation (IV 19)

4.2. Weakly-supervised 3D Object Detection

Weakly-supervised approaches learn to detect 3D objects with weak supervisory signals.

2022

WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection (ICLR 22)

2021

Towards A Weakly Supervised Framework for 3D Point Cloud Object Detection and Annotation (T-PAMI 21)
FGR: Frustum-Aware Geometric Reasoning for Weakly Supervised 3D Vehicle Detection (ICRA 21)
Open-set 3D Object Detection (arXiv 21)
Lifting 2D Object Locations to 3D by Discounting LiDAR Outliers across Objects and Views (arXiv 21)

2020

Weakly Supervised 3D Object Detection from Lidar Point Cloud (ECCV 20)
Weakly Supervised 3D Object Detection from Point Clouds (ACM MM 20)

2019

Deep Active Learning for Efficient Training of a LiDAR 3D Object Detector (IV 19)
LATTE: Accelerating LiDAR Point Cloud Annotation via Sensor Fusion, One-Click Annotation, and Tracking (ITSC 19)

2018

Leveraging Pre-Trained 3D Object Detection Models For Fast Ground Truth Generation (ITSC 18)

4.3. Semi-supervised 3D Object Detection

Semi-supervised approaches first pretrain a 3D detector on the labeled data, and then use the pre-trained detector to produce pseudo labels or leverage teacher-student models for training on the unlabeled data to further boost the detection performance.

2021

3DIoUMatch: Leveraging IoU Prediction for Semi-Supervised 3D Object Detection (CVPR 21)
SE-SSD: Self-Ensembling Single-Stage Object Detector From Point Cloud (CVPR 21)
Semi-supervised 3D Object Detection via Adaptive Pseudo-Labeling (ICIP 21)
Pseudo-labeling for Scalable 3D Object Detection (arXiv 21)

4.4. Self-supervised 3D Object Detection

Self-supervised approaches first pre-train a 3D detector on the unlabeled data in a self-supervised manner, and then fine-tune the detector on the labeled data.

2022

SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations (AAAI 22)

2021

Exploring Geometry-aware Contrast and Clustering Harmonization for Self-supervised 3D Object Detection (ICCV 21)
Self-Supervised Pretraining of 3D Features on any Point-Cloud (ICCV 21)

2020

PointContrast: Unsupervised Pre-training for 3D Point Cloud Understanding (ECCV 20)

5. 3D Object Detection in Driving Systems

5.1. End-to-end Learning for Autonomous Driving

End-to-end autonomous driving aims to integrate all tasks in autonomous driving, e.g. perception, prediction, planning, control, mapping, localization, into a unified framework and learn these tasks in an end-to-end manner.

2022

Hindsight is 20/20: Leveraging Past Traversals to Aid 3D Perception (ICLR 22)

2021

MP3: A Unified Model to Map, Perceive, Predict and Plan (CVPR 21)
Deep Multi-Task Learning for Joint Localization, Perception, and Prediction (CVPR 21)
LookOut: Diverse Multi-Future Prediction and Planning for Self-Driving (ICCV 21)
LaserFlow: Efficient and Probabilistic Object Detection and Motion Forecasting (RA-L 21)
Perceive, Attend, and Drive: Learning Spatial Attention for Safe Self-Driving (ICRA 21)

2020

PnPNet: End-to-End Perception and Prediction with Tracking in the Loop (CVPR 20)
MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird’s Eye View Maps (CVPR 20)
STINet: Spatio-Temporal-Interactive Network for Pedestrian Detection and Trajectory Prediction (CVPR 20)
DSDNet: Deep Structured self-Driving Network (ECCV 20)
Testing the Safety of Self-driving Vehicles by Simulating Perception and Prediction (ECCV 20)
Perceive, Predict, and Plan: Safe Motion Planning Through Interpretable Semantic Representations (ECCV 20)
End-to-end Contextual Perception and Prediction with Interaction Transformer (IROS 20)
PointTrackNet: An End-to-End Network For 3-D Object Detection and Tracking From Point Clouds (RA-L 20)
Multimodal End-to-End Autonomous Driving (T-ITS 20)
Tracking to Improve Detection Quality in Lidar For Autonomous Driving (ICASSP 21)

2019

Monocular Plan View Networks for Autonomous Driving (IROS 19)

2018

Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net (CVPR 18)
IntentNet: Learning to Predict Intention from Raw Sensor Data (CoRL 18)
End-to-end Driving via Conditional Imitation Learning (IROS 18)
Learning to Drive in a Day (arXiv 18)

2017 or earlier

End to End Learning for Self-Driving Cars (arXiv 16)

5.2. Simulation for Autonomous Driving

2021

GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving (CVPR 21)
TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors (CVPR 21)
AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles (CVPR 21)
DriveGAN: Towards a Controllable High-Quality Neural Simulation (CVPR 21)
SceneGen: Learning to Generate Realistic Traffic Scenes (CVPR 21)
LiDAR-Aug: A General Rendering-based Augmentation Framework for 3D Object Detection (CVPR 21)
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather (ICCV 21)
There and Back Again: Learning to Simulate Radar Data for Real-World Applications (ICRA 21)
Learning to Drop Points for LiDAR Scan Synthesis (IROS 21)
VISTA 2.0: An Open, Data-driven Simulator for Multimodal Sensing and Policy Learning for Autonomous Vehicles (arXiv 21)
Generating Useful Accident-Prone Driving Scenarios via a Learned Traffic Prior (arXiv 21)

2020

LiDARsim: Realistic LiDAR Simulation by Leveraging the Real World (CVPR 20)
SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving (CVPR 20)
Testing the Safety of Self-driving Vehicles by Simulating Perception and Prediction (ECCV 20)
Learning Robust Control Policies for End-to-End Autonomous Driving from Data-Driven Simulation (RA-L 20)
Augmented LiDAR Simulator for Autonomous Driving (RA-L 20)

2019

Deep Generative Modeling of LiDAR Data (IROS 19)
AADS: Augmented Autonomous Driving Simulation using Data-driven Algorithms (Science Robotics 19)
Precise Synthetic Image and LiDAR (PreSIL) Dataset for Autonomous Vehicle Perception (IV 19)

2018

Gibson Env: Real-World Perception for Embodied Agents (CVPR 18)
Off-Road Lidar Simulation with Data-Driven Terrain Primitives (ICRA 18)
Interaction-Aware Probabilistic Behavior Prediction in Urban Environments (IROS 18)

2017 or earlier

CARLA: An Open Urban Driving Simulator (CoRL 17)
AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles (FSR 17)
The SYNTHIA Dataset: A Large Collection of Synthetic Images for Semantic Segmentation of Urban Scenes (CVPR 16)
Augmented Reality Meets Computer Vision : Efficient Data Generation for Urban Driving Scenes (arXiv 17)

5.3. Reliablity & Robustness for 3D Object Detection

2021

AdvSim: Generating Safety-Critical Scenarios for Self-Driving Vehicles (CVPR 21)
Fooling LiDAR Perception via Adversarial Trajectory Perturbation (ICCV 21)
Fog Simulation on Real LiDAR Point Clouds for 3D Object Detection in Adverse Weather (ICCV 21)
Invisible for both Camera and LiDAR: Security of Multi-Sensor Fusion based Perception in Autonomous Driving Under Physical-World Attacks (S&P 21)
Can We Use Arbitrary Objects to Attack LiDAR Perception in Autonomous Driving? (CCS 21)
Exploring Adversarial Robustness of Multi-sensor Perception Systems in Self Driving (CoRL 21)
Lidar Light Scattering Augmentation (LISA): Physics-based Simulation of Adverse Weather Conditions for 3D Object Detection (arXiv 21)
3D-VField: Learning to Adversarially Deform Point Clouds for Robust 3D Object Detection (arXiv 21)

2020

Physically Realizable Adversarial Examples for LiDAR Object Detection (CVPR 20)
Seeing Through Fog Without Seeing Fog: Deep Multimodal Sensor Fusion in Unseen Adverse Weather (CVPR 20)
Learning an Uncertainty-Aware Object Detector for Autonomous Driving (IROS 20)
Inferring Spatial Uncertainty in Object Detection (IROS 20)
Towards Better Performance and More Explainable Uncertainty for 3D Object Detection of Autonomous Vehicles (ITSC 20)
Towards Robust LiDAR-based Perception in Autonomous Driving: General Black-box Adversarial Sensor Attack and Countermeasures (USENIX Security 20)

2019

Robustness of 3D Deep Learning in an Adversarial Setting (CVPR 19)
Identifying Unknown Instances for Autonomous Driving (CoRL 19)
Leveraging Heteroscedastic Aleatoric Uncertainties for Robust Real-Time LiDAR 3D Object Detection (IV 19)
LiDAR Data Integrity Verification for Autonomous Vehicle (IEEE Access 19)

2018

Towards Safe Autonomous Driving: Capture Uncertainty in the Deep Neural Network For Lidar 3D Vehicle Detection (ITSC 18)

5.4. Cooperative 3D Object Detection

In collaborative 3D object detection, different vehicles can communicate with each other to obtain a more reliable detection results.

2021

Learning to Communicate and Correct Pose Errors (CoRL 21)
Learning Distilled Collaboration Graph for Multi-Agent Perception (NeurIPS 21)
Data Fusion with Split Covariance Intersection for Cooperative Perception (ITSC 21)
CoFF: Cooperative Spatial Feature Fusion for 3-D Object Detection on Autonomous Vehicles (IoT-J 21)
EMP: Edge-assisted Multi-vehicle Perception (MobiCom 21)
OPV2V: An Open Benchmark Dataset and Fusion Pipeline for Perception with Vehicle-to-Vehicle Communication (arXiv 21)

2020

When2com: Multi-Agent Perception via Communication Graph Grouping (CVPR 20)
V2VNet: Vehicle-to-Vehicle Communication for Joint Perception and Prediction (ECCV 20)
Who2com: Collaborative Perception via Learnable Handshake Communication (ICRA 20)
MLOD: Awareness of Extrinsic Perturbation in Multi-LiDAR 3D Object Detection for Autonomous Driving (IROS 20)
Cooperative Perception for 3D Object Detection in Driving Scenarios Using Infrastructure Sensors (T-ITS 20)

2019

Cooper: Cooperative Perception for Connected Autonomous Vehicles based on 3D Point Clouds (ICDCS 19)
F-Cooper: Feature based Cooperative Perception for Autonomous Vehicle Edge Computing System Using 3D Point Clouds (SEC 19)
Automatic Vehicle Tracking With Roadside LiDAR Data for the Connected-Vehicles System (IS 19)
Detection and tracking of pedestrians and vehicles using roadside LiDAR sensors (Transport 19)

2018

Collaborative Automated Driving: A Machine Learning-based Method to Enhance the Accuracy of Shared Information (ITSC 18)

2017 or earlier

Multivehicle Cooperative Driving Using Cooperative Perception: Design and Experimental Validation (T-ITS 15)
Car2X-Based Perception in a High-Level Fusion Architecture for Cooperative Perception Systems (IV 12)
V2V Communications in Automotive Multi-sensor Multi-target Tracking (VTC 08)

6. Continue Reading

CVPR 2022

Point2Seq: Detecting 3D Objects as Sequences (CVPR 22)
HyperDet3D: Learning a Scene-Conditioned 3D Object Detector (CVPR 22)
Exploring Geometry Consistency for monocular 3D object detection (CVPR 22)
MonoJSG: Joint Semantic and Geometric Cost Volume for Monocular 3D Object Detection (CVPR 22)
LiDAR Snowfall Simulation for Robust 3D Object Detection (CVPR 22)
Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection (CVPR 22)
Leveraging Object-Level Rotation Equivariance for 3D Object Detection (CVPR 22)
Rope3D: Take A New Look from the 3D Roadside Perception Dataset for Autonomous Driving and Monocular 3D Object Detection Task (CVPR 22)
Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection (CVPR 22)
OccAM’s Laser: Occlusion-based Attribution Maps for 3D Object Detectors on LiDAR Data (CVPR 22)
RBGNet: Ray-based Grouping for 3D Object Detection (CVPR 22)
MonoGround: Detecting Monocular 3D Objects from the Ground (CVPR 22)
Voxel Field Fusion for 3D Object Detection (CVPR 22)
Dimension Embeddings for Monocular 3D Object Detection (CVPR 22)
Embracing Single Stride 3D Object Detector with Sparse Transformer (CVPR 22)
Focal Sparse Convolutional Networks for 3D Object Detection (CVPR 22)
TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers (CVPR 22)
VISTA: Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention (CVPR 22)
MonoDTR: Monocular 3D Object Detection with Depth-Aware Transformer (CVPR 22)
Homography Loss for Monocular 3D Object Detection (CVPR 22)
Point2Cyl: Reverse Engineering 3D Objects – from Point Clouds to Extrusion Cylinders (CVPR 22)
SS3D: Sparsely-Supervised 3D Object Detection from Point Cloud (CVPR 22)
LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object Detection (CVPR 22)
Pseudo-Stereo for Monocular 3D Object Detection in Autonomous Driving (CVPR 22)
Bridged Transformer for Vision and Point Cloud 3D Object Detection (CVPR 22)
Voxel Set Transformer: A Set-to-Set Approach to 3D Object Detection from Point Clouds (CVPR 22)
Boosting 3D Object Detection by Simulating Multimodality on Point Clouds (CVPR 22)
Time3D: End-to-End Joint Monocular 3D Object Detection and Tracking for Autonomous Driving (CVPR 22)
A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation (CVPR 22)
3D-VField: Learning to Adversarially Deform Point Clouds for Robust 3D Object Detection (CVPR 22)
Point Density-Aware Voxels for LiDAR 3D Object Detection (CVPR 22)
D*-V2X: A Large-Scale Dataset for Vehicle-Infrastructure Cooperative 3D Object Detection (CVPR 22)
CAT-Det: Contrastively Augmented Transformer for Multi-modal 3D Object Detection (CVPR 22)

2021

WeakM3D: Towards Weakly Supervised Monocular 3D Object Detection (ICLR 22)
Hindsight is 20/20: Leveraging Past Traversals to Aid 3D Perception (ICLR 22)
MonoDistill: Learning Spatial Features for Monocular 3D Object Detection (ICLR 22)
Behind the Curtain: Learning Occluded Shapes for 3D Object Detection (AAAI 22)
SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object Detection (AAAI 22)
Joint 3D Object Detection and Tracking Using Spatio-Temporal Representation of Camera Image and LiDAR Point Clouds (AAAI 22)
SimIPU: Simple 2D Image and 3D Point Cloud Unsupervised Pre-Training for Spatial-Aware Visual Representations (AAAI 22)
Embracing Single Stride 3D Object Detector with Sparse Transformer (arXiv 21)
AFDetV2: Rethinking the Necessity of the Second Stage for Object Detection from Point Clouds (arXiv 21)
AutoAlign: Pixel-Instance Feature Aggregation for Multi-Modal 3D Object Detection (arXiv 22)
Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection (AAAI 22)

3D Object Detection for Autonomous Driving - A Comprehensive Survey [Paper Lists]

1. Data Source for 3D Object Detection

1.1. Datasets for 3D Object Detection

2022

2021s

2020

2019

2017 or earlier

1.2. Evaluation Metrics

2021

2020

1.3. Loss Functions

2021

2020

2019

2. Sensor-based 3D Object Detection

2.1. LiDAR-based 3D Object Detection

2.1.1. Point-based 3D Object Detection

2022

2021

2020

2019

2018

2.1.2. Grid-based 3D Object Detection (Voxel and Pillars)

2021

2020

2019

2018

2017 or earlier

2.1.3. Point-voxel based 3D Object Detection

2022

2021

2020

2019

2018

2.1.4. Range-based 3D Object Detection

2021

2020

2019

2.1.5. Anchor-based 3D object detection

2.1.6. Anchor-free 3D object detection

2.2 Camera-based 3D Object Detection

2.2.1. Monocular-based 3D Object Detection

2022

2021

2020

2019

2018

2017 or earlier

2.2.2. Stereo-based 3D Object Detection

2022

2021

2020

2019

2017 or earlier

2.2.3. Multi-view-based 3D Object Detection

2022

2021

2020

2017

2.3 Multi-modal 3D Object Detection

2.3.1. LiDAR & Camera Fusion for 3D Object Detection

2022

2021

2020

2019

2018

2017 or earlier

2.3.2. LiDAR & Other Sensors Fusion for 3D Object Detection

2021

2020

2019

2018

2017 or earlier

3. Temporal 3D Object Detection

2022

2021

2020

2019

2018