3D Perception and Motion Prediction with Point Cloud Learning in Autonomous Driving

The Hong Kong University of Science and Technology
Department of Computer Science and Engineering


PhD Thesis Defence


Title: "3D Perception and Motion Prediction with Point Cloud Learning in 
Autonomous Driving"

By

Mr. Maosheng YE


Abstract:

3D perception system is an essential component of robotics, especially for 
autonomous driving systems. 3D segmentation and motion prediction are crucial 
subtasks in the perception system, which provide fine-grained scene 
understanding and forecasting. The point cloud is the primary data structure 
when dealing with 3D segmentation and 3D object detection in perception. Many 
point cloud processing algorithms are proposed for fine-grained LiDAR 
segmentation based on different representations. However, different 
representations have their own pros and cons. Thus, multi-representation 
learning is a common framework to fuse the merits of multiple representations 
in order to achieve the balance among performance, efficiency, and memory 
usage. While the goal is direct and clear, finding a better and more efficient 
way to design a multi- representation framework is still challenging since it 
is related to the point cloud properties, including sparsity, irregularity, and 
the number of points when dealing with autonomous driving scenarios.

This thesis aims to study the multi-representation point cloud learning in the 
3D perception system to design an efficient network structure for demanding 
applications. For LiDAR segmentation, we utilize point representation and voxel 
representation in a unified and efficient manner. Hierarchical learning is 
proposed both in pointwise and voxelwise learning branches. Furthermore, we 
propose the voxel-as-point principle better to exploit the sparsity and 
scale-invariant in the point cloud to save the memory cost brought by point 
representation. We design an attentive scale-selection layer based on an 
attention mechanism capable of fusing multi-scale information.

Besides that, we also extend these networks to the downstream task motion 
predictions, which also process the sparse and structural data input that can 
be viewed as a special kind of temporal point cloud, namely TPCN. We are the 
first work that combines point cloud learning with motion forecasting. For 
enhancing spatial-temporal robustness under slight disturbance, we propose Dual 
Consistency Constraints that regularize the predicted trajectories under 
perturbation during training. We extensively study the efficacy of Dual 
Consistency Constraints in other state-of-the-art methods and demonstrate its 
effectiveness as a plug-in component.


Date:                   Thursday, 11 April 2024

Time:                   4:30pm - 6:30pm

Venue:                  Room 5501
                        Lifts 25/26

Chairman:               Prof. Chik Patrick YUE (ECE)

Committee Members:      Prof. Qifeng CHEN (Supervisor)
                        Prof. Dan XU
                        Prof. Pedro SANDER
                        Prof. Ling SHI (ECE)
                        Prof. Dong XU (HKU)