Neighboring feature pooling | PointNet[11] | 2017 | Using a single symmetric function, max pooling | 3D object classification, part segmentation, scene semantic parsing | ModelNet40 | 89.2 | 86.2 | — |
ShapeNet | — | — | 83.7 |
S3DIS | 78.62 | — | 47.71 |
Point-Net++[60] | 2017 | Processing a set of points sampled in a metric space using a hierarchical fashion | Processing point sets sampled in a metric space | ModelNet40 | 91.9 | — | — |
ShapeNet | — | — | 85.1 |
PointSIFT[61] | 2018 | Stacking several orientation-encoding units to achieve multi-scale representation | Improving 3D shape representation | S3DIS | 88.72 | — | 70.23 |
SO-Net[62] | 2018 | Building a self-organizing map (SOM) to model the spatial distribution of point cloud | Point cloud reconstruction, classification, object part segmentation and shape retrieval | ModelNet40 | 93.4 | — | — |
ShapeNet | — | — | 84.6 |
3DMAX-Net[63] | 2018 | Multi-scale contextual feature learning, local and global feature aggregation | 3D semantic segmentation on large-scale point clouds | S3DIS | 79.5 | — | 47.5 |
PointWeb[64] | 2019 | Using adaptive feature adjustment (AFA) module to find the interaction between points | 3D point cloud segmentation and classification | ModelNet40 | 92.3 | 89.4 | — |
S3DIS | 86.97 | 66.64 | 60.28 |
In Ref. [65] | 2020 | Learning local point features from point cloud in different resolutions | Point cloud analysis | ModelNet40 | 93.1 | — | — |
ShapeNet | — | — | 86.4 |
ScanObjectNN | 80.3 | — | — |
RandLA-Net[66] | 2020 | Using random point sampling instead of more complex point selection approaches | 3D semantic segmentation on large-scale point clouds | Semantic 3D | 94.8 | — | 77.4 |
KITTI | — | — | 53.9 |
Graph-based methods | In Ref. [70] | 2017 | Performing convolutions over local graph neighborhoods exploiting edge labels | Graph classification | ModelNet40 | — | 87.4 | — |
ModelNet10 | — | 90.8 | — |
SPG[71] | 2018 | Capturing the organization of 3D point clouds by superpoint graph (SPG) | 3D semantic segmentation on large-scale point clouds | Semantic3D | 92.9 | — | 76.2 |
S3DIS | 85.5 | 73.0 | 62.1 |
Method | Year | Key idea | Application scenario | Dataset | Accuracy /% |
OA | MA | mIoU |
Graph-based methods | SpecGCN[12] | 2018 | Leveraging the power of spectral graph CNNs in the PointNet++ framework while adopting a different pooling strategy | 3D point cloud segmentation and classification | ModelNet40 | 91.5 | — | — |
ShapeNet | — | — | 84.6 |
RGCNN[74] | 2018 | Adding graph-signal smoothness a prior in the loss function | 3D point cloud segmentation and classification | ModelNet40 | 90.5 | 87.3 | — |
ShapeNet | — | — | 84.3 |
DGCNN[75] | 2018 | Using EdgeConv to capture and exploit fine-grained geometric properties of point clouds | 3D point cloud segmentation and classification | ModelNet40 | 92.2 | 90.2 | — |
ShapeNet | — | — | 85.1 |
LDGCNN[76] | 2019 | Removing the transformation network; linking hierarchical features from different dynamic graphs | 3D point cloud segmentation and classification | ModelNet40 | 92.9 | 90.3 | — |
ShapeNet | — | — | 85.1 |
In Ref. [72] | 2019 | Using a simple point embedding network and a new graph-structured loss function | 3D semantic segmentation on large-scale point clouds | S3DIS | 87.9 | 78.3 | 68.4 |
vKITTI | 84.3 | 67.3 | 52.0 |
In Ref. [77] | 2019 | Stacking DPAM module to gradually agglomerate points | 3D point cloud segmentation and classification | ModelNet40 | 91.9 | — | — |
ModelNet10 | 94.6 | — | — |
ShapeNet | — | — | 86.1 |
S3DIS | — | — | 64.5 |
HDGCN[79] | 2019 | Combining the hierarchical structure and the DGConv block to extract both local and global features of point clouds hierarchically | 3D semantic segmentation (indoor and outdoor scenes) | S3DIS | — | 76.11 | 66.85 |
Paris-Lille-3D | — | — | 68.30 |
PointNGCNN[80] | 2020 | Using the Chebyshev polynomials as the graph filters to extract features in the neighborhood of each point | Capturing the potential geometric information of 3D objects | ModelNet40 | 92.8 | — | — |
ShapeNet | — | — | 85.6 |
S3DIS | 87.3 | — | — |
ScanNet | 84.9 | — | — |
LKPO-GNN[81] | 2020 | Using LKPO-GNN to select multi-directional k-NNs to form the local topological structure of a centroid | Obtaining deeper feature representation | ModelNet40 | 91.4 | 88.9 | — |
ShapeNet | — | — | 85.6 |
S3DIS | 85.8 | — | 64.6 |
ScanNet | 85.3 | — | 58.4 |
CPL-Net[82] | 2020 | Using critical points layer (CPL) to reduce the number of points in an unordered point cloud and retain the important (critical) ones | 3D object classification | ModelNet40 | 92.41 | 90.53 | — |
Method | Year | Key idea | Application scenario | Dataset | Accuracy /% |
OA | MA | mIoU |
Kernel-based convolution | Pointwise CNN[13] | 2017 | Pointwise convolution which can be applied at each point in a point cloud to learn point-wise features | 3D semantic segmentation and object recognition | S3DIS | — | 74.1 | — |
PointCNN[83] | 2018 | X-Conv; weighting and permuting input points and features before processed by a typical convolution | Leveraging spatially-local correlation from data represented in point cloud | ModelNet40 | 92.5 | 88.8 | — |
S3DIS | — | — | 65.39 |
ShapeNet | — | — | 84.6 |
ScanNet | 79.7 | 55.7 | — |
PCCN[91] | 2018 | Exploiting parameterized kernel functions which span the full continuous vector space | Point cloud segmentation (indoor and outdoor scenes), lidar motion estimation of driving scenes | Stanford Large-Scale 3D Indoor Scene Dataset | — | 67.01 | 58.27 |
Driving Scenes Dataset | 95.45 | — | 58.06 |
SpiderCNN[92] | 2018 | SpiderConv; extending convolutional operations from regular grids to irregular point sets | 3D point cloud segmentation and classification | ModelNet40 | 92.4 | — | — |
ShapeNet | — | — | 85.3 |
GeoCNN[84] | 2019 | GeoConv; modeling the geometric structure of points by a decomposition and aggregation method based on vector decomposition | 3D shape classification, segmentation and object detection | ModelNet40 | 93.9 | 91.6 | — |
InterpCNN[85] | 2019 | Interp Conv; using discrete convolutional kernels and an interpolation function to explicitly measure geometric relations between input point clouds and kernel-weight coordinates | 3D shape classification, object part segmentation and indoor scene semantic parsing | ModelNet40 | 93.0 | — | — |
S3DIS | 88.7 | — | 66.7 |
ShapeNet | — | — | 86.3 |
A-CNN[86] | 2019 | Capturing the local neighborhood geometry of each point by specifying the (regular and dilated) ring-shaped structures and directions in the computation | Object classification, part segmentation, and semantic segmentation in large-scale scenes | ModelNet40 | 92.6 | 90.3 | — |
ModelNet10 | 95.5 | 95.3 | — |
S3DIS | 87.3 | — | — |
ShapeNet | — | — | 86.1 |
ScanNet | 85.4 | — | — |
In Ref. [88] | 2019 | RIConv; using low-level rotation invariant geometric features such as distances and angles to design a convolution operator for point cloud learning | 3D object classification and segmentation | ModelNet40 | 86.5 | — | — |
ShapeNet | — | — | 75.5 |
Driving Scenes Dataset | 95.45 | — | 58.06 |
Method | Year | Key idea | Application scenario | Dataset | Accuracy /% |
OA | MA | mIoU |
Kernel-based convolution | In Ref. [93] | 2019 | PointConv; taking the positions of point clouds as input and learning an MLP to approximate a weight function, then applying a inverse density scale on the learned weights to compensate the non-uniform sampling | 3D semantic segmentation; convolutional networks in 2D images of a similar structure | ModelNet40 | 92.5 | — | — |
ShapeNet | — | — | 85.7 |
ScanNet | — | — | 55.6 |
In Ref. [95] | 2019 | KPConv; using a set of kernel points to define the area where each kernel weight is applied | Adapting to the geometry of the scene objects | ModelNet40 | 92.9 | — | — |
ShapeNet | — | — | 86.4 |
SPHNet[99] | 2019 | SPHConv; employing a spherical harmonics based kernel at different layers of the network | 3D shape deep learning tasks | ModelNet40 | 87.7 | — | — |
ConvPoint[100] | 2019 | Using continuous convolution and a hierarchical data representation structure based on a search tree | Large scale indoor and outdoor semantic segmentation | ModelNet40 | 92.5 | 89.6 | — |
ShapeNet | — | — | 85.8 |
RS-CNN[96] | 2020 | RS-Conv; learning from the geometric topology constraint among points | Encoding meaningful shape information in 3D point cloud | ModelNet40 | 93.6 | — | — |
ShapeNet | — | — | 86.2 |
Attention-based methods | A-SCN[113] | 2018 | Adopting shape context as the basic building block acting like convolution in CNN | 3D point cloud classification and segmentation | ShapeNet | — | — | 84.6 |
PAN[109] | 2018 | Combining Feature Pyramid Attention (FPA) module and Global Attention Upsample (GAU) | 3D point cloud semantic segmentation (urban scenes) | PASCAL VOC 2012 | 95.7 | — | 84.0 |
Cityscapes | — | — | 78.6 |
PryramNet[110] | 2019 | Combining Graph Embedding Module(GEM) and Pyramid Attention Network(PAN) | 3D object classification and semantic segmentation | ModelNet40 | 91.5 | 88.3 | — |
S3DIS | 85.6 | — | 55.6 |
ShapeNet | — | — | 83.9 |
GAPNet[107] | 2019 | GAPLayer; embedding graph attention mechanism within stacked Multi-Layer-Perceptron (MLP) layers to learn local geometric representations | 3D shape classification and part segmentation | ModelNet40 | 92.4 | 89.7 | — |
ShapeNet | — | — | 84.7 |
Method | Year | Key idea | Application scenario | Dataset | Accuracy /% |
OA | MA | mIoU |
Attention-based methods | PAT[108] | 2019 | Using a parameter-efficient Group Shuffle Attention (GSA) to replace the costly Multi-Head Attention; Gumbel Subset Sampling (GSS) | Hierarchical multiple instance learning | ModelNet40 | 91.7 | — | — |
S3DIS | — | — | 64.28 |
LSANet[111] | 2019 | Generating Spatial Distribution Weights (SDWs) hierarchically based on the spatial relationship in local region for spatial independent operations | 3D object classification, part segmentation, and semantic segmentation | ModelNet40 | 92.3 | 89.2 | — |
S3DIS | 86.8 | — | 62.2 |
ShapeNet | — | — | 83.2 |
ScanNet | 85.1 | — | — |
In Ref. [114] | 2019 | Attention-based score refinement (ASR) module | Improving the segmentation accuracy | ShapeNet | — | — | 85.6 |
GACNet[106] | 2020 | Assigning proper attentional weights to different neighboring points | 3D point cloud semantic segmentation | Semantic3D | 91.9 | — | 70.8 |
S3DIS | 87.79 | — | 62.85 |
In Ref. [115] | 2020 | Local Attention-Edge Convolution (LAE-Conv); constructing a local graph based on the neighborhood points searched in multi-directions | Predicting dense labels for 3D point cloud segmentation | S3DIS | 88.95 | 66.3 | |
ShapeNet | — | — | 85.9 |
ScanNet | 88.6 | — | 46.9 |