Review of 3D Point Cloud Processing Methods Based on Deep Learning

Yiquan Wu; Huixian Chen; Yao Zhang

doi:10.3788/CJL230924

Performance

Structured light camera

Binocular vision camera

Time of flight camera

Principle

Project special structural patterns onto the object

Calculate depth information from two RGB images

Direct measurement based on the time of flight of light

Accuracy

Achieve high precision of

0.01‒1.00 mm in short distance

Up to millimeter precision in short distance

Up to centimeter-level accuracy

Range

Within 10 m

Within 2 m（baseline 10 mm）

Within 100 m

Resolution

Up to 1080 pixel×720 pixel

Up to 2000 pixel

Less than 640 pixel×480 pixel

Frame rate

30 frame/s

From high to low

Higher，up to hundreds of frame per second

Influencing factor

Reflection

Illumination changes and object textures，unavailable at night

Illumination changes and object textures，multiple reflections

Software complexity

Medium

High

Low

Representative

Kinect v1，Pickit，PrimeSense

PointGrey Bumblebee，ZED

Kinect v2，Terabee，Basler

Type

Ref.

Specific structure

Contribution

Limitation

CNN-based

［18］

Fully differentiable CNN

Height map denoising network

Poor denoising effect on larger holes

［19］

GCN

Robust to high levels of noise

Neighborhood size can affect performance

［20］

Geometric dual domain graph convolutional networks

Real and virtual normals are defined

Longer training time

［21］

Feature preserving normal estimation

Automatically estimate normals and update point locations

Unsuitable for severe noise and large outliers

Upsampling-

based

［25］

Denoiser and upsampler combined

Effectively resist attacks from other point cloud datasets

Unsuitable for defending against black box attacks

［27］

Networks based on discrete differential geometry

Preserve features and geometric details

Incomplete datasets are not considered

［29］

Patch correlation unit and position correction unit

Consider noise and outliers in practical applications

The patch selection strategy will affect the stability of the algorithm

［30］

Graph attention convolution and edge-aware node caching

Fine-grained edge detail is preserved with high quality

GAC modules increase computational complexity

Filter-based

［31］

Edge-aware integrated network

Suitable for dense point clouds with structure-invariant scale

Training time is long

［32］

Projection denoising method based on neural network

Direct point cloud denoising using deep learning techniques

Need enough training samples

［37］

Add repulsion term and data term to the objective function

Capable of handling fine-scale features and sharp features

Depend on the quality of the input normals

［38］

Outlier recognizer and denoiser

Identify and remove points that are far from the surface

Runtime can also be optimized

Gradient-based

［39］

Score estimation network

More robust to outliers

The gradient is discontinuous

［41］

Momentum gradient ascent

The gradient field is continuous

Need to construct an effective global gradient field

［42］

GPCD++ network framework

Lightweight network UniNet

Cannot handle large pores

Other methods

［43］

Channel attention module

Stitching local features of point clouds at multiple scales

The capture of neighborhood feature information is biased

［44］

Hybrid self-attention network

Enhance local information through Transformer

Longer training time

［48］

Unsupervised machine learning

Detect outliers by isolation forests and elliptical envelopes

High time complexity

［49］

Transformer-based

Extract multi-scale local features

High computational complexity

Type

Ref.

Specific structure

Contribution

Limitation

Octree-based

［50］

Octree encoding

Using network training entropy model

Neighborhood information not used

［51］

Multi-context deep learning

Using the feature of sibling nodes

Decoding speed can be accelerated

Hybrid representation

［52］

Voxel context compression octree structured

Suitable for static and dynamic point cloud compression

Higher resolution features are ignored

［53］

Deep autoregressive generative models

Apply autoregressive generative models to 3D

Long encoding and decoding time

［54］

Multiscale deep context model

Parallel voxel prediction

Sparse point cloud effect is poor

Other methods

［55］

Based on learning conditional probability model

Capture features and relationships of point clouds by sparse tensors

Runtime is highly dependent on the number of occupied blocks

［56］

Combination of multi-scale and sparse convolutional network

Use cross-scale，cross-group and cross-color correlations to approximate attribute probabilities

When the prediction module increases，the algorithm complexity will also increase

Type

Ref.

Specific structure

Contribution

Limitation

Octree-based

［57］

Learning approximation model based on neural network

Use octree partition to divide point cloud patches with the same size

Long training time

［58］

Multiscale end-to-end network

Learn point cloud features by sparse convolution

Noise can affect performance

Voxel-based

［59］

Variational autoencoders based on neural networks

Apply stacked 3D convolutions in a variational autoencoder structure

Convolution efficiency needs to be improved

Auto-encoder

［62］

Encoding method based on CNN

Extend deep learning coding methods

Long encoding and decoding time

［63］

Deep autoencoders with hierarchical structure

Multi-scale layered encoder to obtain features at each level

Can only handle small and fixed size point clouds

［66］

Convolutional autoencoders

Enhanced encoding robustness and more flexible decoding

Rate distortion

［67］

Compression with spatial and temporal redundancy

Increased compression ratio and compression speed

Computational cost is high

Other methods

［69］

Folding-based network

Fold the 3D manifold onto the image

Unsuitable for point clouds with complex geometries

［73］

End-to-end TransPCC framework

Lear complex relationships between points by self-attention structure

Computational efficiency needs to be improved

［74］

Multi-scale local self-attention mechanism

Capture high-level feature in dynamic local neighborhoods

Model running speed still needs to be optimized

［75］

Transformer network model based on attention mechanism

Use the Transformer to enhance point space feature perception

Long encoding and decoding time

Type

Ref.

Specific structure

Contribution

Limitation

CNN-based

［76］

Multi-level feature aggregation

Good anti-noise performance

Inability to fill large holes

［77］

Point cloud density enhanced convolutional network

Enhancing point cloud density with SRCNN

Point cloud density increase is small

［78］

Based on single LiDAR

Eliminates dependency on camera

Sensitive to outliers

［80］

Channel-based attention network

Use circular fills to solve edge recovery issues

Need more reasonable evaluation indicators

GCN-based

［82］

Graph convolutional network

Fewer network parameters

Computational cost increases

［83］

Dynamic residual graph convolutional networks

Learn local geometric features by multilayer graph convolution

Sensitive to rotating point clouds

［84］

Double-channel graph convolutional network

Apply feature similarity to construct local graphs of point clouds

Computational complexity increases

GAN-based

［85］

Based on GAN

Robust to noise and sparse point clouds

Unsuitable for filling large gaps

［86］

Adversarial residual graph network

Obtain features by graph confrontation loss function

Cannot repair large holes or missing parts

［87］

“Zero-shot” point cloud upsampling network

Training time is reduced

Complex regions are still mismapped

Other structure methods

［88］

Progressive point set upsampling network

The generated point cloud is smoother and more complete

Difficult to handle sparse low-quality point clouds

［89］

Face point cloud super-resolution network

Predict high-resolution data from low-resolution data

The preprocessing stage is not in the super-resolution network

［90］

Based on the Transformer

Different types of data can be upsampled

Consume more network parameters

Type

Ref.

Specific structure

Contribution

Limitation

Image-based

［94］

Point cloud deformation network

Invariant to disordered point clouds

Lack of some details

［95］

CNN

Efficient and scalable

Lack of projection information

Sampling-based

［88］

Multi-step upsampling network

Robust to noisy and sparse inputs

Unsuitable for sparse point clouds

［97］

Data driven

Generate more accurate upsampling with less chamfer loss

Sampling of unknown features degrades

［98］

Feature reshaping

The generated point cloud is smoother and more complete

Difficult to handle sparse input

Completion-based

［100］

Learning-based shape completion methods

Robust to occlusion and noise

Not sure if the output preserves the input points

［103］

Multi-scale generative network based on feature points

The spatial arrangement of the point cloud is preserved

Only a part of the point cloud missing area is predicted

［104］

Cascade refinement network

Remain more details

Occlusion leads to large errors

［105］

Skip-attention network

High-quality point cloud restoration

Calculation efficiency still needs to be optimized

［111］

Normalized matrix attention Transformer

Integrate features from different channels and neighborhoods

High computational complexity

Dataset

Year

Task

Website

PCD

PCC

PCSR

PCR

KITTI^［113］

2012

√

http：∥www.cvlibs.net/datasets/kitti

Paris-rue-Madame^［114］

2014

√

https：∥people.cmm.minesparis.psl.eu/users/serna/rueMadameDataset.html

SHREC15^［115］

2015

√

https：∥www.icst.pku.edu.cn/zlian/representa/3d15/index.htm

ModelNet^［116］

2015

√

http：∥modelnet.cs.princeton.edu/

ShapeNet^［117］

2015

√

https：∥shapenet.org/

vKITTI^［118］

2016

√

https：∥europe.naverlabs.com/Research/Computer-Vision/Proxy-Virtual-Worlds/

ShapeNet Part^［119］

2016

√

https：∥cs.stanford.edu/~ericyi/project_page/part_annotation/

S3DIS^［120］

2016

√

http：∥buildingparser.stanford.edu/dataset.html

MVUB

2016

√

http：∥plenodb.jpeg.org/pc/microsoft/

8iVFB

2017

√

http：∥plenodb.jpeg.org/pc/8ilabs/

3DMatch^［121］

2017

√

http：∥3Dmatch.cs.princeton.edu/#rgbd-reconstruction-datasets

ScanNet^［122］

2017

√

http：∥www.scan-net.org/

Matterport3D^［123］

2017

√

https：∥niessner.github.io/Matterport/

PU-Net^［76］

2018

√

https：∥drive.google.com/file/d/1R21MD1O6q8E7ANui8FR0MaABkKc30PG4/view

PCN^［100］

2018

√

https：∥drive.google.com/drive/folders/1M_lJN14Ac1RtPtEQxNlCV9e8pom3U6Pa

PU-GAN^［85］

2020

√

https：∥drive.google.com/file/d/1BNqjidBVWP0_MUdMTeGy1wZiR6fqyGmC/view？pli=1

SemanticKITTI^［124］

2019

√

http：∥semantic- kitti.org/

MPEG PCC^［125］

2018

√

https：∥mpeg-pcc.org/

nuScenes^［126］

2020

√

https：∥nuscenes.org/

Waymo^［127］

2020

√

https：∥waymo.com/open/

PCNet^［35］

2020

√

https：∥nuage.lix.polytechnique.fr/index.php/s/xSRrTNmtgqgeLGa

PU1K^［82］

2021

√

https：∥drive.google.com/file/d/1oTAx34YNbL6GDwHYL2qqvjmYtTVWcELg/view

Task

Evaluation indicator

Accuracy

Distance

Similarity

Others

PCD

Precision，recall，F-score，RMSE，MAE

CD，EMD，HD

PSNR

P2M

PCC

Precision，recall，F-score，RMSE，MAE

CD，EMD，HD

PSNR

BPP，time

PCSR

Precision，recall，F-score，RMSE，MAE

CD，EMD，HD

SSIM，PSNR

P2F，NUC

PCR

Precision，recall，F-score，RMSE，MAE

CD，EMD，HD

PSNR

Dataset

Method

Evaluation index for points with resolution of 10000（sparse）

P2M

1% noise

2% noise

3% noise

1% noise

2% noise

3% noise

PU-Net

PCNet^［35］

3.515

7.467

13.067

1.148

3.965

8.737

GPDNet^［19］

3.78

8.007

13.482

1.337

4.426

9.114

DMR^［46］

4.482

4.982

5.892

1.722

2.115

2.846

Score-based^［39］

2.521

3.686

4.708

0.463

1.074

1.942

PSR^［40］

2.353

3.35

4.075

0.306

0.734

1.242

GPCD++^［42］

1.881

2.728

3.433

0.251

0.654

1.161

PCNet

PCNet^［35］

3.847

8.752

14.525

1.221

3.043

5.873

GPDNet^［19］

5.47

10.006

15.521

1.973

3.65

6.353

DMR^［46］

6.602

7.145

8.087

2.152

2.237

2.487

Score-based^［39］

3.369

5.132

6.776

0.83

1.195

1.941

PSR^［40］

2.873

4.757

6.031

0.783

1.118

1.619

GPCD++^［42］

2.813

4.195

5.385

0.759

0.893

1.333

Dataset

Method

Evaluation index for points with resolution of 50000（dense）

P2M

1% noise

2% noise

3% noise

1% noise

2% noise

3% noise

PU-Net

PCNet^［35］

1.049

1.447

2.289

0.346

0.608

1.285

GPDNet^［19］

1.913

5.021

9.705

1.037

3.736

7.998

DMR^［46］

1.162

1.566

2.432

0.469

0.8

1.528

Score-based^［39］

0.716

1.288

1.928

0.15

0.566

1.041

PSR^［40］

0.649

0.997

1.344

0.076

0.296

0.531

GPCD++^［42］

0.505

0.852

1.198

0.073

0.303

0.534

PCNet

PCNet^［35］

1.293

1.913

3.249

0.289

0.505

1.076

GPDNet^［19］

5.31

7.709

11.941

1.716

2.859

5.13

DMR^［46］

1.566

2.009

2.933

0.35

0.485

0.859

Score-based^［39］

1.066

1.659

2.494

0.177

0.354

0.657

PSR^［40］

1.01

1.515

2.093

0.146

0.34

0.573

GPCD++^［42］

0.857

1.344

1.92

0.132

0.331

0.53

Method

Microsoft Voxelized Upper Bodies（MVUB）dataset

Phil9

Phil10

Ricardo9

Ricardo10

Average

Frame

245

216

—

G-PCC^［128］

1.23

1.07

1.04

1.07

0.95

VoxelDNN^［53］

0.92

0.83

0.72

0.75

0.81

MSVoxelDNN^［54］

1.02

0.95

0.99

OctAttention^［51］

0.83

0.79

0.72

0.76

Method

8i Voxelized Full Bodies（8iVFB）dataset

Loot10

Redandblack10

Boxer9/10

Thaidancer9/10

Average

Frame

300

—

G-PCC^［128］

0.95

1.09

0.96/0.94

0.99/0.99

0.99

VoxelDNN^［53］

0.64

0.73

0.76/—

0.81/—

0.73

MSVoxelDNN^［54］

0.73

0.87

—/0.70

—/0.85

0.79

Method

8iVFB dataset

KITTI dataset

MVUB dataset

Encoding

time /s

Decoding

time /s

Encoding

time /s

Decoding

time /s

Encoding

time /s

Decoding

time /s

G-PCC（octree）^［128］

1.6

0.6

—

0.73

0.07

G-PCC（trisoup）^［128］

8.1

6.6

—

2.06

1.10

G-PCC v8^［128］

—

1.30

0.55

—

Learned-PCGC^［59］

9.3

9.5

—

PCGCv2^［58］

1.6

5.4

—

0.53

0.18

SparsePCGC^［72］

—

1.44

1.32

—

PCGFormer^［74］

—

0.87

0.51

Method

CD /10^-3

HD /10^-3

P2F/10^-3

NUC 0.4% /

$10^{- 3}$

Epoch

Time

Parameter

quantity /10³

μ

σ

PU-Net^［76］

0.38

3.67

8.19

6.65

6.36

120

4.5 h

814

AR-GCN^［86］

0.23

1.78

3.02

3.52

1.29

120

6.2 h

822

MPU^［88］

0.21

1.90

1.72

2.21

1.32

400

27 h

304

PU-GAN^［85］

0.17

1.76

1.05

1.92

0.55

100

25 h

684

PU-GCN^［82］

0.26

2.62

2.15

3.01

1.75

100

9 h

542

ZSPU^［87］

0.19

1.11

2.12

2.21

2.24

96 s

310

Method

CD/10^-3

HD/10^-3

P2F/10^-3

Epoch

Time /

（10^-3 s）

Parameter quantity /10³

Model /MB

PU-Net^［76］

1.155

15.170

4.834

100

8.4

812.0

10.1

MPU^［88］

0.935

13.327

3.551

100

8.3

76.2

6.2

PU-GCN^［82］

0.585

7.577

2.499

100

8.0

76.0

1.8

PU-Transformer^［90］

0.451

3.843

1.277

100

9.9

969.9

18.4

Method

Mean chamfer distance per point on PCN dataset /10³

Average

Airplane

Cabinet

Car

Chair

Lamp

Sofa

Table

Vessel

PCN^［100］

9.64

5.50

10.63

8.70

11.00

11.34

11.68

8.59

9.67

TopNet^［101］

9.89

6.24

11.63

9.83

11.50

9.37

12.35

9.36

8.85

CRN^［104］

8.51

4.79

9.97

8.31

9.49

8.94

10.69

7.81

8.05

AGFA-Net^［109］

6.76

3.89

9.03

7.68

7.18

5.52

8.72

6.18

5.91

Method

Chamfer distance per point on ShapeNet dataset /10⁴

Average

Airplane

Cabinet

Car

Chair

Lamp

Sofa

Table

Vessel

PCN^［100］

14.72

8.09

18.32

10.53

19.33

18.52

16.44

16.34

10.21

TopNet^［101］

9.72

5.50

12.02

8.90

12.56

9.54

12.20

9.57

7.51

SA-Net^［105］

7.74

2.18

9.11

5.56

8.94

9.98

7.83

9.94

7.23

Yiquan Wu, Huixian Chen, Yao Zhang. Review of 3D Point Cloud Processing Methods Based on Deep Learning[J]. Chinese Journal of Lasers, 2024, 51(5): 0509001

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information

微信扫一扫：分享

微信扫一扫：分享