• Chinese Journal of Lasers
  • Vol. 51, Issue 5, 0509001 (2024)
Yiquan Wu*, Huixian Chen, and Yao Zhang
Author Affiliations
  • College of Electronic and Information Engineering, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, Jiangsu, China
  • show less
    DOI: 10.3788/CJL230924 Cite this Article Set citation alerts
    Yiquan Wu, Huixian Chen, Yao Zhang. Review of 3D Point Cloud Processing Methods Based on Deep Learning[J]. Chinese Journal of Lasers, 2024, 51(5): 0509001 Copy Citation Text show less
    PointCleanNet framework[35]
    Fig. 1. PointCleanNet framework[35]
    Development route of deep learning methods commonly used in four point cloud processing tasks
    Fig. 2. Development route of deep learning methods commonly used in four point cloud processing tasks
    PerformanceStructured light cameraBinocular vision cameraTime of flight camera
    PrincipleProject special structural patterns onto the objectCalculate depth information from two RGB imagesDirect measurement based on the time of flight of light
    Accuracy

    Achieve high precision of

    0.01‒1.00 mm in short distance

    Up to millimeter precision in short distanceUp to centimeter-level accuracy
    RangeWithin 10 mWithin 2 m(baseline 10 mm)Within 100 m
    ResolutionUp to 1080 pixel×720 pixelUp to 2000 pixelLess than 640 pixel×480 pixel
    Frame rate30 frame/sFrom high to lowHigher,up to hundreds of frame per second
    Influencing factorReflectionIllumination changes and object textures,unavailable at nightIllumination changes and object textures,multiple reflections
    Software complexityMediumHighLow
    RepresentativeKinect v1,Pickit,PrimeSensePointGrey Bumblebee,ZEDKinect v2,Terabee,Basler
    Table 1. Comparison of performance parameters of three depth cameras
    TypeRef.Specific structureContributionLimitation
    CNN-based18Fully differentiable CNNHeight map denoising networkPoor denoising effect on larger holes
    19GCNRobust to high levels of noiseNeighborhood size can affect performance
    20Geometric dual domain graph convolutional networksReal and virtual normals are definedLonger training time
    21Feature preserving normal estimationAutomatically estimate normals and update point locationsUnsuitable for severe noise and large outliers

    Upsampling-

    based

    25Denoiser and upsampler combinedEffectively resist attacks from other point cloud datasetsUnsuitable for defending against black box attacks
    27Networks based on discrete differential geometryPreserve features and geometric detailsIncomplete datasets are not considered
    29Patch correlation unit and position correction unitConsider noise and outliers in practical applicationsThe patch selection strategy will affect the stability of the algorithm
    30Graph attention convolution and edge-aware node cachingFine-grained edge detail is preserved with high qualityGAC modules increase computational complexity
    Filter-based31Edge-aware integrated networkSuitable for dense point clouds with structure-invariant scaleTraining time is long
    32Projection denoising method based on neural networkDirect point cloud denoising using deep learning techniquesNeed enough training samples
    37Add repulsion term and data term to the objective functionCapable of handling fine-scale features and sharp featuresDepend on the quality of the input normals
    38Outlier recognizer and denoiserIdentify and remove points that are far from the surfaceRuntime can also be optimized
    Gradient-based39Score estimation networkMore robust to outliersThe gradient is discontinuous
    41Momentum gradient ascentThe gradient field is continuousNeed to construct an effective global gradient field
    42GPCD++ network frameworkLightweight network UniNetCannot handle large pores
    Other methods43Channel attention moduleStitching local features of point clouds at multiple scalesThe capture of neighborhood feature information is biased
    44Hybrid self-attention networkEnhance local information through TransformerLonger training time
    48Unsupervised machine learningDetect outliers by isolation forests and elliptical envelopesHigh time complexity
    49Transformer-basedExtract multi-scale local featuresHigh computational complexity
    Table 2. Comparison of point cloud denoising and filtering methods based on deep learning
    TypeRef.Specific structureContributionLimitation
    Octree-based50Octree encodingUsing network training entropy modelNeighborhood information not used
    51Multi-context deep learningUsing the feature of sibling nodesDecoding speed can be accelerated
    Hybrid representation52Voxel context compression octree structuredSuitable for static and dynamic point cloud compressionHigher resolution features are ignored
    53Deep autoregressive generative modelsApply autoregressive generative models to 3DLong encoding and decoding time
    54Multiscale deep context modelParallel voxel predictionSparse point cloud effect is poor
    Other methods55Based on learning conditional probability modelCapture features and relationships of point clouds by sparse tensorsRuntime is highly dependent on the number of occupied blocks
    56Combination of multi-scale and sparse convolutional networkUse cross-scale,cross-group and cross-color correlations to approximate attribute probabilitiesWhen the prediction module increases,the algorithm complexity will also increase
    Table 3. Comparison of point cloud lossless compression methods based on deep learning
    TypeRef.Specific structureContributionLimitation
    Octree-based57Learning approximation model based on neural networkUse octree partition to divide point cloud patches with the same sizeLong training time
    58Multiscale end-to-end networkLearn point cloud features by sparse convolutionNoise can affect performance
    Voxel-based59Variational autoencoders based on neural networksApply stacked 3D convolutions in a variational autoencoder structureConvolution efficiency needs to be improved
    Auto-encoder62Encoding method based on CNNExtend deep learning coding methodsLong encoding and decoding time
    63Deep autoencoders with hierarchical structureMulti-scale layered encoder to obtain features at each levelCan only handle small and fixed size point clouds
    66Convolutional autoencodersEnhanced encoding robustness and more flexible decodingRate distortion
    67Compression with spatial and temporal redundancyIncreased compression ratio and compression speedComputational cost is high
    Other methods69Folding-based networkFold the 3D manifold onto the imageUnsuitable for point clouds with complex geometries
    73End-to-end TransPCC frameworkLear complex relationships between points by self-attention structureComputational efficiency needs to be improved
    74Multi-scale local self-attention mechanismCapture high-level feature in dynamic local neighborhoodsModel running speed still needs to be optimized
    75Transformer network model based on attention mechanismUse the Transformer to enhance point space feature perceptionLong encoding and decoding time
    Table 4. Comparison of point cloud lossy compression methods based on deep learning
    TypeRef.Specific structureContributionLimitation
    CNN-based76Multi-level feature aggregationGood anti-noise performanceInability to fill large holes
    77Point cloud density enhanced convolutional networkEnhancing point cloud density with SRCNNPoint cloud density increase is small
    78Based on single LiDAREliminates dependency on cameraSensitive to outliers
    80Channel-based attention networkUse circular fills to solve edge recovery issuesNeed more reasonable evaluation indicators
    GCN-based82Graph convolutional networkFewer network parametersComputational cost increases
    83Dynamic residual graph convolutional networksLearn local geometric features by multilayer graph convolutionSensitive to rotating point clouds
    84Double-channel graph convolutional networkApply feature similarity to construct local graphs of point cloudsComputational complexity increases
    GAN-based85Based on GANRobust to noise and sparse point cloudsUnsuitable for filling large gaps
    86Adversarial residual graph networkObtain features by graph confrontation loss functionCannot repair large holes or missing parts
    87“Zero-shot” point cloud upsampling networkTraining time is reducedComplex regions are still mismapped
    Other structure methods88Progressive point set upsampling networkThe generated point cloud is smoother and more completeDifficult to handle sparse low-quality point clouds
    89Face point cloud super-resolution networkPredict high-resolution data from low-resolution dataThe preprocessing stage is not in the super-resolution network
    90Based on the TransformerDifferent types of data can be upsampledConsume more network parameters
    Table 5. Comparison of point cloud super-resolution methods based on deep learning
    TypeRef.Specific structureContributionLimitation
    Image-based94Point cloud deformation networkInvariant to disordered point cloudsLack of some details
    95CNNEfficient and scalableLack of projection information
    Sampling-based88Multi-step upsampling networkRobust to noisy and sparse inputsUnsuitable for sparse point clouds
    97Data drivenGenerate more accurate upsampling with less chamfer lossSampling of unknown features degrades
    98Feature reshapingThe generated point cloud is smoother and more completeDifficult to handle sparse input
    Completion-based100Learning-based shape completion methodsRobust to occlusion and noiseNot sure if the output preserves the input points
    103Multi-scale generative network based on feature pointsThe spatial arrangement of the point cloud is preservedOnly a part of the point cloud missing area is predicted
    104Cascade refinement networkRemain more detailsOcclusion leads to large errors
    105Skip-attention networkHigh-quality point cloud restorationCalculation efficiency still needs to be optimized
    111Normalized matrix attention TransformerIntegrate features from different channels and neighborhoodsHigh computational complexity
    Table 6. Comparison of point cloud restoration, completion and reconstruction methods based on deep learning
    DatasetYearTaskWebsite
    PCDPCCPCSRPCR
    KITTI1132012http:∥www.cvlibs.net/datasets/kitti
    Paris-rue-Madame1142014https:∥people.cmm.minesparis.psl.eu/users/serna/rueMadameDataset.html
    SHREC151152015https:∥www.icst.pku.edu.cn/zlian/representa/3d15/index.htm
    ModelNet1162015http:∥modelnet.cs.princeton.edu/
    ShapeNet1172015https:∥shapenet.org/
    vKITTI1182016https:∥europe.naverlabs.com/Research/Computer-Vision/Proxy-Virtual-Worlds/
    ShapeNet Part1192016https:∥cs.stanford.edu/~ericyi/project_page/part_annotation/
    S3DIS1202016http:∥buildingparser.stanford.edu/dataset.html
    MVUB2016http:∥plenodb.jpeg.org/pc/microsoft/
    8iVFB2017http:∥plenodb.jpeg.org/pc/8ilabs/
    3DMatch1212017http:∥3Dmatch.cs.princeton.edu/#rgbd-reconstruction-datasets
    ScanNet1222017http:∥www.scan-net.org/
    Matterport3D1232017https:∥niessner.github.io/Matterport/
    PU-Net762018https:∥drive.google.com/file/d/1R21MD1O6q8E7ANui8FR0MaABkKc30PG4/view
    PCN1002018https:∥drive.google.com/drive/folders/1M_lJN14Ac1RtPtEQxNlCV9e8pom3U6Pa
    PU-GAN852020https:∥drive.google.com/file/d/1BNqjidBVWP0_MUdMTeGy1wZiR6fqyGmC/view?pli=1
    SemanticKITTI1242019http:∥semantic- kitti.org/
    MPEG PCC1252018https:∥mpeg-pcc.org/
    nuScenes1262020https:∥nuscenes.org/
    Waymo1272020https:∥waymo.com/open/
    PCNet352020https:∥nuage.lix.polytechnique.fr/index.php/s/xSRrTNmtgqgeLGa
    PU1K822021https:∥drive.google.com/file/d/1oTAx34YNbL6GDwHYL2qqvjmYtTVWcELg/view
    Table 7. Common datasets for point cloud processing tasks based on deep learning
    TaskEvaluation indicator
    AccuracyDistanceSimilarityOthers
    PCDPrecision,recall,F-score,RMSE,MAECD,EMD,HDPSNRP2M
    PCCPrecision,recall,F-score,RMSE,MAECD,EMD,HDPSNRBPP,time
    PCSRPrecision,recall,F-score,RMSE,MAECD,EMD,HDSSIM,PSNRP2F,NUC
    PCRPrecision,recall,F-score,RMSE,MAECD,EMD,HDPSNR
    Table 8. Common evaluation indicators for point cloud processing tasks
    DatasetMethodEvaluation index for points with resolution of 10000(sparse)
    CDP2M
    1% noise2% noise3% noise1% noise2% noise3% noise
    PU-NetPCNet353.5157.46713.0671.1483.9658.737
    GPDNet193.788.00713.4821.3374.4269.114
    DMR464.4824.9825.8921.7222.1152.846
    Score-based392.5213.6864.7080.4631.0741.942
    PSR402.3533.354.0750.3060.7341.242
    GPCD++421.8812.7283.4330.2510.6541.161
    PCNetPCNet353.8478.75214.5251.2213.0435.873
    GPDNet195.4710.00615.5211.9733.656.353
    DMR466.6027.1458.0872.1522.2372.487
    Score-based393.3695.1326.7760.831.1951.941
    PSR402.8734.7576.0310.7831.1181.619
    GPCD++422.8134.1955.3850.7590.8931.333
    DatasetMethodEvaluation index for points with resolution of 50000(dense)
    CDP2M
    1% noise2% noise3% noise1% noise2% noise3% noise
    PU-NetPCNet351.0491.4472.2890.3460.6081.285
    GPDNet191.9135.0219.7051.0373.7367.998
    DMR461.1621.5662.4320.4690.81.528
    Score-based390.7161.2881.9280.150.5661.041
    PSR400.6490.9971.3440.0760.2960.531
    GPCD++420.5050.8521.1980.0730.3030.534
    PCNetPCNet351.2931.9133.2490.2890.5051.076
    GPDNet195.317.70911.9411.7162.8595.13
    DMR461.5662.0092.9330.350.4850.859
    Score-based391.0661.6592.4940.1770.3540.657
    PSR401.011.5152.0930.1460.340.573
    GPCD++420.8571.3441.920.1320.3310.53
    Table 9. Performance comparison of classic point cloud denoising methods on PU-Net and PCNet datasets
    MethodMicrosoft Voxelized Upper Bodies(MVUB)dataset
    Phil9Phil10Ricardo9Ricardo10Average
    Frame245245216216
    G-PCC1281.231.071.041.070.95
    VoxelDNN530.920.830.720.750.81
    MSVoxelDNN541.020.950.99
    OctAttention510.830.790.720.720.76
    Method8i Voxelized Full Bodies(8iVFB)dataset
    Loot10Redandblack10Boxer9/10Thaidancer9/10Average
    Frame30030011
    G-PCC1280.951.090.96/0.940.99/0.990.99
    VoxelDNN530.640.730.76/—0.81/—0.73
    MSVoxelDNN540.730.87—/0.70—/0.850.79
    Table 10. Average bits per point (bpp) results of classic point cloud lossless compression methods
    Method8iVFB datasetKITTI datasetMVUB dataset

    Encoding

    time /s

    Decoding

    time /s

    Encoding

    time /s

    Decoding

    time /s

    Encoding

    time /s

    Decoding

    time /s

    G-PCC(octree)1281.60.60.730.07
    G-PCC(trisoup)1288.16.62.061.10
    G-PCC v81281.300.55
    Learned-PCGC599.39.5
    PCGCv2581.65.40.530.18
    SparsePCGC721.441.32
    PCGFormer740.870.51
    Table 11. Comparison of average encoding and decoding time for different point cloud lossy compression methods
    MethodCD /10-3HD /10-3P2F/10-3

    NUC 0.4% /

    10-3

    EpochTime

    Parameter

    quantity /103

    μσ
    PU-Net760.383.678.196.656.361204.5 h814
    AR-GCN860.231.783.023.521.291206.2 h822
    MPU880.211.901.722.211.3240027 h304
    PU-GAN850.171.761.051.920.5510025 h684
    PU-GCN820.262.622.153.011.751009 h542
    ZSPU870.191.112.122.212.245096 s310
    Table 12. Performance comparison of different point cloud super-resolution methods on PU-Net dataset
    MethodCD/10-3HD/10-3P2F/10-3Epoch

    Time /

    (10-3 s)

    Parameter quantity /103Model /MB
    PU-Net761.15515.1704.8341008.4812.010.1
    MPU880.93513.3273.5511008.376.26.2
    PU-GCN820.5857.5772.4991008.076.01.8
    PU-Transformer900.4513.8431.2771009.9969.918.4
    Table 13. Performance comparison of different point cloud super-resolution methods on PU1K dataset
    MethodMean chamfer distance per point on PCN dataset /103
    AverageAirplaneCabinetCarChairLampSofaTableVessel
    PCN1009.645.5010.638.7011.0011.3411.688.599.67
    TopNet1019.896.2411.639.8311.509.3712.359.368.85
    CRN1048.514.799.978.319.498.9410.697.818.05
    AGFA-Net1096.763.899.037.687.185.528.726.185.91
    MethodChamfer distance per point on ShapeNet dataset /104
    AverageAirplaneCabinetCarChairLampSofaTableVessel
    PCN10014.728.0918.3210.5319.3318.5216.4416.3410.21
    TopNet1019.725.5012.028.9012.569.5412.209.577.51
    SA-Net1057.742.189.115.568.949.987.839.947.23
    Table 14. Performance comparison of different point cloud restoration, completion and reconstruction methods