Point Cloud Classification Methods Based on Deep Learning: A Review

Pei Wen; Yinglei Cheng; Wangsheng Yu

doi:10.3788/LOP202158.1600003

Journals >Laser & Optoelectronics Progress >Volume 58 >Issue 16 >Page 1600003 > Article

Laser & Optoelectronics Progress
Vol. 58, Issue 16, 1600003 (2021)

Point Cloud Classification Methods Based on Deep Learning: A Review

Pei Wen^1、2, Yinglei Cheng^1、*, and Wangsheng Yu¹

Author Affiliations

¹Information and Navigation College, Air Force Engineering University, Xi'an, Shaanxi 710077, China

²The 93575 Unit of PLA, Chengde, Hebei 067000, China

show less

DOI: 10.3788/LOP202158.1600003 Cite this Article Set citation alerts

Pei Wen, Yinglei Cheng, Wangsheng Yu. Point Cloud Classification Methods Based on Deep Learning: A Review[J]. Laser & Optoelectronics Progress, 2021, 58(16): 1600003 Copy Citation Text

show less

Fig. 1. Architecture of MVCNN for point cloud classification and segmentation

Download full size

Fig. 2. Architecture of GVCNN for point cloud classification and segmentation

Download full size

Fig. 3. Architecture of MHBN for point cloud classification and segmentation

Download full size

Fig. 4. Architecture of 3D ContextNet for point cloud classification and segmentation

Download full size

Fig. 5. Architecture of SEGCloud for point cloud semantic segmentation^[57]

Download full size

Fig. 6. Architecture of PointNet++ for point cloud classification and segmentation^[60]

Download full size

Fig. 7. Architecture of dense-resolution network for point cloud classification and segmentation^[65]

Download full size

Fig. 8. Architecture of RandLA-Net for point cloud semantic segmentation^[66]

Download full size

Fig. 9. Schematic diagram of a graph-based network^[31]

Download full size

Fig. 10. Architecture of SpecGCN for point cloud classification and segmentation^[12]

Download full size

Fig. 11. Architecture of LKPO-GNN for point cloud classification and segmentation^[81]

Download full size

Fig. 12. An illustration of the continuous and discrete convolutions for local neighborhoods of a point^[31]. (a) Local neighborhoods of a point; (b) 3D continuous convolution; (c) 3D discrete convolution

Download full size

Fig. 13. Schematics of several typical convolutions. (a) Pointwise Conv^[13]; (b) GeoConv^[84]; (c) RIConv^[88]; (d) SPHConv^[99]; (e) convolutional layer of Convpoint^[100]; (f) RS-Conv^[96]

Download full size

Fig. 14. Principle of attention coefficients generation

Download full size

Fig. 15. Architecture of PATNet for point cloud classification and segmentation

Download full size

Method					Year		Key idea			Application scenario			Dataset				Accuracy /%
Method					Year		Key idea			Application scenario			Dataset				OA			MA			mIoU
Multi-view based method		MVCNN[37]			2015		Learning to recognize 3D shapes from a collection of their rendered views on 2D images			3D shape recognition			ModelNet40				90.10			—			—
		RCPCNN[39]			2017		Introducing a view clustering and pooling layer based on dominant sets			3D object recognition			ModelNet40				93.80			—			—
		SnapNet[42]			2017		Transferring the very impressive results of 2D deep segmentation networks to 3D			3D semantic segmentation			Semantic 3D				88.60			70.80			59.10
		SnapNet[42]			2017					3D semantic segmentation			SUN RGB-D				—			67.40			—
		SnapNet-R[43]			2017		Using 3D-coherent synthesis of scene observations and mixing them in a multi-view framework for 3D labeling			Semantic labeling of the scene perceived by a robot			SUN RGB-D				78.04			—			39.61
		GVCNN[38]			2018		Using a grouping strategy			3D shape classification and retrieval			ModelNet40				93.10			—			—
		MHBN[40]			2018		Aggregating local convolutional features through bilinear pooling			3D object recognition			ModelNet40				94.91			—			—
		MHBN[40]			2018					3D object recognition			ModelNet10				92.23			—			—
		In the Ref. [44]			2019		Combining CNNs with LSTM to exploit the correlative information from multiple views			3D shape recognition and 3D shape retrieval			ModelNet40				91.05			—			—
		In the Ref. [44]			2019					3D shape recognition and 3D shape retrieval			ModelNet10				95.29			—			—
		LU-Net[45]			2019		Embedding 3D local features in 2D range-images; using a U-Net			Solving the image processing problem of 3D LiDAR point cloud			KITTI				—			—			55.40
		3D-MiniNet[46]			2020		Combining 3D and 2D learning layers; learning the 2D representation through a novel projection			Fast and efficient for 3D LIDAR point cloud			Semantic-KITTI				—			—			55.80
		3D-MiniNet[46]			2020					Fast and efficient for 3D LIDAR point cloud			KITTI				—			—			58.00
Volumetric method		VoxNet[47]			2015		Integrating a volumetric occupancy grid representation with a supervised 3D CNN			Real-time object recognition			ModelNet40				85.90			83.00			—
		VoxNet[47]			2015					Real-time object recognition			ModelNet10				—			92.00			—
		3D ShapeNet[48]			2015		Using aconvolutional deep belief network to represent a geometric 3D shape as a probability distribution of binary variables on a 3D voxel grid			Joint object recognition and shape completion from 2.5D depth maps			ModelNet40				84.70			77.30			—
		3D ShapeNet[48]			2015								ModelNet10				—			83.50			—
Method				Year				Key idea			Application scenario			Dataset		Accuracy /%
Method				Year				Key idea			Application scenario			Dataset		OA			MA			mIoU
Volumetric method	OctNet[49]		2016			Using a set of unbalanced octrees to exploit the sparsity in the input data to hierarchically partition the space			3D object classification, orientation estimation and point cloud labeling			ModelNet40			86.50			—			—
	OctNet[49]		2016									ModelNet10			90.90			—			—
	FPNN[55]		2016			Representing 3D spaces as volumetric fields, using field probing filters to extract features			3D object recognition			ModelNet40			87.50			—			—
	FusionNet[59]		2016			Using both voxel and pixel representations for training relatively weak classifiers			3D CAD models classification			ModelNet40			90.80			—			—
	FusionNet[59]		2016						3D CAD models classification			ModelNet10			93.11			—			—
	O-CNN[50]		2017			Storing the octant information and CNN features into the graphics memory and executing the entire O-CNN training and evaluation on the GPU			3D object classification, shape retrieval, and shape segmentation			ModelNet40			90.60			—			85.90
	Kd-Net[52]		2017			Performing multiplicative transformations and sharing parameters according to the subdivisions of the point clouds imposed onto them by kdtrees			3D shape classification, shape retrieval, and shape part segmentation			ModelNet40			91.80			—			—
	Kd-Net[52]		2017									ModelNet10			94.00			—			77.20
	3D ContextNet[53]		2017			Exploiting the local and global contextual cues imposed by the implicit space partition of the K-d tree for feature learning			3D object classification and part segmentation			S3DIS			84.90			74.50			55.60
	SEGCloud[57]		2017			Combining 3D-FCNN, trilinear interpolation(TI), and fully connected conditional random fields (FC-CRF).			3D semantic segmentation (indoor and outdoor scenes)			Semantic 3D			—			73.08			60.30
												S3DIS			—			57.35			48.92
												NYUv2			66.82			56.43			43.45
												KITTI			—			49.46			36.78
	MSNet[54]		2018			Multi-scale voxelization			Adaptive and robust point cloud classification			MLS			83.18			—			—
												TLS			98.24			—			—
												ALS			97.02			—			—
	PointGrid[56]		2018			Incorporating a constant number of points within each grid cell			3D visual recognition			ModelNet40			92.00			88.90			—
	PointGrid[56]		2018						3D visual recognition			ShapeNet			86.10			80.50			86.40
	VV-net[58]		2018			Using a kernel-based interpolated variational autoencoder (VAE) architecture to encode the local geometry within each voxel			3D object segmentation into parts and scenes segmentation into individual objects; normal estimation			ShapeNet			—			—			87.40
	VV-net[58]		2018									S3DIS			87.78			—			78.22

Table 1. Comparison for methods based on regular representation

Method										Year						Key idea					Application scenario				Dataset								Accuracy /%
Method										Year						Key idea					Application scenario				Dataset								OA						MA						mIoU
Neighboring feature pooling					PointNet^[11]					2017						Using a single symmetric function, max pooling					3D object classification, part segmentation, scene semantic parsing				ModelNet40								89.2						86.2						—
																									ShapeNet								—						—						83.7
																									S3DIS								78.62						—						47.71
					Point-Net++^[60]					2017						Processing a set of points sampled in a metric space using a hierarchical fashion					Processing point sets sampled in a metric space				ModelNet40								91.9						—						—
					Point-Net++^[60]					2017											Processing point sets sampled in a metric space				ShapeNet								—						—						85.1
					PointSIFT^[61]					2018						Stacking several orientation-encoding units to achieve multi-scale representation					Improving 3D shape representation				S3DIS								88.72						—						70.23
					SO-Net^[62]					2018						Building a self-organizing map (SOM) to model the spatial distribution of point cloud					Point cloud reconstruction, classification, object part segmentation and shape retrieval				ModelNet40								93.4						—						—
					SO-Net^[62]					2018															ShapeNet								—						—						84.6
					3DMAX-Net^[63]					2018						Multi-scale contextual feature learning, local and global feature aggregation					3D semantic segmentation on large-scale point clouds				S3DIS								79.5						—						47.5
					PointWeb^[64]					2019						Using adaptive feature adjustment (AFA) module to find the interaction between points					3D point cloud segmentation and classification				ModelNet40								92.3						89.4						—
					PointWeb^[64]					2019											3D point cloud segmentation and classification				S3DIS								86.97						66.64						60.28
					In Ref. [65]					2020						Learning local point features from point cloud in different resolutions					Point cloud analysis				ModelNet40								93.1						—						—
																									ShapeNet								—						—						86.4
																									ScanObjectNN								80.3						—						—
					RandLA-Net^[66]					2020						Using random point sampling instead of more complex point selection approaches					3D semantic segmentation on large-scale point clouds				Semantic 3D								94.8						—						77.4
					RandLA-Net^[66]					2020											3D semantic segmentation on large-scale point clouds				KITTI								—						—						53.9
Graph-based methods					In Ref. [70]					2017						Performing convolutions over local graph neighborhoods exploiting edge labels					Graph classification				ModelNet40								—						87.4						—
					In Ref. [70]					2017											Graph classification				ModelNet10								—						90.8						—
					SPG^[71]					2018						Capturing the organization of 3D point clouds by superpoint graph (SPG)					3D semantic segmentation on large-scale point clouds				Semantic3D								92.9						—						76.2
					SPG^[71]					2018											3D semantic segmentation on large-scale point clouds				S3DIS								85.5						73.0						62.1
Method							Year						Key idea					Application scenario						Dataset								Accuracy /%
Method							Year						Key idea					Application scenario						Dataset								OA						MA						mIoU
Graph-based methods	SpecGCN^[12]						2018						Leveraging the power of spectral graph CNNs in the PointNet++ framework while adopting a different pooling strategy					3D point cloud segmentation and classification						ModelNet40								91.5						—						—
	SpecGCN^[12]						2018											3D point cloud segmentation and classification						ShapeNet								—						—						84.6
	RGCNN^[74]						2018						Adding graph-signal smoothness a prior in the loss function					3D point cloud segmentation and classification						ModelNet40								90.5						87.3						—
	RGCNN^[74]						2018						Adding graph-signal smoothness a prior in the loss function					3D point cloud segmentation and classification						ShapeNet								—						—						84.3
	DGCNN^[75]						2018						Using EdgeConv to capture and exploit fine-grained geometric properties of point clouds					3D point cloud segmentation and classification						ModelNet40								92.2						90.2						—
	DGCNN^[75]						2018											3D point cloud segmentation and classification						ShapeNet								—						—						85.1
	LDGCNN^[76]						2019						Removing the transformation network; linking hierarchical features from different dynamic graphs					3D point cloud segmentation and classification						ModelNet40								92.9						90.3						—
	LDGCNN^[76]						2019											3D point cloud segmentation and classification						ShapeNet								—						—						85.1
	In Ref. [72]						2019						Using a simple point embedding network and a new graph-structured loss function					3D semantic segmentation on large-scale point clouds						S3DIS								87.9						78.3						68.4
	In Ref. [72]						2019											3D semantic segmentation on large-scale point clouds						vKITTI								84.3						67.3						52.0
	In Ref. [77]						2019						Stacking DPAM module to gradually agglomerate points					3D point cloud segmentation and classification						ModelNet40								91.9						—						—
																								ModelNet10								94.6						—						—
																								ShapeNet								—						—						86.1
																								S3DIS								—						—						64.5
	HDGCN^[79]						2019						Combining the hierarchical structure and the DGConv block to extract both local and global features of point clouds hierarchically					3D semantic segmentation (indoor and outdoor scenes)						S3DIS								—						76.11						66.85
	HDGCN^[79]						2019											3D semantic segmentation (indoor and outdoor scenes)						Paris-Lille-3D								—						—						68.30
	PointNGCNN^[80]						2020						Using the Chebyshev polynomials as the graph filters to extract features in the neighborhood of each point					Capturing the potential geometric information of 3D objects						ModelNet40								92.8						—						—
																								ShapeNet								—						—						85.6
																								S3DIS								87.3						—						—
																								ScanNet								84.9						—						—
	LKPO-GNN^[81]						2020						Using LKPO-GNN to select multi-directional k-NNs to form the local topological structure of a centroid					Obtaining deeper feature representation						ModelNet40								91.4						88.9						—
																								ShapeNet								—						—						85.6
																								S3DIS								85.8						—						64.6
																								ScanNet								85.3						—						58.4
	CPL-Net^[82]						2020						Using critical points layer (CPL) to reduce the number of points in an unordered point cloud and retain the important (critical) ones					3D object classification						ModelNet40								92.41						90.53						—
Method									Year						Key idea				Application scenario						Dataset						Accuracy /%
Method									Year						Key idea				Application scenario						Dataset						OA						MA							mIoU
Kernel-based convolution				Pointwise CNN^[13]					2017						Pointwise convolution which can be applied at each point in a point cloud to learn point-wise features				3D semantic segmentation and object recognition						S3DIS						—						74.1							—
				PointCNN^[83]					2018						X-Conv; weighting and permuting input points and features before processed by a typical convolution				Leveraging spatially-local correlation from data represented in point cloud						ModelNet40						92.5						88.8							—
																									S3DIS						—						—							65.39
																									ShapeNet						—						—							84.6
																									ScanNet						79.7						55.7							—
				PCCN^[91]					2018						Exploiting parameterized kernel functions which span the full continuous vector space				Point cloud segmentation (indoor and outdoor scenes), lidar motion estimation of driving scenes						Stanford Large-Scale 3D Indoor Scene Dataset						—						67.01							58.27
				PCCN^[91]					2018																Driving Scenes Dataset						95.45						—							58.06
				SpiderCNN^[92]					2018						SpiderConv; extending convolutional operations from regular grids to irregular point sets				3D point cloud segmentation and classification						ModelNet40						92.4						—							—
				SpiderCNN^[92]					2018										3D point cloud segmentation and classification						ShapeNet						—						—							85.3
				GeoCNN^[84]					2019						GeoConv; modeling the geometric structure of points by a decomposition and aggregation method based on vector decomposition				3D shape classification, segmentation and object detection						ModelNet40						93.9						91.6							—
				InterpCNN^[85]					2019						Interp Conv; using discrete convolutional kernels and an interpolation function to explicitly measure geometric relations between input point clouds and kernel-weight coordinates				3D shape classification, object part segmentation and indoor scene semantic parsing						ModelNet40						93.0						—							—
																									S3DIS						88.7						—							66.7
																									ShapeNet						—						—							86.3
				A-CNN^[86]					2019						Capturing the local neighborhood geometry of each point by specifying the (regular and dilated) ring-shaped structures and directions in the computation				Object classification, part segmentation, and semantic segmentation in large-scale scenes						ModelNet40						92.6						90.3							—
																									ModelNet10						95.5						95.3							—
																									S3DIS						87.3						—							—
																									ShapeNet						—						—							86.1
																									ScanNet						85.4						—							—
				In Ref. [88]					2019						RIConv; using low-level rotation invariant geometric features such as distances and angles to design a convolution operator for point cloud learning				3D object classification and segmentation						ModelNet40						86.5						—							—
																									ShapeNet						—						—							75.5
																									Driving Scenes Dataset						95.45						—							58.06
Method								Year						Key idea								Application scenario						Dataset						Accuracy /%
Method								Year						Key idea								Application scenario						Dataset						OA					MA				mIoU
Kernel-based convolution			In Ref. [93]					2019						PointConv; taking the positions of point clouds as input and learning an MLP to approximate a weight function, then applying a inverse density scale on the learned weights to compensate the non-uniform sampling								3D semantic segmentation; convolutional networks in 2D images of a similar structure						ModelNet40						92.5					—				—
																												ShapeNet						—					—				85.7
																												ScanNet						—					—				55.6
			In Ref. [95]					2019						KPConv; using a set of kernel points to define the area where each kernel weight is applied								Adapting to the geometry of the scene objects						ModelNet40						92.9					—				—
			In Ref. [95]					2019														Adapting to the geometry of the scene objects						ShapeNet						—					—				86.4
			SPHNet^[99]					2019						SPHConv; employing a spherical harmonics based kernel at different layers of the network								3D shape deep learning tasks						ModelNet40						87.7					—				—
			ConvPoint^[100]					2019						Using continuous convolution and a hierarchical data representation structure based on a search tree								Large scale indoor and outdoor semantic segmentation						ModelNet40						92.5					89.6				—
			ConvPoint^[100]					2019														Large scale indoor and outdoor semantic segmentation						ShapeNet						—					—				85.8
			RS-CNN^[96]					2020						RS-Conv; learning from the geometric topology constraint among points								Encoding meaningful shape information in 3D point cloud						ModelNet40						93.6					—				—
			RS-CNN^[96]					2020														Encoding meaningful shape information in 3D point cloud						ShapeNet						—					—				86.2
Attention-based methods			A-SCN^[113]					2018				Adopting shape context as the basic building block acting like convolution in CNN									3D point cloud classification and segmentation					ShapeNet				—						—					84.6
			PAN^[109]					2018						Combining Feature Pyramid Attention (FPA) module and Global Attention Upsample (GAU)								3D point cloud semantic segmentation (urban scenes)						PASCAL VOC 2012						95.7					—				84.0
			PAN^[109]					2018														3D point cloud semantic segmentation (urban scenes)						Cityscapes						—					—				78.6
			PryramNet^[110]					2019						Combining Graph Embedding Module(GEM) and Pyramid Attention Network(PAN)								3D object classification and semantic segmentation						ModelNet40						91.5					88.3				—
																												S3DIS						85.6					—				55.6
																												ShapeNet						—					—				83.9
			GAPNet^[107]					2019						GAPLayer; embedding graph attention mechanism within stacked Multi-Layer-Perceptron (MLP) layers to learn local geometric representations								3D shape classification and part segmentation						ModelNet40						92.4					89.7				—
			GAPNet^[107]					2019														3D shape classification and part segmentation						ShapeNet						—					—				84.7
Method						Year						Key idea								Application scenario							Dataset					Accuracy /%
Method						Year						Key idea								Application scenario							Dataset					OA						MA				mIoU
Attention-based methods		PAT^[108]				2019						Using a parameter-efficient Group Shuffle Attention (GSA) to replace the costly Multi-Head Attention; Gumbel Subset Sampling (GSS)								Hierarchical multiple instance learning							ModelNet40					91.7						—				—
		PAT^[108]				2019														Hierarchical multiple instance learning							S3DIS					—						—				64.28
		LSANet^[111]				2019						Generating Spatial Distribution Weights (SDWs) hierarchically based on the spatial relationship in local region for spatial independent operations								3D object classification, part segmentation, and semantic segmentation							ModelNet40					92.3						89.2				—
																											S3DIS					86.8						—				62.2
																											ShapeNet					—						—				83.2
																											ScanNet					85.1						—				—
		In Ref. [114]				2019					Attention-based score refinement (ASR) module						Improving the segmentation accuracy						ShapeNet						—						—					85.6
		GACNet^[106]				2020						Assigning proper attentional weights to different neighboring points								3D point cloud semantic segmentation							Semantic3D					91.9						—				70.8
		GACNet^[106]				2020														3D point cloud semantic segmentation							S3DIS					87.79						—				62.85
		In Ref. [115]				2020						Local Attention-Edge Convolution (LAE-Conv); constructing a local graph based on the neighborhood points searched in multi-directions								Predicting dense labels for 3D point cloud segmentation							S3DIS					88.95						66.3
																											ShapeNet					—						—				85.9
																											ScanNet					88.6						—				46.9

Table 2. Comparison for methods based on original point clouds

Pei Wen, Yinglei Cheng, Wangsheng Yu. Point Cloud Classification Methods Based on Deep Learning: A Review[J]. Laser & Optoelectronics Progress, 2021, 58(16): 1600003

Download Citation

Set citation alerts for the article

Tools

Set citation alerts for the article

Save the article for my favorites

Paper Information