Review on RGB-D Image Classification

Tu Shuqin; Xue Yueju; Liang Yun; Huang Ning; Zhang Xiao

doi:10.3788/lop53.060003

[1] Bo L, Ren X, Fox D. Unsupervised feature learning for RGB-D based object recognition[J]. Springer Tracts in Advanced Robotics, 2013, 88: 387-402.

[2] Lai K, Bo L, Ren X, et al.. Sparse distance learning for object recognition combining RGB and depth information[C]. Robotics and Automation, International Conference on IEEE, 2011: 4007-4013.

[3] Bo L, Ren X, Fox D. Depth kernel descriptors for object recognition[C]. Intelligent Robots and Systems (IROS), International Conference on IEEE, 2011: 821-826.

[4] Bium M, Springenberg J T, Wulfing J, et al.. A learned feature descriptor for object recognition in RGB-D data[C]. Proceedings of IEEE International Conference on Robotics and Automation, 2012: 1298-1303.

[5] Kramer J, Burrus N, Echtler F, et al.. Hardware[J]. Hacking the Kinect, 2012, 14(2): 156-156.

[6] Socher R, Huval B, Bath B, et al.. Convolutional-recursive deep learning for 3d object classification[C]. Advances in Neural Information Processing Systems, 2012: 665-673.

[7] Couprie C, Farabet C, Najman L, et al.. Indoor semantic segmentation using depth information[C]. International Conference on Learning Representations, Scottsdale, Arizona, 2013.

[8] Farabet C, Couprie C, Najamn L, et al.. Learning hierarchical features for scene labeling[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2013, 35(8): 1915-1929.

[9] Gupta S, Girshick R, Pablo A, et al.. Learning rich features from RGB-D images for object detection and segmentation[C]. European Conference on Computer Vision, Zurich, Switzerland, 2014: 345-360.

[10] Rusu R B, Cousins S. 3D is here: Point cloud library (PCL)[C]. Robotics and Automation (ICRA), International Conference on IEEE, 2011: 1-4.

[11] Rusu R B, Blodow N, Marton Z C, et al.. Aligning point cloud views using persistent feature histograms[C]. Intelligent Robots and Systems, IROS 2008, IEEE/RSJ International Conference on IEEE, 2008: 3384-391.

[12] Tang J, Miller S, Singh A, et al.. A textured object recognition pipeline for color and depth image data[C]. Robotics and Automation (ICRA), 2012 IEEE International Conference on IEEE, 2012: 3467-3474.

[13] Rusu R B, Bradski G, Thibaux R, et al.. Fast 3D recognition and pose using the viewpoint feature histogram[C]. Intelligent Robots and Systems (IROS), International Conference on IEEE, 2010: 2155-2162.

[14] Wohlkinger W, Vincze M. Ensemble of shape functions for 3d object classification[C]. 2011 IEEE International Conference on Robotics and Biomimetics, 2011: 2987-2992.

[15] Kanezak A, Marton Z, Pangercic D, et al.. Voxelized shape and color histograms for RGB-D[C]. IROS Workshop on Active Semantic Perception, 2011.

[16] Choi C, Christensen H I. 3D pose estimation of daily objects using an RGB-D camera[C]. Intelligent Robots and Systems (IROS), International Conference on IEEE, 2012: 3342-3349.

[17] Tombari F, Salti S, Stefano L D. Acombined texture-shape descriptor for enhanced 3D feature matching[C]. Image Processing (ICIP), 2011 18th IEEE International Conference on 2011, 2011: 809-812.

[18] Rusu R B, Blodow N, Beetz M. Fast point feature histograms (FPFH) for 3D registration[C]. Proceedings of the IEEE international conference on Robotics and Automation IEEE Press, 2009: 3212-3217.

[19] Wohlkinger W, Vincze M. Ensemble of shape functions for 3d object classification[C]. IEEE International Conference on Robotics and Biomimetics (ROBIO), 2011: 2987-2992.

[20] Wang W, Chen L, Liu Z, et al.. Textured/textureless object recognition and pose estimation using RGB-D image[J]. Journal of Real-Time Image Processing, 2013: 1-16.

[21] Nascimento E R, Oliveira G L, Campos M F M, et al.. BRAND: A robust appearance and depth descriptor for RGB-D images[C]. Intelligent Robots and Systems (IROS), International Conference on IEEE, 2012: 1720-1726.

[22] Gupta S, Arbelaez P, Malik J. Perceptualorganization and recognition of indoor scenes from RGB-D images[C]. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2013: 564-571.

[23] Lai K, B O L, REN X, et al.. A large-scale hierarchical multi-view RGB-D object dataset[C]. Robotics and Automation (ICRA), International Conference on IEEE, 2011: 1817-1824.

[24] Janoch A, Karayev S, Jia Y, et al.. A category-level 3d object dataset: Putting the Kinect to work[M]. London: Springer, 2013: 141-165.

[25] Silberman N, Hoiem D, Kohli P, et al.. Indoor segmentation and support inference from RGBD images[M]. Heidelberg: Springer, 2012: 746-760.

[26] Hema S K, Abhishek A, Joachims T, et al.. Semantic labeling of 3D point clouds for indoor scenes[J]. Nips, 2011: 244-252.

[27] Xiao J, Owens A, TorralbaA. SUN3D: Adatabase of big spaces reconstructed using SfM and object labels[C]. IEEE International Conference on Computer VisionInstitute of Electrical and Electronics Engineers, 2014: 1625-1632.

[28] Song S, Lichtenberg S P, Xiao J. Sun RGB-D: A RGB-D scene understanding benchmark suite[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015: 567-576.

[29] Fanelli G, Dantone M, Gall J, et al.. Random forests for real time 3D face analysis[J]. International Journal of Computer Vision, 2013, 101(3): 437-458.

[30] Hinterstoisser S, Lepetit V, Ilic S, et al.. Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes[J]. Lecture Notes in Computer Science, 2012.

[31] Ali H, Shafait F, Giannakidou E, et al.. Contextual object category recognition for RGB-D scene labeling[J]. Robotics & Autonomous Systems, 2014, 62(2): 241-256.

[32] Yang C, Jang Y, Beh J, et al.. Gesture recognition using depth-based hand tracking for contactless controller application[C]. Digest of Technical Papers-IEEE International Conference on Consumer Electronics, 2012: 297-298.

[33] Schwarz M, Schulz H, Behnke S. RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features[C]. IEEE International Conference on Robotics & Automation, 2015.

[34] Zhou Wei, Liu Gang, Ma Xiaodan, et al.. Study on multi-image registration of apple tree at different growth stages[J]. Acta Optica Sinica, 2014, 34(2): 0215001.

[35] Koppula H S, Gupta R, Saxena A. Learning human activities and object affordances from RGB-D videos[J]. International Journal of Robotics Research, 2012, 32(8): 951-970.

[36] Ni B, Wang G, Mouli P. Rgbd-hudaact: A color-depth video database for human daily activity recognition[C]. In Consumer Depth Cameras for Computer Vision, 2013: 193-208.

[37] Li Xiuzhi, Yang Ailin, Qin Baoling, et al.. Monocular camera three dimensional reconstruction based on optical flow feedback[J]. Acta Optica Sinica, 2015, 35(5): 0515001.

[38] Jia Songmin, Wang Ke, Li Xiuzhi, et al.. Monocular camera three dimensional reconstruction based on variation model[J]. Acta Optica Sinica, 2014, 34(4): 0415002.

[39] Tu S Q, Xue Y J, Liang Y, et al.. Learning structured group sparse representation for RGB-D image classification[J]. Journal of Information and Computational Science, 2015, 12(11): 4357-4367.

[40] Huang Xiaolin, Xue Yueju, Tu Shuqin, et al.. RGB-D images classification based on compressed sensing theory[J]. Computer Applications and Software, 2014, 31(3): 195-197.

[41] Handa A, Whelan T, Mcdonald J, et al.. A benchmark for RGB-D visual odometry, 3D reconstruction and SLAM[C]. Robotics and Automation (ICRA), International Conference on IEEE, 2014: 1524-1531.

[42] Burgard W, Cremers D, Sturm J, et al.. A benchmark for the evaluation of RGB-D SLAM systems[C]. International Conference on Intelligent Robot Systems, 2012: 573-580.

[43] Shotton J, Girshick R, Fitzgibbon A, et al.. Efficient human pose estimation from single depth images[M]. London: Springer, 2013: 175-192.

[44] Hinterstoisser S, Lepetit V, Ilic S, et al.. Model based training, detection and pose estimation of texture-less 3D objects in heavily cluttered scenes[J]. Lecture Notes in Computer Science, 2012.

[45] Wang W, Chen L, Liu Z, et al.. Textured/textureless object recognition and pose estimation using RGB-D image[J]. Journal of Real-Time Image Processing, 2013: 1-16.

[46] Ohn-Bar E, Trivedi M M. Hand gesture recognition in real time for automotive interfaces: A multimodal vision-based approach and evaluations[J]. IEEE Transactions on Intelligent Transportation Systems, 2014, 15(6): 2368-2377.

[47] Yin Panlong, Xu Guangzhu, Lei Bangjun, et al.. Review on the technique to obtain depth information using Kinect and its application to three dimensional object recognition[J]. Journal of Integration Technology, 2013: 2(6): 94-99.

[48] Quigley M, Conley K, Gerkey B, et al.. ROS: An open-source robot operating system[C]. ICRA Workshop on Open Source Software, 2009, 3(3.2): 5.

[49] Silberman N, Fergus R. Indoor scene segmentation using a structured light sensor[C]. Computer Vision Workshops (ICCV Workshops), International Conference on IEEE, 2011: 601-608.

[50] Ren X, Bo L, Fox D. RGB-(D) scene labeling: Features and algorithm[C]. Computer Vision and Pattern Recognition, IEEE, 2012: 2759-2766.

[51] Cheng Y, Zhao X, Huang K, et al.. Semi-supervised learning for RGB-D object recognition[C]. Pattern Recognition, International Conference on IEEE, 2014.

[52] Bengio Y, Courville A, Vincent P. Representation learning: A review and new perspectives[J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2013, 35(8): 1798-1828.

[53] Wang A, Lu J, Wang G, et al.. Multi-modal unsupervised feature learning for RGB-D scene labeling[M]. London: Springer, 2014: 453-467.

[54] Eitel A, Springenberg J T, Spinello L, et al.. Multimodal deep learning for robust RGB-D object recognition[C]. CVPR, 2015.

[55] Song S, Xiao J. Sliding shapes for 3D object detection in depth images[M]. London: Springer, 2014: 634-651.