Review of Deep Learning-Based Semantic Segmentation

Xiangfu Zhang; Jian Liu; Zhangsong Shi; Zhonghong Wu; Zhi Wang

doi:10.3788/LOP56.150003

[1] He Y, Wang H, Zhang B. Color-based road detection in urban traffic scenes[J]. IEEE Transactions on Intelligent Transportation Systems, 5, 309-318(2004). http://ieeexplore.ieee.org/document/1364007

[2] An Z, Xu X P, Yang J H et al. Design of augmented reality head-up display system based on image semantic segmentation[J]. Acta Optica Sinica, 38, 0710004(2018).

[3] Ros G, Sellart L, Materzynska J et al. The SYNTHIA dataset: a large collection of synthetic images for semantic segmentation of urban scenes. [C]∥The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 26-July 1, 2016, Las Vegas, Nevada, USA. New York: IEEE, 3234-3243(2016).

[4] Yi Z, Criminisi A, Shotton J et al. Discriminative, semantic segmentation of brain tissue in MR images[M]. ∥Yang G Z, Hawkes D, Rueckert D, et al. Medical image computing and computer-assisted intervention -MICCAI 2009. Lecture notes in computer science. Berlin, Heidelberg: Springer, 5762, 558-565(2009).

[5] Liu H, Peng L, Wen J W. Multi-scale aware pedestrian detection algorithm based on improved full convolutional network[J]. Laser & Optoelectronics Progress, 55, 091504(2018).

[6] Simo-Serra E, Fidler S, Moreno-Noguer F et al. A high performance CRF model for clothes parsing[M]. ∥Cremers D, Reid I, Saito H, et al. Computer vision—ACCV 2014. Lecture notes in computer science. Cham: Springer, 9005, 64-81(2015).

[7] Dollar P, Appel R, Belongie S et al. Fast feature pyramids for object detection[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 1532-1545(2014). http://dl.acm.org/citation.cfm?id=2693405"

[8] Girshick R, Donahue J, Darrell T et al. Region-based convolutional networks for accurate object detection and segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 142-158(2016). http://arxiv.org/abs/1605.06409

[9] Krizhevsky A, Sutskever I, Hinton G E. ImageNet classification with deep convolutional neural networks[J]. Communications of the ACM, 60, 84-90(2017). http://dl.acm.org/citation.cfm?id=2999257

[10] Simonyan K. -04-10)[2019-01-05]. https:∥arxiv., org/abs/1409, 1556(2015).

[11] Szegedy C, Liu W, Jia Y Q et al. Going deeper with convolutions. [C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: IEEE, 15523970(2015).

[12] He K M, Zhang X Y, Ren S Q et al. Deep residual learning for image recognition. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 770-778(2016).

[13] Mohamed A R, Dahl G E, Hinton G. Acoustic modeling using deep belief networks[J]. IEEE Transactions on Audio, Speech, and Language Processing, 20, 14-22(2012). http://dl.acm.org/citation.cfm?id=2336011

[14] Cheng G J, Liu L T. Feasibility study of deep learning algorithm applied to rock image processing[J]. Software Guide, 15, 163-166(2016).

[15] Wang L, Liu Q. A multi-object image segmentation algorithm based on local features[J]. Laser & Optoelectronics Progress, 55, 061002(2018).

[16] Shi J B, Malik J. Normalized cuts and image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 888-905(2000). http://brain.oxfordjournals.org/lookup/external-ref?access_num=10.1109/34.868688&link_type=DOI

[17] Wu S Q, Nakao M, Matsuda T. Automatic GrabCut based lung extraction from endoscopic images with an initial boundary. [C]∥2016 IEEE 13th International Conference on Signal Processing (ICSP), November 6-10, 2016, Chengdu, China. New York: IEEE, 1374-1378(2017).

[18] Everingham M. Eslami S M A, van Gool L, et al. The Pascal visual object classes challenge: a retrospective[J]. International Journal of Computer Vision, 111, 98-136(2015).

[19] Hariharan B, Arbelaez P, Bourdev L et al. Semantic contours from inverse detectors. [C]∥2011 International Conference on Computer Vision, November 6-13, 2011, Barcelona, Spain. New York: IEEE, 991-998(2012).

[20] Mottaghi R, Chen X J, Liu X B et al. The role of context for object detection and semantic segmentation in the wild. [C]∥2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 23-28, 2014, Columbus, OH, USA. New York: IEEE, 891-898(2014).

[21] Wang J Y, Yuille A. Semantic part segmentation using compositional model combining shape and appearance. [C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: IEEE, 1788-1797(2015).

[22] Garcia-Garcia A, Orts-Escolano S, Oprea S et al. -04-22)[2019-01-05], org/abs/1704, 06857(2017). https://arxiv.

[23] Lin T Y, Maire M, Belongie S et al. Microsoft COCO: common objects in context[M]. ∥Fleet D, Pajdla T, Schiele B, et al. Computer vision-ECCV 2014. Lecture notes in computer science. Cham: Springer, 8693, 740-755(2014).

[24] Cordts M, Omran M, Ramos S et al. The cityscapes dataset for semantic urban scene understanding. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 3213-3223(2016).

[25] Brostow G J, Fauqueur J, Cipolla R. Semantic object classes in video:a high-definition ground truth database[J]. Pattern Recognition Letters, 30, 88-97(2009). http://dl.acm.org/citation.cfm?id=1465403&CFID=418833479&CFTOKEN=54592737

[26] Sturgess P, Alahari K, Ladicky L et al. Combining appearance and structure from motion features for road scene understanding[C]∥Proceedings of the British Machine Vision Conference 2009, September 7-10, 2009, London. Durham, England,, 62(2009).

[27] Ros G, Ramos S, Granados M et al. Vision-based offline-online perception paradigm for autonomous driving. [C]∥2015 IEEE Winter Conference on Applications of Computer Vision, January 5-9, 2015, Waikoloa, HI, USA. New York: IEEE, 231-238(2015).

[28] Zhang R, Candra S A, Vetter K et al. Sensor fusion for semantic segmentation of urban scenes. [C]∥2015 IEEE International Conference on Robotics and Automation (ICRA), May 26-30, 2015, Seattle, WA, USA. New York: IEEE, 1850-1857(2015).

[29] Geiger A, Lenz P, Stiller C et al. Vision meets robotics: the KITTI dataset[J]. The International Journal of Robotics Research, 32, 1231-1237(2013). http://dl.acm.org/citation.cfm?id=2528333

[30] Alvarez J M, Gevers T. LeCun Y, et al. Road scene segmentation from a single image[M]. ∥ Fitzgibbon A, Lazebnik S, Perona P, et al. Computer vision-ECCV 2012. Lecture notes in computer science. Berlin, Heidelberg: Springer, 7578, 376-389(2012).

[31] Ros G, Alvarez J M. Unsupervised image transformation for outdoor semantic labelling. [C]∥2015 IEEE Intelligent Vehicles Symposium (IV), June 28-July 1, 2015, Seoul, Korea. New York: IEEE, 537-542(2015).

[32] Shelhamer E, Long J, Darrell T. Fully convolutional networks for semantic segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 640-651(2017). http://www.tandfonline.com/servlet/linkout?suffix=CIT0044&dbid=16&doi=10.1080%2F15481603.2018.1426091&key=10.1109%2FCVPR.2015.7298965

[33] Badrinarayanan V, Kendall A, Cipolla R. SegNet:a deep convolutional encoder-decoder architecture for image segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39, 2481-2495(2017). http://www.ncbi.nlm.nih.gov/pubmed/28060704

[34] Chen L C, Papandreou G, Kokkinos I, fully connected CRFs[J/OL] et al. -06-07)[2019-01-05]. https:∥arxiv., org/abs/1412, 7062(2016).

[35] Chen L C, Papandreou G, Kokkinos I et al. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40, 834-848(2018). http://doi.ieeecomputersociety.org/10.1109/TPAMI.2017.2699184

[36] Chen L C, Papandreou G, Schroff F et al. -12-05)[2019-01-05]. https:∥arxiv., org/abs/1706, 05587(2017).

[37] Chen L C, Zhu Y K, Papandreou G et al. Encoder-decoder with atrous separable convolution for semantic image segmentation[M]. ∥ Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision-ECCV 2018. Lecture notes in computer science. Cham: Springer, 11211, 833-851(2018).

[38] Zheng S, Jayasumana S, Romera-Paredes B et al. Conditional random fields as recurrent neural networks. [C]∥2015 IEEE International Conference on Computer Vision (ICCV), December 7-13, 2015, Santiago, Chile. New York: IEEE, 1529-1537(2015).

[39] Liu W, Rabinovich A. -11-19)[2019-01-05]. https:∥arxiv., org/abs/1506, 04579(2015).

[40] Pinheiro P O, Lin T Y, Collobert R et al. Learning to refine object segments[M]. ∥Leibe B, Matas J, Sebe N, et al. Computer vision—ECCV 2016. Lecture notes in computer science. Cham: Springer, 9905, 75-91(2016).

[41] Zhao H S, Shi J P, Qi X J et al. Pyramid scene parsing network. [C]∥2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI. New York: IEEE, 6230-6239(2017).

[42] Raj A, Maturana D, Pennsylvania: Carnegie Mellon University. CMU-RITR-15-21(2015).

[43] Roy A, Todorovic S. A multi-scale CNN for affordance segmentation in RGB images[M]. ∥Leibe B, Matas J, Sebe N, et al. Computer vision-ECCV 2016. Lecture notes in computer science. Cham: Springer, 9908, 186-201(2016).

[44] Eigen D, Fergus R. Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. [C]∥2015 IEEE International Conference on Computer Vision (ICCV), December 7-13, 2015, Santiago, Chile. New York: IEEE, 2650-2658(2016).

[45] Bian X, Lim S N, Zhou N. Multiscale fully convolutional network with application to industrial inspection. [C]∥2016 IEEE Winter Conference on Applications of Computer Vision (WACV), March 7-10, 2016, Lake Placid, NY, USA. New York: IEEE, 16035894(2016).

[46] Visin F, Romero A, Cho K et al. ReSeg: a recurrent neural network-based model for semantic segmentation. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), June 26-July 1, 2016, Las Vegas, NV, USA. New York: IEEE, 426-433(2016).

[47] Li Z, Yukang Gan Y K, Liang X D et al. -07-26)[2019-01-05]. https:∥arxiv.org/abs/1604.05000v1.(2016).

[48] Pinheiro P H O. -06-12)[2019-01-05]. https:∥arxiv., org/abs/1306, 2795(2013).

[49] Byeon W, Breuel T M, Raue F et al. Scene labeling with LSTM recurrent neural networks. [C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: IEEE, 3547-3555(2015).

[50] Shuai B, Zuo Z, Wang B et al. DAG-recurrent neural networks for scene labeling. [C]∥2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 27-30, 2016, Las Vegas, NV, USA. New York: IEEE, 3620-3629(2016).

[51] Bell S, Upchurch P, Snavely N et al. Material recognition in the wild with the Materials in Context Database. [C]∥2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 7-12, 2015, Boston, MA, USA. New York: IEEE, 3479-3487(2015).

[52] Pinheiro P O, Collobert R, Dollár P. Learning to segment object candidates[C]∥ Proceedings of the 28th International Conference on Neural Information Processing Systems, December 7-12, 2015, Montreal, Canada., 2, 1990-1998(2015).

[53] Visin F, Francesco K, Cho K et al. -07-23)[2019-01-05]. https:∥arxiv., org/abs/1505, 00393(2015).

[54] Hochreiter S, Schmidhuber J. Long short-term memory[J]. Neural Computation, 9, 1735-1780(1997).

[55] Wu Z Z, King S. Investigating gated recurrent networks for speech synthesis. [C]∥2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), March 20-25, 2016, Shanghai, China. New York: IEEE, 5140-5144(2016).

[56] Li X, Jie Z Q, Wang W et al. FoveaNet: perspective-aware urban scene parsing. [C]∥2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy. New York: IEEE, 784-792(2017).

[57] Yu C Q, Wang J B, Peng C et al. Learning a discriminative feature network for semantic segmentation. [C]∥2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT, USA. New York: IEEE, 1857-1866(2018).

[58] Souly N, Spampinato C, Shah M. Semi supervised semantic segmentation using generative adversarial network. [C]∥2017 IEEE International Conference on Computer Vision (ICCV), October 22-29, 2017, Venice, Italy. New York: IEEE, 5689-5697(2017).

[59] Guo C C, Yu F Q, Chen Y. Image semantic segmentation based on convolutional neural network feature and improved superpixel matching[J]. Laser & Optoelectronics Progress, 55, 081005(2018).