[1] Dalal N , Triggs B . Histograms of Oriented Gradients for Human Detection[C]// 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2005: 886–893.
[2] Felzenszwalb P, McAllester D, Ramanan D. A discriminatively trained, multiscale, deformable part model[C]//Proceedings of 2008 IEEE Conference on Computer Vision and Pattern Recognition, 2008: 1–8.
[3] Proceedings of 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014: 580–587.
Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//
[4] Girshick R. Fast R-CNN[C]//Proceedings of 2015 IEEE International Conference on Computer Vision, 2015: 1440–1448.
[5] Ren S Q, He K M, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 39(6): 1137–1149.
[6] Redmon J, Divvala S, Girshick R, et al. You only look once: unified, real-time object detection[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 779–788.
[7] Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6517–6525.
[8] detector[C]//The 14th European Conference on Computer Vision, 2016: 21–37.
Liu W, Anguelov D, Erhan D, et al. SSD: single shot MultiBox
[9] He K M, Gkioxari G, Dollár P, et al. Mask R-CNN[C]//Proceedings of 2017 IEEE International Conference on Computer Vision, 2017: 2980–2988.
[10] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 936–944.
[12] Yang D F, Sun FC, Wang S C, et al. Simultaneous estimation of ego-motion and vehicle distance by using a monocular camera[J]. Science China Information Sciences, 2014, 57(5): 1–10.
[13] Xu Y F, Wang Y, Guo L. Unsupervised ego-motion and dense depth estimation with monocular video[C]//Proceedings of 2018 IEEE 18th International Conference on Communication Tech-nology, 2018: 1306–1310.
[14] Tateno K, Tombari F, Laina I, et al. CNN-SLAM: real-time dense monocular SLAM with learned depth prediction[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition, 2017: 6565–6574.
[15] Teichmann M, Weber M, Z.llner M, et al. MultiNet: real-time joint semantic reasoning for autonomous driving[C]//Proceedings of 2018 IEEE Intelligent Vehicles Symposium (IV), 2018: 1013–1020.
[16] Li B J, Liu S, Xu W C, et al. Real-time object detection and semantic segmentation for autonomous driving[J]. Proceedings of SPIE, 2017, 10608: 106080P.
[17] Chen L F, Yang Z, Ma J J, et al. Driving scene perception network: real-time joint detection, depth estimation and semantic segmentation[C]//Proceedings of 2018 IEEE Winter Conference on Applications of Computer Vision, 2018: 1283–1291.
[19] Kong H, Audibert J Y, Ponce J. Vanishing point detection for road detection[C]//Proceedings of 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: 96–103.
[20] Moghadam P, Starzyk J A, Wijesoma W S. Fast vanishing-point detection in unstructured environments[J]. IEEE Transactions on Image Processing, 2012, 21(1): 425–430.
[21] He K M, Zhang X Y, Ren S Q, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016: 770–778.
[22] Li Z M, Peng C, Yu G, et al. DetNet: design backbone for object detection[C]//The 15th European Conference on Computer Vision, 2018: 339–354.
[23] Yu F, Koltun V. Multi-scale context aggregation by dilated con-volutions[EB/OL]. (2016-04-30). https://arxiv.org/abs/1511. 07122v2.