A similarity-guided segmentation model for garbage detection under road scene

Caiyun Zheng; Danhua Cao; Cheng Hu1

doi:10.1007/s12200-022-00004-9

[1] Min, H., Zhu, X., Yan, B.: Research on visual algorithm of road garbage based on intelligent control of road sweeper. J. Phys. Conf. Ser. 1302(3), 032024 (2019)

[2] Rad, M.S., Kaenel, A.V., Droux, A.: A computer vision system to localize and classify wastes on the streets. In: Proceedings of International Conference on Computer Vision Systems. pp. 195–204. Springer, Cham (2017)

[3] Mittal, G., Yagnik, K.B., Garg, M., Krishnan, N.C.: Spotgarbage: smartphone app to detect garbage using deep learning. In: Proceedings of 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing. pp. 940–945. ACM, Heidelberg (2016)

[4] Balchandani, C., Hatwar, R.K., Makkar, P., Shah, Y., Eirinaki, M.: A deep learning framework for smart street cleaning. In: Proceedings of IEEE Third International Conference on Big Data Computing Service and Applications. pp. 112–117. CA, San Francisco (2017)

[5] Zeng, D., Zhang, S., Chen, F., Wang, Y.: Multi-scale cnn based garbage detection of airborne hyperspectral data. IEEE Access Pract. Innov. Open Solut. 7, 104514–104527 (2019)

[6] Wang, T., Cai, Y., Liang, L., Ye, D.: A multi-level approach to waste object segmentation. Sensors (Basel) 20(14), 3816 (2020)

[7] Proenca, P.F., Simoes, P.: TACO: trash annotations in context for litter detection. arXiv preprint arXiv: 2003.06975 (2020)

[8] Ping, P., Xu, G., Kumala, E., Gao, J.: Smart street litter detection and classification based on Faster R-CNN and edge computing. Int. J. Softw. Eng. Knowl. Eng. 30(04), 537–553 (2020)

[9] Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: Proceedings of International Conference on Learning Representations. pp. 2–4. ICLR, San Juan (2016)

[10] Romera, E., Alvarez, J.M., Bergasa, L.M., Arroyo, R.: ERFNet: efficient residual factorized convnet for real-time semantic segmentation. IEEE Trans. Intell. Transp. Syst. 1, 1–10 (2017)

[11] Badrinarayanan, V., Kendall, A., Cipolla, R.: SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017)

[12] Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Proceedings of International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham (2015)

[13] Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. pp. 2881–2890. IEEE, Honolulu (2017)

[14] Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of European Conference on Computer Vision (ECCV). pp. 801–818. Springer, Munich (2018)

[15] Li, H., Xiong, P., An, J., Wang, L.: Pyramid attention network for semantic segmentation. arXiv preprint arXiv:1805.10180 (2018)

[16] Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. pp. 3431–3440. IEEE, Boston (2015)

[17] Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: Proceedings of IEEE 2016 4th International Conference on 3D Vision. pp. 565–571. IEEE, Stanford (2016)

[18] Sudre, C.H., Li, W., Vercauteren, T., Ourselin, S., Cardoso, M.J.: Generalised Dice overlap as a deep learning loss function for highly unbalanced segmentations. In: Proceedings of Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support. pp 240–248. Springer, Québec (2017)

[19] Salehi, S.S.M., Erdogmus, D., Gholipour, A.: Tversky loss function for image segmentation using 3D fully convolutional deep networks. In: Proceedings of International Workshop on Machine Learning in Medical Imaging. pp. 379–387. Springer, Québec (2017)

[20] Shrivastava, A., Gupta, A., Girshic,k R.: Training region-based object detectors with online hard example mining. In: Proceedings of IEEE Conference on Computer Vision & Pattern Recognition. pp. 761–769. IEEE, Las Vegas (2016)

[21] Wu, Z., Shen, C., Hengel, A.V.D.: High-performance semantic segmentation using very deep fully convolutional networks. arXiv preprint arXiv:1604.04339 (2016)

[22] Snell, J., Swersky, K., Zemel, R.: Prototypical networks for fewshot learning. In: Proceedings of Advances in Neural Information Processing Systems, pp. 4077–4087. MIT Press, Long Beach (2017)

[23] Zhang, X., Wei, Y., Yang, Y., Huang, T.S.: Sg-one: similarity guidance network for one-shot semantic segmentation. IEEE Trans. Cybern. 50(9), 3855–3865 (2020)

[24] Wang, K., Liew, J.H., Zou, Y., Zhou, D. Feng J.: PANet: fewshot image semantic segmentation with prototype alignment. In: Proceedings of International Conference on Computer Vision (ICCV). pp. 9196–9205. IEEE, Seoul (2019)

[25] Paszke, A., Chaurasia, A., Kim, S., Culurciello, E.: Enet: a deep neural network architecture for real-time semantic segmentation. arXiv preprint arXiv:1606.02147 (2016)

[26] Shi, W., Caballer, J., Huszár, F., Totz, J., Wang, Z.: Real-time single image and video super-resolution using an efficient subpixel convolutional neural network. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. pp. 1874–188. IEEE: Las Vegas (2016)

[27] Zhang, Z., Zhang, X., Peng, C., Xue, X., Sun, J.: Exfuse: enhancing feature fusion for semantic segmentation. In: Proceedings of European Conference on Computer Vision. pp. 269–284. Springer, Munich (2018)

[28] Huang, G., Liu, Z., Maaten, L.V.D., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. pp. 2261–2269. IEEE, Honolulu (2017)

[29] Ma, N., Zhang, X., Zheng, H.T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. arXiv preprint arXiv:1807.11164 (2018)

[30] Khoreva, A., Benenson, R., Ilg, E., Brox, T., Schiele, B.: Lucid data dreaming for video object segmentation. arXiv preprint arXiv:1703.09554 (2017)

[31] Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv: 2004.10934 (2020)