• Laser & Optoelectronics Progress
  • Vol. 58, Issue 16, 1615001 (2021)
Jie Jin1, Kaiyan Liu1, and Shunkao Zha2、*
Author Affiliations
  • 1School of Electrical and Information Engineering, Tianjin University, Tianjin 300072, China
  • 2School of Software Engineering, University of Science and Technology of China, Suzhou, Jiangsu 215123, China
  • show less
    DOI: 10.3788/LOP202158.1615001 Cite this Article Set citation alerts
    Jie Jin, Kaiyan Liu, Shunkao Zha. Vision-Language Navigation Algorithm Based on Cosine Similarity[J]. Laser & Optoelectronics Progress, 2021, 58(16): 1615001 Copy Citation Text show less
    References

    [1] Brahmbhatt S, Hays J. DeepNav: learning to navigate large cities[C]. //2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, 3087-3096(2017).

    [2] Gupta S, Davidson J, Levine S et al. Cognitive mapping and planning for visual navigation[C]. //2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 21-26, 2017, Honolulu, HI, USA., 7272-7281(2017).

    [3] Chen J H, Jiang H H. Multi-scale segmentation for ridge row in vision navigation[J]. Laser & Optoelectronics Progress, 57, 081017(2020).

    [4] Parisotto E, Salakhutdinov R. Neural map: structured memory for deep reinforcement learning[C]. //Proceedings of 2018 International Conference on Learning Representations (ICLR), April 30-May 3, 2018, Vancouver, BC, Canada(2018).

    [5] Wu B, Wang X R. Inertial navigation aided image feature matching method[J]. Laser & Optoelectronics Progress, 57, 101509(2020).

    [6] Savinov N, Dosovitskiy A, Koltun V. Semi-parametric topological memory for navigation[C]. //Proceedings of 2018 International Conference on Learning Representations (ICLR), April 30-May 3, 2018, Vancouver, BC, Canada(2018).

    [7] Zhu Y K, Mottaghi R, Kolve E et al. Target-driven visual navigation in indoor scenes using deep reinforcement learning[C]. //2017 IEEE International Conference on Robotics and Automation (ICRA), May 29-June 3, 2017, Singapore., 3357-3364(2017).

    [8] Anderson P, Wu Q, Teney D et al. Vision-and-language navigation: interpreting visually-grounded navigation instructions in real environments[C]. //2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 18-23, 2018, Salt Lake City, UT, USA, 3674-3683(2018).

    [9] Wang X, Xiong W H, Wang H M et al. Look before you leap: bridging model-free and model-based reinforcement learning for planned-ahead vision-and-language navigation[M]. //Ferrari V, Hebert M, Sminchisescu C, et al. Computer vision-ECCV 2018. Lecture notes in computer science, 11220, 38-55(2018).

    [10] Fried D, Hu R, Cirik V et al. Speaker-follower models for vision-and-language navigation[C]. //Proceedings of the 32th Conference on Neural Information Processing Systems (NeurIPS), December 3-8, 2018, Montreal, Canada, 3314-3325(2018).

    [11] Ma C Y, Lu J, Wu Z et al. Self-monitoring navigation agent via auxiliary progress estimation[C]. //Proceedings of 2019 International Conference on Learning Representations (ICLR), May 6-9, 2019, New Orleans, Louisiana, United States(2019).

    [12] Xu Y, Fern A, Yoon S. Discriminative learning of beam-search heuristics for planning[C]. //Proceedings of the 20th International Joint Conference on Artificial Intelligence (IJCAI), January 6-12, 2007, Hyderabad, India, 2041-2046(2007).

    [13] Ma C Y, Wu Z X, AlRegib G et al. The regretful agent: heuristic-aided navigation through progress estimation[C]. //2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 15-20, 2019, Long Beach, CA, USA, 6725-6733(2019).

    [14] Zhu F D, Zhu Y, Chang X J et al. Vision-language navigation with self-supervised auxiliary reasoning tasks[C]. //2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 13-19, 2020, Seattle, WA, USA., 10009-10019(2020).

    [15] Majumdar A, Shrivastava A, Lee S et al. Improving vision-and-language navigation with image-text pairs from the web[M]. //Vedaldi A, Bischof H, Brox T, et al. Computer vision-ECCV 2020. Lecture notes in computer science, 12351, 259-274(2020).

    [16] Wang X, Huang Q Y, Celikyilmaz A et al. Reinforced cross-modal matching and self-supervised imitation learning for vision-language navigation[C]. //2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 15-20, 2019, Long Beach, CA, USA., 6622-6631(2019).

    [17] Vaswani A, Shazeer N, Parmar N et al. Attention is all you need[C]. //Proceedings of the 31th Conference on Neural Information Processing Systems (NIPS 2017), December 4-9, 2017, Long Beach, Canada, 5998-6008(2017).

    [18] Luong T, Pham H, Manning C D. Effective approaches to attention-based neural machine translation[C]. //Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, September 17-21, 2015, Lisbon, Portugal, 1412-1421(2015).

    [19] Chang A, Dai A, Funkhouser T et al. Matterport 3D: learning from RGB-D data in indoor environments[C]. //2017 International Conference on 3D Vision (3DV), October 10-12, 2017, Qingdao, China, 667-676(2017).

    [20] Wu H H, Su H S, Liu G H et al. Facial expression recognition algorithm based on cosine distance loss function[J]. Laser & Optoelectronics Progress, 56, 241502(2019).

    [21] Anderson P, Chang A, Chaplot D S et al. On evaluation of embodied navigation agents[EB/OL]. (2018-07-18)[2020-10-19]. https://arxiv.org/abs/1807.06757

    Jie Jin, Kaiyan Liu, Shunkao Zha. Vision-Language Navigation Algorithm Based on Cosine Similarity[J]. Laser & Optoelectronics Progress, 2021, 58(16): 1615001
    Download Citation