• Journal of Terahertz Science and Electronic Information Technology
  • Vol. 19, Issue 1, 156 (2021)
XIAN Rong*, HE Xiaohai, WU Xiaohong, and QING Linbo
Author Affiliations
  • [in Chinese]
  • show less
    DOI: 10.11805/tkyda2020172 Cite this Article
    XIAN Rong, HE Xiaohai, WU Xiaohong, QING Linbo. Visual Question Answering based on multimodal bidirectional guided attention[J]. Journal of Terahertz Science and Electronic Information Technology , 2021, 19(1): 156 Copy Citation Text show less
    References

    [1] ANTOL S,AGRAWAL A,LU J,et al. VQA:Visual Question Answering[J]. International Journal of Computer Vision, 2017, 123(1):4-31.

    [2] WANG L,LI Y,HUANG J,et al. Learning two-branch neural networks for image-text matching tasks[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019,41(2):394-407.

    [3] ANDERSON P,HE X,BUEHLER C,et al. Bottom-up and top-down attention for image captioning and visual question answering[C]// IEEE/CVF Conference on Computer Vision and Pattern Recognition. Salt Lake City,USA:IEEE, 2018: 6077-6086.

    [4] WU Q,SHEN C,WANG P,et al. Image captioning and visual question answering based on attributes and external knowledge[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2018,40(6):1367-1381.

    [5] SHIH K J,SINGH S,HOIEM D,et al. Where to look:focus regions for visual question answering[C]// Computer Vision and Pattern Recognition. Las Vegas,US:IEEE, 2016:4613-4621.

    [6] PENG L,YANG Y,BIN Y,et al. Word-to-region attention network for visual question answering[J]. Multimedia Tools and Applications, 2019,78(3):3843-3858.

    [7] LU J,YANG J,BATRA D,et al. Hierarchical question-image co-attention for visual question answering[C]// Proceedings of the 30th International Conference on Neural Information Processing Systems. Barcelona,Spain:Curran Associates, 2016: 289-297.

    [8] YANG C,JIANG M,JIANG B,et al. Co-attention network with question type for visual question answering[J]. IEEE Access, 2019(7):40771-40781.

    [9] KIM J,JUN J,ZHANG B,et al. Bilinear attention networks[C]// Neural Information Processing Systems. Montreal,Canada: Curran Associates, 2018:1564-1574.

    [10] YU Z,YU J,CUI Y,et al. Deep modular co-attention networks for visual question answering[C]// Computer Vision and Pattern Recognition. Long Beach,CA,US:IEEE, 2019:6281-6290.

    [11] ZHANG Y,HARE J,PRUGEL-BENNETT A,et al. Learning to count objects in natural images for visual question answering[C]// International Conference on Learning Representations. Vancouver,Canada:[s.n.], 2018.

    [12] VASWANI A,SHAZEER N,PARMAR N,et al. Attention is all you need[C]// Proceedings of the 31st International Conference on Neural Information Processing Systems. Long Beach,USA:Curran Associates, 2017:5998-6008.

    [14] TENEY D,ANDERSON P,HE X,et al. Tips and tricks for visual question answering:learnings from the 2017 challenge[C]// IEEE/CVF Computer Vision and Pattern Recognition. Salt Lake City,USA:IEEE, 2018:4223-4232.

    [15] GAO P,YOU H,ZHANG Z,et al. Multi-modality latent interaction network for visual question answering[C]// International Conference on Computer Vision. Seoul,Korea(South):IEEE, 2019:5825-5835.

    [16] GAO P,JIANG Z,YOU H,et al. Dynamic fusion with intra-and inter-modality attention flow for visual question answering[C]// Computer Vision and Pattern Recognition. Long Beach,CA,USA:IEEE, 2019:6639-6648.

    XIAN Rong, HE Xiaohai, WU Xiaohong, QING Linbo. Visual Question Answering based on multimodal bidirectional guided attention[J]. Journal of Terahertz Science and Electronic Information Technology , 2021, 19(1): 156
    Download Citation