[7] A Vaswani, N Shazeer, N Parmar et al. Attention is all you need, 6000-6010(2017).
[11] L C Chen, G Papandreou, I Kokkinos et al. Semantic image segmentation with deep convolutional nets and fully connected CRFs(2015).
[15] A Dosovitskiy, L Beyer, A Kolesnikov et al. An image is worth 16x16 words: transformers for image recognition at scale(2021).
[21] A Krizhevsky, I Sutskever, G E Hinton. ImageNet classification with deep convolutional neural networks, 1106-1114(2012).