[15] P Velickovic, G Cucurull, A Casanova et al. Graph attention networks, 1-12(2018).
[17] T N Kipf, M Welling. Semi-supervised classification with graph convolutional networks, 1-14(2017).
[24] I Loshchilov, F Hutter. Decoupled weight decay regularization, 1-8(2019).
[26] M Ilse, J Tomczak, M Welling. Attention-based deep multiple instance learning, 2127-2136(2018).