[3] Q. Chen et al. Enhanced LSTM for natural language inference, 1657-1668(2017).
[4] J. Devlin et al. BERT: pre-training of deep bidirectional transformers for language understanding, 4171-4186(2019).
[5] A. Grover, J. Leskovec. Node2vec: scalable feature learning for networks, 855-864(2016).
[9] M. A. Aceves-Fernandez, L. Huang, L. Xu, A. E. Miroshnichenko. Deep learning enabled nanophotonics. Advances and Applications in Deep Learning(2020).
[19] Z. Wu et al. Neuromorphic metasurface. Photonics Res., 8, 46-50(2020).
[24] H. Zheng et al. Meta-optic accelerators for object classifiers. Sci. Adv., 8, eabo6410(2022).
[28] J. Schulman et al. Proximal policy optimization algorithms(2017).
[29] Y. Bengio, D. P. Kingma, J. Ba, Y. LeCun. Adam: a method for stochastic optimization(2015).
[30] K. He et al. Deep residual learning for image recognition, 770-778(2016).
[33] M. Ranzato, I. O. Tolstikhin et al. MLP-mixer: an all-MLP architecture for vision. Advances in Neural Information Processing Systems, 34, 24261-24272(2021).
[36] L. Li et al. Machine-learning reprogrammable metasurface imager. Nat. Commun., 10, 1082(2019).
[38] W. J. Padilla, R. D. Averitt. Imaging with metamaterials. Nat. Rev. Phys., 4, 85-100(2022).