[1] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Human-level control through deep reinforcement learning[J].Nature, 2015, 518(7540): 529-533.
[2] WATTER M, SPRINGENBERG J T, BOEDECKER J, et al.Embed to control: a locally linear latent dynamics model for control from raw images[C]//Proceedings of the 28th International Conference on Neural Information Processing Systems.Cambridge:MIT Press, 2015:2746-2754.
[3] ZHANG F Y, LEITNER J, MILFORD M, et al.Towards vision-based deep reinforcement learning for robotic motion control[C]//Australasian Conference on Robotics and Automation.Canberra:Australian Robotics and Automation Association, 2015:1-8.
[4] SILVER D, HUANG A, MADDISON C J, et al.Mastering the game of Go with deep neural networks and tree search[J].Nature, 2016, 529(7587): 484-489.
[5] TAN M.Multi-agent reinforcement learning: independent vs.cooperative agents[C]//Proceedings of the Tenth International Conference on Machine Learning.Amherst:Elsevier Inc., 1993: 330-337.
[6] FOERSTER J N, ASSAEL Y M, DE FREITAS N, et al.Learning to communicate to solve riddles with deep distributed recurrent Q-networks[EB/OL].(2016-02-08)[2022-03-10].https://arxiv.org/abs/1602.02672.
[7] SUNEHAG P, LEVER G, GRUSLYS A, et al.Value-decomposition networks for cooperative multi-agent learning based on team reward[C]//Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems.Richland:International Foundation for Autonomous Agents and Multiagent Systems, 2018:2085-2087.