An Improved QMIX Network Based on Gradient Entropy Regularization

LU Rui; PENG Pengfei

doi:10.3969/j.issn.1671-637x.2023.04.015

[1] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Human-level control through deep reinforcement learning［J］.Nature, 2015, 518(7540): 529-533.

[2] WATTER M, SPRINGENBERG J T, BOEDECKER J, et al.Embed to control: a locally linear latent dynamics model for control from raw images［C］//Proceedings of the 28th International Conference on Neural Information Processing Systems.Cambridge:MIT Press, 2015:2746-2754.

[3] ZHANG F Y, LEITNER J, MILFORD M, et al.Towards vision-based deep reinforcement learning for robotic motion control［C］//Australasian Conference on Robotics and Automation.Canberra:Australian Robotics and Automation Association, 2015:1-8.

[4] SILVER D, HUANG A, MADDISON C J, et al.Mastering the game of Go with deep neural networks and tree search［J］.Nature, 2016, 529(7587): 484-489.

[5] TAN M.Multi-agent reinforcement learning: independent vs.cooperative agents［C］//Proceedings of the Tenth International Conference on Machine Learning.Amherst:Elsevier Inc., 1993: 330-337.

[6] FOERSTER J N, ASSAEL Y M, DE FREITAS N, et al.Learning to communicate to solve riddles with deep distributed recurrent Q-networks［EB/OL］.(2016-02-08)［2022-03-10］.https://arxiv.org/abs/1602.02672.

[7] SUNEHAG P, LEVER G, GRUSLYS A, et al.Value-decomposition networks for cooperative multi-agent learning based on team reward［C］//Proceedings of the 17th International Conference on Autonomous Agents and Multiagent Systems.Richland:International Foundation for Autonomous Agents and Multiagent Systems, 2018:2085-2087.

微信扫一扫：分享

微信扫一扫：分享