[6] HUNG S M,GIVIGI S N.A Q-learning approach to flock-ing with UAVs in a stochastic environment[J].IEEE Transactions on Cybernetics,2017,47(1):186-197.
[7] WANG C,WANG J,ZHANG X.A deep reinforcement learning approach to flocking and navigation of UAVs in large-scale complex environment[C]//IEEE Global Conference on Signal and Information Processing.Anaheim,CA:IEEE, 2018:1228-1232.
[8] WANG C,YAN C,XIANG X,et al.A continuous actor-critic reinforcement learning approach to flocking with fixed-wing UAVs[C]//The Eleventh Asian Conference on Machine Learning.[S.l.]:ACML,2019:64-79.
[9] HUNG S M,GIVIGI S N,NOURELDIN A.A Dyna-Q(λ) approach to flocking with fixed-wing UAVs in a stochastic environment[C]//IEEE International Conference on Systems,Man,and Cybernetics.Hong Kong:IEEE,2015:1918-1923.
[11] PACHTER M,D’AZZO J J,DARGAN J L.Automatic formation flight control[J].Journal of Guidance,Control, and Dynamics,1994,17(6):1380-1383.
[13] VAN HASSELT H,GUEZ A,SILVER D.Deep reinforcement learning with double Q-learning[C]//Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.Phoenix:AAAI,2016:2094-2100.
[14] KRSE B J A.Learning from delayed rewards[J].Robotics and Autonomous Systems,1995,15(4):233-235.
[15] MNIH V,KAVUKCUOGLU K,SILVER D,et al.Human-level control through deep reinforcement learning[J].Nature,2015,518(7540):529-533.