[1] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Human-level control through deep reinforcement learning[J].Nature, 2015, 518: 529-533.
[2] LILLICRAP T P, HUNT J J, PRITZEL A, et al.Continuous control with deep reinforcement learning[J].IEICE Transactions on Fundamentals of Electronics, Communica-tions and Computer Sciences, 2015.doi: 10.1016/S1098-3015(10)67722-4.
[3] MOGHADAM R, LEWIS F L.Output-feedback H∞ quadratic tracking control of linear systems using reinforcement learning[J].International Journal of Adaptive Control & Signal Processing, 2019, 33(2): 628-640.
[5] SUTTON R S, BARTO A G.Reinforcement learning: an introduction[M].Cambridge, Massachusetts: MIT Press, 1998.
[8] REN H, ZHANG H, WEN Y, et al.Integral reinforcement learning off-policy method for solving nonlinear multi-player nonzero-sum games with saturated actuator[J].Neurocomputing, 2019, 335(28): 96-104.
[9] ARAGON-GMEZ R, CLEMPNER J B.Traffic-signal control reinforcement learning approach for continuous-time Markov games[J].Engineering Applications of Artificial Intelligence, 2020, 89: 103415.
[13] JIANG Y, JIANG Z P.Computational adaptive optimal control for continuous time linear systems with completely unknown dynamics[J].Automatica, 2012, 48(10): 2699-2704.