An Online Q-Learning Algorithm for a Model-Free Infinite Horizon System

DAI Xiaoqing; ZHAO Xu

doi:10.3969/j.issn.1671-637x.2022.02.012

[1] MNIH V, KAVUKCUOGLU K, SILVER D, et al.Human-level control through deep reinforcement learning［J］.Nature, 2015, 518: 529-533.

[2] LILLICRAP T P, HUNT J J, PRITZEL A, et al.Continuous control with deep reinforcement learning［J］.IEICE Transactions on Fundamentals of Electronics, Communica-tions and Computer Sciences, 2015.doi: 10.1016/S1098-3015(10)67722-4.

[3] MOGHADAM R, LEWIS F L.Output-feedback H∞ quadratic tracking control of linear systems using reinforcement learning［J］.International Journal of Adaptive Control & Signal Processing, 2019, 33(2): 628-640.

[5] SUTTON R S, BARTO A G.Reinforcement learning: an introduction［M］.Cambridge, Massachusetts: MIT Press, 1998.

[8] REN H, ZHANG H, WEN Y, et al.Integral reinforcement learning off-policy method for solving nonlinear multi-player nonzero-sum games with saturated actuator［J］.Neurocomputing, 2019, 335(28): 96-104.

[9] ARAGON-GMEZ R, CLEMPNER J B.Traffic-signal control reinforcement learning approach for continuous-time Markov games［J］.Engineering Applications of Artificial Intelligence, 2020, 89: 103415.

[13] JIANG Y, JIANG Z P.Computational adaptive optimal control for continuous time linear systems with completely unknown dynamics［J］.Automatica, 2012, 48(10): 2699-2704.