[6] REN H G, YIN R J, LI F J, et al. Research on Q-ELM algorithm in robot path planning[C]//Control and Decision Conference, IEEE, 2016: 5975-5979.
[7] TAI L, LIU M. A robot exploration strategy based on Q-Learning network[C]//International Conference on RealTime Computing and Robotics, IEEE, 2016: 57-62.
[10] ZHANG Y F, LI W L, DE SILVA C W. RSMDP-based robust Q-Learning for optimal path planning in a dynamic environment[J]. IAES International Journal of Robotics & Automation, 2014, 31(4): 290-300.
[11] KLIDBARY S H, SHOURAKI S B, KOURABBASLOU S S. Path planning of modular robots on various terrains using Q-Learning versus optimization algorithms[J]. Intelligent Service Robotics, 2017, 10(2): 121-136.
[12] GASKETT C. Q-Learning for robot control [D]. Canberra: The Australian National University, 2002.
[13] DUNG L T, KOMEDA T, TAKAGI M. Reinforcement learning for POMDP using state classification [J]. Applied Artificial Intelligence, 2008, 22(7/8): 761-779.
[15] FRAMLING K. Guiding exploration by pre-existing know ledge without modifying reward[J]. Neural Networks, 2007, 20(6): 736-747.
[16] OH C H, NAKASHIMA T, ISHIBUCHI H.Initialization of Q-values by fuzzy rules for accelerating Q-Learning[C]//Proceedings of the IEEE World Congress on Computational Intelligence, 2002: 2051-2056.
[17] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. Computer Science, 2015, 8(6): 1-14.
[18] WATKINS C, CHRISTDPHER J, DAYAN P. Q-Learning[J].Machine Learning, 1992, 8(1): 279-292.
[19] SUTTON R S, BARTO A G. Reinforcement learning: an introduction[M]. Cambridge: The MIT Press, 1998.
[20] LIN L X, XIE H B, ZHANG D B, et al. Supervised neural Q-Learning based motion control for bionic underwater robots[J ] . Journal of Bionic Engineering, 2010, 7(s): 177-184.