UAV Path Planning Based on Reverse Reinforcement Learning

YANG Xiuxia; WANG Chenlei; ZHANG Yi; YU Hao; JIANG Zijie

doi:10.3969/j.issn.1671-637x.2023.08.001

Abstract

In the planning of UAV safe collision avoidance path,Deep Deterministic Policy Gradient (DDPG) algorithm suffers from slow convergence rate and reward function setting difficulties.To solve the problems,based on reverse reinforcement learning,a UAV path planning algorithm that integrates expert demonstration trajectories is proposed.Firstly,based on the simulator software,the demostration trajectory dataset of the expert manipulating the UAV to avoid obstacles is collected.Secondly,the hybrid sampling mechanism is used to update the network parameters by integrating high-quality expert demonstration trajectory data in the self-exploration data to reduce the cost of algorithm exploration.Finally,according to the maximum entropy reverse reinforcement learning algorithm,the optimal reward function implied in the experience of experts is calculated,which solves the problem that the reward function is difficult to design in complex tasks.Comparative experimental results show that the improved algorithm can effectively improve the efficiency of algorithm training and the obstacle avoidance performance is better.