Summary: | In this paper, we focus on the study of UAV ground target tracking under obstacle environments using deep reinforcement learning, and an improved deep deterministic policy gradient (DDPG) algorithm is presented. A reward function based on line of sight and artificial potential field is constructed to guide the behavior of UAV to achieve target tracking, and a penalty term of action makes the trajectory smooth. In order to improve the exploration ability, multiple UAVs, which controlled by the same policy network, are used to perform tasks in each episode. Taking into account that the history observations have a great degree of correlation with the policy, long short-term memory networks are used to approximate the state of environments, which improve the approximation accuracy and the efficiency of data utilization. The simulation results show that the propose method can make the UAV keep target tracking and obstacle avoidance effectively.
|