MPR-RL: Multi-Prior Regularized Reinforcement Learning for Knowledge Transfer

In manufacturing, assembly tasks have been a challenge for learning algorithms due to variant dynamics of different environments. Reinforcement learning (RL) is a promising framework to automatically learn these tasks, yet it is still not easy to apply a learned policy or skill, that is the ability...

Full description

Bibliographic Details
Main Authors: Stork, J.A (Author), Stoyanov, T. (Author), Yang, Q. (Author)
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2022
Subjects:
Online Access:View Fulltext in Publisher
LEADER 02542nam a2200373Ia 4500
001 10.1109-LRA.2022.3184805
008 220718s2022 CNT 000 0 und d
020 |a 23773766 (ISSN) 
245 1 0 |a MPR-RL: Multi-Prior Regularized Reinforcement Learning for Knowledge Transfer 
260 0 |b Institute of Electrical and Electronics Engineers Inc.  |c 2022 
856 |z View Fulltext in Publisher  |u https://doi.org/10.1109/LRA.2022.3184805 
520 3 |a In manufacturing, assembly tasks have been a challenge for learning algorithms due to variant dynamics of different environments. Reinforcement learning (RL) is a promising framework to automatically learn these tasks, yet it is still not easy to apply a learned policy or skill, that is the ability of solving a task, to a similar environment even if the deployment conditions are only slightly different. In this letter, we address the challenge of transferring knowledge within a family of similar tasks by leveraging multiple skill priors. We propose to learn prior distribution over the specific skill required to accomplish each task and compose the family of skill priors to guide learning the policy for a new task by comparing the similarity between the target task and the prior ones. Our method learns a latent action space representing the skill embedding from demonstrated trajectories for each prior task. We have evaluated our method on a task in simulation and a set of peg-in-hole insertion tasks and demonstrate better generalization to new tasks that have never been encountered during training. Our Multi-Prior Regularized RL (MPR-RL) method is deployed directly on a real world Franka Panda arm, requiring only a set of demonstrated trajectories from similar, but crucially not identical, problem instances. © 2016 IEEE. 
650 0 4 |a Aerospace electronics 
650 0 4 |a E-learning 
650 0 4 |a Job analysis 
650 0 4 |a Knowledge management 
650 0 4 |a Knowledge transfer 
650 0 4 |a Learn+ 
650 0 4 |a Learning algorithms 
650 0 4 |a Machine learning for robot control 
650 0 4 |a Machine-learning 
650 0 4 |a Personnel training 
650 0 4 |a reinforcement learning 
650 0 4 |a Reinforcement learning 
650 0 4 |a Reinforcement learnings 
650 0 4 |a Robots control 
650 0 4 |a Task analysis 
650 0 4 |a Trajectories 
650 0 4 |a transfer learning 
650 0 4 |a Transfer learning 
700 1 |a Stork, J.A.  |e author 
700 1 |a Stoyanov, T.  |e author 
700 1 |a Yang, Q.  |e author 
773 |t IEEE Robotics and Automation Letters