Visual-based Parameterized Proximal Policy Optimization

碩士 === 國立交通大學 === 資訊科學與工程研究所 === 107 === We proposes a visual-based proximal policy optimization in parameterized (structured) action spaces based on the actor critic network. The optimization, named parameterized proximal policy optimization (P3O), is applied to RoboCup soccer simulation, robotic a...

Full description

Bibliographic Details
Main Authors: Huang, Ming-Xu, 黃明旭
Other Authors: Wu, I-Chen
Format: Others
Language:en_US
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/c2d58n
id ndltd-TW-107NCTU5394059
record_format oai_dc
spelling ndltd-TW-107NCTU53940592019-06-27T05:42:50Z http://ndltd.ncl.edu.tw/handle/c2d58n Visual-based Parameterized Proximal Policy Optimization 基於影像之參數化近端策略優化 Huang, Ming-Xu 黃明旭 碩士 國立交通大學 資訊科學與工程研究所 107 We proposes a visual-based proximal policy optimization in parameterized (structured) action spaces based on the actor critic network. The optimization, named parameterized proximal policy optimization (P3O), is applied to RoboCup soccer simulation, robotic arm grasping and pushing. Experiments show that our method, P3O, converges fast and well. In RoboCup soccer task, P3O only needs 2/3 iterations of previous works to reach the same goal rate. Moreover, P3O can learn more stable policies in the end of training and have better policies, which means policies learned by P3O can reach goal with less steps. In agerage, policies from P3O can kick balls to the goal about 80 steps, and over 115 steps of previous works. For robotic arm grasping and pushing, we demonstrate that P3Os have capabilities to learn policies from high-dimension image observations, and successfully complete tasks, namely with success rates of 100% for pushing and 99% for grasping in about 30,000 updates. To our knowledge, our approach is the first visual-based deep reinforcement learning approach in parameterized action spaces. Wu, I-Chen 吳毅成 2018 學位論文 ; thesis 34 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立交通大學 === 資訊科學與工程研究所 === 107 === We proposes a visual-based proximal policy optimization in parameterized (structured) action spaces based on the actor critic network. The optimization, named parameterized proximal policy optimization (P3O), is applied to RoboCup soccer simulation, robotic arm grasping and pushing. Experiments show that our method, P3O, converges fast and well. In RoboCup soccer task, P3O only needs 2/3 iterations of previous works to reach the same goal rate. Moreover, P3O can learn more stable policies in the end of training and have better policies, which means policies learned by P3O can reach goal with less steps. In agerage, policies from P3O can kick balls to the goal about 80 steps, and over 115 steps of previous works. For robotic arm grasping and pushing, we demonstrate that P3Os have capabilities to learn policies from high-dimension image observations, and successfully complete tasks, namely with success rates of 100% for pushing and 99% for grasping in about 30,000 updates. To our knowledge, our approach is the first visual-based deep reinforcement learning approach in parameterized action spaces.
author2 Wu, I-Chen
author_facet Wu, I-Chen
Huang, Ming-Xu
黃明旭
author Huang, Ming-Xu
黃明旭
spellingShingle Huang, Ming-Xu
黃明旭
Visual-based Parameterized Proximal Policy Optimization
author_sort Huang, Ming-Xu
title Visual-based Parameterized Proximal Policy Optimization
title_short Visual-based Parameterized Proximal Policy Optimization
title_full Visual-based Parameterized Proximal Policy Optimization
title_fullStr Visual-based Parameterized Proximal Policy Optimization
title_full_unstemmed Visual-based Parameterized Proximal Policy Optimization
title_sort visual-based parameterized proximal policy optimization
publishDate 2018
url http://ndltd.ncl.edu.tw/handle/c2d58n
work_keys_str_mv AT huangmingxu visualbasedparameterizedproximalpolicyoptimization
AT huángmíngxù visualbasedparameterizedproximalpolicyoptimization
AT huangmingxu jīyúyǐngxiàngzhīcānshùhuàjìnduāncèlüèyōuhuà
AT huángmíngxù jīyúyǐngxiàngzhīcānshùhuàjìnduāncèlüèyōuhuà
_version_ 1719213398939402240