Humanoids Learning to Walk: a Natural CPG-Actor-Critic Architecture

The identification of learning mechanisms for locomotion has been the subject of much researchfor some time but many challenges remain. Dynamic systems theory (DST) offers a novel approach to humanoid learning through environmental interaction. Reinforcement learning (RL) has offered a promising me...

Full description

Bibliographic Details
Main Authors: CAI eLI, Robert eLowe, Tom eZiemke
Format: Article
Language:English
Published: Frontiers Media S.A. 2013-04-01
Series:Frontiers in Neurorobotics
Subjects:
SI
DST
Online Access:http://journal.frontiersin.org/Journal/10.3389/fnbot.2013.00005/full
Description
Summary:The identification of learning mechanisms for locomotion has been the subject of much researchfor some time but many challenges remain. Dynamic systems theory (DST) offers a novel approach to humanoid learning through environmental interaction. Reinforcement learning (RL) has offered a promising method to adaptively link the dynamic system to the environment it interacts with via a reward-based value system.In this paper, we propose a model that integrates the above perspectives and applies it to the case of a humanoid (NAO) robot learning to walk the ability of which emerges from its value-based interaction with the environment. In the model,a simplified central pattern generator (CPG) architecture inspired by neuroscientific research and DST is integrated with an actor-critic approach to RL (cpg-actor-critic). In the cpg-actor-critic architecture, least-square-temporal-difference (LSTD) based learning converges to the optimal solution quickly by using natural gradient and balancing exploration and exploitation. Futhermore, rather than using a traditional (designer-specified) reward it uses a dynamic value function as a stability indicator (SI) that adapts to the environment.The results obtained are analyzed and explained by using a novel DST embodied cognition approach. Learning to walk, from this perspective, is a process of integrating sensorimotor levels and value.
ISSN:1662-5218