Model-Agnostic Metalearning-Based Text-Driven Visual Navigation Model for Unfamiliar Tasks
As vision and language processing techniques have made great progress, mapless-visual navigation is occupying uppermost position in domestic robot field. However, most current end-to-end navigation models tend to be strictly trained and tested on identical datasets with stationary structure, which l...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9189802/ |
id |
doaj-9c0a970fea4e4df98a8c20c1889cb79c |
---|---|
record_format |
Article |
spelling |
doaj-9c0a970fea4e4df98a8c20c1889cb79c2021-03-30T03:27:32ZengIEEEIEEE Access2169-35362020-01-01816674216675210.1109/ACCESS.2020.30230149189802Model-Agnostic Metalearning-Based Text-Driven Visual Navigation Model for Unfamiliar TasksTianfang Xue0https://orcid.org/0000-0002-6383-1448Haibin Yu1https://orcid.org/0000-0002-1663-2956Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, ChinaShenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, ChinaAs vision and language processing techniques have made great progress, mapless-visual navigation is occupying uppermost position in domestic robot field. However, most current end-to-end navigation models tend to be strictly trained and tested on identical datasets with stationary structure, which leads to great performance degradation when dealing with unseen targets and environments. Since the targets of same category could possess quite diverse features, generalization ability of these models is also limited by their visualized task description. In this article we propose a model-agnostic metalearning based text-driven visual navigation model to achieve generalization to untrained tasks. Based on meta-reinforcement learning approach, the agent is capable of accumulating navigation experience from existing targets and environments. When applied to finding a new object or exploring in a new scene, the agent will quickly learn how to fulfill this unfamiliar task through relatively few recursive trials. To improve learning efficiency and accuracy, we introduce fully convolutional instance-aware semantic segmentation and Word2vec into our DRL network to respectively extract visual and semantic features according to object class, creating more direct and concise linkage between targets and their surroundings. Several experiments have been conducted on realistic dataset Matterport3D to evaluate its target-driven navigation performance and generalization ability. The results demonstrate that our adaptive navigation model could navigate to text-defined targets and achieve fast adaption to untrained tasks, outperforming other state-of-the-art navigation approaches.https://ieeexplore.ieee.org/document/9189802/Mapless-visual navigationsemantic segmentationtext-drivenmodel-agnostic meta-learning |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Tianfang Xue Haibin Yu |
spellingShingle |
Tianfang Xue Haibin Yu Model-Agnostic Metalearning-Based Text-Driven Visual Navigation Model for Unfamiliar Tasks IEEE Access Mapless-visual navigation semantic segmentation text-driven model-agnostic meta-learning |
author_facet |
Tianfang Xue Haibin Yu |
author_sort |
Tianfang Xue |
title |
Model-Agnostic Metalearning-Based Text-Driven Visual Navigation Model for Unfamiliar Tasks |
title_short |
Model-Agnostic Metalearning-Based Text-Driven Visual Navigation Model for Unfamiliar Tasks |
title_full |
Model-Agnostic Metalearning-Based Text-Driven Visual Navigation Model for Unfamiliar Tasks |
title_fullStr |
Model-Agnostic Metalearning-Based Text-Driven Visual Navigation Model for Unfamiliar Tasks |
title_full_unstemmed |
Model-Agnostic Metalearning-Based Text-Driven Visual Navigation Model for Unfamiliar Tasks |
title_sort |
model-agnostic metalearning-based text-driven visual navigation model for unfamiliar tasks |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
As vision and language processing techniques have made great progress, mapless-visual navigation is occupying uppermost position in domestic robot field. However, most current end-to-end navigation models tend to be strictly trained and tested on identical datasets with stationary structure, which leads to great performance degradation when dealing with unseen targets and environments. Since the targets of same category could possess quite diverse features, generalization ability of these models is also limited by their visualized task description. In this article we propose a model-agnostic metalearning based text-driven visual navigation model to achieve generalization to untrained tasks. Based on meta-reinforcement learning approach, the agent is capable of accumulating navigation experience from existing targets and environments. When applied to finding a new object or exploring in a new scene, the agent will quickly learn how to fulfill this unfamiliar task through relatively few recursive trials. To improve learning efficiency and accuracy, we introduce fully convolutional instance-aware semantic segmentation and Word2vec into our DRL network to respectively extract visual and semantic features according to object class, creating more direct and concise linkage between targets and their surroundings. Several experiments have been conducted on realistic dataset Matterport3D to evaluate its target-driven navigation performance and generalization ability. The results demonstrate that our adaptive navigation model could navigate to text-defined targets and achieve fast adaption to untrained tasks, outperforming other state-of-the-art navigation approaches. |
topic |
Mapless-visual navigation semantic segmentation text-driven model-agnostic meta-learning |
url |
https://ieeexplore.ieee.org/document/9189802/ |
work_keys_str_mv |
AT tianfangxue modelagnosticmetalearningbasedtextdrivenvisualnavigationmodelforunfamiliartasks AT haibinyu modelagnosticmetalearningbasedtextdrivenvisualnavigationmodelforunfamiliartasks |
_version_ |
1724183438799405056 |