Robot Concept Acquisition Based on Interaction Between Probabilistic and Deep Generative Models

We propose a method for multimodal concept formation. In this method, unsupervised multimodal clustering and cross-modal inference, as well as unsupervised representation learning, can be performed by integrating the multimodal latent Dirichlet allocation (MLDA)-based concept formation and variation...

Full description

Bibliographic Details
Main Authors: Ryo Kuniyasu, Tomoaki Nakamura, Tadahiro Taniguchi, Takayuki Nagai
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-09-01
Series:Frontiers in Computer Science
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/fcomp.2021.618069/full
id doaj-61fca076c12b4892b9e923ebdddd9259
record_format Article
spelling doaj-61fca076c12b4892b9e923ebdddd92592021-09-04T08:48:06ZengFrontiers Media S.A.Frontiers in Computer Science2624-98982021-09-01310.3389/fcomp.2021.618069618069Robot Concept Acquisition Based on Interaction Between Probabilistic and Deep Generative ModelsRyo Kuniyasu0Tomoaki Nakamura1Tadahiro Taniguchi2Takayuki Nagai3Takayuki Nagai4Department of Mechanical Engineering and Intelligent Systems, The University of Electro-Communications, Tokyo, JapanDepartment of Mechanical Engineering and Intelligent Systems, The University of Electro-Communications, Tokyo, JapanCollege of Information Science and Engineering, Ritsumeikan University, Shiga, JapanDepartment of Systems Innovation, Graduate School of Engineering Science, Osaka University, Osaka, JapanArtificial Intelligence EXploration Research Center, The University of Electro-Communications, Tokyo, JapanWe propose a method for multimodal concept formation. In this method, unsupervised multimodal clustering and cross-modal inference, as well as unsupervised representation learning, can be performed by integrating the multimodal latent Dirichlet allocation (MLDA)-based concept formation and variational autoencoder (VAE)-based feature extraction. Multimodal clustering, representation learning, and cross-modal inference are critical for robots to form multimodal concepts from sensory data. Various models have been proposed for concept formation. However, in previous studies, features were extracted using manually designed or pre-trained feature extractors and representation learning was not performed simultaneously. Moreover, the generative probabilities of the features extracted from the sensory data could be predicted, but the sensory data could not be predicted in the cross-modal inference. Therefore, a method that can perform clustering, feature learning, and cross-modal inference among multimodal sensory data is required for concept formation. To realize such a method, we extend the VAE to the multinomial VAE (MNVAE), the latent variables of which follow a multinomial distribution, and construct a model that integrates the MNVAE and MLDA. In the experiments, the multimodal information of the images and words acquired by a robot was classified using the integrated model. The results demonstrated that the integrated model can classify the multimodal information as accurately as the previous model despite the feature extractor learning in an unsupervised manner, suitable image features for clustering can be learned, and cross-modal inference from the words to images is possible.https://www.frontiersin.org/articles/10.3389/fcomp.2021.618069/fullconcept formationsymbol emergence in roboticsprobabilistic generative modeldeep generative modelunsupervised learningrepresentation learning
collection DOAJ
language English
format Article
sources DOAJ
author Ryo Kuniyasu
Tomoaki Nakamura
Tadahiro Taniguchi
Takayuki Nagai
Takayuki Nagai
spellingShingle Ryo Kuniyasu
Tomoaki Nakamura
Tadahiro Taniguchi
Takayuki Nagai
Takayuki Nagai
Robot Concept Acquisition Based on Interaction Between Probabilistic and Deep Generative Models
Frontiers in Computer Science
concept formation
symbol emergence in robotics
probabilistic generative model
deep generative model
unsupervised learning
representation learning
author_facet Ryo Kuniyasu
Tomoaki Nakamura
Tadahiro Taniguchi
Takayuki Nagai
Takayuki Nagai
author_sort Ryo Kuniyasu
title Robot Concept Acquisition Based on Interaction Between Probabilistic and Deep Generative Models
title_short Robot Concept Acquisition Based on Interaction Between Probabilistic and Deep Generative Models
title_full Robot Concept Acquisition Based on Interaction Between Probabilistic and Deep Generative Models
title_fullStr Robot Concept Acquisition Based on Interaction Between Probabilistic and Deep Generative Models
title_full_unstemmed Robot Concept Acquisition Based on Interaction Between Probabilistic and Deep Generative Models
title_sort robot concept acquisition based on interaction between probabilistic and deep generative models
publisher Frontiers Media S.A.
series Frontiers in Computer Science
issn 2624-9898
publishDate 2021-09-01
description We propose a method for multimodal concept formation. In this method, unsupervised multimodal clustering and cross-modal inference, as well as unsupervised representation learning, can be performed by integrating the multimodal latent Dirichlet allocation (MLDA)-based concept formation and variational autoencoder (VAE)-based feature extraction. Multimodal clustering, representation learning, and cross-modal inference are critical for robots to form multimodal concepts from sensory data. Various models have been proposed for concept formation. However, in previous studies, features were extracted using manually designed or pre-trained feature extractors and representation learning was not performed simultaneously. Moreover, the generative probabilities of the features extracted from the sensory data could be predicted, but the sensory data could not be predicted in the cross-modal inference. Therefore, a method that can perform clustering, feature learning, and cross-modal inference among multimodal sensory data is required for concept formation. To realize such a method, we extend the VAE to the multinomial VAE (MNVAE), the latent variables of which follow a multinomial distribution, and construct a model that integrates the MNVAE and MLDA. In the experiments, the multimodal information of the images and words acquired by a robot was classified using the integrated model. The results demonstrated that the integrated model can classify the multimodal information as accurately as the previous model despite the feature extractor learning in an unsupervised manner, suitable image features for clustering can be learned, and cross-modal inference from the words to images is possible.
topic concept formation
symbol emergence in robotics
probabilistic generative model
deep generative model
unsupervised learning
representation learning
url https://www.frontiersin.org/articles/10.3389/fcomp.2021.618069/full
work_keys_str_mv AT ryokuniyasu robotconceptacquisitionbasedoninteractionbetweenprobabilisticanddeepgenerativemodels
AT tomoakinakamura robotconceptacquisitionbasedoninteractionbetweenprobabilisticanddeepgenerativemodels
AT tadahirotaniguchi robotconceptacquisitionbasedoninteractionbetweenprobabilisticanddeepgenerativemodels
AT takayukinagai robotconceptacquisitionbasedoninteractionbetweenprobabilisticanddeepgenerativemodels
AT takayukinagai robotconceptacquisitionbasedoninteractionbetweenprobabilisticanddeepgenerativemodels
_version_ 1717815219856605184