Toward a Needs-Based Architecture for ‘Intelligent’ Communicative Agents: Speaking with Intention

The past few years have seen considerable progress in the deployment of voice-enabled personal assistants, first on smartphones (such as Apple’s Siri) and most recently as standalone devices in people’s homes (such as Amazon’s Alexa). Such ‘intelligent’ communicative agents are distinguished from th...

Full description

Bibliographic Details
Main Authors: Roger K. Moore, Mauro Nicolao
Format: Article
Language:English
Published: Frontiers Media S.A. 2017-12-01
Series:Frontiers in Robotics and AI
Subjects:
Online Access:http://journal.frontiersin.org/article/10.3389/frobt.2017.00066/full
id doaj-ea93aadd28474b97bd0877e53fe1efa5
record_format Article
spelling doaj-ea93aadd28474b97bd0877e53fe1efa52020-11-24T22:25:30ZengFrontiers Media S.A.Frontiers in Robotics and AI2296-91442017-12-01410.3389/frobt.2017.00066277317Toward a Needs-Based Architecture for ‘Intelligent’ Communicative Agents: Speaking with IntentionRoger K. Moore0Mauro Nicolao1Speech and Hearing Research Group, Department of Computer Science, University of Sheffield, Sheffield, United KingdomSpeech and Hearing Research Group, Department of Computer Science, University of Sheffield, Sheffield, United KingdomThe past few years have seen considerable progress in the deployment of voice-enabled personal assistants, first on smartphones (such as Apple’s Siri) and most recently as standalone devices in people’s homes (such as Amazon’s Alexa). Such ‘intelligent’ communicative agents are distinguished from the previous generation of speech-based systems in that they claim to offer access to services and information via conversational interaction (rather than simple voice commands). In reality, conversations with such agents have limited depth and, after initial enthusiasm, users typically revert to more traditional ways of getting things done. It is argued here that one source of the problem is that the standard architecture for a contemporary spoken language interface fails to capture the fundamental teleological properties of human spoken language. As a consequence, users have difficulty engaging with such systems, primarily due to a gross mismatch in intentional priors. This paper presents an alternative needs-driven cognitive architecture which models speech-based interaction as an emergent property of coupled hierarchical feedback-control processes in which a speaker has in mind the needs of a listener and a listener has in mind the intentions of a speaker. The implications of this architecture for future spoken language systems are illustrated using results from a new type of ‘intentional speech synthesiser’ that is capable of optimising its pronunciation in unpredictable acoustic environments as a function of its perceived communicative success. It is concluded that such purposeful behavior is essential to the facilitation of meaningful and productive spoken language interaction between human beings and autonomous social agents (such as robots). However, it is also noted that persistent mismatched priors may ultimately impose a fundamental limit on the effectiveness of speech-based human–robot interaction.http://journal.frontiersin.org/article/10.3389/frobt.2017.00066/fullcommunicative agentsspoken language processinghierarchical controlintentional speech synthesisautonomous social agentsmismatched priors
collection DOAJ
language English
format Article
sources DOAJ
author Roger K. Moore
Mauro Nicolao
spellingShingle Roger K. Moore
Mauro Nicolao
Toward a Needs-Based Architecture for ‘Intelligent’ Communicative Agents: Speaking with Intention
Frontiers in Robotics and AI
communicative agents
spoken language processing
hierarchical control
intentional speech synthesis
autonomous social agents
mismatched priors
author_facet Roger K. Moore
Mauro Nicolao
author_sort Roger K. Moore
title Toward a Needs-Based Architecture for ‘Intelligent’ Communicative Agents: Speaking with Intention
title_short Toward a Needs-Based Architecture for ‘Intelligent’ Communicative Agents: Speaking with Intention
title_full Toward a Needs-Based Architecture for ‘Intelligent’ Communicative Agents: Speaking with Intention
title_fullStr Toward a Needs-Based Architecture for ‘Intelligent’ Communicative Agents: Speaking with Intention
title_full_unstemmed Toward a Needs-Based Architecture for ‘Intelligent’ Communicative Agents: Speaking with Intention
title_sort toward a needs-based architecture for ‘intelligent’ communicative agents: speaking with intention
publisher Frontiers Media S.A.
series Frontiers in Robotics and AI
issn 2296-9144
publishDate 2017-12-01
description The past few years have seen considerable progress in the deployment of voice-enabled personal assistants, first on smartphones (such as Apple’s Siri) and most recently as standalone devices in people’s homes (such as Amazon’s Alexa). Such ‘intelligent’ communicative agents are distinguished from the previous generation of speech-based systems in that they claim to offer access to services and information via conversational interaction (rather than simple voice commands). In reality, conversations with such agents have limited depth and, after initial enthusiasm, users typically revert to more traditional ways of getting things done. It is argued here that one source of the problem is that the standard architecture for a contemporary spoken language interface fails to capture the fundamental teleological properties of human spoken language. As a consequence, users have difficulty engaging with such systems, primarily due to a gross mismatch in intentional priors. This paper presents an alternative needs-driven cognitive architecture which models speech-based interaction as an emergent property of coupled hierarchical feedback-control processes in which a speaker has in mind the needs of a listener and a listener has in mind the intentions of a speaker. The implications of this architecture for future spoken language systems are illustrated using results from a new type of ‘intentional speech synthesiser’ that is capable of optimising its pronunciation in unpredictable acoustic environments as a function of its perceived communicative success. It is concluded that such purposeful behavior is essential to the facilitation of meaningful and productive spoken language interaction between human beings and autonomous social agents (such as robots). However, it is also noted that persistent mismatched priors may ultimately impose a fundamental limit on the effectiveness of speech-based human–robot interaction.
topic communicative agents
spoken language processing
hierarchical control
intentional speech synthesis
autonomous social agents
mismatched priors
url http://journal.frontiersin.org/article/10.3389/frobt.2017.00066/full
work_keys_str_mv AT rogerkmoore towardaneedsbasedarchitectureforintelligentcommunicativeagentsspeakingwithintention
AT mauronicolao towardaneedsbasedarchitectureforintelligentcommunicativeagentsspeakingwithintention
_version_ 1725757309689266176