Prediction of Public Trust in Politicians Using a Multimodal Fusion Approach

This paper explores the automatic prediction of public trust in politicians through the use of speech, text, and visual modalities. It evaluates the effectiveness of each modality individually, and it investigates fusion approaches for integrating information from each modality for prediction using...

Full description

Bibliographic Details
Main Authors:	Muhammad Shehram Shah Syed, Elena Pirogova, Margaret Lech
Format:	Article
Language:	English
Published:	MDPI AG 2021-05-01
Series:	Electronics
Subjects:	trust classification social signal processing speech acoustics computer vision multimodal fusion
Online Access:	https://www.mdpi.com/2079-9292/10/11/1259

id	doaj-807762e705d749b09324638b26b8307f
record_format	Article
spelling	doaj-807762e705d749b09324638b26b8307f2021-06-01T01:01:24ZengMDPI AGElectronics2079-92922021-05-01101259125910.3390/electronics10111259Prediction of Public Trust in Politicians Using a Multimodal Fusion ApproachMuhammad Shehram Shah Syed0Elena Pirogova1Margaret Lech2School of Engineering, RMIT University, Melbourne, VIC 3000, AustraliaSchool of Engineering, RMIT University, Melbourne, VIC 3000, AustraliaSchool of Engineering, RMIT University, Melbourne, VIC 3000, AustraliaThis paper explores the automatic prediction of public trust in politicians through the use of speech, text, and visual modalities. It evaluates the effectiveness of each modality individually, and it investigates fusion approaches for integrating information from each modality for prediction using a multimodal setting. A database was created consisting of speech recordings, twitter messages, and images representing fifteen American politicians, and labeling was carried out per a publicly available ranking system. The data were distributed into three trust categories, i.e., the low-trust category, mid-trust category, and high-trust category. First, unimodal prediction using each of the three modalities individually was performed using the database; then, using the outputs of the unimodal predictions, a multimodal prediction was later performed. Unimodal prediction was performed by training three independent logistic regression (LR) classifiers, one each for speech, text, and images. The prediction vectors from the individual modalities were then concatenated before being used to train a multimodal decision-making LR classifier. We report that the best performing modality was speech, which achieved a classification accuracy of 92.81%, followed by the images, achieving an accuracy of 77.96%, whereas the best performing model for text-modality achieved a 72.26% accuracy. With the multimodal approach, the highest classification accuracy of 97.53% was obtained when all three modalities were used for trust prediction. Meanwhile, in a bimodal setup, the best performing combination was that combining the speech and image visual modalities by achieving an accuracy of 95.07%, followed by the speech and text combination, showing an accuracy of 94.40%, whereas the text and images visual modal combination resulted in an accuracy of 83.20%.https://www.mdpi.com/2079-9292/10/11/1259trust classificationsocial signal processingspeech acousticscomputer visionmultimodal fusion
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Muhammad Shehram Shah Syed Elena Pirogova Margaret Lech
spellingShingle	Muhammad Shehram Shah Syed Elena Pirogova Margaret Lech Prediction of Public Trust in Politicians Using a Multimodal Fusion Approach Electronics trust classification social signal processing speech acoustics computer vision multimodal fusion
author_facet	Muhammad Shehram Shah Syed Elena Pirogova Margaret Lech
author_sort	Muhammad Shehram Shah Syed
title	Prediction of Public Trust in Politicians Using a Multimodal Fusion Approach
title_short	Prediction of Public Trust in Politicians Using a Multimodal Fusion Approach
title_full	Prediction of Public Trust in Politicians Using a Multimodal Fusion Approach
title_fullStr	Prediction of Public Trust in Politicians Using a Multimodal Fusion Approach
title_full_unstemmed	Prediction of Public Trust in Politicians Using a Multimodal Fusion Approach
title_sort	prediction of public trust in politicians using a multimodal fusion approach
publisher	MDPI AG
series	Electronics
issn	2079-9292
publishDate	2021-05-01
description	This paper explores the automatic prediction of public trust in politicians through the use of speech, text, and visual modalities. It evaluates the effectiveness of each modality individually, and it investigates fusion approaches for integrating information from each modality for prediction using a multimodal setting. A database was created consisting of speech recordings, twitter messages, and images representing fifteen American politicians, and labeling was carried out per a publicly available ranking system. The data were distributed into three trust categories, i.e., the low-trust category, mid-trust category, and high-trust category. First, unimodal prediction using each of the three modalities individually was performed using the database; then, using the outputs of the unimodal predictions, a multimodal prediction was later performed. Unimodal prediction was performed by training three independent logistic regression (LR) classifiers, one each for speech, text, and images. The prediction vectors from the individual modalities were then concatenated before being used to train a multimodal decision-making LR classifier. We report that the best performing modality was speech, which achieved a classification accuracy of 92.81%, followed by the images, achieving an accuracy of 77.96%, whereas the best performing model for text-modality achieved a 72.26% accuracy. With the multimodal approach, the highest classification accuracy of 97.53% was obtained when all three modalities were used for trust prediction. Meanwhile, in a bimodal setup, the best performing combination was that combining the speech and image visual modalities by achieving an accuracy of 95.07%, followed by the speech and text combination, showing an accuracy of 94.40%, whereas the text and images visual modal combination resulted in an accuracy of 83.20%.
topic	trust classification social signal processing speech acoustics computer vision multimodal fusion
url	https://www.mdpi.com/2079-9292/10/11/1259
work_keys_str_mv	AT muhammadshehramshahsyed predictionofpublictrustinpoliticiansusingamultimodalfusionapproach AT elenapirogova predictionofpublictrustinpoliticiansusingamultimodalfusionapproach AT margaretlech predictionofpublictrustinpoliticiansusingamultimodalfusionapproach
_version_	1721413309730652160

Prediction of Public Trust in Politicians Using a Multimodal Fusion Approach

Similar Items