Machine Learning for Dissimulating Reality

In the last decade, advances in statistical modeling and computer science have boosted the production of machine-produced contents in different fields: from language to image generation, the quality of the generated outputs is remarkably high, sometimes better than those produced by a human being. M...

Full description

Bibliographic Details
Main Author: Andrea Giussani
Format: Article
Language:English
Published: MDPI AG 2021-04-01
Series:Proceedings
Subjects:
Online Access:https://www.mdpi.com/2504-3900/77/1/17
id doaj-0e4626c0783146199a2aae94a8714a4d
record_format Article
spelling doaj-0e4626c0783146199a2aae94a8714a4d2021-04-27T23:01:10ZengMDPI AGProceedings2504-39002021-04-0177171710.3390/proceedings2021077017Machine Learning for Dissimulating RealityAndrea Giussani0Department of Decision Sciences and Bocconi Institute for Data Science and Analytics, Bocconi University, 20136 Milan, ItalyIn the last decade, advances in statistical modeling and computer science have boosted the production of machine-produced contents in different fields: from language to image generation, the quality of the generated outputs is remarkably high, sometimes better than those produced by a human being. Modern technological advances such as OpenAI’s GPT-2 (and recently GPT-3) permit automated systems to dramatically alter reality with synthetic outputs so that humans are not able to distinguish the real copy from its counteracts. An example is given by an article entirely written by GPT-2, but many other examples exist. In the field of computer vision, Nvidia’s Generative Adversarial Network, commonly known as StyleGAN (Karras et al. 2018), has become the de facto reference point for the production of a huge amount of fake human face portraits; additionally, recent algorithms were developed to create both musical scores and mathematical formulas. This presentation aims to stimulate participants on the state-of-the-art results in this field: we will cover both GANs and language modeling with recent applications. The novelty here is that we apply a transformer-based machine learning technique, namely RoBerta (Liu et al. 2019), to the detection of human-produced versus machine-produced text concerning fake news detection. RoBerta is a recent algorithm that is based on the well-known Bidirectional Encoder Representations from Transformers algorithm, known as BERT (Devlin et al. 2018); this is a bi-directional transformer used for natural language processing developed by Google and pre-trained over a huge amount of unlabeled textual data to learn embeddings. We will then use these representations as an input of our classifier to detect real vs. machine-produced text. The application is demonstrated in the presentation.https://www.mdpi.com/2504-3900/77/1/17machine learningnatural language processingsupervised learningclassification task
collection DOAJ
language English
format Article
sources DOAJ
author Andrea Giussani
spellingShingle Andrea Giussani
Machine Learning for Dissimulating Reality
Proceedings
machine learning
natural language processing
supervised learning
classification task
author_facet Andrea Giussani
author_sort Andrea Giussani
title Machine Learning for Dissimulating Reality
title_short Machine Learning for Dissimulating Reality
title_full Machine Learning for Dissimulating Reality
title_fullStr Machine Learning for Dissimulating Reality
title_full_unstemmed Machine Learning for Dissimulating Reality
title_sort machine learning for dissimulating reality
publisher MDPI AG
series Proceedings
issn 2504-3900
publishDate 2021-04-01
description In the last decade, advances in statistical modeling and computer science have boosted the production of machine-produced contents in different fields: from language to image generation, the quality of the generated outputs is remarkably high, sometimes better than those produced by a human being. Modern technological advances such as OpenAI’s GPT-2 (and recently GPT-3) permit automated systems to dramatically alter reality with synthetic outputs so that humans are not able to distinguish the real copy from its counteracts. An example is given by an article entirely written by GPT-2, but many other examples exist. In the field of computer vision, Nvidia’s Generative Adversarial Network, commonly known as StyleGAN (Karras et al. 2018), has become the de facto reference point for the production of a huge amount of fake human face portraits; additionally, recent algorithms were developed to create both musical scores and mathematical formulas. This presentation aims to stimulate participants on the state-of-the-art results in this field: we will cover both GANs and language modeling with recent applications. The novelty here is that we apply a transformer-based machine learning technique, namely RoBerta (Liu et al. 2019), to the detection of human-produced versus machine-produced text concerning fake news detection. RoBerta is a recent algorithm that is based on the well-known Bidirectional Encoder Representations from Transformers algorithm, known as BERT (Devlin et al. 2018); this is a bi-directional transformer used for natural language processing developed by Google and pre-trained over a huge amount of unlabeled textual data to learn embeddings. We will then use these representations as an input of our classifier to detect real vs. machine-produced text. The application is demonstrated in the presentation.
topic machine learning
natural language processing
supervised learning
classification task
url https://www.mdpi.com/2504-3900/77/1/17
work_keys_str_mv AT andreagiussani machinelearningfordissimulatingreality
_version_ 1721505501121871872