Event detection in finance using hierarchical clustering algorithms on news and tweets

In the current age of overwhelming information and massive production of textual data on the Web, Event Detection has become an increasingly important task in various application domains. Several research branches have been developed to tackle the problem from different perspectives, including Natur...

Full description

Bibliographic Details
Main Authors: Salvatore Carta, Sergio Consoli, Luca Piras, Alessandro Sebastian Podda, Diego Reforgiato Recupero
Format: Article
Language:English
Published: PeerJ Inc. 2021-05-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-438.pdf
id doaj-5cdc13baf5224032bafc5b0ecb04a8d5
record_format Article
spelling doaj-5cdc13baf5224032bafc5b0ecb04a8d52021-05-12T15:05:05ZengPeerJ Inc.PeerJ Computer Science2376-59922021-05-017e43810.7717/peerj-cs.438Event detection in finance using hierarchical clustering algorithms on news and tweetsSalvatore Carta0Sergio Consoli1Luca Piras2Alessandro Sebastian Podda3Diego Reforgiato Recupero4Department of Mathematics and Computer Science, University of Cagliari, Cagliari, ItalyEuropean Commission, Joint Research Centre (DG-JRC), Ispra, Varese, ItalyDepartment of Mathematics and Computer Science, University of Cagliari, Cagliari, ItalyDepartment of Mathematics and Computer Science, University of Cagliari, Cagliari, ItalyDepartment of Mathematics and Computer Science, University of Cagliari, Cagliari, ItalyIn the current age of overwhelming information and massive production of textual data on the Web, Event Detection has become an increasingly important task in various application domains. Several research branches have been developed to tackle the problem from different perspectives, including Natural Language Processing and Big Data analysis, with the goal of providing valuable resources to support decision-making in a wide variety of fields. In this paper, we propose a real-time domain-specific clustering-based event-detection approach that integrates textual information coming, on one hand, from traditional newswires and, on the other hand, from microblogging platforms. The goal of the implemented pipeline is twofold: (i) providing insights to the user about the relevant events that are reported in the press on a daily basis; (ii) alerting the user about potentially important and impactful events, referred to as hot events, for some specific tasks or domains of interest. The algorithm identifies clusters of related news stories published by globally renowned press sources, which guarantee authoritative, noise-free information about current affairs; subsequently, the content extracted from microblogs is associated to the clusters in order to gain an assessment of the relevance of the event in the public opinion. To identify the events of a day d we create the lexicon by looking at news articles and stock data of previous days up to d−1 Although the approach can be extended to a variety of domains (e.g. politics, economy, sports), we hereby present a specific implementation in the financial sector. We validated our solution through a qualitative and quantitative evaluation, performed on the Dow Jones’ Data, News and Analytics dataset, on a stream of messages extracted from the microblogging platform Stocktwits, and on the Standard & Poor’s 500 index time-series. The experiments demonstrate the effectiveness of our proposal in extracting meaningful information from real-world events and in spotting hot events in the financial sphere. An added value of the evaluation is given by the visual inspection of a selected number of significant real-world events, starting from the Brexit Referendum and reaching until the recent outbreak of the Covid-19 pandemic in early 2020.https://peerj.com/articles/cs-438.pdfNatural language processingEvent detectionNews analysisSocial mediaFinanceHierarchical clustering
collection DOAJ
language English
format Article
sources DOAJ
author Salvatore Carta
Sergio Consoli
Luca Piras
Alessandro Sebastian Podda
Diego Reforgiato Recupero
spellingShingle Salvatore Carta
Sergio Consoli
Luca Piras
Alessandro Sebastian Podda
Diego Reforgiato Recupero
Event detection in finance using hierarchical clustering algorithms on news and tweets
PeerJ Computer Science
Natural language processing
Event detection
News analysis
Social media
Finance
Hierarchical clustering
author_facet Salvatore Carta
Sergio Consoli
Luca Piras
Alessandro Sebastian Podda
Diego Reforgiato Recupero
author_sort Salvatore Carta
title Event detection in finance using hierarchical clustering algorithms on news and tweets
title_short Event detection in finance using hierarchical clustering algorithms on news and tweets
title_full Event detection in finance using hierarchical clustering algorithms on news and tweets
title_fullStr Event detection in finance using hierarchical clustering algorithms on news and tweets
title_full_unstemmed Event detection in finance using hierarchical clustering algorithms on news and tweets
title_sort event detection in finance using hierarchical clustering algorithms on news and tweets
publisher PeerJ Inc.
series PeerJ Computer Science
issn 2376-5992
publishDate 2021-05-01
description In the current age of overwhelming information and massive production of textual data on the Web, Event Detection has become an increasingly important task in various application domains. Several research branches have been developed to tackle the problem from different perspectives, including Natural Language Processing and Big Data analysis, with the goal of providing valuable resources to support decision-making in a wide variety of fields. In this paper, we propose a real-time domain-specific clustering-based event-detection approach that integrates textual information coming, on one hand, from traditional newswires and, on the other hand, from microblogging platforms. The goal of the implemented pipeline is twofold: (i) providing insights to the user about the relevant events that are reported in the press on a daily basis; (ii) alerting the user about potentially important and impactful events, referred to as hot events, for some specific tasks or domains of interest. The algorithm identifies clusters of related news stories published by globally renowned press sources, which guarantee authoritative, noise-free information about current affairs; subsequently, the content extracted from microblogs is associated to the clusters in order to gain an assessment of the relevance of the event in the public opinion. To identify the events of a day d we create the lexicon by looking at news articles and stock data of previous days up to d−1 Although the approach can be extended to a variety of domains (e.g. politics, economy, sports), we hereby present a specific implementation in the financial sector. We validated our solution through a qualitative and quantitative evaluation, performed on the Dow Jones’ Data, News and Analytics dataset, on a stream of messages extracted from the microblogging platform Stocktwits, and on the Standard & Poor’s 500 index time-series. The experiments demonstrate the effectiveness of our proposal in extracting meaningful information from real-world events and in spotting hot events in the financial sphere. An added value of the evaluation is given by the visual inspection of a selected number of significant real-world events, starting from the Brexit Referendum and reaching until the recent outbreak of the Covid-19 pandemic in early 2020.
topic Natural language processing
Event detection
News analysis
Social media
Finance
Hierarchical clustering
url https://peerj.com/articles/cs-438.pdf
work_keys_str_mv AT salvatorecarta eventdetectioninfinanceusinghierarchicalclusteringalgorithmsonnewsandtweets
AT sergioconsoli eventdetectioninfinanceusinghierarchicalclusteringalgorithmsonnewsandtweets
AT lucapiras eventdetectioninfinanceusinghierarchicalclusteringalgorithmsonnewsandtweets
AT alessandrosebastianpodda eventdetectioninfinanceusinghierarchicalclusteringalgorithmsonnewsandtweets
AT diegoreforgiatorecupero eventdetectioninfinanceusinghierarchicalclusteringalgorithmsonnewsandtweets
_version_ 1721443004848275456