Research on the Development and Application of the GDELT Event Database

This study investigates the development and application of the GDELT (Global Database of Events, Language, and Tone) news database. Through experiments, we conducted a quantitative statistical analysis of the GDELT event database to evaluate its practical characteristics. The results indicate that a...

Full description

Bibliographic Details
Published in:Data
Main Authors: Dengxi Hong, Zexin Fu, Xin Zhang, Yan Pan
Format: Article
Language:English
Published: MDPI AG 2025-10-01
Subjects:
Online Access:https://www.mdpi.com/2306-5729/10/10/158
Description
Summary:This study investigates the development and application of the GDELT (Global Database of Events, Language, and Tone) news database. Through experiments, we conducted a quantitative statistical analysis of the GDELT event database to evaluate its practical characteristics. The results indicate that although the database achieves comprehensive coverage across all countries and regions and includes most major global media outlets, the accuracy rate of its key fields is only approximately 55%, with a data redundancy as high as 20%. Based on these findings, while the GDELT data demonstrates good coverage and data integrity, data correction and deduplication are recommended before its use in research contexts and industrial applications. Subsequently, a survey of the existing literature reveals that current studies using GDELT primarily focused on event-related metrics, such as event quantity, tone, and GoldsteinScale, for application in international relations analysis, crisis event prediction, policy effectiveness testing, and public opinion impact analysis. Nevertheless, news constitutes a fundamental channel of information dissemination in media networks, and the propagation of news events through these networks represents a critical area of study for information recommendation, public opinion guidance, and crisis intervention. Existing research has employed the Event, GKG, and Mentions tables to construct cross-national news flow network models. However, the informational correlations across different data table fields have not been fully leveraged in preliminary data selection, leading to substantial computational overhead. To advance research in this field, this study employs chained list queries on the Event and Mentions tables within GDELT. Using social network analysis, we constructed a media co-occurrence network of event reports, through which core hubs and associative relationships within the event dissemination network are identified.
ISSN:2306-5729