A Reproducible IT-Blog Corpus

The dataset comprises text and metadata extracted from several hundred IT-blogs and websites, along with a method to duplicate the data by updating its contents and downloading it to the user’s local machine. The targets have been hand-picked with the intention to represent the discourse on blogs an...

Full description

Bibliographic Details
Main Authors: Adrien Barbaresi, Jens Pohlmann
Format: Article
Language:English
Published: Ubiquity Press 2021-07-01
Series:Journal of Open Humanities Data
Subjects:
Online Access:https://openhumanitiesdata.metajnl.com/articles/35