Robust clustering of languages across Wikipedia growth

Wikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing support. While the English Wikipedia is the most extensive and the most researched one with over 5 million articles, comparatively little is known about the behaviour and growth of the remaining 283 sma...

Full description

Bibliographic Details
Main Authors: Kristina Ban, Matjaž Perc, Zoran Levnajić
Format: Article
Language:English
Published: The Royal Society 2017-01-01
Series:Royal Society Open Science
Subjects:
Online Access:https://royalsocietypublishing.org/doi/pdf/10.1098/rsos.171217
id doaj-17ca5fc5ccda4626921d5ec98d587fed
record_format Article
spelling doaj-17ca5fc5ccda4626921d5ec98d587fed2020-11-25T04:00:47ZengThe Royal SocietyRoyal Society Open Science2054-57032017-01-0141010.1098/rsos.171217171217Robust clustering of languages across Wikipedia growthKristina BanMatjaž PercZoran LevnajićWikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing support. While the English Wikipedia is the most extensive and the most researched one with over 5 million articles, comparatively little is known about the behaviour and growth of the remaining 283 smaller Wikipedias, the smallest of which, Afar, has only one article. Here, we use a subset of these data, consisting of 14 962 different articles, each of which exists in 26 different languages, from Arabic to Ukrainian. We study the growth of Wikipedias in these languages over a time span of 15 years. We show that, while an average article follows a random path from one language to another, there exist six well-defined clusters of Wikipedias that share common growth patterns. The make-up of these clusters is remarkably robust against the method used for their determination, as we verify via four different clustering methods. Interestingly, the identified Wikipedia clusters have little correlation with language families and groups. Rather, the growth of Wikipedia across different languages is governed by different factors, ranging from similarities in culture to information literacy.https://royalsocietypublishing.org/doi/pdf/10.1098/rsos.171217wikipedialanguagegrowth dynamicsdata analysisclustering
collection DOAJ
language English
format Article
sources DOAJ
author Kristina Ban
Matjaž Perc
Zoran Levnajić
spellingShingle Kristina Ban
Matjaž Perc
Zoran Levnajić
Robust clustering of languages across Wikipedia growth
Royal Society Open Science
wikipedia
language
growth dynamics
data analysis
clustering
author_facet Kristina Ban
Matjaž Perc
Zoran Levnajić
author_sort Kristina Ban
title Robust clustering of languages across Wikipedia growth
title_short Robust clustering of languages across Wikipedia growth
title_full Robust clustering of languages across Wikipedia growth
title_fullStr Robust clustering of languages across Wikipedia growth
title_full_unstemmed Robust clustering of languages across Wikipedia growth
title_sort robust clustering of languages across wikipedia growth
publisher The Royal Society
series Royal Society Open Science
issn 2054-5703
publishDate 2017-01-01
description Wikipedia is the largest existing knowledge repository that is growing on a genuine crowdsourcing support. While the English Wikipedia is the most extensive and the most researched one with over 5 million articles, comparatively little is known about the behaviour and growth of the remaining 283 smaller Wikipedias, the smallest of which, Afar, has only one article. Here, we use a subset of these data, consisting of 14 962 different articles, each of which exists in 26 different languages, from Arabic to Ukrainian. We study the growth of Wikipedias in these languages over a time span of 15 years. We show that, while an average article follows a random path from one language to another, there exist six well-defined clusters of Wikipedias that share common growth patterns. The make-up of these clusters is remarkably robust against the method used for their determination, as we verify via four different clustering methods. Interestingly, the identified Wikipedia clusters have little correlation with language families and groups. Rather, the growth of Wikipedia across different languages is governed by different factors, ranging from similarities in culture to information literacy.
topic wikipedia
language
growth dynamics
data analysis
clustering
url https://royalsocietypublishing.org/doi/pdf/10.1098/rsos.171217
work_keys_str_mv AT kristinaban robustclusteringoflanguagesacrosswikipediagrowth
AT matjazperc robustclusteringoflanguagesacrosswikipediagrowth
AT zoranlevnajic robustclusteringoflanguagesacrosswikipediagrowth
_version_ 1724449176380506112