A Content Analysis of Google Scholar: Coverage Varies by Discipline and by Database

Objective – To ascertain the coverage by discipline, publication date, publication language, and upload frequency of the scholarly articles found in Google Scholar. Design – Comparative content analyses. Setting – Electronic information resources accessible via the internet (both freely ac...

Full description

Bibliographic Details
Main Author: Virginia Wilson
Format: Article
Language:English
Published: University of Alberta 2007-03-01
Series:Evidence Based Library and Information Practice
Online Access:https://journals.library.ualberta.ca/eblip/index.php/EBLIP/article/view/150
id doaj-51e2ef65777d45fa9a738ce1a1efbd78
record_format Article
spelling doaj-51e2ef65777d45fa9a738ce1a1efbd782020-11-25T01:35:14ZengUniversity of AlbertaEvidence Based Library and Information Practice1715-720X2007-03-012110.18438/B8DW26A Content Analysis of Google Scholar: Coverage Varies by Discipline and by DatabaseVirginia Wilson0University of SaskatchewanObjective – To ascertain the coverage by discipline, publication date, publication language, and upload frequency of the scholarly articles found in Google Scholar. Design – Comparative content analyses. Setting – Electronic information resources accessible via the internet (both freely accessible and for-fee databases). Subjects – Forty-seven online databases and Google Scholar. Methods – The study compared the content of 47 databases (21 Internet resources freely available to the general public; 26 restricted-access databases) covering a variety of subjects with the content of Google Scholar. Each database was assigned to one of the following discipline categories: business, education, humanities, science and medicine, social science, and multidisciplinary. From April through July 2005, researchers generated random samples of 50 article titles from each of the 47 databases and searched the titles on Google Scholar to determine inclusion. Related studies were conducted for publication date and publication language analysis, and for the Google Scholar upload frequency study. For the publication date study, random samples from one database (PsycINFO) with a high degree of variability in Google Scholar coverage were searched for 1990, 2000, and 2004. For the publication language study, Google Scholar coverage of PsycINFO articles in English was compared to coverage of PsycINFO articles published in non-English languages. For the upload frequency study, two databases chosen for their high degree of coverage (BioMed Central and PubMed) were monitored to determine how often the new content was uploaded to Google Scholar. Main Results – This study revealed that content covered by Google Scholar varies greatly from database to database and from discipline to discipline. Of the 47 databases studied, coverage ranged from 6% to 100%. Mean and median values of coverage for all databases were both 60%. The mean discipline category scores varied from the humanities databases at 10% coverage, to the social sciences and education at 39% and 41% respectively, to science and medicine databases at 76% coverage. Mean coverage was 77% for the multidisciplinary databases. Mean coverage of open access journal databases was 95%, freely accessible databases had 84% mean coverage, and single publisher databases had 83% mean coverage. The publication language study found a bias towards English language publications. As well, a publication date bias was found – coverage of earlier dates was not as thorough as coverage of more recent publications. In the upload frequency study, for BioMed Central and PubMed there appears to be an approximately 15-week delay in the uploading of new material to Google Scholar. Conclusions – The results of this study serve to alert researchers and information professionals that Google Scholar (in beta test mode at the time of the study) has poor coverage in certain areas. To those with access to commercial databases, this serves as a cautionary tale. To those with a dearth of commercial databases, Google Scholar is a welcome site and can provide at least some information. The researchers state that the search engine itself could make future content studies unnecessary if it decides to make its content collection methodology transparent to users. Upload frequency, Google Scholar’s linking services, the advanced search option, and the “cited by” feature could all be subjects of future studies. For its first year in operation, Google Scholar offers a broad range of discipline coverage with substantial depth in some areas. At the time of the study, Google Scholar was working with libraries and vendors to connect search results to library-licensed full text.https://journals.library.ualberta.ca/eblip/index.php/EBLIP/article/view/150
collection DOAJ
language English
format Article
sources DOAJ
author Virginia Wilson
spellingShingle Virginia Wilson
A Content Analysis of Google Scholar: Coverage Varies by Discipline and by Database
Evidence Based Library and Information Practice
author_facet Virginia Wilson
author_sort Virginia Wilson
title A Content Analysis of Google Scholar: Coverage Varies by Discipline and by Database
title_short A Content Analysis of Google Scholar: Coverage Varies by Discipline and by Database
title_full A Content Analysis of Google Scholar: Coverage Varies by Discipline and by Database
title_fullStr A Content Analysis of Google Scholar: Coverage Varies by Discipline and by Database
title_full_unstemmed A Content Analysis of Google Scholar: Coverage Varies by Discipline and by Database
title_sort content analysis of google scholar: coverage varies by discipline and by database
publisher University of Alberta
series Evidence Based Library and Information Practice
issn 1715-720X
publishDate 2007-03-01
description Objective – To ascertain the coverage by discipline, publication date, publication language, and upload frequency of the scholarly articles found in Google Scholar. Design – Comparative content analyses. Setting – Electronic information resources accessible via the internet (both freely accessible and for-fee databases). Subjects – Forty-seven online databases and Google Scholar. Methods – The study compared the content of 47 databases (21 Internet resources freely available to the general public; 26 restricted-access databases) covering a variety of subjects with the content of Google Scholar. Each database was assigned to one of the following discipline categories: business, education, humanities, science and medicine, social science, and multidisciplinary. From April through July 2005, researchers generated random samples of 50 article titles from each of the 47 databases and searched the titles on Google Scholar to determine inclusion. Related studies were conducted for publication date and publication language analysis, and for the Google Scholar upload frequency study. For the publication date study, random samples from one database (PsycINFO) with a high degree of variability in Google Scholar coverage were searched for 1990, 2000, and 2004. For the publication language study, Google Scholar coverage of PsycINFO articles in English was compared to coverage of PsycINFO articles published in non-English languages. For the upload frequency study, two databases chosen for their high degree of coverage (BioMed Central and PubMed) were monitored to determine how often the new content was uploaded to Google Scholar. Main Results – This study revealed that content covered by Google Scholar varies greatly from database to database and from discipline to discipline. Of the 47 databases studied, coverage ranged from 6% to 100%. Mean and median values of coverage for all databases were both 60%. The mean discipline category scores varied from the humanities databases at 10% coverage, to the social sciences and education at 39% and 41% respectively, to science and medicine databases at 76% coverage. Mean coverage was 77% for the multidisciplinary databases. Mean coverage of open access journal databases was 95%, freely accessible databases had 84% mean coverage, and single publisher databases had 83% mean coverage. The publication language study found a bias towards English language publications. As well, a publication date bias was found – coverage of earlier dates was not as thorough as coverage of more recent publications. In the upload frequency study, for BioMed Central and PubMed there appears to be an approximately 15-week delay in the uploading of new material to Google Scholar. Conclusions – The results of this study serve to alert researchers and information professionals that Google Scholar (in beta test mode at the time of the study) has poor coverage in certain areas. To those with access to commercial databases, this serves as a cautionary tale. To those with a dearth of commercial databases, Google Scholar is a welcome site and can provide at least some information. The researchers state that the search engine itself could make future content studies unnecessary if it decides to make its content collection methodology transparent to users. Upload frequency, Google Scholar’s linking services, the advanced search option, and the “cited by” feature could all be subjects of future studies. For its first year in operation, Google Scholar offers a broad range of discipline coverage with substantial depth in some areas. At the time of the study, Google Scholar was working with libraries and vendors to connect search results to library-licensed full text.
url https://journals.library.ualberta.ca/eblip/index.php/EBLIP/article/view/150
work_keys_str_mv AT virginiawilson acontentanalysisofgooglescholarcoveragevariesbydisciplineandbydatabase
AT virginiawilson contentanalysisofgooglescholarcoveragevariesbydisciplineandbydatabase
_version_ 1725067724132974592