Autoload: a pipeline for expanding the holdings of an Institutional Repository enabled by ResourceSync

Providing local access to locally produced content is a primary goal of the Institutional Repository (IR). Guidelines, requirements, and workflows are among the ways in which institutions attempt to ensure this content is deposited and preserved, but some content is always missed. At Los Alamo...

Full description

Bibliographic Details
Main Authors: James Powell, Martin Klein, Herbert Van de Sompel
Format: Article
Language:English
Published: Code4Lib 2017-04-01
Series:Code4Lib Journal
Online Access:http://journal.code4lib.org/articles/12427
id doaj-f8176b9ed0db48fb89498c7d74576e89
record_format Article
spelling doaj-f8176b9ed0db48fb89498c7d74576e892020-11-25T03:40:28ZengCode4LibCode4Lib Journal1940-57582017-04-013612427Autoload: a pipeline for expanding the holdings of an Institutional Repository enabled by ResourceSyncJames PowellMartin KleinHerbert Van de SompelProviding local access to locally produced content is a primary goal of the Institutional Repository (IR). Guidelines, requirements, and workflows are among the ways in which institutions attempt to ensure this content is deposited and preserved, but some content is always missed. At Los Alamos National Laboratory, the library implemented a service called LANL Research Online (LARO), to provide public access to a collection of publicly shareable LANL researcher publications authored between 2006 and 2016. LARO exposed the fact that we have full text for only about 10% of eligible publications for this time period, despite a review and release requirement that ought to have resulted in a much higher deposition rate. This discovery motivated a new effort to discover and add more full text content to LARO. Autoload attempts to locate and harvest items that were not deposited locally, but for which archivable copies exist. Here we describe the Autoload pipeline prototype and how it aggregates and utilizes Web services including Crossref, SHERPA/RoMEO, and oaDOI as it attempts to retrieve archivable copies of resources. Autoload employs a bootstrapping mechanism based on the ResourceSync standard, a NISO standard for data replication and synchronization. We implemented support for ResourceSync atop the LARO Solr index, which exposes metadata contained in the local IR. This allowed us to utilize ResourceSync without modifying our IR. We close with a brief discussion of other uses we envision for our ResourceSync-Solr implementation, and describe how a new effort called Signposting can replace cumbersome screen scraping with a robust autodiscovery path to content which leverages the Web protocol.http://journal.code4lib.org/articles/12427
collection DOAJ
language English
format Article
sources DOAJ
author James Powell
Martin Klein
Herbert Van de Sompel
spellingShingle James Powell
Martin Klein
Herbert Van de Sompel
Autoload: a pipeline for expanding the holdings of an Institutional Repository enabled by ResourceSync
Code4Lib Journal
author_facet James Powell
Martin Klein
Herbert Van de Sompel
author_sort James Powell
title Autoload: a pipeline for expanding the holdings of an Institutional Repository enabled by ResourceSync
title_short Autoload: a pipeline for expanding the holdings of an Institutional Repository enabled by ResourceSync
title_full Autoload: a pipeline for expanding the holdings of an Institutional Repository enabled by ResourceSync
title_fullStr Autoload: a pipeline for expanding the holdings of an Institutional Repository enabled by ResourceSync
title_full_unstemmed Autoload: a pipeline for expanding the holdings of an Institutional Repository enabled by ResourceSync
title_sort autoload: a pipeline for expanding the holdings of an institutional repository enabled by resourcesync
publisher Code4Lib
series Code4Lib Journal
issn 1940-5758
publishDate 2017-04-01
description Providing local access to locally produced content is a primary goal of the Institutional Repository (IR). Guidelines, requirements, and workflows are among the ways in which institutions attempt to ensure this content is deposited and preserved, but some content is always missed. At Los Alamos National Laboratory, the library implemented a service called LANL Research Online (LARO), to provide public access to a collection of publicly shareable LANL researcher publications authored between 2006 and 2016. LARO exposed the fact that we have full text for only about 10% of eligible publications for this time period, despite a review and release requirement that ought to have resulted in a much higher deposition rate. This discovery motivated a new effort to discover and add more full text content to LARO. Autoload attempts to locate and harvest items that were not deposited locally, but for which archivable copies exist. Here we describe the Autoload pipeline prototype and how it aggregates and utilizes Web services including Crossref, SHERPA/RoMEO, and oaDOI as it attempts to retrieve archivable copies of resources. Autoload employs a bootstrapping mechanism based on the ResourceSync standard, a NISO standard for data replication and synchronization. We implemented support for ResourceSync atop the LARO Solr index, which exposes metadata contained in the local IR. This allowed us to utilize ResourceSync without modifying our IR. We close with a brief discussion of other uses we envision for our ResourceSync-Solr implementation, and describe how a new effort called Signposting can replace cumbersome screen scraping with a robust autodiscovery path to content which leverages the Web protocol.
url http://journal.code4lib.org/articles/12427
work_keys_str_mv AT jamespowell autoloadapipelineforexpandingtheholdingsofaninstitutionalrepositoryenabledbyresourcesync
AT martinklein autoloadapipelineforexpandingtheholdingsofaninstitutionalrepositoryenabledbyresourcesync
AT herbertvandesompel autoloadapipelineforexpandingtheholdingsofaninstitutionalrepositoryenabledbyresourcesync
_version_ 1724534648603672576