We would like to share an interesting achievement. CLARIN’s Virtual Language Observatory (VLO) now harvests the metadata records of datasets in Radboud University’s Data Repository (RDR, https://data.ru.nl/) that are published in the domain of Humanities. This was done using DataCite as a hinge. Here is a more detailed description how this was done technically (thanks to Twan Goosen!):
- A query was defined that filters the RDR records out of the total set
of RU records aggregated by DataCite.org - This query is used to retrieve records by means of the OAI-PMH
protocol from DataCite’s dedicated endpoint according to a fixed
schedule (once per week)
(we can use the query mentioned
above thanks to the fact that DataCite offers dynamic sets, as a
transparent extension of the OAI-PMH specifications) - Directly after retrieving the records, they are converted from the
DataCite metadata format to CMDI using a stylesheet - Once these and other metadata sets (from other endpoints) have been
retrieved, the VLO (re)processes all of its metadata records and uses
these to build the database that contains the content is visible and
searchable in the VLO. Part of this process is extracting information
from the properties within the original metadata onto the predefined
fields and facets, leveraging the semantic interoperability features of
CMDI.
If your university also has a data repository with metadata records in DataCite, and you would like to increase the visibility of these datasets in CLARIN’s VLO, then we encourage you to contact vlo@clarin.eu.
Henk van den Heuvel
Datasteward Faculty of Arts, Radboud University, Nijmegen, the Netherlands
Member of CLARIN’s Board of Directors