Metadata university repository harvested in VLO: Radboud University PoC

We would like to share an interesting achievement. CLARIN’s Virtual Language Observatory (VLO) now harvests the metadata records of datasets in Radboud University’s Data Repository (RDR, https://data.ru.nl/) that are published in the domain of Humanities. This was done using DataCite as a hinge. Here is a more detailed description how this was done technically (thanks to Twan Goosen!):

  1. A query was defined that filters the RDR records out of the total set
    of RU records aggregated by DataCite.org
  2. This query is used to retrieve records by means of the OAI-PMH
    protocol from DataCite’s dedicated endpoint according to a fixed
    schedule (once per week)
    (we can use the query mentioned
    above thanks to the fact that DataCite offers dynamic sets, as a
    transparent extension of the OAI-PMH specifications)
  3. Directly after retrieving the records, they are converted from the
    DataCite metadata format to CMDI using a stylesheet
  4. Once these and other metadata sets (from other endpoints) have been
    retrieved, the VLO (re)processes all of its metadata records and uses
    these to build the database that contains the content is visible and
    searchable in the VLO. Part of this process is extracting information
    from the properties within the original metadata onto the predefined
    fields and facets, leveraging the semantic interoperability features of
    CMDI.

If your university also has a data repository with metadata records in DataCite, and you would like to increase the visibility of these datasets in CLARIN’s VLO, then we encourage you to contact vlo@clarin.eu.

Henk van den Heuvel

Datasteward Faculty of Arts, Radboud University, Nijmegen, the Netherlands

Member of CLARIN’s Board of Directors