FAIR Metadata for Specialised Corpora - How can CLARIN support the communities?

egon · 7 October 2025 11:59

Returning to our presentation at the Annual Conference 2025: “Towards FAIR Metadata for Specialised Corpora: A Community-Informed Empirical Study of Schema Development in Two Communities” (slides), I wanted to link relevant Topics (from this forum):

but also emphasise that the usability of the annotated data is unclear - community-specific metadata, with its domain-specific complexity:

should it be integrated into existing infrastructures like the VLO?
(how) can it be integrated into/connected to the Resource Families?
how can it be made accessible (displayed/searched/browsed/…) to end users without reducing it to a common denominator?

and since this will (potentially) become relevant for a few more communities (~= K-centres, and likely C-Centres that would host the metadata - or B-centres) the question still remains:

How can CLARIN (ERIC / B,C,K-centres) support each other and the community efforts [for community-specific metadata]?

matthies · 24 October 2025 07:35

Dear Egon, to my knowledge there is indeed “only” CMDI as commonly agreed metastandard, but profiles (essentially XML schemas) vary. In Finland we use Profiles derived from the META-SHARE Schema in COMEDI. COMEDI currently exports them as clarin.eu:cr1:p_1361876010571. So for CLARIN you should use CMDI. But the world is bigger than CLARIN. I understood your talk as touching on the questions: If we have metadata in different granularity, how to we make sure that we are dealing with variants of the same thing? Example: (PID-1 is here the placeholder for a Handle used in CLARIN)

Very detailed metadata of dataset with “PID-1”: Contains names of subjects, etc. Sensitive.
Pseudomymized version of dataset with “PID-1” above. Less sensitive.
CMDI of dataset with PID-1: Describes the dataset, where and when and how it was created. Public, shown in VLO.
EOSC compatible HTML metadata of PID-1 in VLO landing page. Subset of CMDI. Increases FAIR score of fair tools
Subset of dataset with PID-1 exported to national service, like etsin.fairdata.fi. (The Language Bank data is available there for search)
Reference to dataset with PID in article. Has Author, Year, Name, repository, PID-1 (very small subset of the metadata)

My suggestion would now the following: PID-1 points to the CMDI descriptive metadata at the repository’s Metadata service (COMEDI in our case). This is the “master metadata”, subsets and supersets must be in sync with the data provided there. So if the superset of very detailed metadata mentions the Name of the dataset and it is not identical to CMDI, CMDI is the authoritative source.

Supersets should therefore not copy too much of the authoritative metadata, since it can be always found behind the PID.

The same holds true for subsets, like reference instructions. All sub and super sets need to contain the dataset PID (“PID-1”) as clear link between them. The CMDI metadata points to the data (via resource proxy). Also these pointers are authoritative.

If we can agree on this principle we can think of how to implement it. Descriptive Metadata does not change extremely often, but it does change, like due to incorrect creation which is detected later, etc. So mechanisms should be in place to deal with such changes.

What do you think?

Topic	Replies	Views
If there is no single metadata scheme, how should I describe my resources in order for them to be compatible with the CLARIN infrastructure? Metadata metadata	31	12 September 2024
What metadata scheme is used within CLARIN? Metadata metadata	12	12 September 2024
If I deliver CLARIN metadata today, can you make my metadata available to the broad public? Metadata metadata , vlo , harvesting	30	13 September 2024
I have a fedora repository. How can I offer my CMDI files over OAI-PMH? Metadata metadata , harvesting	48	12 September 2024
What versions of CMDI are there and which should I use? FAQ metadata	14	10 September 2024

FAIR Metadata for Specialised Corpora - How can CLARIN support the communities?

Related topics