Can I get VLO visitor statistics for records that originate from my centre or national consortium?

Representatives of centres may want to understand and quantify the exposure to their resources offered by the VLO. CLARIN collects visitor statistics for web applications including the VLO using Matomo. These offer limited insight into the quantity and quality of visits to the applications in general, and optionally also to predefined subsets of visits based on visitor or content characteristics.

Available statistics

The Segmentation functionality in Matomo can be used to obtain statistics within a predefined scope of visits. The URL of the record page of the VLO can be used to restrict the considered visits to a set in which records from one or more specific metadata providers were viewed. Reports based on such segments can be generated automatically at set intervals, and the Matomo web interface can be used to inspect visitor characteristics, behaviours and acquisition data for within any specific time range.

Some data and metrics that can be viewed or included in scheduled reports for the defined segment are:

  • Number of visits that include a visit to one or more matching records

  • Number of pageviews within those visits (including other pages)

  • Geographical characteristics of visitors

  • Technical characteristics of visitors (device, operating system, browser etc)

  • Referrers

  • Search terms

  • Page transitions

  • Some predefined actions such as facet interaction

Limitations / caveats

It is important to note that the information that can be derived from these statistics is limited by several factors. Most importantly:

  • Some visits may go untracked (as per CLARIN’s policy to respect so-called “Do Not Track” headers, see CE-2015-0528 and the Matomo FAQ on this feature)

  • Some tracked visits may represent requests from bots (crawlers, spiders etc.) rather than actual users

  • The workings and semantics of specific metrics may not be intuitive, therefore the documentation of Matomo should be consulted before interpreting these

When using Segments in Matomo, it is important to be aware of the facts that Segmentation happens at the level of visits, which can include a series of actions by a user which may be outside the scope of the filter defined for the facet. For instance, the “pageview” count shown in reports and on the web interface do not necessarily represent the number of views just for records by the targeted metadata provider(s) but rather will be equal to or higher than that. Therefore, some manual evaluation is needed to find the correctly scoped pageviews metric.

Concrete steps

  • Determine the URL pattern to narrow down

    • The URL for the record page in the VLO follows the following pattern:

      https://vlo.clarin.eu/record/``<recordId>

      Where is a sanitised version of the CMDI record’s MdSelfLink header value. If there is no MdSelfLink, the record ID is based on the path and filename of the records after harvesting.

    • The supported approach is to rely on a PID based MdSelfLink. Make sure that the provided records reliably contain such a header.

  • Sending an e-mail to matomo@clarin.eu, requesting:

    • the configuration of Segment for the determined record page URL pattern(s) (see the previous point) at https://stats.clarin.eu

    • the creation of a read-only Matomo account at https://stats.clarin.eu

    • scheduling an automated report, indicating:

      • to which e-mail address(es) the reports should be sent

      • how often to receive a report by e-mail

      • what information (which metrics) to have included in the report