SAW's LexFCS Endpoint with Solr

The Saxon Academy of Sciences and Humanities in Leipzig has released an easy to setup LexFCS endpoint implementation. The FCS endpoint is primarily for lexical resources described in the new LexFCS Specification and intended to serve as reference implementation. It comes bundled with the Apache Solr search engine to support search and retrieval of structured lexical data. The endpoint is ready for use in no time (quickstart :high_voltage:).

Data format and query language

Lexical resources differ from standard text resources found in the CLARIN Federated Content Search in that search and results are centered around lexical entries that are structured into various fields (lemma, pos, sentiment, …) with values and optional attributes (language, references). A custom query language LexCQL has been introduced to support querying this structure.

Apache Solr has been chosen for indexing this structured data by using nested documents with a generic field list. The FCS endpoint implementation will rewrite incoming LexCQL queries into complex Solr Block Join Queries that enable search requests such as lang = “deu” AND translation =/lang=eng “member of parliament” (search for german lexical entries that have a “translation” field with an English values of “member of parliament”). The Solr configuration may need some updates when handling non-latin languages due to different tokenization rules etc.

Support your own lexical resources

The FCS endpoint implementation comes with a tiny example resource (based on the English Wiktionary) to easily test out the endpoint. Adding new resources is fairly straightforward but generally requires some kind of conversion into the Solr data format used here.

Documentation and example configuration for custom deployments are available.

Related

1 Like