Foroyskt is a lemmatizer (Python-based tool) designed to process and analyze Faroese language data. Built as a fork of Miðeind’s BinPackage, it incorporates data from Føroyski Bendingargrunnurin, adapting its functionality to Faroese morphology. The package supports features like word lookups, grammatical inflections, and compound word recognition.
The tool compresses large datasets, such as Føroyski Bendingargrunnurin’s KRISTINsnid, into an optimized binary structure for fast lookups. It also handles Faroese-specific elements, including additional suffixes, prefixes, and the unique character set, ensuring accurate results tailored to the language. For users, it provides both basic and advanced querying options, returning detailed information about lemmas, inflectional forms, and grammatical features.
Installation is straightforward via GitHub, requiring some setup to download and prepare the KRISTINsnid data. Foroyskt is available under the MIT license, with resources and documentation to guide users in integrating it into their projects.
Release: 2024
Contact: annika@hi.is





