This is the release of the Basic Language Resource Kit (BLARK 1.0) for Faroese made by the Project Group Ravnur under the Talutøkni Foundation (https://mtd.setur.fo).
- Licensed by CC BY 4.0
- Transcribed recordings, approx. 100 hours
- Full form dictionary with phonetic transcription, 24.000 lemmas
- Guides in Faroese and English
- PAROLE (part of speech) for Faroese
- Faroese SAMPA
- Background text corpus, approx. 25 million words
Recommended use: Automatic speech recognition (ASR), text-to-speech (TTS), natural language processing (NLP)
Audio: The audio was collected by recording speakers reading texts. The participants are aged 18-83, divided into 3 age groups. Recordings are made of 249 female speakers and 184 male speakers – 433 speakers total. The recordings were made on TASCAM DR-40 Linear PCM audio recorders using the built-in stereo microphones in WAVE 16 bit with a sample rate of 48kHz.
Text: The text being read by the participants includes text from the news, blogs, Wikipedia, law etc. These texts were edited to fit our format. We also had texts within specific domains such as Faroese place names, numbers, licence plates, telling time etc. These texts were written by us.
You can read more about the making of BLARK 1.0 here: http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.495.pdf
Release: 01.07.2022
Contact: mtd@setur.fo



