Ravnur BLARK 1.0

This project started its work on gathering and creating language resources for Faroese in January 2019 and is set to end with the release of BLARK 1.0 in July 2022. The aim was to create open-source resources that can be used for language technology for Faroese, while the main goal for this project group was to get resources that can be used for Faroese automatic speech recognition (ASR). Download size: 27 GB.

Tak niður

This is the release of the Basic Language Resource Kit (BLARK 1.0) for Faroese made by the Project Group Ravnur under the Talutøkni Foundation (https://mtd.setur.fo).

  • Licensed by CC BY 4.0
  • Transcribed recordings, approx. 100 hours
  • Full form dictionary with phonetic transcription, 24.000 lemmas
  • Guides in Faroese and English
  • PAROLE (part of speech) for Faroese
  • Faroese SAMPA
  • Background text corpus, approx. 25 million words

Recommended use: Automatic speech recognition (ASR), text-to-speech (TTS), natural language processing (NLP)

Audio: The audio was collected by recording speakers reading texts. The participants are aged 18-83, divided into 3 age groups. Recordings are made of 249 female speakers and 184 male speakers – 433 speakers total. The recordings were made on TASCAM DR-40 Linear PCM audio recorders using the built-in stereo microphones in WAVE 16 bit with a sample rate of 48kHz.

Text: The text being read by the participants includes text from the news, blogs, Wikipedia, law etc. These texts were edited to fit our format. We also had texts within specific domains such as Faroese place names, numbers, licence plates, telling time etc. These texts were written by us.

You can read more about the making of BLARK 1.0 here: http://www.lrec-conf.org/proceedings/lrec2022/pdf/2022.lrec-1.495.pdf

Release: 01.07.2022
Contact: mtd@setur.fo

Útgevari

,

Nýtsla

,

Snið

, , ,

Mál

Loyvi

Ravnur BLARK 1.0Ravnur BLARK 1.0
Scroll to Top