Nordic LangID

Resource Type:

A cllection of texts in different Nordic languages that can be used for language identification.

Open

Nordic LangID can be used to make automatic language identification. With the texts in this dataset discrimination can be made between six Nordic language: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokmål), Faroese and Icelandic.

The dataset has two parts, one with 10K samples per language and another with 50K per language.

For more info, see the paper: Discriminating Between Similar Nordic Languages.

Release: April 2021
Contact: renha@itu.dk

Publisher

Uses

Format

Language(s)

, , , , ,

License

Nordic LangIDNordic LangID
Scroll to Top