Nordic LangID can be used to make automatic language identification. With the texts in this dataset discrimination can be made between six Nordic language: Danish, Swedish, Norwegian (Nynorsk), Norwegian (Bokmål), Faroese and Icelandic.
The dataset has two parts, one with 10K samples per language and another with 50K per language.
For more info, see the paper: Discriminating Between Similar Nordic Languages.
Release: April 2021
Contact: renha@itu.dk





