Research Activities

Culture-Sensitive Assessment and Adjustment of Large Language Models – Adaptation to the Nordic-Baltic Societies
Project (2026–2028)
The Centre for Language Technology is a central partner in this cross-national project, which is dedicated to the culture-sensitive assessment and adjustment of Large Language Models (LLMs) specifically for the Nordic and Baltic societies. While current AI technology is often dominated by English-centric, monocultural perspectives that can unintentionally erode regional values, our work focuses on ensuring that these models are responsibly adapted to respect the linguistic and cultural diversity of our regions. This initiative is rooted in the belief that AI should be a transparent and inclusive tool that reinforces, rather than diminishes, deeply held societal values like equality, democracy, and trust.
As part of this mission, we will be responsible for the Faroese side of the project. This involvement will result in the creation of new, high-quality Faroese linguistic resources that will be invaluable not only for aligning AI but also for a wider range of other local research and digital applications. Beyond these datasets, the project will deliver advanced methods for explaining and assessing how LLMs handle regional nuances and will ultimately result in the direct alignment of several models to ensure they function responsibly within our specific societal contexts. By developing these open-source, multi-parallel datasets, we are making the unique cultural heritage of the Faroe Islands an explicit part of the technological landscape.
By sharing best practices and resources across borders, the consortium aims to advance the state-of-the-art in language technology while ensuring a more culturally aware perspective in AI development. This collaborative effort allows our respective countries to implement technologies that are authentically customized and ethically grounded in our shared regional norms.
The project is funded by Nordforsk.

Evaluating Faroese: Towards Developing a Benchmarking Evaluation Framework
In the spring of 2024, Iben Nyholm Debess started a Ph.D. project at the Centre for Language Technology.
The project investigates the assessment of Faroese language technology and develop an evaluation framework with specific tasks in Natural Language Understanding and Natural Language Generation. This evaluation framework will serve as a benchmark for Faroese language technology and will enable examination and evaluation of tools and language models.
Evaluation results are valuable and provide a basis for comparison. A consistent baseline for comparison fosters quality development and ensures transparency.
In the development process, various approaches will be explored that are particularly well suited for small languages.
These results are valuable for other small language communities facing similar challenges to the Faroese language.
The Faroese MegaWord Corpus
The aim of the Faroese MegaWord Corpus (Miklamálgrunnurin) project is to build a large Faroese representative text corpus. The corpus will consist of texts from a broad variety of sources and will include morphosyntactic tags.
The texts will be processed to avoid copyright infringement.

The corpus will be published under open license on Faroese domain, and will be accessible to anyone to download or for online queries.
The project is both a language technology and a linguistic project that will enhance research on Faroese within the fields of language technology and linguistics, and at the same time it will be beneficial to the public. It is valuable to be able to search for words and sentences in one’s mother tongue, both to people in private, but also in education and, crucially, for language preservation.
The Faroese MegaWord Corpus will serve as a pillar in the linguistic resource pool when it comes to developing language technology and technological language aids, internationally and locally. The large size of the corpus will have a direct and positive impact on the quality of the models and tools developed. The categorised metadata will be of value in language technology work as developers are able to use different parts of the corpus for different purposes.
The project is a collaboration between The University of the Faroe Islands and The Árni Magnússon Institute for Icelandic Studies. The Faroese MegaWord Corpus project’s working frame and end product is inspired by the Icelandic GigaWord Corpus Project, which was managed by the Àrni Magnússon Institute.
The project is funded with a grant from Nordplus Nordic Languages and spans over two years.
PhD project in Faroese Speech Recognition

On April 2, 2024, Dávid í Lág started a new PhD research position at the Faculty of Science and Technology at the University of the Faroe Islands. He holds a degree in computer science with a specialization in data science and machine learning from the IT University of Copenhagen.
The Ph.D. project focuses on automatic speech recognition (ASR) for smaller languages, with emphasis on Faroese. By utilizing the latest AI techniques in machine learning and using multilingual ASR models that include over 100 languages, the project will explore how it is now possible to create a Faroese speech recognizer that performs well, even though the data foundation is relatively small.
The project will also investigate the possibilities of generating new data from existing datasets, producing audio from text, and using knowledge transfer between languages.
The principal supervisor is Jón Guðnason, professor at the University of Reykjavík, with Barbara Scalvini at the Faculty of Science and Technology at the University of the Faroe Islands as the co-supervisor. Iben Nyholm Debess at the Centre for Language Technology at the University of the Faroe Islands and Annika Simonsen at the University of Iceland are also collaborating on the project as linguists.
Faroese Machine Translation and Data Augmentation

Barbara Scalvini, who works at the Faculty of Science and Technology and at the Centre for Language Technology at the University of the Faroe Islands, is developing machine translation between Faroese and English. The Research Council has granted funding for the project. Barbara is joined by Iben Nyholm Debess, a PhD Scholar at the Faculty of Faroese Language.
Large Language Models have proven to be efficient at various downstream tasks, such as Machine Translation. This few shot learner capacity has opened up several opportunities for data augmentation, especially in the context of low-resource languages. We exploit this opportunity to create synthetic parallel sentences for the Faroese-English language pair, that we can then use to train a lightweight Neural Machine Translation model, easily deployable also for commercial applications.
Here we explore also how to optimize prompts and examples for few shot learning to obtain the best translation performance.
An exciting aspect of the project is that it requires the participation of a wide range of Faroese language users. You and I can use the machine translator to translate texts between Faroese and English. Afterwards, it’s possible to submit an evaluation of the translation and to make any necessary revisions to the translated text.
A crucial part of the project is that everything will be open source. Translation models, high‑quality bilingual datasets, and evaluation frameworks will be made available to everyone—researchers, developers, companies, and the general public—so that development can continue even after the project has been completed.
Týðingarpallurin verður atkomuligur hjá almenninginum í oktobur 2025, og verkætlanin verður liðug á vári í 2026.
TrustLLM
The TrustLLM project is a collaboration between various institutions and universities from European countries. The Centre for Language Technology at the University of the Faroe Islands functions as an external collaborator.

TrustLLM develops reliable multilingual large language models (LLMs), focusing on European languages and contexts. The main objective of TrustLLM is to develop an open, trustworthy, and factual LLM, initially targeting Germanic languages, including Faroese. The role of the Center for Language Technology in this project involves collecting Faroese language data and consulting on linguistic issues regarding Faroese. Consequently, Faroese will be included in the final model, giving us in the Faroe Islands the novel opportunity to apply the model in a Faroese setting.
One of the core values of TrustLLM is trust. Globally, language models are being developed at a rapid pace. However, it is not always transparent which data are used for development and what biases the data might introduce. Therefore, trust is crucial in the development of large language models. It ensures that users can rely on the models to provide accurate, ethically sound, and unbiased information. Creating trustworthy LLMs promotes safe integration into various applications and has a positive societal impact.

Faroese researcher Annika Simonsen is a Ph.D. student at the University of Iceland, and her project is part of TrustLLM. Annika is working on data alignment, which involves adjusting data and models to better align with human values. This is a highly important area of work. Annika focuses on the alignment of all languages in the project, with a particular emphasis on low-resource languages such as Faroese and Icelandic.
Read more about TrustLLM here and about Annika's work here. On this page you can see all the collaborators.
Adaptation of Large Language Models to Faroese
By using a variety of techniques, we can adapt and align Large Multilingual Models to Faroese. Such techniques include: few shot prompting, low rank adaptation, fine-tuning and so on.
Here, we ask ourselves the following questions:
- How much are hidden representations in Large Language Models shared between similar languages?
- Can we manipulate such hidden representations to provide a better match between concepts in different languages?
- Would such manipulation increase the performance of the large language model in the target language?
It is Barbara Scalvini at the Faculty of Science and Technology at the University of the Faroe Islands, who in collaboration with the Centre for Language Technology is in charge of this project.
