Detail

Publication date: 19 de March, 2026

AMALIA: An Open Source Large Language Model for European Portuguese

Despite rapid progress in open large language models (LLMs), European Portuguese (pt-PT) remains underrepresented in both training data and native evaluation, with machine-translated benchmarks likely missing the variant’s linguistic and cultural nuances. We introduce AMALIA a fully open LLM that prioritizes pt-PT by using more high-quality pt-PT data during both the mid- and post-training stages. To evaluate pt-PT more faithfully, we release a suite of pt-PT benchmarks that includes translated standard tasks and four new datasets targeting pt-PT generation, linguistic competence, and pt-PT/pt-BR bias. Experiments show that AMALIA matches strong baselines on translated benchmarks while substantially improving performance on pt-PT-specific evaluations, supporting the case for targeted training and native benchmarking for European Portuguese.

Presenter

João Magalhães (DI - NOVA FCT and NOVA LINCS),

URL http://meet.google.com/trr-aiyh-uyp
Date 25/03/2026 2:00 pm
Location DI Seminars Room and Google Meet
Host Bio Prof. João Miguel da Costa Magalhães is a Full Professor in the Department of Computer Science at Universidade NOVA de Lisboa and a senior researcher at NOVA LINCS. He serves as co-Director of the CMU Portugal Program and leads the NOVA LINCS Multimodal Systems Group. He holds a PhD from Imperial College London (2008) and his research focuses on vision-language models, multimodal learning, and AI systems for semantic multimedia. João has coordinated and contributed to international projects with R&D leading partners BBC, Amazon and Google. He is the Technical Program Committee Chair of the 2026 ACM International Conference in Multimedia and in 2022 was the General Chair.