Publication date: 21 de May, 2024

GlórIA: the new Portuguese-European Large Language Model

  • GlórIA is the new Portuguese LLM, developed at the NOVA LINCS Multimodal Systems Group. GlórIA is a PT-PT top-performing LLM, capable of generating high-quality texts on a multitude of topics, such as History, Environment, Culinary, and many more. Led by Prof. David Semedo and together with Ricardo Lopes and Prof. João Magalhães, the team has released GlórIA, the first generative LLM trained on a large high-quality Portuguese corpus of over 35 billion tokens, comprising a highly diverse set of sources (e.g. Wikipedia, News, Dialogs and Web Pages), created partially in collaboration with, the Portuguese Web Archive.The model quality and generative characteristics make it a highly suitable model to be used to address a wide range of Natural Language Processing tasks, such as Dialog, Summarization, and Information-extraction.

    GLórIA is the first open peer-reviewed Portuguese-European Large Language Model (LLM). It is a step towards advancing Portuguese-focused Natural Language Processing and Artificial intelligence systems, contributing to the democratization of high-performant LLMs.

    Having achieved this milestone, the team is now working on further extending GlórIA to domain-specific scenarios, and expanding it to even larger language models.

    More details are provided in GlórIA’s paper, presented at PROPOR 2024: GlórIA: A Generative and Open Large Language Model for Portuguese. Both the pre-trained models, benchmark, and source code, are publicly available here and here.

