MULTIMODAL SYSTEMS

PI:   João Magalhães


  • The Multimodal Systems (MS) group is a world-wide recognized group in human vision and language AI, mixed reality environments, rehabilitation technologies and simulation. Our goal is to understand humans and research tools and algorithms that close the gap between human need sand computational systems. We will continue to publish our scientific works at top international venues, e.g. MM, SIGCHI, ACL, CVPR, WSDM, ECIR, IUI, UIST.

    Bringing the new generation of LLMs and LVLMs closer to the way humans' reason is the driving force behind many of the MS research questions: How can LVLMs be more controllable and trustable? How to improve the factual consistency of generative LVLMs? What is the role of retrieval-augmented LVLMs in foundational models? How to steer foundational models to generate long-term, grounded videos? Can LVLMs enable anomalous situations recognition in video surveillance? Can LVLMs address theory-of-mind problems in the physical world in collaboration with humans? Solving these challenges will introduce a step-change in V&L AI.
  • In the area of Mixed Reality and Cultural Heritage, we will deepen the understanding of the main challenge, how can virtual representations of cultural artifacts be widely used by expert and non-expert users? This can be further specialized as, what are the new interaction and collaboration techniques that such an infrastructure affords? What is the impact on other scientific areas and culture access for such a paradigm?

    Leveraged by our expertise in rehabilitation technologies, we will seek real-world impact through targeted research questions: How can motor and cognitive functions answer real-world activities such as Instrumental Activities of Daily Living? Can simulation-based training tools improve the diagnostic capabilities and empathic communication skills of healthcare professionals? Through these efforts, our vision is to drive innovation in rehabilitation and healthcare.

RESEARCH GROUPS