Bringing Context to Deep Neural Models for Media Understanding
The way we consume information is multimodal, with different media such as images and texts conveying distinct but complementary perspectives. Naturally, many complex learning problems require this joint modeling of media, where the semantic gap between modalities needs to be addressed.
In this talk, I will walk you through my latest research and contributions with my group towards this endeavour, in seeking context-enriched multimodal neural models for a variety of domains. These contributions span over multiple facets of the problem: from data collection/cleaning/indexing, neural models design, loss functions and optimization, to context-enriched representation learning where extra dimensions (such as time) are brought into play.
Applications of these backbone models to a set of challenging multimedia tasks will be showcased, such as the extraction of diachronic media insights from large-scale data (COGNITUS H2020 project), multimodal news understanding (NewsVisualSeek) and task-oriented multimodal agents (CMU-Portugal iFetch and Amazon AlexaPrize Taskbot Challenge projects).
David Semedo is an Assistant Professor at the NOVA School of Science and Technology, and an integrated member at NOVA LINCS research center. He holds a Ph.D. in Computer Science, and his research focuses on multimodal deep learning, machine learning and data mining approaches for Multimedia Understanding.
In this research stream, he seeks models that exploit patterns of media data to solve real-world problems, over large-scale collections (in space and time), by addressing the semantic gap between vision and language.
He has been involved in multiple national and international projects and industry collaborations, with applications over inherently multimodal data domains such as social media, data archives, news and others.