Deep neural architectures require to be properly trained. Inherently, training a deep network implies the curation of a carefully annotated data set and the existence of a differentiable loss function. Two case-studies will be presented in this talk. First, when the collection and annotation of a large-scale dataset is too tedious, and an automatically annotated data set is used instead. Such new data set has the prominent advantage that is collected without much human effort, but the drawback of containing potentially many annotation errors and other kinds of outliers. Therefore, when training a deep network on such dataset, on must design losses that are robust to the kind of contamination present in the dataset. In our case, we propose a probabilistic loss based on a Gaussian-Uniform mixture, and design an EM algorithm to split outliers from inliners. Then use only the inliers to train the deep network. The second case-study happens when facing tasks, whose associated algorithms are evaluated with performance measures that are far from being differentiable. In those cases, the performance measure cannot be directly used as the training loss of the deep architecture, and on must find alternative proxy loss functions that approximate the behavior of the original evaluation metric. We will take the case of multi-object tracking, since the evaluation metrics need to go through a estimation-to-ground-truth assignment step usually solved by means of the Hungarian algorithm. Since this step is not differentiable, such evaluation metrics cannot be used to train the tracking network. To overcome this problem, we propose to train an auxiliary network to approximate the behavior of the Hungarian algorithm, but with a differentiable mapping: the Deep Hungarian Network (DHN). After its training step, DHN can be used to approximate the evaluation metrics with differentiable losses based on the assignment step done by the DHN.
Xavier Alameda-Pineda is a (tenured) Research Scientist at Inria, in the Perception Group. He obtained the M.Sc. in Mathematics in 2008, in Telecommunications in 2009 from BarcelonaTech and in Computer Science in 2010 from Université Grenoble-Alpes (UGA). He the worked towards his Ph.D. in Mathematics and Computer Science, and obtained it 2013, from UGA. After a two-year post-doc period at the Multimodal Human Understanding Group, at University of Trento, he was appointed with his current position. Xavier is an active member of SIGMM, and a senior member of IEEE. He is co-chairing the “Audio-visual machine perception and interaction for companion robots” chair of the Multidisciplinary Institute of Artificial Intelligence. Xavier has served as Area Chair in major computer vision and multimedia conferences, and he will be Program co-chair of ACM MM 2022. Xavier is also the coordinator of the H2020 ICT project SPRING, aiming to bring social robotics in gerontological healthcare. Xavier’s research interests are in combining machine learning, computer vision and audio processing for scene and behavior analysis and human-robot interaction.