A Tutorial at the IEEE International Conference on Image Processing (ICIP) 2025
This tutorial provides a comprehensive overview of robust multimodal learning, encompassing both foundational concepts and recent advancements. Its primary aim is to present the presenters’ perspective on this broad field, with a particular emphasis on underlying architectures and models rather than solely on performance analysis. The material is drawn from a wide range of academic papers, blogs, and other resources. The tutorial begins with an introduction to the fundamentals of multimodal learning, including data fusion, alignment, and representation learning. It then examines the challenges of ensuring robustness, especially in scenarios involving missing, unaligned, or noisy data. Following this, recent trends in multimodal learning and their applications are discussed. The session concludes with open questions and potential directions for future research.