Multimodal learning leverage multiple and diverse modalities such as images, text, or audio to enable contextual understanding and reliable decision-making. These methods have achieved state-of-the-art results by integrating multiple modalities in areas such as medical imaging, autonomous driving, and visual surveillance. However, the effectiveness of multimodal learning in real-world scenarios remains limited by practical challenges: data from some modalities may be missing, corrupted, or poorly aligned due to sensor failures, environmental noise, or bandwidth constraints. While prior surveys have proposed taxonomies on multimodal learning and strategies for handling missing or corrupted data, these perspectives are often treated in isolation. This separation overlooks the fact that data imperfections are interconnected, and effective multimodal learning requires a unified understanding across architectural design and modality reliability. To address this gap, we present comprehensive taxonomies that cover and integrate three major aspects: (1) architectural design for multimodal learning, (2) learning under missing modalities, and (3) learning under corrupted modalities. By framing these aspects together, our study highlights their interdependencies and facilitates a comprehensive understanding of multimodal learning. Furthermore, we discuss benchmark datasets and real-world applications through this taxonomic lens and outline open challenges and future directions for developing resilient methods.

Multimodal learning under imperfect data conditions: a survey / Liaqat, Muhammad Irzam; Abbas, Qaiser; Nawaz, Shah; Zaheer, Zaigham; Moscati, Marta; Hou, Yufang; Khan Muhammad, Haris; Khan, Salman; Andre, Elisabeth; Schedl, Markus. - (2025). [10.36227/techrxiv.176410566.65375877/v1]

Multimodal learning under imperfect data conditions: a survey

Liaqat Muhammad Irzam;
2025

Abstract

Multimodal learning leverage multiple and diverse modalities such as images, text, or audio to enable contextual understanding and reliable decision-making. These methods have achieved state-of-the-art results by integrating multiple modalities in areas such as medical imaging, autonomous driving, and visual surveillance. However, the effectiveness of multimodal learning in real-world scenarios remains limited by practical challenges: data from some modalities may be missing, corrupted, or poorly aligned due to sensor failures, environmental noise, or bandwidth constraints. While prior surveys have proposed taxonomies on multimodal learning and strategies for handling missing or corrupted data, these perspectives are often treated in isolation. This separation overlooks the fact that data imperfections are interconnected, and effective multimodal learning requires a unified understanding across architectural design and modality reliability. To address this gap, we present comprehensive taxonomies that cover and integrate three major aspects: (1) architectural design for multimodal learning, (2) learning under missing modalities, and (3) learning under corrupted modalities. By framing these aspects together, our study highlights their interdependencies and facilitates a comprehensive understanding of multimodal learning. Furthermore, we discuss benchmark datasets and real-world applications through this taxonomic lens and outline open challenges and future directions for developing resilient methods.
2025
Representation learning, Missing modalities, Corrupted modalities, Taxonomy
File in questo prodotto:
File Dimensione Formato  
v1.pdf

accesso aperto

Descrizione: Multimodal Learning Under Imperfect Data Conditions: A Survey
Tipologia: Documento in Pre-print
Licenza: Creative commons
Dimensione 684.72 kB
Formato Adobe PDF
684.72 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/40200
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
social impact