Multimodal learning under imperfect data conditions: a survey

IRIS

Multimodal learning leverage multiple and diverse modalities such as images, text, or audio to enable contextual understanding and reliable decision-making. These methods have achieved state-of-the-art results by integrating multiple modalities in areas such as medical imaging, autonomous driving, and visual surveillance. However, the effectiveness of multimodal learning in real-world scenarios remains limited by practical challenges: data from some modalities may be missing, corrupted, or poorly aligned due to sensor failures, environmental noise, or bandwidth constraints. While prior surveys have proposed taxonomies on multimodal learning and strategies for handling missing or corrupted data, these perspectives are often treated in isolation. This separation overlooks the fact that data imperfections are interconnected, and effective multimodal learning requires a unified understanding across architectural design and modality reliability. To address this gap, we present comprehensive taxonomies that cover and integrate three major aspects: (1) architectural design for multimodal learning, (2) learning under missing modalities, and (3) learning under corrupted modalities. By framing these aspects together, our study highlights their interdependencies and facilitates a comprehensive understanding of multimodal learning. Furthermore, we discuss benchmark datasets and real-world applications through this taxonomic lens and outline open challenges and future directions for developing resilient methods.

Multimodal learning under imperfect data conditions: a survey / Liaqat, Muhammad Irzam; Abbas, Qaiser; Nawaz, Shah; Zaheer, Zaigham; Moscati, Marta; Hou, Yufang; Khan Muhammad, Haris; Khan, Salman; Andre, Elisabeth; Schedl, Markus. - (2025). [10.36227/techrxiv.176410566.65375877/v1]

Multimodal learning under imperfect data conditions: a survey

Liaqat Muhammad Irzam;Abbas Qaiser;Nawaz Shah;Zaheer Zaigham;Moscati Marta;Hou Yufang;Khan Muhammad Haris;Khan Salman;Andre Elisabeth;Schedl Markus

2025

Abstract

Multimodal learning leverage multiple and diverse modalities such as images, text, or audio to enable contextual understanding and reliable decision-making. These methods have achieved state-of-the-art results by integrating multiple modalities in areas such as medical imaging, autonomous driving, and visual surveillance. However, the effectiveness of multimodal learning in real-world scenarios remains limited by practical challenges: data from some modalities may be missing, corrupted, or poorly aligned due to sensor failures, environmental noise, or bandwidth constraints. While prior surveys have proposed taxonomies on multimodal learning and strategies for handling missing or corrupted data, these perspectives are often treated in isolation. This separation overlooks the fact that data imperfections are interconnected, and effective multimodal learning requires a unified understanding across architectural design and modality reliability. To address this gap, we present comprehensive taxonomies that cover and integrate three major aspects: (1) architectural design for multimodal learning, (2) learning under missing modalities, and (3) learning under corrupted modalities. By framing these aspects together, our study highlights their interdependencies and facilitates a comprehensive understanding of multimodal learning. Furthermore, we discuss benchmark datasets and real-world applications through this taxonomic lens and outline open challenges and future directions for developing resilient methods.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Parole chiave
	
				Representation learning, Missing modalities, Corrupted modalities, Taxonomy
			
	Appare nelle tipologie:
	
				5.15 Altro

File in questo prodotto:

File	Dimensione	Formato
v1.pdf accesso aperto Descrizione: Multimodal Learning Under Imperfect Data Conditions: A Survey Tipologia: Documento in Pre-print Licenza: Creative commons Dimensione 684.72 kB Formato Adobe PDF Visualizza/Apri	684.72 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/40200

Citazioni

ND

ND

ND

social impact