Robustness of models addressing Information Disorder: a comprehensive review and benchmarking study

IRIS

Machine learning and deep learning models are increasingly susceptible to adversarial attacks, particularly in critical areas like cybersecurity and Information Disorder. This study provides a comprehensive evaluation of model Robustness against such attacks across key tasks well-assessed in Information Disorder literature: Toxic Speech Detection, Sentiment Analysis, Propaganda Detection, and Hate Speech Detection. Rigorous experiments conducted across 13 models and 12 diverse datasets highlight significant vulnerabilities. The methodological framework implements adversarial attacks that strategically manipulates model inputs based on keyword significance, identified using the LIME method, an advanced explainable AI technique. The evaluation measures Robustness primarily through accuracy of the models and attack success rates. The experiments reveal that current models display inconsistent resistance to adversarial manipulations, underscoring an urgent need for developing more sophisticated defensive strategies. The study sheds light on the critical weaknesses in existing models and charts a course for future research to fortify AI resilience against evolving cyber threats. The findings advocate for a paradigm shift in model training and development to prioritize adversarial Robustness, ensuring that AI systems are equipped to handle real-world adversarial scenarios effectively.

Robustness of models addressing Information Disorder: a comprehensive review and benchmarking study / Fenza, Giuseppe; Loia, Vincenzo; Stanzione, Claudio; Di Gisi, Maria. - In: NEUROCOMPUTING. - ISSN 0925-2312. - 596:(2024). [10.1016/j.neucom.2024.127951]

Robustness of models addressing Information Disorder: a comprehensive review and benchmarking study

Fenza Giuseppe;Loia Vincenzo;Stanzione Claudio;Di Gisi Maria

2024

Abstract

Machine learning and deep learning models are increasingly susceptible to adversarial attacks, particularly in critical areas like cybersecurity and Information Disorder. This study provides a comprehensive evaluation of model Robustness against such attacks across key tasks well-assessed in Information Disorder literature: Toxic Speech Detection, Sentiment Analysis, Propaganda Detection, and Hate Speech Detection. Rigorous experiments conducted across 13 models and 12 diverse datasets highlight significant vulnerabilities. The methodological framework implements adversarial attacks that strategically manipulates model inputs based on keyword significance, identified using the LIME method, an advanced explainable AI technique. The evaluation measures Robustness primarily through accuracy of the models and attack success rates. The experiments reveal that current models display inconsistent resistance to adversarial manipulations, underscoring an urgent need for developing more sophisticated defensive strategies. The study sheds light on the critical weaknesses in existing models and charts a course for future research to fortify AI resilience against evolving cyber threats. The findings advocate for a paradigm shift in model training and development to prioritize adversarial Robustness, ensuring that AI systems are equipped to handle real-world adversarial scenarios effectively.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Rivista
	
				NEUROCOMPUTING
			
	Parole chiave
	
				Adversarial attacks
Explainable artificial intelligence
Information Disorder
Robustness
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
NEUCOM-D-24-01718_R1-92-173.pdf embargo fino al 30/08/2026 Tipologia: Documento in Post-print Licenza: Creative commons Dimensione 3.2 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	3.2 MB	Adobe PDF	Visualizza/Apri Richiedi una copia
1-s2.0-S0925231224007227-main.pdf non disponibili Descrizione: Robustness of Models Addressing Information Disorder: A Comprehensive Review and Benchmarking Study Tipologia: Versione Editoriale (PDF) Licenza: Copyright dell'editore Dimensione 6.66 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	6.66 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/39680

Citazioni

ND

9

ND

social impact