Machine learning and deep learning models are increasingly susceptible to adversarial attacks, particularly in critical areas like cybersecurity and Information Disorder. This study provides a comprehensive evaluation of model Robustness against such attacks across key tasks well-assessed in Information Disorder literature: Toxic Speech Detection, Sentiment Analysis, Propaganda Detection, and Hate Speech Detection. Rigorous experiments conducted across 13 models and 12 diverse datasets highlight significant vulnerabilities. The methodological framework implements adversarial attacks that strategically manipulates model inputs based on keyword significance, identified using the LIME method, an advanced explainable AI technique. The evaluation measures Robustness primarily through accuracy of the models and attack success rates. The experiments reveal that current models display inconsistent resistance to adversarial manipulations, underscoring an urgent need for developing more sophisticated defensive strategies. The study sheds light on the critical weaknesses in existing models and charts a course for future research to fortify AI resilience against evolving cyber threats. The findings advocate for a paradigm shift in model training and development to prioritize adversarial Robustness, ensuring that AI systems are equipped to handle real-world adversarial scenarios effectively.

Robustness of models addressing Information Disorder: a comprehensive review and benchmarking study / Fenza, Giuseppe; Loia, Vincenzo; Stanzione, Claudio; Di Gisi, Maria. - In: NEUROCOMPUTING. - ISSN 0925-2312. - 596:(2024). [10.1016/j.neucom.2024.127951]

Robustness of models addressing Information Disorder: a comprehensive review and benchmarking study

Di Gisi Maria
2024

Abstract

Machine learning and deep learning models are increasingly susceptible to adversarial attacks, particularly in critical areas like cybersecurity and Information Disorder. This study provides a comprehensive evaluation of model Robustness against such attacks across key tasks well-assessed in Information Disorder literature: Toxic Speech Detection, Sentiment Analysis, Propaganda Detection, and Hate Speech Detection. Rigorous experiments conducted across 13 models and 12 diverse datasets highlight significant vulnerabilities. The methodological framework implements adversarial attacks that strategically manipulates model inputs based on keyword significance, identified using the LIME method, an advanced explainable AI technique. The evaluation measures Robustness primarily through accuracy of the models and attack success rates. The experiments reveal that current models display inconsistent resistance to adversarial manipulations, underscoring an urgent need for developing more sophisticated defensive strategies. The study sheds light on the critical weaknesses in existing models and charts a course for future research to fortify AI resilience against evolving cyber threats. The findings advocate for a paradigm shift in model training and development to prioritize adversarial Robustness, ensuring that AI systems are equipped to handle real-world adversarial scenarios effectively.
2024
Adversarial attacks
Explainable artificial intelligence
Information Disorder
Robustness
File in questo prodotto:
File Dimensione Formato  
NEUCOM-D-24-01718_R1-92-173.pdf

embargo fino al 30/08/2026

Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 3.2 MB
Formato Adobe PDF
3.2 MB Adobe PDF   Visualizza/Apri   Richiedi una copia
1-s2.0-S0925231224007227-main.pdf

non disponibili

Descrizione: Robustness of Models Addressing Information Disorder: A Comprehensive Review and Benchmarking Study
Tipologia: Versione Editoriale (PDF)
Licenza: Copyright dell'editore
Dimensione 6.66 MB
Formato Adobe PDF
6.66 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/39680
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 9
social impact