Contrastive siamese network for detecting AI-generated text across domains and models

IRIS

The rapid proliferation of large language models (LLMs) has raised growing concerns about distinguishing between human-written and AI-generated text. This work addresses the task of detecting AI-generated content by evaluating the latent similarity between a given input text and an alternative response generated for the same prompt, either known or inferred. Accordingly, CLAID (Contrastive Learning for AI Detection) is proposed as a Siamese Neural Network architecture utilizing BERT encoders and contrastive loss to capture semantic similarity between text pairs. Unlike prior approaches that rely on explicit classification or domain-specific features, our method focuses on modeling pairwise similarity, enabling a flexible and model-agnostic detection framework. To evaluate the generalization capabilities of the system, a comprehensive multi-domain and multi-model benchmark comprising three diverse datasets (i.e., HC3, DAIGT, and OUTFOX), encompassing a wide range of text genres, prompt structures, and generative models, has been constructed. Experimental results show that the proposed model achieves near-perfect classification accuracy across both single-domain and mixed-domain scenarios, demonstrating strong robustness to domain shifts, prompt variability, and authorship ambiguity. The model also exhibits strong data efficiency, attaining high performance with minimal supervision.

Contrastive siamese network for detecting AI-generated text across domains and models / Di Gisi, Maria; Fenza, Giuseppe; Gallo, Mariacristina; Loia, Vincenzo. - In: NEUROCOMPUTING. - ISSN 0925-2312. - 661:(2026). [10.1016/j.neucom.2025.131983]

Contrastive siamese network for detecting AI-generated text across domains and models

Di Gisi Maria;Fenza Giuseppe;Gallo Mariacristina;Loia Vincenzo

2026

Abstract

The rapid proliferation of large language models (LLMs) has raised growing concerns about distinguishing between human-written and AI-generated text. This work addresses the task of detecting AI-generated content by evaluating the latent similarity between a given input text and an alternative response generated for the same prompt, either known or inferred. Accordingly, CLAID (Contrastive Learning for AI Detection) is proposed as a Siamese Neural Network architecture utilizing BERT encoders and contrastive loss to capture semantic similarity between text pairs. Unlike prior approaches that rely on explicit classification or domain-specific features, our method focuses on modeling pairwise similarity, enabling a flexible and model-agnostic detection framework. To evaluate the generalization capabilities of the system, a comprehensive multi-domain and multi-model benchmark comprising three diverse datasets (i.e., HC3, DAIGT, and OUTFOX), encompassing a wide range of text genres, prompt structures, and generative models, has been constructed. Experimental results show that the proposed model achieves near-perfect classification accuracy across both single-domain and mixed-domain scenarios, demonstrating strong robustness to domain shifts, prompt variability, and authorship ambiguity. The model also exhibits strong data efficiency, attaining high performance with minimal supervision.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2026
			
	Rivista
	
				NEUROCOMPUTING
			
	Parole chiave
	
				AI-generated text detection
Contrastive learning
Prompt inversion
Siamese neural networks
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
claid.pdf accesso aperto Descrizione: Contrastive siamese network for detecting AI-generated text across domains and models Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 3.31 MB Formato Adobe PDF Visualizza/Apri	3.31 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/39682

Citazioni

ND

1

ND

social impact