Bridging continual learning of motion and self-supervised representations

IRIS

Efficiently learning unsupervised pixel-wise visual representations is crucial for training agents that can perceive their environment without relying on heavy human supervision or abundant annotated data. Motivated by recent work that promotes motion as a key source of information in representation learning, we propose a novel instance of contrastive criterions over time and space. In our architecture, pixel-wise motion field and representations are extracted by neural models, trained from scratch in an integrated fashion. Learning proceeds online over time, exploiting also a momentum-based moving average to update the feature extractor, without replaying any large buffers of past data. Experiments on real-world videos and on a recently introduced benchmark, with photorealistic streams generated from a 3D environment, confirm that the proposed model can learn to estimate motion and jointly develop representations. Our model nicely encodes the variable appearance of the visual information in space and time, significantly overcoming a recent approach and it also compares favourably with convolutional and Transformer-based networks, offline-pre-trained on large collections of supervised and unsupervised images.

Bridging continual learning of motion and self-supervised representations

Tiezzi Marco;Marullo Simone;Betti Alessandro;Casoni Michele;Melacci Stefano

2024

Abstract

Efficiently learning unsupervised pixel-wise visual representations is crucial for training agents that can perceive their environment without relying on heavy human supervision or abundant annotated data. Motivated by recent work that promotes motion as a key source of information in representation learning, we propose a novel instance of contrastive criterions over time and space. In our architecture, pixel-wise motion field and representations are extracted by neural models, trained from scratch in an integrated fashion. Learning proceeds online over time, exploiting also a momentum-based moving average to update the feature extractor, without replaying any large buffers of past data. Experiments on real-world videos and on a recently introduced benchmark, with photorealistic streams generated from a 3D environment, confirm that the proposed model can learn to estimate motion and jointly develop representations. Our model nicely encodes the variable appearance of the visual information in space and time, significantly overcoming a recent approach and it also compares favourably with convolutional and Transformer-based networks, offline-pre-trained on large collections of supervised and unsupervised images.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Codice ISBN
	
				9781643685489
			
	Parole chiave
	
				Adversarial machine learning, Contrastive Learning, Federated learning, Pixels, Semi-supervised learning, Unsupervised learning
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
FAIA-392-FAIA240812.pdf accesso aperto Descrizione: Bridging Continual Learning of Motion and Self-Supervised Representations Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 941.51 kB Formato Adobe PDF Visualizza/Apri	941.51 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/34883

Citazioni

ND

0

social impact