On the trade-off between number of examples and precision of supervision in machine learning problems

IRIS

We investigate linear regression problems for which one is given the additional possibility of controlling the conditional variance of the output given the input, by varying the computational time dedicated to supervise each example. For a given upper bound on the total computational time for supervision, we optimize the trade-off between the number of examples and their precision (the reciprocal of the conditional variance of the output), by formulating and solving suitable optimization problems, based on large-sample approximations of the outputs of the classical ordinary least squares and weighted least squares regression algorithms. Considering a specific functional form for that precision, we prove that there are cases in which “many but bad” examples provide a smaller generalization error than “few but good” ones, but also that the converse can occur, depending on the “returns to scale” of the precision with respect to the computational time assigned to supervise each example. Hence, the results of this study highlight that increasing the size of the dataset is not always beneficial, if one has the possibility to collect a smaller number of more reliable examples. We conclude presenting numerical results validating the theory, and discussing extensions of the proposed framework to other optimization problems.

On the trade-off between number of examples and precision of supervision in machine learning problems / Gnecco, G.S., Nutarelli, F.. - In: OPTIMIZATION LETTERS. - ISSN 1862-4480. - 15:(2021), pp. 1711-1733. [10.1007/s11590-019-01486-x]

On the trade-off between number of examples and precision of supervision in machine learning problems

Gnecco Giorgio;Nutarelli Federico

2021

Abstract

We investigate linear regression problems for which one is given the additional possibility of controlling the conditional variance of the output given the input, by varying the computational time dedicated to supervise each example. For a given upper bound on the total computational time for supervision, we optimize the trade-off between the number of examples and their precision (the reciprocal of the conditional variance of the output), by formulating and solving suitable optimization problems, based on large-sample approximations of the outputs of the classical ordinary least squares and weighted least squares regression algorithms. Considering a specific functional form for that precision, we prove that there are cases in which “many but bad” examples provide a smaller generalization error than “few but good” ones, but also that the converse can occur, depending on the “returns to scale” of the precision with respect to the computational time assigned to supervise each example. Hence, the results of this study highlight that increasing the size of the dataset is not always beneficial, if one has the possibility to collect a smaller number of more reliable examples. We conclude presenting numerical results validating the theory, and discussing extensions of the proposed framework to other optimization problems.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Rivista
	
				OPTIMIZATION LETTERS
			
	Codice OpenAlex
	
				W2978657651
			
	Parole chiave
	
				Optimal supervision time, Linear regression, Variance control, Ordinary least squares, Large-sample approximation
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Gnecco-Nutarelli2019_Article_OnTheTrade-offBetweenNumberOfE.pdf non disponibili Tipologia: Versione Editoriale (PDF) Licenza: Copyright dell'editore Dimensione 561.39 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	561.39 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/13059

Citazioni

ND

13

15

social impact