On the trade-off between number of examples and precision of supervision in regression

IRIS

We investigate regression problems for which one is given the additional possibility of controlling the conditional variance of the output given the input, by varying the computational time dedicated to supervise each example. For a given upper bound on the total computational time, we optimize the trade-off between the number of examples and their precision, by formulating and solving a suitable optimization problem, based on a large-sample approximation of the output of the ordinary least squares algorithm. Considering a specific functional form for that precision, we prove that there are cases in which “many but bad” examples provide a smaller generalization error than “few but good” ones, but also that the converse can occur, depending on the “returns to scale” of the precision with respect to the computational time assigned to supervise each example. Hence, the results of this study highlight that increasing the size of the dataset is not always beneficial, if one has the possibility to collect a smaller number of more reliable examples.

On the trade-off between number of examples and precision of supervision in regression / Gnecco, G.S., Nutarelli, F.. - 1:(2019). [10.1007/978-3-030-16841-4_1]

On the trade-off between number of examples and precision of supervision in regression

Gnecco Giorgio;Nutarelli Federico

2019

Abstract

We investigate regression problems for which one is given the additional possibility of controlling the conditional variance of the output given the input, by varying the computational time dedicated to supervise each example. For a given upper bound on the total computational time, we optimize the trade-off between the number of examples and their precision, by formulating and solving a suitable optimization problem, based on a large-sample approximation of the output of the ordinary least squares algorithm. Considering a specific functional form for that precision, we prove that there are cases in which “many but bad” examples provide a smaller generalization error than “few but good” ones, but also that the converse can occur, depending on the “returns to scale” of the precision with respect to the computational time assigned to supervise each example. Hence, the results of this study highlight that increasing the size of the dataset is not always beneficial, if one has the possibility to collect a smaller number of more reliable examples.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Codice ISBN
	
				978-3-030-16840-7
			
	Codice OpenAlex
	
				W2940451302
			
	Parole chiave
	
				Ordinary least squares, Large sample approximation, Variance control
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
GneccoINNSBDDL2019.pdf non disponibili Descrizione: On the Trade-Off Between Number of Examples and Precision of Supervision in Regression Tipologia: Versione Editoriale (PDF) Licenza: Copyright dell'editore Dimensione 1.68 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.68 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/12060

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

ND

8

social impact