We investigate linear regression problems for which one is given the additional possibility of controlling the conditional variance of the output given the input, by varying the computational time dedicated to supervise each example. For a given upper bound on the total computational time for supervision, we optimize the trade-off between the number of examples and their precision (the reciprocal of the conditional variance of the output), by formulating and solving suitable optimization problems, based on large-sample approximations of the outputs of the classical ordinary least squares and weighted least squares regression algorithms. Considering a specific functional form for that precision, we prove that there are cases in which “many but bad” examples provide a smaller generalization error than “few but good” ones, but also that the converse can occur, depending on the “returns to scale” of the precision with respect to the computational time assigned to supervise each example. Hence, the results of this study highlight that increasing the size of the dataset is not always beneficial, if one has the possibility to collect a smaller number of more reliable examples. We conclude presenting numerical results validating the theory, and discussing extensions of the proposed framework to other optimization problems.
On the trade-off between number of examples and precision of supervision in machine learning problems / Gnecco, Giorgio Stefano; Nutarelli, Federico. - In: OPTIMIZATION LETTERS. - ISSN 1862-4480. - 15:(2021), pp. 1711-1733. [10.1007/s11590-019-01486-x]
On the trade-off between number of examples and precision of supervision in machine learning problems
Gnecco Giorgio
;Nutarelli Federico
2021
Abstract
We investigate linear regression problems for which one is given the additional possibility of controlling the conditional variance of the output given the input, by varying the computational time dedicated to supervise each example. For a given upper bound on the total computational time for supervision, we optimize the trade-off between the number of examples and their precision (the reciprocal of the conditional variance of the output), by formulating and solving suitable optimization problems, based on large-sample approximations of the outputs of the classical ordinary least squares and weighted least squares regression algorithms. Considering a specific functional form for that precision, we prove that there are cases in which “many but bad” examples provide a smaller generalization error than “few but good” ones, but also that the converse can occur, depending on the “returns to scale” of the precision with respect to the computational time assigned to supervise each example. Hence, the results of this study highlight that increasing the size of the dataset is not always beneficial, if one has the possibility to collect a smaller number of more reliable examples. We conclude presenting numerical results validating the theory, and discussing extensions of the proposed framework to other optimization problems.| File | Dimensione | Formato | |
|---|---|---|---|
|
Gnecco-Nutarelli2019_Article_OnTheTrade-offBetweenNumberOfE.pdf
non disponibili
Tipologia:
Versione Editoriale (PDF)
Licenza:
Copyright dell'editore
Dimensione
561.39 kB
Formato
Adobe PDF
|
561.39 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

