Variance-Based Exploration for Learning Model Predictive Control

IRIS

The combination of model predictive control (MPC) and learning methods has been gaining increasing attention as a tool to control systems that may be difficult to model. Using MPC as a function approximator in reinforcement learning (RL) is one approach to reduce the reliance on accurate models. RL is dependent on exploration to learn, and currently, simple heuristics based on random perturbations are most common. This paper considers variance-based exploration in RL geared towards using MPC as function approximator. We propose to use a non-probabilistic measure of uncertainty of the value function approximator in value-based RL methods. Uncertainty is measured by a variance estimate based on inverse distance weighting (IDW). The IDW framework is computationally cheap to evaluate and therefore well-suited in an online setting, using already sampled state transitions and rewards. The gradient of the variance estimate is then used to perturb the policy parameters in a direction where the variance of the value function estimate is increasing. The proposed method is verified on two simulation examples, considering both linear and nonlinear system dynamics, and compared to standard exploration methods using random perturbations. © 2013 IEEE.

Variance-Based Exploration for Learning Model Predictive Control

K. Seel;A. Bemporad;S. Gros;J. T. Gravdahl

2023

Abstract

The combination of model predictive control (MPC) and learning methods has been gaining increasing attention as a tool to control systems that may be difficult to model. Using MPC as a function approximator in reinforcement learning (RL) is one approach to reduce the reliance on accurate models. RL is dependent on exploration to learn, and currently, simple heuristics based on random perturbations are most common. This paper considers variance-based exploration in RL geared towards using MPC as function approximator. We propose to use a non-probabilistic measure of uncertainty of the value function approximator in value-based RL methods. Uncertainty is measured by a variance estimate based on inverse distance weighting (IDW). The IDW framework is computationally cheap to evaluate and therefore well-suited in an online setting, using already sampled state transitions and rewards. The gradient of the variance estimate is then used to perturb the policy parameters in a direction where the variance of the value function estimate is increasing. The proposed method is verified on two simulation examples, considering both linear and nonlinear system dynamics, and compared to standard exploration methods using random perturbations. © 2013 IEEE.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Rivista
	
				IEEE ACCESS
			
	Parole chiave
	
				Model-predictive control
Q-learning
Inverse distance weighting
Reinforcement learning
			
	Appare nelle tipologie:
	
				1.1 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
Variance-Based_Exploration_for_Learning_Model_Predictive_Control.pdf accesso aperto Tipologia: Versione Editoriale (PDF) Licenza: Creative commons Dimensione 1.85 MB Formato Adobe PDF Visualizza/Apri	1.85 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/27858

Citazioni

ND

2

social impact