Reinforcement learning for mixed-integer problems based on MPC

IRIS

Model Predictive Control has been recently proposed as policy approximation for Reinforcement Learning, offering a path towards safe and explainable Reinforcement Learning. This approach has been investigated for Q-learning and actor-critic methods, both in the context of nominal Economic MPC and Robust (N)MPC, showing very promising results. In that context, actor-critic methods seem to be the most reliable approach. Many applications include a mixture of continuous and integer inputs, for which the classical actor-critic methods need to be adapted. In this paper, we present a policy approximation based on mixed-integer MPC schemes, and propose a computationally inexpensive technique to generate exploration in the mixed-integer input space that ensures a satisfaction of the constraints. We then propose a simple compatible advantage function approximation for the proposed policy, that allows one to build the gradient of the mixed-integer MPC-based policy.

Reinforcement learning for mixed-integer problems based on MPC

Gros S.;Zanon M.

2020-01-01

Abstract

Model Predictive Control has been recently proposed as policy approximation for Reinforcement Learning, offering a path towards safe and explainable Reinforcement Learning. This approach has been investigated for Q-learning and actor-critic methods, both in the context of nominal Economic MPC and Robust (N)MPC, showing very promising results. In that context, actor-critic methods seem to be the most reliable approach. Many applications include a mixture of continuous and integer inputs, for which the classical actor-critic methods need to be adapted. In this paper, we present a policy approximation based on mixed-integer MPC schemes, and propose a computationally inexpensive technique to generate exploration in the mixed-integer input space that ensures a satisfaction of the constraints. We then propose a simple compatible advantage function approximation for the proposed policy, that allows one to build the gradient of the mixed-integer MPC-based policy.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
			2020
		
	Parole chiave
	
			Actor-critic methods
Deterministic policy gradient
Mixed-Integer Model Predictive Control
Reinforcement learning
Stochastic
		
	Appare nelle tipologie:
	
			4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
1-s2.0-S2405896320315913-main.pdf non disponibili Tipologia: Versione Editoriale (PDF) Licenza: Nessuna licenza Dimensione 457.34 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	457.34 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
MixedIntegerPolicies.pdf accesso aperto Tipologia: Documento in Post-print Licenza: Creative commons Dimensione 456.94 kB Formato Adobe PDF Visualizza/Apri	456.94 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/18947

Citazioni

ND

10

social impact