Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method

IRIS

In this paper, we present a methodology to implement the stochastic policy gradient method using actor-critic techniques, when the policy is approximated using an MPC scheme. The paper proposes a computationally inexpensive approach to build a stochastic policy generating samples that are guaranteed to be feasible for the MPC constraints. For a continuous input space, imposing hard constraints on the policy poses technical difficulties in the computation of the score function of the policy, required in the policy gradient computation. We propose an approach that solves this issue, and detail how the score function can be computed based on parametric Nonlinear Programming and primal-dual interior point. The approach is illustrated on a simple example.

Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method

Gros S.;Zanon M.

2021

Abstract

In this paper, we present a methodology to implement the stochastic policy gradient method using actor-critic techniques, when the policy is approximated using an MPC scheme. The paper proposes a computationally inexpensive approach to build a stochastic policy generating samples that are guaranteed to be feasible for the MPC constraints. For a continuous input space, imposing hard constraints on the policy poses technical difficulties in the computation of the score function of the policy, required in the policy gradient computation. We propose an approach that solves this issue, and detail how the score function can be computed based on parametric Nonlinear Programming and primal-dual interior point. The approach is illustrated on a simple example.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Codice ISBN
	
				978-1-6654-4197-1
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Reinforcement_Learning_based_on_MPC_and_the_Stochastic_Policy_Gradient_Method.pdf non disponibili Tipologia: Versione Editoriale (PDF) Licenza: Nessuna licenza Dimensione 439.17 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	439.17 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
ExploAndPolicyGradient_Stochastic_2.pdf accesso aperto Tipologia: Documento in Post-print Licenza: Creative commons Dimensione 377.93 kB Formato Adobe PDF Visualizza/Apri	377.93 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/18953

Citazioni

ND

19

social impact