In this paper, we present a methodology to implement the stochastic policy gradient method using actor-critic techniques, when the policy is approximated using an MPC scheme. The paper proposes a computationally inexpensive approach to build a stochastic policy generating samples that are guaranteed to be feasible for the MPC constraints. For a continuous input space, imposing hard constraints on the policy poses technical difficulties in the computation of the score function of the policy, required in the policy gradient computation. We propose an approach that solves this issue, and detail how the score function can be computed based on parametric Nonlinear Programming and primal-dual interior point. The approach is illustrated on a simple example.

Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method

Zanon M.
2021-01-01

Abstract

In this paper, we present a methodology to implement the stochastic policy gradient method using actor-critic techniques, when the policy is approximated using an MPC scheme. The paper proposes a computationally inexpensive approach to build a stochastic policy generating samples that are guaranteed to be feasible for the MPC constraints. For a continuous input space, imposing hard constraints on the policy poses technical difficulties in the computation of the score function of the policy, required in the policy gradient computation. We propose an approach that solves this issue, and detail how the score function can be computed based on parametric Nonlinear Programming and primal-dual interior point. The approach is illustrated on a simple example.
2021
978-1-6654-4197-1
File in questo prodotto:
File Dimensione Formato  
Reinforcement_Learning_based_on_MPC_and_the_Stochastic_Policy_Gradient_Method.pdf

non disponibili

Tipologia: Versione Editoriale (PDF)
Licenza: Nessuna licenza
Dimensione 439.17 kB
Formato Adobe PDF
439.17 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
ExploAndPolicyGradient_Stochastic_2.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 377.93 kB
Formato Adobe PDF
377.93 kB Adobe PDF Visualizza/Apri
ExploAndPolicyGradient_Stochastic_2.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 377.93 kB
Formato Adobe PDF
377.93 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/18953
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
social impact