In this paper, we present a methodology to implement the stochastic policy gradient method using actor-critic techniques, when the policy is approximated using an MPC scheme. The paper proposes a computationally inexpensive approach to build a stochastic policy generating samples that are guaranteed to be feasible for the MPC constraints. For a continuous input space, imposing hard constraints on the policy poses technical difficulties in the computation of the score function of the policy, required in the policy gradient computation. We propose an approach that solves this issue, and detail how the score function can be computed based on parametric Nonlinear Programming and primal-dual interior point. The approach is illustrated on a simple example.
Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method
Zanon M.
2021-01-01
Abstract
In this paper, we present a methodology to implement the stochastic policy gradient method using actor-critic techniques, when the policy is approximated using an MPC scheme. The paper proposes a computationally inexpensive approach to build a stochastic policy generating samples that are guaranteed to be feasible for the MPC constraints. For a continuous input space, imposing hard constraints on the policy poses technical difficulties in the computation of the score function of the policy, required in the policy gradient computation. We propose an approach that solves this issue, and detail how the score function can be computed based on parametric Nonlinear Programming and primal-dual interior point. The approach is illustrated on a simple example.File | Dimensione | Formato | |
---|---|---|---|
Reinforcement_Learning_based_on_MPC_and_the_Stochastic_Policy_Gradient_Method.pdf
non disponibili
Tipologia:
Versione Editoriale (PDF)
Licenza:
Nessuna licenza
Dimensione
439.17 kB
Formato
Adobe PDF
|
439.17 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
ExploAndPolicyGradient_Stochastic_2.pdf
accesso aperto
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
377.93 kB
Formato
Adobe PDF
|
377.93 kB | Adobe PDF | Visualizza/Apri |
ExploAndPolicyGradient_Stochastic_2.pdf
accesso aperto
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
377.93 kB
Formato
Adobe PDF
|
377.93 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.