In this paper, we discuss the implementation of the Deterministic Policy Gradient using the Actor-Critic technique based on linear compatible advantage function approximations in the context of constrained policies. We focus on MPC-based policies, though the discussion is general. We show that in that context, the classic linear compatible advantage function approximation fails to deliver a correct policy gradient due to the exploration becoming distorted by the constraints, and we propose a generalized linear compatible advantage function approximation that corrects the problem. We show that this correction requires an estimation of the mean and covariance of the constrained exploration. The validity of that generalization is formally established and demonstrated on a simple example.

Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies

Zanon M.
2021-01-01

Abstract

In this paper, we discuss the implementation of the Deterministic Policy Gradient using the Actor-Critic technique based on linear compatible advantage function approximations in the context of constrained policies. We focus on MPC-based policies, though the discussion is general. We show that in that context, the classic linear compatible advantage function approximation fails to deliver a correct policy gradient due to the exploration becoming distorted by the constraints, and we propose a generalized linear compatible advantage function approximation that corrects the problem. We show that this correction requires an estimation of the mean and covariance of the constrained exploration. The validity of that generalization is formally established and demonstrated on a simple example.
2021
978-1-6654-4197-1
File in questo prodotto:
File Dimensione Formato  
Bias_Correction_in_Reinforcement_Learning_via_the_Deterministic_Policy_Gradient_Method_for_MPC-Based_Policies.pdf

non disponibili

Tipologia: Versione Editoriale (PDF)
Licenza: Nessuna licenza
Dimensione 557.57 kB
Formato Adobe PDF
557.57 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
ExploAndPolicyGradient_Deterministic_v2.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Creative commons
Dimensione 505.49 kB
Formato Adobe PDF
505.49 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/18951
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 5
social impact