In this paper, we discuss the implementation of the Deterministic Policy Gradient using the Actor-Critic technique based on linear compatible advantage function approximations in the context of constrained policies. We focus on MPC-based policies, though the discussion is general. We show that in that context, the classic linear compatible advantage function approximation fails to deliver a correct policy gradient due to the exploration becoming distorted by the constraints, and we propose a generalized linear compatible advantage function approximation that corrects the problem. We show that this correction requires an estimation of the mean and covariance of the constrained exploration. The validity of that generalization is formally established and demonstrated on a simple example.
Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies
Zanon M.
2021-01-01
Abstract
In this paper, we discuss the implementation of the Deterministic Policy Gradient using the Actor-Critic technique based on linear compatible advantage function approximations in the context of constrained policies. We focus on MPC-based policies, though the discussion is general. We show that in that context, the classic linear compatible advantage function approximation fails to deliver a correct policy gradient due to the exploration becoming distorted by the constraints, and we propose a generalized linear compatible advantage function approximation that corrects the problem. We show that this correction requires an estimation of the mean and covariance of the constrained exploration. The validity of that generalization is formally established and demonstrated on a simple example.File | Dimensione | Formato | |
---|---|---|---|
Bias_Correction_in_Reinforcement_Learning_via_the_Deterministic_Policy_Gradient_Method_for_MPC-Based_Policies.pdf
non disponibili
Tipologia:
Versione Editoriale (PDF)
Licenza:
Nessuna licenza
Dimensione
557.57 kB
Formato
Adobe PDF
|
557.57 kB | Adobe PDF | Visualizza/Apri Richiedi una copia |
ExploAndPolicyGradient_Deterministic_v2.pdf
accesso aperto
Tipologia:
Documento in Post-print
Licenza:
Creative commons
Dimensione
505.49 kB
Formato
Adobe PDF
|
505.49 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.