Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies

IRIS

In this paper, we discuss the implementation of the Deterministic Policy Gradient using the Actor-Critic technique based on linear compatible advantage function approximations in the context of constrained policies. We focus on MPC-based policies, though the discussion is general. We show that in that context, the classic linear compatible advantage function approximation fails to deliver a correct policy gradient due to the exploration becoming distorted by the constraints, and we propose a generalized linear compatible advantage function approximation that corrects the problem. We show that this correction requires an estimation of the mean and covariance of the constrained exploration. The validity of that generalization is formally established and demonstrated on a simple example.

Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies

Gros S.;Zanon M.

2021

Abstract

In this paper, we discuss the implementation of the Deterministic Policy Gradient using the Actor-Critic technique based on linear compatible advantage function approximations in the context of constrained policies. We focus on MPC-based policies, though the discussion is general. We show that in that context, the classic linear compatible advantage function approximation fails to deliver a correct policy gradient due to the exploration becoming distorted by the constraints, and we propose a generalized linear compatible advantage function approximation that corrects the problem. We show that this correction requires an estimation of the mean and covariance of the constrained exploration. The validity of that generalization is formally established and demonstrated on a simple example.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2021
			
	Codice ISBN
	
				978-1-6654-4197-1
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Bias_Correction_in_Reinforcement_Learning_via_the_Deterministic_Policy_Gradient_Method_for_MPC-Based_Policies.pdf non disponibili Tipologia: Versione Editoriale (PDF) Licenza: Nessuna licenza Dimensione 557.57 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	557.57 kB	Adobe PDF	Visualizza/Apri Richiedi una copia
ExploAndPolicyGradient_Deterministic_v2.pdf accesso aperto Tipologia: Documento in Post-print Licenza: Creative commons Dimensione 505.49 kB Formato Adobe PDF Visualizza/Apri	505.49 kB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/18951

Citazioni

ND

8

social impact