This paper shows that the optimal policy and value functions of a Markov Decision Process (MDP), either discounted or not, can be captured by a finite-horizon undiscounted Optimal Control Problem (OCP), even if based on an inexact model. This can be achieved by selecting a proper stage cost and terminal cost for the OCP. A very useful particular case of OCP is a Model Predictive Control (MPC) scheme where a deterministic (possibly nonlinear) model is used to reduce the computational complexity. This observation leads us to parameterize an MPC scheme fully, including the cost function. In practice, Reinforcement Learning algorithms can then be used to tune the parameterized MPC scheme. We verify the developed theorems analytically in an LQR case and we investigate some other nonlinear examples in simulations.

Equivalence of Optimality Criteria for Markov Decision Process and Model Predictive Control

Mario Zanon;
2024-01-01

Abstract

This paper shows that the optimal policy and value functions of a Markov Decision Process (MDP), either discounted or not, can be captured by a finite-horizon undiscounted Optimal Control Problem (OCP), even if based on an inexact model. This can be achieved by selecting a proper stage cost and terminal cost for the OCP. A very useful particular case of OCP is a Model Predictive Control (MPC) scheme where a deterministic (possibly nonlinear) model is used to reduce the computational complexity. This observation leads us to parameterize an MPC scheme fully, including the cost function. In practice, Reinforcement Learning algorithms can then be used to tune the parameterized MPC scheme. We verify the developed theorems analytically in an LQR case and we investigate some other nonlinear examples in simulations.
2024
Markov Decision Process, Model Predictive Control, Reinforcement Learning, Optimality
File in questo prodotto:
File Dimensione Formato  
EMDP.pdf

accesso aperto

Tipologia: Documento in Post-print
Licenza: Copyright dell'editore
Dimensione 1.08 MB
Formato Adobe PDF
1.08 MB Adobe PDF Visualizza/Apri
Equivalence_of_Optimality_Criteria_for_Markov_Decision_Process_and_Model_Predictive_Control.pdf

non disponibili

Tipologia: Versione Editoriale (PDF)
Licenza: Copyright dell'editore
Dimensione 1.02 MB
Formato Adobe PDF
1.02 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/23638
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
social impact