Reinforcement Learning (RL) is a powerful tool to perform data-driven optimalcontrol without relying on a model of the system. However, RL struggles toprovide hard guarantees on the behavior of the resulting control scheme. Incontrast, Nonlinear Model Predictive Control (NMPC) and Economic NMPC (ENMPC)are standard tools for the closed-loop optimal control of complex systems withconstraints and limitations, and benefit from a rich theory to assess theirclosed-loop behavior. Unfortunately, the performance of (E)NMPC hinges on thequality of the model underlying the control scheme. In this paper, we show thatan (E)NMPC scheme can be tuned to deliver the optimal policy of the real systemeven when using a wrong model. This result also holds for real systems havingstochastic dynamics. This entails that ENMPC can be used as a new type offunction approximator within RL. Furthermore, we investigate our results in thecontext of ENMPC and formally connect them to the concept of dissipativity,which is central for the ENMPC stability. Finally, we detail how these resultscan be used to deploy classic RL tools for tuning (E)NMPC schemes. We applythese tools on both a classical linear MPC setting and a standard nonlinearexample from the ENMPC literature.

Data-driven Economic NMPC using Reinforcement Learning

Mario Zanon
2020-01-01

Abstract

Reinforcement Learning (RL) is a powerful tool to perform data-driven optimalcontrol without relying on a model of the system. However, RL struggles toprovide hard guarantees on the behavior of the resulting control scheme. Incontrast, Nonlinear Model Predictive Control (NMPC) and Economic NMPC (ENMPC)are standard tools for the closed-loop optimal control of complex systems withconstraints and limitations, and benefit from a rich theory to assess theirclosed-loop behavior. Unfortunately, the performance of (E)NMPC hinges on thequality of the model underlying the control scheme. In this paper, we show thatan (E)NMPC scheme can be tuned to deliver the optimal policy of the real systemeven when using a wrong model. This result also holds for real systems havingstochastic dynamics. This entails that ENMPC can be used as a new type offunction approximator within RL. Furthermore, we investigate our results in thecontext of ENMPC and formally connect them to the concept of dissipativity,which is central for the ENMPC stability. Finally, we detail how these resultscan be used to deploy classic RL tools for tuning (E)NMPC schemes. We applythese tools on both a classical linear MPC setting and a standard nonlinearexample from the ENMPC literature.
2020
cs.SY; cs.SY
File in questo prodotto:
File Dimensione Formato  
RLForNMPC_V3.pdf

accesso aperto

Tipologia: Documento in Pre-print
Licenza: Creative commons
Dimensione 977.18 kB
Formato Adobe PDF
977.18 kB Adobe PDF Visualizza/Apri
08701462.pdf

non disponibili

Tipologia: Versione Editoriale (PDF)
Licenza: Nessuna licenza
Dimensione 2.05 MB
Formato Adobe PDF
2.05 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/12559
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 143
social impact