Reinforcement Learning (RL) is a powerful tool to perform data-driven optimalcontrol without relying on a model of the system. However, RL struggles toprovide hard guarantees on the behavior of the resulting control scheme. Incontrast, Nonlinear Model Predictive Control (NMPC) and Economic NMPC (ENMPC)are standard tools for the closed-loop optimal control of complex systems withconstraints and limitations, and benefit from a rich theory to assess theirclosed-loop behavior. Unfortunately, the performance of (E)NMPC hinges on thequality of the model underlying the control scheme. In this paper, we show thatan (E)NMPC scheme can be tuned to deliver the optimal policy of the real systemeven when using a wrong model. This result also holds for real systems havingstochastic dynamics. This entails that ENMPC can be used as a new type offunction approximator within RL. Furthermore, we investigate our results in thecontext of ENMPC and formally connect them to the concept of dissipativity,which is central for the ENMPC stability. Finally, we detail how these resultscan be used to deploy classic RL tools for tuning (E)NMPC schemes. We applythese tools on both a classical linear MPC setting and a standard nonlinearexample from the ENMPC literature.
Data-driven Economic NMPC using Reinforcement Learning
Mario Zanon
2020-01-01
Abstract
Reinforcement Learning (RL) is a powerful tool to perform data-driven optimalcontrol without relying on a model of the system. However, RL struggles toprovide hard guarantees on the behavior of the resulting control scheme. Incontrast, Nonlinear Model Predictive Control (NMPC) and Economic NMPC (ENMPC)are standard tools for the closed-loop optimal control of complex systems withconstraints and limitations, and benefit from a rich theory to assess theirclosed-loop behavior. Unfortunately, the performance of (E)NMPC hinges on thequality of the model underlying the control scheme. In this paper, we show thatan (E)NMPC scheme can be tuned to deliver the optimal policy of the real systemeven when using a wrong model. This result also holds for real systems havingstochastic dynamics. This entails that ENMPC can be used as a new type offunction approximator within RL. Furthermore, we investigate our results in thecontext of ENMPC and formally connect them to the concept of dissipativity,which is central for the ENMPC stability. Finally, we detail how these resultscan be used to deploy classic RL tools for tuning (E)NMPC schemes. We applythese tools on both a classical linear MPC setting and a standard nonlinearexample from the ENMPC literature.File | Dimensione | Formato | |
---|---|---|---|
RLForNMPC_V3.pdf
accesso aperto
Tipologia:
Documento in Pre-print
Licenza:
Creative commons
Dimensione
977.18 kB
Formato
Adobe PDF
|
977.18 kB | Adobe PDF | Visualizza/Apri |
08701462.pdf
non disponibili
Tipologia:
Versione Editoriale (PDF)
Licenza:
Nessuna licenza
Dimensione
2.05 MB
Formato
Adobe PDF
|
2.05 MB | Adobe PDF | Visualizza/Apri Richiedi una copia |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.