This paper proposes a technique for synthesizing smooth nonlinear controllers by optimal policy search and stochastic gradient descent. After choosing an appropriate parameterization of the control law, mini-batch stochastic gradient descent steps are used to iteratively optimize the parameters of the control law. The gradients of the expected future closed-loop performance required for the descent are approximated by using simple local linear models, as introduced earlier by the authors for optimal policy search of linear feedback controllers. In this way, the method does not require a full nonlinear model of the process. The algorithm can be applied offline, on a previously collected dataset, or online, while controlling the plant itself with the most updated policy. We apply the method in a numerical example in which we solve an output-tracking problem for a Continuously Stirred Tank Reactor (CSTR) using a neural-network parameterization with differentiable activation function of the controller. In the offline setting the performance of the resulting neural controller is compared to the one of a linear feedback controller trained on the same dataset. In the online setting, instead, we show how the learning procedure can be designed, combining on-policy and off-policy learning, to increase safety and improve performance.

Learning nonlinear feedback controllers from data via optimal policy search and stochastic gradient descent

Ferrarotti L.;Bemporad A.
2020-01-01

Abstract

This paper proposes a technique for synthesizing smooth nonlinear controllers by optimal policy search and stochastic gradient descent. After choosing an appropriate parameterization of the control law, mini-batch stochastic gradient descent steps are used to iteratively optimize the parameters of the control law. The gradients of the expected future closed-loop performance required for the descent are approximated by using simple local linear models, as introduced earlier by the authors for optimal policy search of linear feedback controllers. In this way, the method does not require a full nonlinear model of the process. The algorithm can be applied offline, on a previously collected dataset, or online, while controlling the plant itself with the most updated policy. We apply the method in a numerical example in which we solve an output-tracking problem for a Continuously Stirred Tank Reactor (CSTR) using a neural-network parameterization with differentiable activation function of the controller. In the offline setting the performance of the resulting neural controller is compared to the one of a linear feedback controller trained on the same dataset. In the online setting, instead, we show how the learning procedure can be designed, combining on-policy and off-policy learning, to increase safety and improve performance.
2020
978-1-7281-7447-1
File in questo prodotto:
File Dimensione Formato  
Learning_nonlinear_feedback_controllers_from_data_via_optimal_policy_search_and_stochastic_gradient_descent.pdf

non disponibili

Tipologia: Versione Editoriale (PDF)
Licenza: Nessuna licenza
Dimensione 1.4 MB
Formato Adobe PDF
1.4 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/20230
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
social impact