Learning nonlinear feedback controllers from data via optimal policy search and stochastic gradient descent

IRIS

This paper proposes a technique for synthesizing smooth nonlinear controllers by optimal policy search and stochastic gradient descent. After choosing an appropriate parameterization of the control law, mini-batch stochastic gradient descent steps are used to iteratively optimize the parameters of the control law. The gradients of the expected future closed-loop performance required for the descent are approximated by using simple local linear models, as introduced earlier by the authors for optimal policy search of linear feedback controllers. In this way, the method does not require a full nonlinear model of the process. The algorithm can be applied offline, on a previously collected dataset, or online, while controlling the plant itself with the most updated policy. We apply the method in a numerical example in which we solve an output-tracking problem for a Continuously Stirred Tank Reactor (CSTR) using a neural-network parameterization with differentiable activation function of the controller. In the offline setting the performance of the resulting neural controller is compared to the one of a linear feedback controller trained on the same dataset. In the online setting, instead, we show how the learning procedure can be designed, combining on-policy and off-policy learning, to increase safety and improve performance.

Learning nonlinear feedback controllers from data via optimal policy search and stochastic gradient descent

Ferrarotti L.;Bemporad A.

2020

Abstract

This paper proposes a technique for synthesizing smooth nonlinear controllers by optimal policy search and stochastic gradient descent. After choosing an appropriate parameterization of the control law, mini-batch stochastic gradient descent steps are used to iteratively optimize the parameters of the control law. The gradients of the expected future closed-loop performance required for the descent are approximated by using simple local linear models, as introduced earlier by the authors for optimal policy search of linear feedback controllers. In this way, the method does not require a full nonlinear model of the process. The algorithm can be applied offline, on a previously collected dataset, or online, while controlling the plant itself with the most updated policy. We apply the method in a numerical example in which we solve an output-tracking problem for a Continuously Stirred Tank Reactor (CSTR) using a neural-network parameterization with differentiable activation function of the controller. In the offline setting the performance of the resulting neural controller is compared to the one of a linear feedback controller trained on the same dataset. In the online setting, instead, we show how the learning procedure can be designed, combining on-policy and off-policy learning, to increase safety and improve performance.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2020
			
	Codice ISBN
	
				978-1-7281-7447-1
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

File	Dimensione	Formato
Learning_nonlinear_feedback_controllers_from_data_via_optimal_policy_search_and_stochastic_gradient_descent.pdf non disponibili Tipologia: Versione Editoriale (PDF) Licenza: Nessuna licenza Dimensione 1.4 MB Formato Adobe PDF Visualizza/Apri Richiedi una copia	1.4 MB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.11771/20230

Citazioni

ND

0

social impact