We propose a policy search method for synthesizing optimal feedback control laws for reference tracking directly from data. During the learning phase, the control law is optimized by using stochastic gradient descent iterations and (optionally) applied to the plant while collecting data. Differently from model-based methods, in which a full model of the open-loop plant is first identified from data, here a simple linear model is recursively identified with forgetting factor for the only reason of computing approximately the gradients required for the descent. We report examples showing that the method recovers the optimal feedback law in case the underlying plant is linear, and outperforms the best control law that is achieved by first identifying an open-loop linear model in case the underlying plant is nonlinear.
|Titolo:||Synthesis of Feedback Controllers from Data via Optimal Policy Search and Stochastic Gradient Descent|
|Data di pubblicazione:||2019|
|Appare nelle tipologie:||4.1 Contributo in Atti di convegno|