Reinforcement Learning based on MPC and the Stochastic Policy Gradient Method