Bias Correction in Reinforcement Learning via the Deterministic Policy Gradient Method for MPC-Based Policies