Safe reinforcement learning via projection on a safe set: How to achieve optimality?