This study investigates a Reinforcement Learning (RL) method to derivate control laws of a non-holonimic robot considering the coupling and non-linearty of the system. The controller is online derivated from the interaction between the agent and an unknown environment through a Q-learning based approach. This approach aims to find the best action that maximizes the rewards along attempts to follow a trajectory. Performed experiments might show that the learned controllers were able of efficiently following diverse trajectories considering different speed variations of the robot translation and rotation as well as maximizing the reward amount over iteractions for two distinct learning process configurations.


Authors: Mateus Sousa FrancoSérgio R. Barros dos Santos and Fabio Augusto Faria from the Institute of Science and Tecnology of the Federal University of Sao Paulo, Sao Jose dos Campos, SP, Brazil. E-mails:,, and