This paper presents a decentralized learning algorithm for learning how to coordinate an automated team of actuated parts designed to build several types of structures specified by a user on a plane surface. The algorithm learns from the environment feedback and agent behavior. This problem is defined as a Markov decision process where agents (actuated parts) are modeled as small cube-shaped robots subject to the Bellman’s equation (Q-learning). The Q-learning algorithm considers the communication and conflict resolution models between the agents that lead to the emergence of intelligent global behavior (in a non-stationary stochastic environment). The main contribution of this paper is to propose a self-assembly approach capable of randomly generating the navigation routes of the multiple agents while learning the structure shape according to the hazardous dispersion area that must be isolated in the environment. Simulation trials show the feasibility of merging between the multi-agent coordination process and anti-collision strategy where different case studies are analysed and discussed.

Authors: Marcos P. B. Magueta, Sérgio R. Barros dos Santos and Fabio A. M. Cappabianco from the Institute of Science and Tecnology of the Federal University of Sao Paulo, Sao Jose dos Campos, SP, Brazil. E-mails:,,

External Author: Sidney N. Givigi from the School of Computing of the Queen’s University, Kingston, Ontario, Canada. E-mail: