This paper presents a decentralized learning algorithm for learning how to coordinate an automated team of actuated parts designed to build several types of structures specified by a user on a plane surface. The algorithm learns from the environment feedback and agent behavior. This problem is defined as a Markov decision process where agents (actuated parts) are modeled as small cube-shaped robots subject to the Bellman’s equation (Q-learning). The Q-learning algorithm considers the communication and conflict resolution models between the agents that lead to the emergence of intelligent global behavior (in a non-stationary stochastic environment). The main contribution of this paper is to propose a self-assembly approach capable of randomly generating the navigation routes of the multiple agents while learning the structure shape according to the hazardous dispersion area that must be isolated in the environment. Simulation trials show the feasibility of merging between the multi-agent coordination proces...