Digital videos have become the medium of choice for a growing number of people communicating via Internet and their mobile devices. This has enable the increase of the availability of video data, creating large digital video collections. As the data volume grows in these datasets also increases the interest in efficient systems to retrieve those information. One of the main challenges in developing effective content-based video retrieval systems is to automatically identify semantic contents. For that, four barriers should be considered: (1) multimodal processing, (2) information fusion, (3) semantic learning and (4) query resolution. Numerous techniques have been proposed to overcome such issues. However, most of existing works involve computationally expensive methods. Currently, the development of effective and efficient techniques is an imperative need. In recent years, significant research efforts have been spent by academic and industry communities to make such solutions available to a wide range of devices and platforms. This is the context in which is inserted this research proposal. The goal of this research proposal is to advance the state of the art on semantic retrieval of digital videos. Recently, we introduced in the literature a unimodal video retrieval system designed for low computational power mobile devices. Based on the positive results from its application, we intend to extend the proposed system to take advantage of different data sources, i.e, to use multimodal information, thus improving its effectiveness. For that, we plan to exploit recent solutions on visual computing and machine intelligence aiming at combining different data sources efficiently. Finally, we expect to contribute greatly to the advances in this research field, since the results will be aggregated in a visual development interface, enabling the joint action of those solutions.