Multimodal systems have been designed to enable a more natural interaction between humans and computers. The system’s process of generating a multimodal presentation is called multimodal fission. This is achieved by selecting some combinations of the available modalities and devices. The aim of this thesis is to create a reusable, extendable and domain-independent framework for MultiModal Fission (MMF) with a special focus on the area of collaborative human-robot interaction. The framework receives a semantic predicate, providing the information to be presented in an abstract form, as input. It generates a plan containing the selected modalities and devices for each part of the output. When performing the modality and device selection, information about the respective user, previously generated output as well as the current interaction context is taken into account. The framework tackles these selections by solving two constraint optimization problems in which the different planning criteria are formulated as constraints that need to be optimized. The modalities which are available in the framework have been classified into several categories according to their functionalities. As an example, one category contains modalities which are able to generate object references. Referring multimodally to objects in the environment is an important ability for a robot when solving tasks collaboratively with a human. Therefore, the framework also addresses the generation of suitable verbal and multimodal references. The usefulness of these references in particular and of the created multimodal output in general has been verified in a user study.
Read More (PDF about 100MB!)