Abstract

The Semiotic Machine: Technology and multimodal interaction in context.
Rebekah Wegener

Human interaction is inherently multimodal and if we want to integrate technology into human sense-making processes in a meaningful way, what kinds of theories, models, and methods for studying multimodal interaction do we need? Bateman (2012) points out that “most discussions of multimodal analyses and multimodal meaning-making still proceed without an explicit consideration of just what the ‘mode’ of multimodality is referring to”, which may be because it seems obvious or because development is coming from different perspectives, with different ultimate goals. However, when we want to put multimodality to work in technological development, this becomes problematic. This is particularly true if any attempt is being made at multimodal alignment to form multimodal ensembles: two terms which are themselves understood in very different ways. Here I take up Bateman’s (2012 and 2017) call for clarity on theoretical and methodological issues in multimodality to first give an overview of our work towards an analytical model that separates different concerns, namely the technologically mediated production and reception, the human sensory-motor dispositions and the semiotic representations. In this model, I make the distinction between modality, codality and mediality and situate this with context. To demonstrate the purpose of such a model for representing multimodality and why it is helpful for the machine learning and explicit knowledge representation tasks that we make use of, we draw on the example of CLAra, a multimodal smart listening system that we are building (Cassens and Wegener, 2018). CLAra is an active listening assistant that can automatically extract contextually important information from an interaction using multimodal ensembles (Hansen and Salamon,1990) and a rich model of context. In order to preserve privacy and reduce the need for costly data as much as possible, we utilise priviledged learning techniques, which make use of multiple modality input during training, learn the alignments and rely on the learned association during run-time without access to the full feature set used during learning (Vapnik and Vashist, 2009).Finally, I will demonstrate how the integration of rich theoretical models and access to costly, human annotated data in addition to data that can easily be perceived by machines makes this an example of development following true ‘smart data’ principles, which utilize the strength of good modelling and context to reduce the amount of data that is needed to achieve good results.

References:

  • Bateman, J. A. (2012). The decomposability of semiotic modes. In Multimodal studies (pp. 37-58). Routledge.
  • Bateman, J. A., Wildfeuer, J., and Hiippala, T. (2017). Multimodality: Foundations, Research and Analysis – A Problem-Oriented Introduction. Berlin: Mouton de Gruyter, 2017.
  • Cassens, J., and Wegener, R. (2018). Supporting students through notifications about importance in academic lectures. In Ambient Intelligence: 14th European Conference, AmI 2018, Larnaca, Cyprus, November 12-14, 2018, Proceedings 14 (pp. 227-232). Springer International Publishing.
  • Hansen, L. K., and Salamon, P. (1990). Neural network ensembles. IEEE transactions on pattern analysis and machine intelligence, 12(10), 993-1001.
  • Vapnik, V., and Vashist, A. (2009). A new learning paradigm: Learning using privileged information. Neural networks, 22(5-6), 544-557.

Last modified: Friday, 2023-09-29 22:47 UTC.