By Aline Frederico, Catholic University of São Paulo, Brazil

When investigating how learning takes place, observation is a common method, often accompanied by some sort of recording, which is then transcribed. As a result, often the rich and dynamic environment of a classroom, of parents and children reading together, etc., is reduced to a series of verbal entries, a transcript which is then analyzed and coded.  

In my doctoral research, I studied 4-year-olds’ meaning-making processes when reading digital literature with their parents where observation was the central method of data collection. So what could I learn about meaning-making (and learning in general) by looking at these reading events? The answer, I found, can be significantly different depending on the methods employed for recording and analyzing these observations.

Let’s consider three different scenarios of data collection and analysis. In the first, the data is audio-recorded and transcribed, two methods widely practiced in educational research. In this event, the child, her mother and father read together one scene of Nosy Crow’s Little Red Riding Hood app. The app is an interactive narrative, told by a verbal text – written on the screen and voiced out loud – partly animated illustrations, sound effects and a soundtrack:

This example is representative of a significant part of the data. Despite the considerable verbal exchange that took place in less than 30 seconds, there isn’t much about the child’s meaning-making processes. The parents laugh, make suggestions, repeat the words from the text, but the child is silent. 

This transcript implies that at a certain point someone interacted and tickled the wolf, as he starts laughing and asks the reader to stop. But who has interacted? The father seemed to be asking the child to do it, but did she? Did the mother decide to help and “tickled” the wolf herself? Or perhaps the father decided to demonstrate how to do it? The verbal data and its transcript, as it seems, raises more questions than provides answers.

Now let’s consider the data was video-recorded so there are more details to include in the transcript:

This transcript starts portraying a better picture of what shared digital reading looks like in the early years. The significant number of entries in brackets, representing non-verbal utterances, indicates there is a lot that is unsaid, or moments in which words are only one part of the exchange. This transcript is already what we can call a multimodal transcript. It transcribes in written verbal language utterances contained in multiple modes of communication, namely, spoken verbal language, gestures, touch and facial expressions. 

Now, let’s take this one step further, to a format of transcript that I have developed in my research:

This is another, more complex example of multimodal transcription. It is multimodal in two ways: it represents multiple modes of communication through multiple modes of communication, or it represents the multimodal data through the written mode (as both description and direct speech) and the visual mode (photographs – screenshots captured from the video-recorded observations – icons and layout). 

The transcript also uses some conventions of visual narratives like comic strips and combines them with conventions from traditional academic transcripts such as the time stamps. The verbal language at the top is used to story the sequence of events and highlight the central element being represented in that visual frame, helping fill the gaps between one image and the next.

This transcript not only more accurately represents the data, for instance, for an academic publication which aims to discuss this vignette, but provides more and different information from the previous examples of transcription: it indicates for instance how the participants look like (which therefore involves added ethical considerations – as this post is to be published online, the faces of the participants have been blurred), how they are seated, their distance from the tablet, how the app looks like, where exactly the participants have touched the screen, etc.

This short vignette, through this particular form of representation, tells us, for instance, about the affective nature of this reading event (the playful and warm engagement of the mother, tickling the child); the embodied nature of digital reading (numerous interactive gestures and discussion regarding how and when to use the body to interact); the mother’s interest in the child and the child’s interest in the app; the child’s and the father’s different level of proficiency with touchscreen technology. 

That is not to say that the transcript can replace the data or is as “truthful” as the data. Transcripts are always reductions, simplifications, interpretations of the data. This transcript does not include the sound effects nor the soundtrack that suggests Little Red Riding Hood’s empowerment when facing the Wolf. It focuses on the participants and their actions rather on the actions taking place in the app. It also highlights the interactive gestures with icons, in addition to the gesture’s verbal description and the image of the participant performing the gesture. Such emphasis was purposeful, as the transcript was created to represent the data in a written report which discussed the use of interactive gestures in digital reading. 

Underlying this choice of transcript is a multimodal perspective to communication and learning (see a list of essential readings in this field below). This perspective posits communication and learning take place through a variety of semiotic modes in complex intersemiotic relations. Different modes realize different communicative work and verbal language always operates in relationship to other modes, which are also crucial to learning. 

However, traditional methods in educational research are often language-biased, based on a framework that values language in detriment of other modes. The typical forms of delivery of research, print dissertations and articles in which words are considered the main, if not the only form of representation of the research results, contribute to this landscape. As a result, educational research often ignores or considerably diminishes the roles non-linguistic communicative modes play in education, which, as seen above, can impact considerably on what we learn from the data and compromise the research results.

Despite the growth of multimodality in the past two decades as a field of research, a multimodal perspective and multimodal transcription are still rarely practiced in many areas of educational research. So I end with inviting researchers to reflect: How would my results be different if I assumed a multimodal perspective? What aspects of learning and meaning-making are being concealed by my choice of methods? Could I learn something new about my data by transcribing it multimodally?

To know more about multimodality:

Bezemer, J. and Kress, G., 2015. Multimodality, learning and communication. Taylor and Francis.

Jewitt, C., 2014. The Routledge handbook of multimodal analysis. 2nd ed. London; New York: Routledge.

Kress, G., 2010. Multimodality: A social semiotic approach to contemporary communication. New York; London: Routledge.

Kress, G., Ogborn, J., and Martins, I., 1998. A Satellite View of Language: Some Lessons from Science Classrooms. Language Awareness, 7 (2–3), 69–89.

Kress, G. and van Leeuwen, T., 2001. Multimodal discourse: The modes and media of contemporary communication. London: Arnold.

Resources on multimodal transcription:

Bezemer, J. and Mavers, D., 2011. Multimodal transcription as academic practice: A social semiotic perspective. International Journal of Social Research Methodology, 14 (3), 191–206.

Cowan, K., 2014. Multimodal transcription of video: Examining interaction in Early Years classrooms. Classroom Discourse, 5 (1), 6–21.

Mavers, D., 2012. Transcribing video. MODE Working Paper 05/12. Available at

MODE Transcription bank, a database on formats of multimodal transcription:

Aline Frederico (@aline_frederico) is post-doctoral fellow at the Catholic University of São Paulo, after completing her PhD in 2018 in the Faculty of Education, University of Cambridge as a Cambridge Trust Scholar. Aline researches digital children’s literature and digital reading in the early years and has many years of work experience as an editor and designer. To know more, check her personal website or contact her by email.

Posted by:fersacambridge

2 replies on “Multimodality, Transcription and Educational Research: Learning Beyond Verbal Language

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s