Is communicating in person the “gold standard”? You’re asking the wrong question

by Elizabeth Stokoe, Pentti Haddington, Jon Hindmarsh, Laura Kohonen-Aho, Tuire Oittinen, Sean Rintel, and Lucas Seuren

Golden light bulbs

Communication has been described as the “most important skill of the 21st century”. Yet myths circulate about what counts as ‘good’ communication, including what counts as the optimal circumstances for communicating. Myths arise partly because research in psychology and related disciplines rarely investigates naturally-occurring and moment-to-moment communication, preferring to explore how people communicate in the laboratory or report their communication retrospectively in surveys. Additionally, the fact that we all communicate means it can be easy to find anecdata from our own lives to support or challenge what someone else says about how communication works — which may be correct for them but not actually generalizable.

One assumption that is often shared by the public and researchers alike is that communication ‘in person’ is the ‘gold standard’ by which all other modes are judged. The current era of research on mediated communication still takes its primary cue from 1970s social psychological research, which judged communication modalities by ‘richness’. For instance, ‘social presence theory’ proposed that in-person communication is best because it involves all five senses in real time without mediation. This makes sense intuitively, and indeed many tasks in mediated modalities are harder or take longer when compared to interacting in person. But when we think about it a little more, it is not that simple. In fact, we can have rich and fulfilling communication experiences in text messaging. Critical help is delivered successfully on the telephone. We can form communities and do everything from telling jokes to getting married in text-based games and channels. In video calling, couples tease one another as part of being intimate, and grandparents facilitate moments of togetherness between migrant parents and their very young children.

Communicating in person can also be highly unsatisfactory. We have all been in a terrible meeting, poor medical consultation, or a dire first date. Numerous books, articles and courses are dedicated to how to chair, participate in, and lead, an effective in-person meeting. Why? A meeting can ‘stink’ not because technology makes it hard for us to communicate, but because people are not trained in how to lead or participate in one — or are just poor communicators. Being co-present does not equate to or guarantee quality, inclusion, equality, satisfaction, productivity, interaction, engagement, or connection.

In this article, we unpack one of the most common myths about the ‘gold standard’: that communication is always impoverished if there is reduced access to ‘body language’.

‘Body language’ is a fundamentally flawed starting point

As a topic, communication suffers from assertions about how it works and the factors that shape its outcomes, including compelling, but not always evidence-based, communication myths. These often circulate in media reports of communication research and sometimes in research itself. Sometimes research evidence is limited to a particular scenario — or is retrospective, experimental, or hypothetical — but the finding is so intuitively plausible that the myth outgrows the original grain of truth. These myths then find themselves underpinning the way we describe communication in our own lives, and the cycle is reinforced.

The most prominent myth in communication research, and the starting point for many articles about online interaction via Zoom and other platforms — and which have escalated during the COVID-19 pandemic — is that communication is mostly ‘body language’ or ‘non-verbal’ and that “virtual meetings reduce our superpower of reading the subtle body language cues of the people in the meeting.” Like many others, this article starts by quoting classic research by Albert Mehrabian, arguing that

“[A]n online meeting reduces our chances of perceiving the micro-expressions and the full range of gestures, posture, movements, which in turn makes it more difficult for us to get the full picture of what the person intended to convey with their message, it also becomes harder for the speaker to get the message across.”

Articles about Zoom and Mehrabian

However, many authors (mis)quote Mehrabian without realizing that, with the sociologist Max Atkinson, he worked to try to stop people misinterpreting the results of his research. In 2002, he wrote:

“I am obviously uncomfortable about misquotes of my work. From the very beginning, I have tried to give people the correct limitations of my findings. Unfortunately, the field of self-styled ‘corporate image consultants’ or ‘leadership consultants’ has numerous practitioners with very little psychological expertise.”

And, as Max Atkinson notes, the idea that communication largely occurs through ‘body language’ is easy to debunk: If our bodies are so crucial to interaction, people across the planet would not need to learn any other languages, radio would not be popular, and we would not be able to talk on the telephone or “have perfectly good conversations in the dark”.

Despite the concerns raised by Mehrabian himself, his study is a great example of how communication myths are constructed and sustained. In “What’s Missing From Zoom Reminds Us What it Means to be Human”, the author begins by complaining that every video conferencing app “has ignored a half-century of research on how people communicate”, including that “researchers have known for at least fifty years that at least half of how we communicate is through non-verbal cues.” What is this research that has been ignored for 50 years? It’s Mehrabian’s study again. Even Zoom themselves misinterpret Mehrabian to say that “Words only contribute 7%” to a conversation.”

Such claims ignore the fact that people have been talking on telephones for over a century. We manage conversational turn-taking effectively in audio-only modalities, at least if the number of participants is low. The turn-taking resources of congenitally blind people are largely the same as those of sighted people. It is also important to note that sign languages are largely excluded from wider public discourse about ‘body language’. Indeed, by focusing on ‘body language’ as something that transmits or leaks ‘subtle cues’ — rather than focusing on the precise way that hand movements, head gestures, gaze, lip positions, body torque, and so on, contribute to what we do, mean, and communicate in interaction — the discussion becomes impoverished and simplistic.

People are resourceful when communicating online

There are many problematic statements made about online communication. For instance, video-calling is said to involve “inevitable miscommunication” such as “those awkward nanoseconds of wondering who’s going to talk next, followed by four people saying “Oops, sorry — no, you — no, you go ahead” at the same time”. While such ‘miscommunication’ problems do occur in video calls, they also occur during in-person encounters — and are completely ordinary. In fact, people are constantly confronted with silence and simultaneous talk, but have ways of managing both. Conversation analytic research shows that, while people routinely ‘misproject’ who should be speaking next, occurrences of overlapping talk are rapidly resolved.

Because video calling lets us see and hear remote people and environments, our expectations are raised: it should be as easy as in-person communication. And, a longstanding assumption in telecommunications engineering is that if the audio and video reach fidelity with in-person communication, the ‘social stuff’ will take care of itself. However, this assumption is based on the idea that the only interactions worth having are in-person ones, and that technology merely restricts our opportunities for conversation and engagement. But people are hugely resourceful, and technology offers opportunities that we deploy in inventive ways.

Of course, there are challenges to overcome in video communication. We generally cannot see as much of everyone’s bodies, environments, documents, and tools. In encounters with more than two parties, it is difficult to see who or what people are looking at, making it harder to use eye-contact to select a next speaker, something which is done commonly in person — although in a large group, this is still no guarantee of success. Hybrid meetings, where some participants are co-present and others are connected via technology, are especially complex as they require us to attend simultaneously to what is happening in both physical and virtual spaces. Having half of a meeting’s participants join remotely while the rest are co-present generates the worst of all worlds because the participants have unevenly distributed or unequal access to the resources used to interact.

Continued improvements to technology will overcome some challenges. But people have already adapted. We are good at disregarding and letting pass odd views of people, video freezes, and distorted audio. We can treat seeing the top of someone’s heads, the sides of their face, or even an icon showing their name or initials, as enough of a representation to have a conversation. We use hand raise functions or type our names in the chat to organise speaker roles. We give thumbs up, wave, or provide overly emphatic facial expressions to communicate, pre-empting the need to use spoken language and take turns for brief contributions.

Video calling, with multiple channels for participation, can also reduce hierarchies and support participation and inclusion — and, for conferences, increase attendance while reducing the carbon footprint. We can use parallel chat to have side conversations but also to share resources — links, images, gifs, and so on, which is more difficult in person. We can turn video and audio on and off to signal different levels of attention, or even hide an angry response. More broadly, video calling already lets us achieve astonishing feats of global and local communication, from maintaining diasporic communities to reducing agitation in care home residents with dementia, and from augmenting close friendships among children to enabling friends and family to virtually visit loved ones during the COVID-19 pandemic.

Good communication depends on good communicators — regardless of modality

When it comes to the activities that comprise any encounter — online, on the telephone, in writing, or in person — our research as conversation analysts, and that of those with whom we work and on whose shoulders we stand, shows that we accomplish much the same things regardless of modality, across settings from intimate relationships, through government and enterprise, to healthcare. In other words, we do the same actions of greeting, closing, requesting, offering, complaining, agreeing, questioning, and answering (and more) in all languages and in all modalities. As Harvey Sacks, one of the founders of conversation analysis, stated many decades ago:

“…it’s the source of the failure of technocratic dreams that if only we introduced some fantastic new communication machine, the world will be transformed. Where what happens is that the object is made at home in the world that has whatever organization it already has.”

The richness of a communication technology should not be judged on its visual fidelity to in-person communication, but rather how adaptable it is to our needs. For example, a video-call may allow you to see and hear another person, but it requires synchronous focused attention for a specific period, the ability to speak and be heard, and for each participant to face a camera. Text messaging, on the other hand, can be used anytime and anywhere, with any combination of text, emojis, images, gifs, audio clips, and video clips sent asynchronously when each person has time. While it is sensually ‘leaner’, many people rely on messaging to keep in touch throughout the day, without the effort of video-calling. Furthermore, online love — and online hate for that matter — can be felt as deeply as if they occur in person, as any social media feed attests. The issue is what we do with our words rather than the modality in which they occur.

So, when it comes to answering the question of whether communicating in person is the ‘gold standard’, we argue that the question itself is misguided, for these reasons:

1. ‘Gold standard’ for … what? Which modality is best depends on what we are trying to accomplish in our interactions; that is, the purpose of the encounter and its constituent actions. Some things are better fitted to doing in person, other things are not.

2. Good communication depends not on the modality or technology but on the communication skills of the people using it. We noted earlier that being in person is no guarantee of a high-quality interaction. Communication succeeds when everyone knows why they are talking and where there is parity of opportunity to participate.

3. It depends on how you ask the question. Asking people to report on their interactions — recall them, reflect on them, and so on — doesn’t tell us about the real-time organization of talk. Surveys, interviews, and retrospective designs more generally fall foul of the error made across psychological science since its modern inception: that people have “little or no direct introspective access to higher order cognitive processes… their reports are based on a priori, implicit causal theories, or judgments.” We need more studies to compare how interactions happen in person and online if we are to say anything meaningful about the differences.

4. As well as how you ask the question, it depends when you ask it — and on matters of choice and agency. Choosing to video call is different to being forced to video call — especially without planning or training. A video call in which people have actively decided to be more intentional about their collaboration will be more effective than yet another pointless meeting in which the organizer hopes that just getting people into a room is enough to achieve anything.

In her review of research comparing online to in-person support, Aleks Krotoski concludes that “A mediating machine doesn’t mean the death of humanity. It just makes the relationship more, well, complicated. Every contrary finding gives us a glimpse at something that is obvious when you think about it.

For us, video communication is not an ‘all or nothing’. It does not need to work perfectly for everyone and for every aspect of our lives for it to have value. During the pandemic, many of us have discovered that we can use video in ways that we had never previously imagined. When we are able to choose fully our modality again — telephone, text/SMS, video, or in-person — we will hopefully look past the myths to what works best and what optimally supports our communication goals.

Authors: Pentti Haddington (Professor of English and interaction, University of Oulu), Jon Hindmarsh (Professor of Work and Interaction, King’s College London), Laura Kohonen-Aho (Postdoctoral Researcher, University of Helsinki), Tuire Oittinen (Postdoctoral Researcher, University of Oulu), Sean Rintel (Principal Researcher, Microsoft Research), Lucas Seuren (Health Services Researcher, University of Oxford), and Elizabeth Stokoe (Professor of Social Interaction, Loughborough University).

Professor of Social Interaction at Loughborough University, specializes in conversation analysis, communication training, & science communication.