The softness of hard data
by Elizabeth Stokoe, Charles Antaki, Mike Bracher, Leanne Chrisostomou, Elle Henderson, Danielle Jones, and Simon Stewart
In a recent discussion on Twitter, a professor of genetics raised concerns about a study of children and Long Covid, asking “Is there any data in there or is it just a list of anecdotes?” The author of the study responded by saying “It’s qualitative interview research: “111 children covering a range of topics related to effects of the pandemic.” In response, the genetics professor wrote, “Thanks, this sounds like a great piece of science to me. I’ve got three children myself,… I’ll ask each of them to submit a little paragraph …”. Finally, when taken to task by another professor for being “ill-informed about qualitative research”, the genetics professor replied:
“I appreciate this was poorly worded and it doesn’t really reflect my views on the merits of qualitative vs. quantitative science. My point was more whether we should give that much weight to qualitative studies on long-covid in children before we have collected any hard data” (emphasis added).
The bifurcation of research data into ‘hard’ and ‘soft’ — with associated value judgments — is familiar. But what does ‘hard’ mean? How ‘hard’ do data turn out to be, once you start scrutinizing how they are collected? From the initial design of survey questions to the verbal delivery of experimental procedures — how often do we consider the practices that generate research data — ‘hard’ or ‘soft’? And what might such considerations mean for how we use them?
There are no neutral questions
A great deal of research data — quantitative and qualitative — comprises responses to questions. Researchers of all kinds ask people to report things — in words or numbers — about themselves and the world, from medical symptoms and voting intentions to the numbers of birds in their garden. Such reports are generated by questions, and questioning can be done well or badly. For instance, we might think that reports on a survey are more objective than those elicited in an interview, but there is ample evidence to show that they might not be.
The way questions are constructed — each word and their grammar — affords and constrains the responses collected. A striking example can be found in the work of Elizabeth Loftus and colleagues in their studies of eyewitness reports in experimental settings. They demonstrated how the wording of questions about the events being witnessed influenced responses. When police asked witnesses for estimates of the speed at which cars in a collision were travelling, estimates varied according to whether they were asked “About how fast were the cars going when they smashed / collided / bumped / hit / contacted each other?” Not only that, but witnesses who were asked the ‘smashed’ version were more likely to report seeing broken glass then when asked about a collision, even though there was no glass at the scene.
The importance of each word that comprises a question is amplified further as we consider what we choose to ask about in the first place. As the psychologist Rom Harré discussed in his critique of experimental methods:
The use of questionnaires with limited range questions … which effectively preclude elaborations and reinterpretations … means that the concepts deployed are predetermined. The effect of this is to produce not a representation of the social world being studied, but the representation of the shadow cast upon the social world by the prior conceptual apparatus deployed by the person who constructed the questionnaire.
Furthermore, while we tend to think that people ‘just respond’ to any given question — and that survey designers have methods for managing demand characteristics or outright lies — in fact we are pushed and pulled around by words and wording without really being aware. Conversation analysts understand this pushing and pulling by scrutinizing real language in use.
One of the most compelling examples of how question design impacts the data collected — that is, on what happens next — is a conversation analytic study that reveals what happens when just one word in a ‘closed’ or ‘yes/no’ question is changed. John Heritage and colleagues compared the difference in response to questions asking, “is there anything else…?” to those asking, “is there something else…?” While both are ‘interrogative’ formatted questions, the ‘anything’ version is ‘negatively polarized’ — and more likely to get a ‘no’, while the ‘something’ version is positively polarized — and more likely to get a ‘yes’. The latter inquiry frame significantly increased the number of elaborated responses.
In other words, apparently neutral questions are grammatically tilted towards particular outcomes, sometimes very subtly. As Irene Koshik writes, grammatically affirmative questions like ‘did someone call’ expect affirmative answers while grammatically negative questions like ‘didn’t he arrive yet’ seem to expect negative ones.
The softness of carrying out experimental designs
Laboratory experiments are often held up as the gold standard for collecting reliable and valid quantitative datasets in controlled conditions. However, there is little scrutiny of what goes interactionally inside the ‘black box’ of experimentation.
One of the most widely known psychology experiments of the modern era is Stanley Milgram’s work in the 1960s on obedience and, in particular, whether Germans in World War II were particularly obedient to authority figures. The experimental set up involved the research participants delivering mild to life-threatening electric shots to another participant (who was in fact a confederate) under standardized instructions of the experimenter. The most common interpretation of his findings was that, in fact, we are all “likely to follow orders given by an authority figure, even to the extent of killing an innocent human being”.
Milgram’s classic obedience experiments have generated much debate since their publication, but, as Matthew Hollander points out, “very few studies have focused on the concrete, empirical details of what his participants actually said and did”. Both Hollander and Stephen Gibson have since conducted important studies of the interaction between experimenter and participant — both of which have implications for the collection of ‘hard’ data.
First, Hollander’s work showed that Milgram’s categorization of participants only into ‘obedient’ ‘defiant’ did empirical damage to the continuum of practices through which participants resisted to experimenter’s instructions. Yet the much-quoted ‘hard’ data is that “65% (two-thirds) of participants (i.e., teachers) continued to the highest level of 450 volts.”
Second, Gibson’s analysis showed that, rather than deliver each verbal instruction in a replicable way, “participants could draw the experimenter into a process of negotiation over the continuation of the experimental session.” Since the actualities of social interaction — from each gap, um, uh, rephrase, interruption, pause, and so on — are almost unavoidable in human communication, it should not be surprising that fact Gibson found “radical departures from the standardized experimental procedure.”
In another classic study, Robin Wooffitt showed that the way the “experimenter acknowledges the research participants’ utterances may be significant for the trajectory of the experiment.” Even the standardized questions for gaining research consent from participants — both on written forms and in spoken interaction — have been shown to tilt respondents towards a ‘yes’.
And so, having set out the powerful way that language shapes every encounter and variation occurs — even under the strictest experimental conditions — we can start to see the multiple settings and contexts in which our assumptions about the reliable and standardized production of ‘hard’ data are largely untested.
The delivery of ‘standardized’ survey and diagnostic tools
One reason why ‘hard’ data collection is inevitably impacted by its interactional production is that our encounters are shaped by what conversation analysts call ‘recipient design’. Recipient design refers to the way people shape what they say — word for word — for the person they are talking to. This means that we are constantly editing, elaborating, shortening, and amending everything we say for the person we are talking to — often very subtly and without really being aware.
For instance, the image in Figure 2 is a checklist that gets completed by health visitors when visiting parents of new babies, and a transcript of its conversational completion. Rather than hand the form to the parent, the health visitor transforms it into talk. The researcher, John Heritage, points out how each item on the checklist appears as actual conversation. So, for instance, items on the checklist like “Pregnancy normal/abnormal, specify” must be turned into something verbal — in this case, “and you had a normal pregnancy”. The question is designed to get a ‘no-problem’ response (which it does, “yeh”).
Given the extent of research data collected via surveys, it is important to know not only how the written design of each item may subtly constrain or afford responses, but also what happens when they are delivered verbally within social interaction. Douglas Maynard and colleagues have investigated the spoken delivery of many standardized survey tools. Not only do their transcripts (e.g., Figure 3) exemplify the simple actuality of what written questions ‘look like’ when delivered verbally, they show the push and pull between the interviewer and respondent as the ‘codable’ responses are produced.
Sue Wilkinson similarly showed the tension between standardized and instructions and their spoken delivery, and how responses get “transformed into entries on a coding sheet.” Taking the case of an organization’s ethnic monitoring reporting, she examined how official statistics were collected, including the fact that the call-takers ended up categorizing 86% of callers as ‘White European” — even though that category wasn’t even on the list in front of them.
The impact of small deviations from standardized items is consequential in other ways. For example, Charles Antaki and Mark Rapley examined recordings of how a ‘Quality of Life’ diagnostic instrument was administered by psychologists to service-users. The instructions for “reading the items” included paying “close attention to the exact wording.” Each question had three response options. But the psychologists often reformulated three options into one ‘yes/no’ question with a positive tilt. Since treatment and support options were tied to questionnaire scores, the re-doing of questions had important consequences for those being assessed. The researchers concluded that it is hard to “draw conclusions from simple aggregation of recorded responses to this questionnaire, and, perhaps, to any questionnaire using a fixed-response schedule.”
Other researchers have identified similar problems with the verbal delivery of diagnostic instruments. For example, Danielle Jones and colleagues examined the delivery of a test to identify cognitive impairment. They showed that neurologists often deviate from the standardized procedures which are designed to ensure test accuracy and consistency, calling into question the validity of test outcomes. Similarly, analysis of the verbal delivery of the Patient Health Questionnaire (PHQ-9) depression tool revealed how interactional processes unfolded to establish ‘hard’ clinical outcomes.
Even when interviewers are meant to follow strict guidelines enforceable in law, interactional forces can shift them away from what is written on the page. Research has found that, for instance, police officers deviate from what is mandated by the Police and Criminal Evidence Act (1984) when it comes to opening interviews; that the right to remain silent can be undermined by the moral and interactional imperative to respond to questions and fill silence; that questions in evidence-gathering interviews with vulnerable adults and victims deviate from written guidance, and that narrative descriptions of events in the world get transformed and squeezed into some of the ‘hardest’ data categories around — whether someone broke the law or not.
Conclusion — Everything is ‘soft’ when you study ‘the world as it happens’
Conversation analytic research itself can — although this is much debated — be both qualitative and quantitative. For more than 50 years, its cumulative science — across sociology, psychology, linguistics, communication, and anthropology — has examined hundreds of thousands of cases of real conversation across myriad settings. It has shown that and how conversation is systematic (not messy); that there is ‘order at all points’ (even in one single turn at talk), and that much of its core machinery is universal across languages. When analysts have turned their attention to the production of research data itself, they have shown that what we know depends a lot on what we ask, and that standardization is largely assumed.
Our brief review is intended to show the value in understanding how data are actually produced and some of the assumptions baked into collecting ‘hard data’. We have shown that the production of ‘hard’ data — how fast the car was travelling; how many ‘yes’ and ‘no’ responses are reported; the ethnicity of a person; how many people are obedient or defiant; whether one is diagnosed or not — involves potential points of variation with respect to interaction and context that are only visible through collecting and analyzing the ‘soft’ encounters through which they are produced.
Concerns about the generalizability and reproducibility of qualitative research are legitimate. However, the notion that ‘hard’ and ‘soft’ data are clear and distinct categories may not be appropriate. One response to “whether we should give that much weight to qualitative studies on long-covid in children before we have collected any hard data” is to reverse the direction of this question — that is, to ask how much weight should we give quantitative studies that relate to complex real-world processes until we understand something of the ways in which the data were collected?
Although there are well-established practices for ensuring the reliability of quantitative research, they seldom stray into the kind of territory we have presented. One of the benefits of qualitative research is that the data are often raw and non-technical — and thus transparent and easy to inspect. That said, conversation analysts also make the point that interview data are often presented without the initiating question or without showing how the questions and responses were delivered. Choosing what counts as data to collect, code and publish can keep hidden other sources of information that can have serious consequences for the conclusions we draw.
Decades of research in the sociology of scientific knowledge have turned the practices of doing and reporting research into a research topic in its own right, showing how facts and knowledge become disconnected from the ‘construction yard’ in which they are produced. The ‘hard data’ that emerges from the construction yard is shot through with its ‘soft’ production. But that production can be analysed and understood and used constructively to more rigorous and more open research.
Authors
Elizabeth Stokoe (Loughborough University, UK), Charles Antaki (Loughborough University, UK), Mike Bracher (University of Southampton, UK), Leanne Chrisostomou (University of Portsmouth, UK), Elle Henderson (Open Polytechnic, New Zealand), Danielle Jones (University of Bradford, UK), and Simon Stewart (Staffordshire University, UK).