What's the evidence? A penny for your thoughts
A penny for your thoughts
Photo by mbpogue
Do you speak to yourself? Mutter under your breath, cuss, and swear even? You do? I wonder if you’d make a good candidate for think-aloud research. I’m thinking of the think-aloud protocol, where usability researchers ask you to speak your thoughts.
Usability professionals often use the method to identify usability issues or test a product or website. They ask users to say what they’re thinking as they carry out a task, to find out what participants think, why they’re doing something, what they meant to do but couldn’t, and even how they felt.
A 2012 international survey of usability professionals found seven in ten of them usually or always used the method (Figure 1; McDonald, Edwards, & Zhao, 2012).
Figure 1: The think-aloud protocol is popular with usability professionals.
So how good is it really? Let’s discuss how:
- it adds value (what people say reveals more than eye-tracking can)
- there are concerns about it
- it changes behaviour
- it’s more demanding on participants
- unmoderated tests work.
What people say reveals more than eye-tracking can
When technical communicator Cooke (2010) tracked 10 participants’ eye movements, she found that verbalizations matched 80 percent of the time.
Elling, Lentz, and de Jong (2012) thought: if there was such a match and participants were only silent 16 percent of the time, why do we need them to talk? Let’s just track their eye movements. Their reservations included:
- Some verbalisations were actually reading. However, just because someone’s reading something doesn’t mean they understand what they’re reading.
- Verbalisations may be considered inaccurate because they do not correspond to eye movements, but research shows users process information faster than they can verbalize.
- Eye movements don’t say it all. For example, they won’t reveal expectations, ‘comments on missing information, expressions of doubt and confusion, and thoughts like “I feel foolish working at this site”’ (p. 209).
They sent 60 participants to municipal websites to search for information, for example, on a subsidy for first-time homebuyers. They found:
- Forty percent of verbalizations matched eye-tracking observations (half what Cooke found).
- Participants were silent 27 percent of the time (much more than Cooke’s study).
- Verbalisations were vastly different from Cooke’s study. In Cooke’s study, most verbalizations were reading. In their study, most verbalizations were observations such as ‘There’s a lot of links to choose from here’, and ‘I am not sure where to go from here’ (p. 214; Figure 2).
Figure 2: A different research design meant participants made more observations and read less.
This suggests 60 percent of verbalizations added value, which the researchers attributed to better experimental design, especially the choice of participants, website, and tasks.
Silences aren’t a concern, by the way. They often occur because participants are so occupied with their tasks that they have ‘no cognitive energy left to describe what goes on in their minds . . . . not all cognitive activity, such as scanning website pages, can be easily verbalized’ (p. 217).
Concerns about the think-aloud protocol
So maybe the protocol is useful, though concerns remain (Alhadreti & Mayhew, 2017; Alshammari, Alhadreti & Mayhew, 2015; McDonald, McGarry, & Willis, 2013). These are the main ones I found:
- Thinking aloud may change participants’ thoughts, actions, and performance, since they have to process their thoughts more than usual. Some scholars argue thinking aloud is unnatural, and may improve or hinder task performance, threaten test validity, point to more or fewer problems, and so on.
- Verbalisations can be mainly reading or about users’ actions and procedures, instead of explanations. That’s not terribly insightful.
- The classic think-aloud protocol requires evaluators to listen passively, something that can be uncomfortable. So some evaluators interact with participants, but this could affect performance and verbalization.
To search for solutions and explore new opportunities for the protocol, researchers have:
- explicitly told participants to explain their thoughts
- had participants explain their thoughts retrospectively
- varied their interaction with participants
- looked into unmoderated tests.
Photo by Max Pixel
Explicitly telling participants to explain their thoughts
Applied scientists McDonald, McGarry, and Willis (2013) gave 20 participants ten tasks on a web-based encyclopaedia:
- One group received the classic think-aloud instruction: ‘I want you to say out loud everything that you say to yourself silently. Just act as if you are alone in the room speaking to yourself’.
- The other group was explicitly told to explain their thinking: ‘I want you to think-aloud, [and] explain your link choices as you complete each task’.
- No difference in the overall no. of verbalisations — but when things got hard, the explicit group talked more.
- No difference in performance on easy tasks — but when things got hard, the explicit group performed better and clicked fewer links to navigate.
This suggests explaining things aloud changes behaviour and strategy.
Having participants explain their thoughts retrospectively
Another approach is to ask for thoughts retrospectively. However, computer scientists Alshammari, Alhadreti, and Mayhew (2015) found that those who think aloud as they go (concurrently) raise more usability problems and more unique ones too, perhaps because they can report problems in real time.
What if we gave retrospective participants a memory jogger? Researchers Beach and Willows (2017) gave an innovative, virtual twist to retrospective thinking aloud. They had 45 elementary school teachers use a professional development website, splitting them into three groups:
- Concurrent: thinking aloud while performing tasks.
- Retrospective: thinking aloud after completing the tasks, without memory aids.
- Virtual revisit: thinking aloud after completing the tasks, while watching a screen recording of their session.
- Concurrent participants made way more comments. (By the way, ‘comments’ is my word. The authors called them ‘thought units’: comments with a pause and a different idea before and after.)
- Retrospective and virtual revisit participants’ comments were more complex (Figure 3). Their comments involved more processing and long-term memory, such as comments about reasoning and planning. (Less complex comments are more mechanical, like those about procedures.)
- Retrospective participants tended not to comment on their online actions or navigation.
Figure 3: Those thinking aloud after completing their tasks made nearly twice as many complex comments since they didn’t have competing mental demands. An example of a complex idea is: ‘I know that there is a lot that I can do with this in terms of reading and writing and oral and drama and themes like social justice and history and so on’ (Beach and Willows, 2017; p. 76).
- Concurrent thinking aloud made more demands on participants’ thought processes, so their comments were less complex.
- Retrospective and virtual revisit participants had lighter demands on their thought processes when thinking aloud, so they had energy for more complex ideas.
- Screen recordings were helpful, since the virtual revisit group not only made more complex comments, they could also explain their navigation.
Varying interaction with participants
Could how evaluators interact with participants affect performance and results? Perhaps their ‘tones of voice, attitude, and friendliness’ could affect what participants say (cited in Alhadreti and Mayhew, 2017). It turns out this isn’t so.
THREE THINK-ALOUD METHODS FIND SIMILAR USABILITY PROBLEMS
Computer scientists Alhadreti and Mayhew (2017) tested the Durham University library website on 60 students. Evaluators varied their behaviour:
- Traditional condition: they said ‘please keep talking’ if participants were quiet for 15 seconds.
- Active intervention condition: they asked participants to explain and give more details.
- Speech communication condition: they acknowledged participants’ comments with ‘Mm hmm’ and asking ‘And now…?’ if participants were quiet for 15 seconds.
- All three groups encountered similar usability problems (Figure 4), and of similar severity.
- Those in the active intervention group were least successful with their tasks and took longest. It was the most expensive group too, since the session and analysis took longest. Despite that, they only detected marginally more usability problems.
- Satisfaction with the website was similar, though the active intervention group considered the evaluator’s presence ‘disturbing’.
This suggests there’s little difference between the groups for uncovering usability problems.
Figure 4: Not only did the groups find a similar number of usability problems, most problems were common to all the groups too
THINK-ALOUD METHODS WORK WELL, AND WORK BETTER THAN SILENCE
Bruun & Stage (2015), also computer scientists, ran a more elaborate experiment on a Danish stats website. They had 43 university staff and students in four conditions:
- Traditional: evaluators only spoke to prompt participants to talk if they had been quiet.
- Active listening: evaluators gave feedback or acknowledgement, such as ‘Um-humm’.
- Coaching: evaluators encouraged, sympathised, gave feedback, and even prompted participants with questions like ‘what options do you have?’
- Silent: evaluators only introduced the experiment and tasks. Participants were specifically asked not to think aloud.
- Task completion and time taken were similar across the four conditions.
- The think-aloud conditions found a similar number of usability problems (double the silent condition; Figure 5), with similar severity (critical, serious, or cosmetic).
- Those in the traditional condition raised more unique problems.
- Those in the coaching condition raised more types of problem.
- Those in the coaching and traditional conditions were more satisfied with the website.
Figure 5: This study suggests evaluators should do more than just introduce a study and tasks to participants. Some interaction, even just prompting participants to speak when they have been quiet, makes all the difference to finding usability problems.
This suggests the think-aloud protocols had ‘limited influence on user performance and satisfaction’, and were significantly more successful than silence. Indeed, the authors conclude that ‘no single [think-aloud] protocol version is superior overall; each has strengths and weaknesses’ (p. 17).
Looking into unmoderated tests
In our increasingly online world, researchers conduct usability tests online too. Participants do these anytime and anywhere they like, without an evaluator in sight. Do unmoderated think-aloud sessions work?
Researchers Hertzum, Borlund, and Kristoffersen (2015) sent 14 participants to a music news site, splitting them into two groups:
- Moderated condition: evaluators probed when participants fell silent for some time, became visibly surprised without verbalizing why, or had completed a task.
- Unmoderated condition: participants were told to record the session and think aloud.
- Both groups made a similar number and type of comments. Moderated participants also acknowledged the evaluator.
- Unmoderated participants made proportionately more comments that clearly helped identify usability issues (high-relevance comments; Figure 6). However, they didn’t identify more problems. The authors think there was ‘more duplication [which] constituted a stronger set of evidence for the same usability issues’ (p. 14).
Figure 6: Unmoderated participants made more comments that were highly relevant to usability: comments that decisively identified usability issues. For example: ‘Well, what I really want to do now is just to go to Google and search because this is, eh’ (Hertzum, Borlund, & Kristoffersen, 2015; p. 10).
This suggests usability professionals should consider using unmoderated tests, at least to supplement moderated tests.
So what should we think?
These studies suggest it’s a good idea to get participants to think aloud, though explicitly telling them to explain their behaviour could change it. One way around it might be to record what they do and get them to think aloud retrospectively, while watching their screen recordings.
Don’t keep completely silent during the session, or ask them to; you might only find half the number of usability problems you otherwise would.
Finally, consider unmoderated tests using a recorded think-aloud method. Not only are they economical, you might find more highly relevant comments.
Alhadreti, O., & Mayhew, P. (2017). To intervene or not to intervene: an investigation of three think-aloud protocols in usability testing. Journal of Usability Studies, 12(3), 111-132. Retrieved from https://ueaeprints.uea.ac.uk/64914/1/Accepted_manuscript.pdf
Alshammari, T., Alhadreti, O., & Mayhew, P. (2015). When to ask participants to think aloud: A comparative study of concurrent and retrospective think-aloud methods. International Journal of Human Computer Interaction, 6(3), 48-64. Retrieved from https://ueaeprints.uea.ac.uk/57466/1/IJHCI_118.pdf
Beach, P., & Willows, D. (2017). Understanding teachers' cognitive processes during online professional learning: A methodological comparison. Online Learning, 21(1), 60-84. Retrieved from https://files.eric.ed.gov/fulltext/EJ1140245.pdf
Bruun, A., & Stage, J. (2015, September). An empirical study of the effects of three think-aloud protocols on identification of usability problems. In IFIP Conference on Human-Computer Interaction (pp. 159-176). Springer, Cham. Retrieved from https://hal.inria.fr/hal-01599881/document
Cooke, L. (2010). Assessing concurrent think-aloud protocol as a usability test method: A technical communication approach. IEEE Transactions on Professional Communication , 53(3), 202-215. Retrieved from IEEE database.
Elling, S., Lentz, L., & De Jong, M. (2012). Combining concurrent think-aloud protocols and eye-tracking observations: An analysis of verbalizations and silences. IEEE Transactions on Professional Communication, 55(3), 206-220. Retrieved from IEEE database.
Hertzum, M., Borlund, P., & Kristoffersen, K. B. (2015). What do thinking-aloud those say? A comparison of moderated and unmoderated usability sessions. International Journal of Human-Computer Interaction , 31(9), 557-570. Retrieved from https://www.researchgate.net/profile/Morten_Hertzum/publication/281369948_What_Do_Thinking-Aloud_Those_Say_A_Comparison_of_Moderated_and_Unmoderated_Usability_Sessions/links/56353f4e08aeb786b702c4cf.pdf
McDonald, S., Edwards, H. M., & Zhao, T. (2012). Exploring think-alouds in usability testing: An international survey. IEEE Transactions on Professional Communication , 55(1), 2-19. Retrieved from IEEE database.
McDonald, S., McGarry, K., & Willis, L. M. (2013). Thinking-aloud about web navigation: The relationship between think-aloud instructions, task difficulty and performance. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting: Vol. 57, No. 1 (pp. 2037-2041). Sage CA: Los Angeles, CA: SAGE Publications. Retrieved from https://fas-web.sunderland.ac.uk/~cs0kmc/mcdonld-mcgarry-willis%20full.pdf