Ruining Usability Tests

It’s almost impossible to entirely screw up usability tests. No matter how poorly you do them, you’re likely to get some useful information out of them. Nevertheless, if you do them better, you get better results, so it’s useful to keep some common sources of problems in mind. Here are two easy-to-remember concepts you should consider when doing usability tests.

The Clever Hans effect

Wilhelm von Osten was a German math teacher who lived around the turn of the last century. Besides being a math teacher, he was also an amateur horse trainer. Obviously (to him), the two went well together, because he decided to teach math to his horse. Clever Hans, his horse, quickly learned to do a number of complex math calculations - the horse could add, subtract, multiply, divide, do date calculations, and even understand German. It would tap out the answers to any math question with its foot. It could even read and give the correct answers to questions written on a piece of paper.

Of course, when psychologist Oskar Pfungst investigated the horse, he quickly figured out what was happening. The horse didn’t understand German, couldn’t calculate, and couldn’t read. Instead, it responded to involuntary cues in the body language of Wilhelm von Osten, who, in turn, solved the math problems for his horse. Von Osten was completely unaware that he was providing these cues to the horse.

Later experiments by Oskar Pfungst made it clear that the same effect occurs between humans, and furthermore, that it is impossible to suppress these involuntary cues even if you are aware of their existence. In other words, it’s easily possible to involuntarily lead people through a usability test if you’re standing near them. This is why formal usability tests are always done using two-way mirrors. By isolating the test subject from the people conducting the test, any influence can be eliminated.

Now, we are (mostly) not doing statistically valid double-blind studies. We are merely trying to find problems with our user interaction design, and it typically doesn’t really matter overly much whether you influence the test subject; you’ll probably get useful data regardless, and a quick over-the-shoulder usability test is, of course, infinitely more useful than no usability test. However, if you have the time, but don’t have a usability lab, it’s often a good idea to use a recording program like Silverback and simply leave the room to avoid undue influence on the test subject.

This also solves another problem: If you’re standing next to a person who is blundering through your application, it’s sometimes almost impossible to not interfere and explain to them how to do something. And the more nervous you get, the more you upset the test subject, making it even less likely that he’ll figure out what to do. By leaving the room, you spare yourself the agony of holding yourself back from interfering with the test.


Jakob Nielsen points out another common source of problems when doing usability tests: Terminology. If you do task-based usability tests and phrase your tasks in a way that uses words also used by your application, you give your test subjects very obvious hints telling them how they should achieve their tasks. Nielsen writes:

For example, if you want to test whether people can find and use Excel’s «Remove Duplicates» feature, you should not tell them: «You have a list of companies that have previously purchased your product, but some company names appear multiple times. Remove these duplicates.» Given this task wording, users will often scan the UI for a label containing the words «remove» and/or «duplicates.» Thus, you’re not testing whether the label effectively communicates the command’s functionality, nor are you testing the communicative benefits of combining the command name with its corresponding icon and tooltip. You’re simply testing whether users can match up the terms.

It’s better to avoid outlining the specific task at all, and instead point out the problem and ask them to find a solution: «Some of the names in this list appear twice, but you want each name to appear only once. Use Excel to fix this problem.»

Nielsen’s essay contains more information on terminology, and is definitely worth reading.

If you require a short url to link to this article, please use

designed for use cover

But wait, there's more!

Want to read more like this? Buy my book's second edition! Designed for Use: Create Usable Interfaces for Applications and the Web is now available DRM-free directly from The Pragmatic Programmers. Or you can get it on Amazon, where it's also available in Chinese and Japanese.