A List Apart has just published an essay on usability testing titled The Myth of Usability Testing. Based on a number of comparative usability tests, Robert Hoekman, Jr. comes to the conclusion that usability tests are not helpful when the goal is to identify problems in a user interface.
His conclusion is based on the fact that comparative usability tests often show that different teams doing usability tests find different problems. Worse, some teams may find only a few of the critical problems. About a comparative usability test done in 2003, Hoekman writes:
Collectively, the teams reported 340 usability problems. However, only nine of these problems were reported by more than half of the teams. And a total of 205 problems—60% of all the findings reported—were identified only once. Of the 340 usability problems identified, 61 problems were classifed as “serious” or “critical” problems.
Based on that, Hoekman then writes:
For the Hotmail team to have identified all of the “serious” usability problems discovered in the evaluation process, it would have to have hired all nine usability teams. In CUE-4, to spot all 61 serious problems, the Hotel Penn team would have to have hired all 17 usability teams. Seventeen!
Finally, he concludes that «usability testing fails wholly to do what many people think is its most pertinent and relevant purpose—to identify problems and point a team in the right direction».
Hoekman makes a number of great points on how to improve usability test, and how usability tests can have other advantages. However, his final conclusion ignores exactly why some of the teams in the comparative usability tests performed so poorly. They performed poorly not because usability testing can’t «identify problems and point a team in the right direction», but because the teams that performed poorly did not test properly.
Some of the teams found few serious problems because they didn’t test properly
The goal of the comparative usability tests mentioned in Hoekman’s essay was to identify a set of standards and best practices for usability tests. This means that many teams performing the tests did not employ best practices.
Talking about a different set of tests, Hoekman points out that poor testing yields poor results:
In one recent case, the project goal was to improve usability for a site’s new users. A card-sorting session—a perfectly appropriate discovery method for planning information architecture changes—revealed that the existing, less-than-ideal terminology used throughout the site should be retained. This happened because the team ran the card-sort with existing site users instead of the new users it aimed to entice.
In another case, a team charged with improving the usability of a web application clearly in need of an overhaul ran usability tests to identify major problems. In the end, they determined that the rather poorly-designed existing task flows should not only be kept, but featured. This team, too, ran its tests with existing users, who had—as one might guess—become quite proficient at navigating the inadequate interaction model.
Basically, terrible testing yields terrible results, and since the goal of the comparative usability tests was to find best practices, some teams in those tests did not follow best practices and thus did not get good results.
In other words, the fact that they did not get good results is not an inherent problem with usability tests; it’s a direct result of them doing a poor job.
Usability testing is an iterative process
Another reason why the poor performance of some of these teams doesn’t mean that usability tests don’t work is that the goal of a usability test is not to make a product perfect after only one iteration.
You can’t test a product once, find all problems, fix them, and then be done with usability testing. Usability testing should be part of the design process. You improve usability by regularly iterating between design and test. With every iteration, you will find new problems - some of them you missed in earlier tests, some of them new.
You will never find all problems in a single session.
Research done by Jakob Nielsen and Tom Landauer shows that you can find a majority of problems when testing with five users – provided that you do proper tests. But even when doing good tests with five users, you will not find every problem - and the changes you make to fix the problems you have found may create new problems.
Is it better to rely on expert reviews, rather than doing actual tests?
Hoekman implies this several times, noting:
A good usability professional must be able to identify high-priority problems and make appropriate recommendations—and the best evaluators do this quickly and reliably—but a good designer must also be able to design well in the first place.
It’s natural to assume that one is capable of creating usable designs without having to test. In reality, this doesn’t seem to work. Nielsen has an interesting article on this topic titled Guesses vs. Data as Basis for Design Recommendations. In this article, he explains how even a little testing can outperform «expert opinion». He writes:
Should you offer users help to adjust font sizes or can you simply rely on the built-in browser commands? This question was recently posted to an interaction designers’ discussion group (…).
In our discussion group example, 100% of the designers who provided external data were right, whereas 25% of the designers who relied on their personal opinion were right. Most strikingly, 75% of guessers were wrong. You’d be better off tossing a coin than asking advice of these people.
A few years ago, some of my colleagues decided to run a first-experience study on one of our software packages. The purpose of such a study is to gain an understanding of what our users go through in their first hour of use. What do they experience? Where do they get stuck? How far can they get in the software? What are their learning strategies? As a side experiment, my colleagues asked several experts in the company for their expert opinion as to what problems users would run into, and compiled them into a list. (These experts included the software designers, domain experts, and the people who trained users on the software.) Then my colleagues ran their study, observing sixteen people using this software for the first time, and made a list of the problems that users actually ran into. The result?
There was not one common item on the two lists.
We repeated this experiment a year or two later on a different product, and got exactly the same result. While our experts knew a great deal, they were not good at predicting the behaviour of novice users.
In other words, even if we don’t like to hear this, actual evidence outperforms expert opinion. This matches my own experience. Even when testing designs created by usability experts, the tests invariably find countless problems that the designers did not foresee.1
That is not to say that expert opinions are useless. The more you know about usability, the better your design will be, and the better you will be able to solve problems found in usability tests. However, sheer experience can not replace doing actual tests. Trust your opinion, but verify whether you were right.
There are three specific points I think are important here.
First, the data in Hoekman’s essay doesn’t show that usability testing doesn’t work; it merely shows that you get better results if you do better tests.
Second, while it is tempting to assume that we are capable of making the right decision without testing, in reality, chances are that we are not.
Finally, usability testing is not something you can do once, find all mistakes, and then never do again. It’s iterative, and it should be part of your design process, not something separate you do towards the end of a product cycle.
Update: I’ve rewritten this section for more clarity based on feedback from Björn Busch-Geertsema. Thanks!
If you require a short url to link to this article, please use http://ignco.de/208