Lessons Learned in an Informal Usability Study
(Poster presented at SIGIR - July 1997)
Dawn (Hoffman) Tice,
Laura L. Downey, firstname.lastname@example.org
|(Link Legend:||Unvisited Link||Visited Link)|
|This poster examines the challenges involved in conducting an informal usability study based on the introduction of a new information retrieval system to experienced users. We present a summary of activities performed during two iterations of usability testing and describe our analysis methodology. Results of the study include lessons learned about both the users and the testing techniques.|
|The purpose of the study was three-fold: to gain experience in conducting usability testing on information retrieval systems, to specifically examine the usability of the new ZPRISE interface, and to identify problems our users were having with the assigned task (topic development for TREC).|
The NIST users (assessors) are retired information analysts from the National
Security Agency (NSA). Most of the users have been performing the topic
development and relevance assessment tasks for NIST for four years within the
TREC [Harman 1996] project parameters.
For the usability study, we chose the TREC-5 topic development task. This allowed the actual TREC-5 topic development activity and the usability study to be conducted in parallel. Each user was instructed to compose topics on any subject of interest to them prior to the usability test. They were required to provide the following information per topic: a short title, a short description of the topic, and a narrative that explained what would constitute a relevant document match.
Once the usability test began, users searched a pre-selected database for their topics. During the search they marked documents relevant to the topic and also recorded the number of relevant documents found per topic. The users performed the searches using the NIST ZPRISE system which was installed on networked SUN Workstations.
Prior to the actual usability test, users answered questions on their topic development activities. This data was not part of the usability test but was gathered to support ongoing investigations into user search behavior.
Based on traditional usability practices, we chose a three-step process: a tutorial, observations and verbal feedback, and a satisfaction survey. [Nielsen 1993].
|Usability Test 1|
We conducted the first test with the following parameters:
During the second part of the test, we observed the users for 50 minutes while they navigated the system performing their topic development task. We recorded the critical incidents and user comments. When users had trouble, we encouraged them to problem-solve on their own or to consult system help or the written instructions. Users were given an additional 30 minutes to finish their topic development without observers in the room.
For the final portion of the usability test, we administered a user satisfaction survey (30-45 minutes).
|Usabiity Test 2|
|Based on input from the first usability test, the interface was modified. A second usability test was conducted under the same conditions as Test 1, with two of the original users and two new users.|
After conducting each iteration of the usability tests, we performed an
analysis of the results using several grouping and prioritizing methods.
We identified critical incidents, in-scope and out-of-scope factors, and
prepared estimates for code changes resulting in a final decision model.
The challenge in analyzing all the collected data was to organize it in order to identify the major system and interface issues. We gathered the following data per user:
For the next step, the observation list was shortened by consolidating like problems and separating problems attributed to training issues. We also identified out-of-scope observations such as problems related to the underlying windowing system and not to the interface itself.
At this point we incorporated the data from the satisfaction survey that was relevant to the identified usability issues in order to combine all the observed and perceived problems. We also created two other lists: a set of positive comments about the system and a list of users' suggestions for future enhancements.
As the final step in organizing the usability problem matrix, we categorized the items into several sub-groups such as problems relating to messages in the interface or information organization. We then assigned high, medium and low priorities to the problems. With the development team, we proposed usability solutions and discussed the cost/benefit of each, resulting in a set of action items and estimates for changes to the interface.
Changes were made to the interface and the second usability test was conducted. In order to be able to compare and contrast the results from Test 1 and Test 2, we used the same basic analysis technique for Test 2.
During comparative analysis, we were concerned with two major questions. Had we minimized/eliminated the problems identified in Test 1? And, did any new or unreported problems occur in Test 2, especially those that may have been introduced due to the changes?
We examined the categories of usability issues resulting from both tests rather than comparing the actual raw numbers to account for the differences in users and the number of users in each test.
Usability issues were identified in 19 categories in Test 1 and 14 categories in Test 2. When we compared the categories we found 11 in common. This translates to elimination of 8 groups of usability issues between Test 1 and Test 2 and the identification of three new groups in Test 2.
We then analyzed the 11 common groups and the three new groups. We found that they could roughly be classified into two major divisions - navigation issues and conceptual issues. Navigation issues included widget co-location, size, placement, and existence. Conceptual issues primarily revolved around the definition and function of relevance feedback including the use and utility of enhanced query terms.
It should be noted that mixed objectives made it difficult to collect and organize the data. During Test 1 we struggled with several classification schemes before deciding on a useful strategy. In the end, the final data organization and analysis was made easier through repeated examination of the data from several different perspectives during Test 1.
Our first goal in conducting the informal usability study was to gain
experience in usability testing on information retrieval systems. First and
foremost, we learned that performing several activities in tandem can lead to
confusion between tasks and more difficult analysis of results. The users
were performing actual TREC topic development and in turn we were testing
the new interface for general usability while also testing this general use
interface on a specific task. Often the lines became blurred.
The second goal of identifying and correcting the problems related to our general use ZPRISE interface was relatively straightforward. During analysis, we identified navigation and conceptual difficulties which were corrected and retested. The poster session will explore these in more detail.
The third goal of identifying problems our assessors were having with the TREC task became the most complex (and interesting) of the three goals. This section will mainly concentrate on lessons learned in that area.
First, as in most usability studies [Koeneman 1994], we identified the typical user issues: