David L. Banks, Eric Lagergren, Nien-Fan Zhang
Statistical Engineering Division, ITL
Donna Harman, Walter Liggett, Paul Over, Ellen Voorhees
Information Access and User Interfaces Division, ITL
For many years, NIST has been a leader in text-retrieval research. This domain poses a number of statistical problems, and as we have built on previous progress, the remaining unsolved problems have grown increasingly more difficult. Now, the pressing issue is to compare the performances of different document-retrieval systems across a range of retrieval topics. A solution in this area will enable the many universities and industries that have designed document browsers to achieve a high-level understanding of the strengths and weaknesses of their products, and point the way to new improvements.
In the comparison problem, document-retrieval system i assigns a rank to document j for the kth retrieval topic; call this rank Xijk. From this information, one wants to determine which topics are intrinsically hard or easy, and how those topics might be identified a priori. Following up on that, once one has some meaningful clusters of topics, one wants to find out which browsers work best with which clusters.
Our first examination used straightforward analysis of variance techniques, especially Mandel's `bundle-of-lines' procedure to overcome the problem of testing for interaction when there is only one observation for each combination of factor levels. The results were definitive; there are interactive effects between topic and browser, and these effects are strongest with the most difficult topics.
This result suggested several analytical strategies, including block-cluster analysis, latent variable modelling (based on item response analysis in educational testing theory), and statistical models for rank-valued data. These analyses are at different stages of completion, but the emerging consensus is that none of them will enable a comprehensive solution to the broad problem.
We have developed an analysis
based on a new graphical tool called the beadplot,
shown on the following page.
In the beadplot, the most relevant document is colored dark red,
and less relevant documents are given colors that tend towards purple,
according to the visible spectrum.
Thus one can see in the beadplot how each system rates the documents
in relevance (white spots correspond to irrelevant documents).
Thus it is easy to identify groups of documents which tend to be
retrieved together; some systems retrieve them early, and assign
them low ranks, whereas others miss the documents entirely, or assign
them higher ranks.
Based on insights from these plots, we now represent
each topic in a three-dimensional space whose axes reflect how easy
it is to capture relevant documents, how many red herrings the document
pool contains, and how much the systems differ in success for that topic.
Figure 4: Beadplots show the rank at which each relevant document was retrieved by each of the text-retrieval systems. The rows correspond to the retrieval system, and the colored dots correspond to documents. Dots of the same color indicate the same document, and the order and spacings along the row indicate the ranks the documents were assigned.
Date created: 7/20/2001