SED navigation bar go to SED home page go to SED publications page go to NIST home page SED Home Page SED Contacts SED Projects SED Products and Publications Search SED Pages

contents     previous     next

3.1.4 Comparison of Information Retrieval Systems

David L. Banks, Eric Lagergren, Nien-Fan Zhang

Statistical Engineering Division, ITL

Donna Harman, Walter Liggett, Paul Over, Ellen Voorhees

Information Access and User Interfaces Division, ITL

For many years, NIST has been a leader in text-retrieval research. This domain poses a number of statistical problems, and as we have built on previous progress, the remaining unsolved problems have grown increasingly more difficult. Now, the pressing issue is to compare the performances of different document-retrieval systems across a range of retrieval topics. A solution in this area will enable the many universities and industries that have designed document browsers to achieve a high-level understanding of the strengths and weaknesses of their products, and point the way to new improvements.

In the comparison problem, document-retrieval system i assigns a rank to document j for the kth retrieval topic; call this rank Xijk. From this information, one wants to determine which topics are intrinsically hard or easy, and how those topics might be identified a priori. Following up on that, once one has some meaningful clusters of topics, one wants to find out which browsers work best with which clusters.

Our first examination used straightforward analysis of variance techniques, especially Mandel's `bundle-of-lines' procedure to overcome the problem of testing for interaction when there is only one observation for each combination of factor levels. The results were definitive; there are interactive effects between topic and browser, and these effects are strongest with the most difficult topics.

This result suggested several analytical strategies, including block-cluster analysis, latent variable modelling (based on item response analysis in educational testing theory), and statistical models for rank-valued data. These analyses are at different stages of completion, but the emerging consensus is that none of them will enable a comprehensive solution to the broad problem.

We have developed an analysis based on a new graphical tool called the beadplot, shown on the following page. In the beadplot, the most relevant document is colored dark red, and less relevant documents are given colors that tend towards purple, according to the visible spectrum. Thus one can see in the beadplot how each system rates the documents in relevance (white spots correspond to irrelevant documents). Thus it is easy to identify groups of documents which tend to be retrieved together; some systems retrieve them early, and assign them low ranks, whereas others miss the documents entirely, or assign them higher ranks. Based on insights from these plots, we now represent each topic in a three-dimensional space whose axes reflect how easy it is to capture relevant documents, how many red herrings the document pool contains, and how much the systems differ in success for that topic.


Figure 4: Beadplots show the rank at which each relevant document was retrieved by each of the text-retrieval systems. The rows correspond to the retrieval system, and the colored dots correspond to documents. Dots of the same color indicate the same document, and the order and spacings along the row indicate the ranks the documents were assigned.

contents     previous     next

Date created: 7/20/2001
Last updated: 7/20/2001
Please email comments on this WWW page to