ITL hosts Text REtrieval Conference (TREC) 2004

 

The Information Access Division hosted the thirteenth annual meeting of NIST's Text REtrieval Conference (TREC) November 16-19, 2004.  TREC serves the information retrieval research community and advances the state of the art in language technologies by providing the infrastructure for large-scale evaluation of those technologies, and an open forum for their discussion.  This year, TREC "tracks" included question answering, web search, robust retrieval, novelty detection in documents, and genomics.  Over one hundred groups from twenty-eight countries participated in the evaluation, representing organizations in academia, industry, and government.

 

One new track this year, the Terabyte Track, focuses on very large scale retrieval.  While systems exist in the field that operate with terabytes of data, the methodologies for evaluating the performance of those systems are not very mature; the goal of the Terabyte Track is to advance our evaluation methods to that scale and beyond.  This year's test collection was an exhaustive crawl of publicly-available web pages contained in the ".gov" domain.  While it didn't add up to a full terabyte, the approximately 25 million web pages totaling nearly 500 gigabytes did pose a challenge to the researchers and has allowed us to begin to see how the evaluation scales up.  One unexpected outcome of the track was that several research teams took the opportunity to reimplement their systems to handle large-scale collections, and to release those systems as open source.  Now the community has more tools available to them for searching larger collections of text.

 

Plans for TREC 2005 have already begun, with new tracks in enterprise web search and e-mail spam detection.

 

Contact:  Ellen Voorhees, ext. 3761