ITL hosts Text
REtrieval Conference (TREC) 2004
The Information
Access Division hosted the thirteenth annual meeting of NIST's Text REtrieval
Conference (TREC) November 16-19, 2004.
TREC serves the information retrieval research community and advances the
state of the art in language technologies by providing the infrastructure for
large-scale evaluation of those technologies, and an open forum for their
discussion. This year, TREC
"tracks" included question answering, web search, robust retrieval,
novelty detection in documents, and genomics.
Over one hundred groups from twenty-eight countries participated in the
evaluation, representing organizations in academia, industry, and government.
One new track this
year, the Terabyte Track, focuses on very large scale retrieval. While systems exist in the field that
operate with terabytes of data, the methodologies for evaluating the
performance of those systems are not very mature; the goal of the Terabyte
Track is to advance our evaluation methods to that scale and beyond. This year's test collection was an exhaustive
crawl of publicly-available web pages contained in the ".gov" domain. While it didn't add up to a full terabyte,
the approximately 25 million web pages totaling nearly 500 gigabytes did pose a
challenge to the researchers and has allowed us to begin to see how the evaluation
scales up. One unexpected outcome of
the track was that several research teams took the opportunity to reimplement
their systems to handle large-scale collections, and to release those systems
as open source. Now the community has
more tools available to them for searching larger collections of text.
Plans for TREC 2005
have already begun, with new tracks in enterprise web search and e-mail spam
detection.
Contact: Ellen Voorhees, ext. 3761