Genomics Track introduced in TREC 2003

 

 

The Text REtrieval Conference (TREC) is a series of evaluation workshops managed by ITL’s Information Access Division and designed to foster research on technologies for information retrieval.  Participants produce retrieval results for one or more focus areas called tracks prior to the workshop, then meet during the workshop to discuss the results.  The twelfth TREC conference, TREC 2003, was held November 18-21, 2003 at NIST (Gaithersburg).  TREC 2003 contained six tracks, including tracks on question answering, retrieving web documents, and eliminating redundant information in a response.  Two new tracks focused on improving baseline retrieval effectiveness.  A third new track examined retrieval effectiveness when the information sought is restricted to a particular domain, and used genomics data as the domain of interest.

 

The primary task in the genomics track was to retrieve documents describing gene function.  Systems were given a gene name and an organism (e.g., "human" or "mouse"), which was interpreted as a request to retrieve documents describing the basic biology of the gene and its protein products in the specified organism.  The motivating scenario for the task was that of a biological researcher or graduate student---someone who already has considerable domain knowledge---confronted with the need to learn about a new gene very quickly.  The document collection used for the test consisted of approximately 526,000 MEDLINE records donated to the track by the National Library of Medicine.  Twenty-five groups including academic (Berkeley, Stanford, University of Maryland, University Hospital of Geneva), commercial (Erasmus MC, Tarragon Consulting Corp.), and governmental (the National Library of Medicine, the Canadian National Research Council) research groups participated in the track.

 

More information regarding TREC can be found on the TREC web site, http://trec.nist.gov.

 

CONTACT: Ellen Voorhees, ext. 3761