|TIPSTER Text Program A multi-agency, multi-contractor program|
TABLE OF CONTENTS
TIPSTER Technology Overview
TIPSTER Related Research
Phase III Overview
Reinvention Laboratory Project
Generic Information Retrieval
Generic Text Extraction
12 Month Workshop Notes
Text Retrieval Conference
Multilingual Entity Task
Other Related Projects
Document Down Loading
Request for Change (RFC)
Glossary of Terms
TIPSTER Source Information
Return to Retrieval Group home page
Return to IAD home page
Date created: Monday, 31-Jul-00
The TIPSTER Text Program was a Defense Advanced Research Projects Agency (DARPA ) led government effort to advance the state of the art in text processing technologies through the cooperation of researchers and developers in Government, industry and academia. The resulting capabilities were deployed within the intelligence community to provide analysts with improved operational tools. Due to lack of funding, this program formally ended in the Fall of 1998.
DARPA, the Department of Defense (DoD) and the Central Intelligence Agency (CIA) jointly funded and managed the program, in close collaboration with the National Institute of Standards and Technology (NIST) and the Space and Naval Warfare Systems Center (SPAWAR, or SSC), formerly NCCOSC/NRaD. A TIPSTER Advisory Board was formed in 1998 with members representing users from other Government agencies interested in automated text processing, such as the Department of Energy (DOE), Federal Bureau of Investigation (FBI), Internal Revenue Service (IRS), National Science Foundation (NSF), Treasury Department and other Government agencies.
In its efforts to improve document processing efficiency and cost effectiveness TIPSTER focused on three underlying technologies.
These three capabilities formed the basis for nearly all other information handling tasks.
TIPSTER Phase I
During the first phase of TIPSTER research efforts, (1991-1994), the participants made major advances in creating the algorithms for document detection and information extraction and in improving the techniques for measuring those advances, through activities such as the Message Understanding Conferences (MUC) and the Text Retrieval Conferences (TREC). Document Detection technologies improved Recall from roughly 30% to as high as 75% and the improvement in the processing of natural language queries was also significant. Improvements in Information Extraction produced increases in Recall from roughly 49% to 65% and in Precision from 55% to 59%, and dramatic gains were made in the ability to automatically identify a wide range of items such as names (both personal and organizational), dates, locations, times, phone numbers, etc.
TIPSTER Phase II
The TIPSTER research and development community turned its attention to the creation of a software architecture during the second phase, (April 1994-September 1996), in order to standardize the technology components, enable "plug and play" capabilities among the various tools being developed, and permit the sharing of software among the various participants. Based on feedback from the researchers, developers, and users of the existing prototype and implementation systems, the architecture, funding permitted, continued to evolve.
The Multilingual Entity Task (MET) developed Chinese and Japanese training collectons with over 300 documents in each language. The task was initially confined to Named Entity extraction and the development of a variety of tools such as word boundary finder, part-of-speech tagged Chinese lexicons and dictionaries.
Various research projects and demonstration systems in support of Document Detection and Information Extraction were also completed.
TIPSTER Phase III
Phase III started in October 1996 and continued to build on Phase I and II achievements with new projects in supporting research, development and evaluation areas. Also, summarization was added as a fundamental task area. See Phase III Overview