TACP logo

National Institute of Standards and Technology Home Page
TIPSTER Text Program A multi-agency, multi-contractor program



TIPSTER Overview
TIPSTER Technology Overview
TIPSTER Related Research
Phase III Overview
TIPSTER Calendar
Reinvention Laboratory Project
What's New

Conceptual Papers
Generic Information Retrieval
Generic Text Extraction
Summarization Concepts
12 Month Workshop Notes

Text Retrieval Conference
Multilingual Entity Task
Summarization Evaluation

More Information
Other Related Projects
Document Down Loading
Request for Change (RFC)
Glossary of Terms
TIPSTER Source Information

Return to Retrieval Group home page
Return to IAD home page

Last updated:

Date created: Monday, 31-Jul-00

Glossary of Terms


Select the first letter of the word from the list above to jump to appropriate section of the glossary.

- A -

Abstract - A document summary which succinctly captures the significant concepts in the document. Abstracts are usually prepared by humans. See summarization.

Annotation - The additional information associated with a document or a collection. Under the TIPSTER concept annotations are the principal way components pass data between them. Annotations usually the result of extraction processes; however, users may also create annotations. See the Architecture Design document.

Attribute - A characteristic of a collection, document or annotation represented by a single value or set of values.

Back to Top

- B -

Back to Top

- C -

Collection - A group of documents, usually with some characteristic(s) in common. Under TIPSTER the implementaton of a Collection is broad and a Collection may be the actual documents (text) or a list of document identifiers (ID). A document may appear in more than one Collection.
Corpus - All the documents in the domain of interest.
Coreference - An alternate reference to an entity. Ex. "John Smith is the president of Big Linguistics, Inc. He had a problem with the board of directors. Eventually the board decided the president should be replaced." Coreferences are shown in italics.
Component - a major piece of code in the TIPSTER Concept. Equivalent to a Computer System Component (CSC) in conventional life-cycle definitions. Example - a detection component. Also see module.
Back to Top

- D -

Document Detection - The selection of one or more documents which meet a Detection Need or Query. Equivalent to the older term Information Retrieval.
Detection Need - A statement that specifies the user's criteria for selecting documents from a Corpus. Under TIPSTER a Detection Need may contain any or all of the following: keywords, Boolean terms, free text describing a document or concept and examples of desired or not desired documents. Interpretion of a Detection Need results in a query which may be quite complex in structure. See Query.
Back to Top

- E -

 Extraction - The selection of specific types of information from text, e.g. person name, place names, companies, organizations, or relationships between text entities. See Information Extraction
Back to Top

- F -

Fill Rules - the criteria that describes the constraints used to select information for template slots and the conditions under which Template Objects are instantiated. See MUC guidelines.
Back to Top

- G -

Graphical Interface Unit (GUI) - Graphical interfaces are not part of the TIPSTER Architecture; however, they are usually necessary for applications. TIPSTER components frequently interface to a GUI.
Back to Top

- H -

Back to Top

- I -

Information Extraction - Same as 'text extraction'. The selection of specific types of information from text, e.g., person names, place names, companies, organizations, temporal data, currency data, other entities, co-references, relationships between entities. The latter two items are more difficult. The usual objective of extraction is to build databases that are more suitable than free text for querying, e.g. using SQL.
Back to Top

- J -

Back to Top

- K -

Knowledge Base - a files or lists of static information used in natural language processing, such as, gazetteers, parts of speech word lists, grammar rules, document structures, SGML tag sets, stemming list, stop word list, abbreviation lists and dictionaries. Thes items are frequently domain dependent.
Back to Top

- L -

Back to Top

- M -

Module - is equivalent to a Computer System Unit (CSU) in the conventional life-cycle definition. A CSU is an element specified in the design of a CSC that is separately testable. A parser is an example of a module. Modules are used to build components. Generally modules are composed of no more than 300 lines of code.
Multi-lingual - is considered to be multiple languages in one document or multiple documents in different languages.
Back to Top

- N -

Back to Top

- O -

Back to Top

- P -

Pattern - is an expression of a specific form that is used for matching text during the extraction process. TIPSTER has a Pattern Specification Language which describes how to write rules to control extraction engines.
Profile - a group of Detection Needs which describe a user's area of interest.
Back to Top

- Q -

Query - Translation of a Detection Need results in one of more queries which is either in a user understandable form or a specific format depending upon the actual retrieval engine. A query typically produces a list of documents which meet the Detection Need criteria. ( Also see Routing)
Back to Top

- R -

Retrieval Engine - the component that implements the retrieval code. The uniqueness of different Retrieval Engines is based upon the particular algorithms use for retrieval, e.g., text index approach, term weighting methods or document vectors.
Routing - the directing of a document to more than one user. Typically, each user has a profile which describes that user's area of interest. A document is tested against all user profiles so as to determine where it should be sent. In essence, one document is is tested against multiple queries obtained from the Profile, whereas Document Detection tests many documents against one query. See Profile.
Back to Top

- S -

Back to Top

- T -

Back to Top

- U -

Back to Top

- V -

Back to Top

- W -

Back to Top

- X -

Back to Top

- Y -

Back to Top

- Z -

Back to Top

Multi-colored horizontal rule