Document Visualization

Emile Morse

 

 

 

Submitted: December 15, 1997

Revised: January 15, 1998

Table of contents

Abstract *

1 Introduction *

2 Scope and Definitions *

2.1 Documents and Metadata *

2.2 Shneiderman Framework *

2.3 Information Retrieval versus Information Visualization *

3 Document Data Types and Representation *

3.1 Linear Text *

3.2 Two-dimensional Text *

3.3 Three-dimensional Text *

3.4 Multidimensional Text *

3.4.1 Text Analysis-the basic method *

3.4.2 Refinements of the Basic Method *

3.4.3 Alternative Methods for Encoding Documents *

3.5 Temporal Text *

3.6 Trees *

3.7 Networks *

3.8 Distributed Documents/Workspaces *

4 Interface Issues *

4.1 Overview *

4.2 Zoom *

4.3 Filtering *

4.4 Details-on-Demand *

4.5 Relate *

4.6 History *

4.7 External Memory/Extracts *

5 Task Models *

5.1 Wehrend -- a task level user model *

5.2 Task Models from Library Environments *

5.2.1 Marchionini *

5.2.2 Bates *

5.2.3 Belkin *

5.3 VIRI Research Group Tasks *

5.4 Summary of Task Models *

6 Examples of Visualization Systems *

6.1 Linear Text: TileBars *

6.2 Two-dimensional Text: Pad++ *

6.3 Three-dimensional Text: WebBook *

6.4 Multidimensional Text: SPIRE *

6.5 Temporal Text: SeeSoft *

6.6 Tree Text: Hyperbolic Tree *

6.7 Networks: Navigational View Builder *

6.8 Distributed Documents/Workspaces: CASCADE *

7 Research Opportunities *

8 Summary *

Appendix A: Metadata *

Appendix B: List of Tasks from Shneiderman *

Appendix C: Research Design *

9 References *

 

List of Tables

Table 1: Shneiderman Classification *

Table 2: Comparison of Tasks *

Table 3: Information Seeking Dimensions (Belkin et al. 1995) *

Table 4: List of Visualization Systems for Documents *

 

 

List of Figures

Figure 1: Information retrieval model *

Figure 2: Information visualization or browsing *

Figure 3: TileBars embedded in a Scatter/Gather interface. *

Figure 4: PAD++ rendering of a hypertext *

Figure 5: WebBook close-up showing a book being riffled through *

Figure 6: SPIRE Themescape showing topic distribution in a large document space *

Figure 7: SeeSoft showing overview of a software project *

Figure 8: Hyperbolic tree representation of document collection *

Figure 9: Navigational View Builder showing relationships among a set of documents *

Figure 10: CASCADE display demonstrating landmark feature including color-coded links,

Mural and TileBar *

Figure 11: VIBE display with Displacement feature activated causing tails to appear on document icons *

Figure 12: WebVIBE showing the same document collection and POI selections as

Figure 11 *

 

Abstract

Evaluating visual interfaces for documents requires knowledge about each of the components of a human-computer system. The computer component manages the document and its various surrogates, indexes and representations. A framework developed by Shneiderman suggests characterizing objects by their data type. The proposed types include linear, 2-D, 3-D, multidimensional, temporal, hierarchical, network and distributed. The interface component of visualization systems is discussed in terms of the functionality provided for interaction of the user and the computer. The basic elements are static overview, and several dynamic requirements, such as zoom, filter and details-on-demand. The user component is discussed from the viewpoint of various taxonomies of elemental tasks that are required for satisfactory goal realization. Finally, existing visual interfaces that support document space exploration are reviewed.

  1. Introduction
  2. Visualization is a cognitive process performed by humans in forming a mental image of a domain space. In computer and information science it is, more specifically, the visual representation of a domain space using graphics, images, animated sequences and sound augmentation to present the data, structure and dynamic behavior of large, complex data sets that represent systems, events, processes, objects and concepts (Williams et al. 1995).

    Visualizations may be characterized on a continuum ranging from physically concrete to purely abstract depending on the properties of the objects being rendered. Scientific visualization is mainly concerned with phenomena that are based in the physical world. The most concrete visualizations are renderings of objects as they exist in the world, e.g., a walk-though of a museum. Renderings of building plans could be placed a little further along the continuum since the building does not already exist but is made visible based on physical properties contained in the plan. Models that attempt to render properties that are not visible, such as forces in bridge beams or wind gusts in weather simulations, are making things that cannot ordinarily be seen visible. Maps are another instance of visualizations that are rooted in the physical world; they can be used to magnify (e.g., computer chip map) or shrink (e.g., geographical maps). In addition, they may be used to display attributes that are physical (e.g., topology) or abstract (e.g., population density). Molecular modeling is an example of visualization that is both physically motivated and abstract. The objects, i.e., atoms and molecules, are concrete objects, but due to the fact that they are invisible, chemists and physicists rely on models that have utility but that are not necessarily faithful representations of the underlying atomic species. Visualizing in databases can be mapped best to a portion of the abstract end of the physical/abstract scale. The attributes stored in many databases reflect characteristics of physical objects, but in many other systems, the objects may be abstract or a mixture of both. Since information has "no innate shape or color" (Koike 1993), its visualization has a purely abstract character. Information visualization covers areas such as visual reasoning, visual data modeling, visual programming, visual information retrieval and browsing, visualization of program execution, visual languages, visual interface design, and spatial reasoning.

    People have tremendous perceptual abilities for visual information. Visualizations rely on the fact that users can distinguish positions, colors, textures, and relationships. Relationships can be shown in such displays by proximity, by containment, by connected lines, by color-coding, etc. Fields containing hundreds or thousands of points can be scanned rapidly and efficiently for clusters, outliers, trends, and gaps. Attention can be drawn to salient items using a variety of techniques including highlighting, blinking, motion, and size. Direct manipulation of visualizations can be accomplished with a variety of methods, such as pointing to select, dragging, and zooming. Feedback is immediate and intuitive in such environments. "The eye, the hand and the mind seem to work smoothly and rapidly as users perform actions on visual displays" (Shneiderman 1996, p. 340).

    Examples of and guidelines for good graphical displays are common (e.g., Tufte 1983, 1990, 1997, Bertin 1983, Cleveland 1985). These guidelines are descriptive rather than prescriptive. They focus mainly on the data and not on the tasks that the user might need to perform with the data (Casner 1991). There is evidence that task performance is sometimes superior with graphical displays but in other situations, textual displays are better. Larkin & Simon (1987) investigated the usefulness of graphical displays in human task performance. They found that there were two ways in which graphical presentations could support more efficient task performance: 1) by allowing the substitution of rapid perceptual inferences for difficult logical inferences, and 2) by reducing search for information required for task completion. This study gives some theoretical support to add to the intuitive appeal of using graphics.

    Until now the discussion has centered on what visualization is and why it is interesting to consider as an alternative to textual information. The next topic pertains to documents, including why and how they can be the objects of visualization. Documents are important sources of information. A document's information is found not only in its content, but also in its metadata and its structure. Document metadata consists of elements such as author, publisher, and date of publication. Keywords comprise an intermediate form of document data; they can be viewed as both content and metadata. Metadata forms the primary type of information in library systems. The WorldWide Web contains large volumes of documents that are poorly characterized in terms of both metadata and content descriptions. Automated methods for indexing documents have become an important research issue since there are not enough human resources available to index the documents that are published either in paper or in electronic formats. The methods for indexing documents are normally lexically-based but this is not the only kind of information available in the native full-text. Issues at the data level include: how to prepare adequate document representations or surrogates and how to integrate metadata and derived indexes. Section 3 will apply a taxonomy developed by Shneiderman (1996, 1998) to support visualization. Details of this taxonomy, as originally proposed, can be found in the Scope and Definitions chapter.

    The second dimension of the Shneiderman taxonomy presents a set of functionalities required of any computer-human interface. Section 4 presents an overview of these interface functionalities with respect to visualization systems. Discussing the interface requires attention to both how the computer presents information to the user as well as how the user communicates his needs to the computer. For this reason the discussion includes methods for mapping computer representations of documents to interface objects and interaction techniques.

    Much effort is devoted to developing visualization systems but there is less attention to developing schemes for testing whether these systems are useful in meeting the needs of users. Part of the problem with setting up a testing plan for visualization systems resides in the fact that these systems are purported to be exploration, creativity or browsing environments. As such, it seems to be an enormous undertaking to define at any level what the component tasks of such environments might be.

    If one intends to be able to evaluate visual presentations of data derived from documents, it is important to understand both the data source and the tasks of the users who need to accomplish document-centered goals. Section 5 describes three different ways that tasks can be categorized. The first is a low level task analysis of Wehrend & Lewis (1990) that seeks to determine, based on elemental data objects, the types of tasks that can be developed. The second approach to developing a task taxonomy comes from the library science literature and is of the naturalistic field testing variety. The final task set is a preliminary, high-level breakdown proposed by the VIRI (Visual Information Retrieval Interface) research group. The sixth section describes several current visualization systems with examples being taken from each of the main data type categories.

  3. Scope and Definitions
  4. The basic components of a human-computer system are the data, the user, and the interface between them. This paper will address 1) how documents in a system are processed for presentation in the interface, 2) how the data are mapped to interface objects and what kinds of interactions should be supported by the interface, and 3) the tasks that characterize the user model in the system. The organization of the discussion mirrors these three major elements. Section 3 concentrates on developing the Shneiderman framework as it might be applied to documents. Section 4 reports on the methods that can be used to render overviews in computer displays and on requirements for effective interaction with displays. Section 5 presents several alternative task models that have been used to characterize the human user of information systems. Finally, Section 6 presents examples of visualizations that derive from each of the major document data types described in Section 3.

    1. Documents and Metadata
    2. "Ask any group of ten information scientists to define 'document' and you will get ten different answers." - Spring (1991)

      According to Spring (1991, p 8), "A document is an identifiable entity, having some durable form, produced by a person or persons toward the goal of communication and may take a number of forms, but must have at least one symbolic manifestation that can be comprehended by humans." Buckland (1997) presents an interesting historical treatment of document definitions. Various viewpoints have been held in the last 150 years. Otlet (Buckland 1997) regarded not only graphic and written records to be documents, but also objects that could inform observers; e.g., natural object, artifacts, explanatory models, educational toys, archaeological finds, and sculpture. Breit (1951) has also taken an ecumenical view of documents and presented a discussion about whether or not an antelope was a document. Her conclusion was that a free-living antelope in the wild was not a document but that a specimen in a zoo was certainly a document. For the practical purpose of presenting visual representations of documents, it is most often plain text that is processed to generate vectors, indexes, and surrogates used in systems. This paper focuses on this more constrained view of documents as plain text.

      Metadata in the current context refers to information about a document rather than to the content of the document itself. Author, publisher, date of publication, and even keywords constitute metadata. Standards are being developed for use on the WWW and in traditional library systems that seek to codify metadata. Ng et al. (1997) present an overview of the metadata standardization issues, including a discussion of the similarities and differences among the competing candidate standards. They mention the Dublin Core, URC, USMARC, IAFA, and TEI header methods. Such schemes will provide some basic structure to a large class of documents that are largely unclassified. A table of metadata classifications is presented in Appendix A.

      A final issue related to document spaces concerns granularity. Spring (personal communication) has suggested that a proper framework would include: document components, documents, document sets, document collections, and document analytics. Components can be defined as paragraphs, sections, or chapters of their parent document. Sets are groups of documents that are fewer in number that the full collection. Analytics are essentially metadata. Successful visualizations must clearly distinguish the grain size that they attempt to render. In this paper there is no attempt to divide applications explicitly along these lines. It should be clear in each case, however, whether a visualization is based on single documents, sets of documents or collections of documents.

    3. Shneiderman Framework
    4. The framework suggested by Shneiderman to support research in visualization is two-dimensional. The first dimension is the data-type of the objects to be represented in the interface. He lists seven types in an early paper (Shneiderman, 1996) and eight in later versions found in his textbook (Shneiderman 1998) and at the University of Maryland website. The types are linear, planar, volumetric, temporal, multidimensional, tree, network, and workspace. Workspace is the type that was added in the later versions. The second dimension is a task typology and includes: overview, zoom, filter, details-on-demand, relate, history, and extract. The scope of the Shneiderman framework is the entire domain of visualization. Table 1 provides a graphical view of the framework. It is pertinent to note that both of these dimensions are very high-level, more qualitative than quantitative. The purpose of the original framework was "to sort out the prototypes [that currently exist] and guide researchers to new opportunities" (Shneiderman 1996); the goal of the current examination is the same.

      Table 1: Shneiderman Classification

       

      Interface Functionality

      Document Data Types

      Overview

      Zoom

      Filter

      Details-on-Demand

      Relate

      History

      Extract

      Linear

      2-Dimensional

      3-Dimensional

      Multidimensional

      Temporal

      Hierarchical

      Network

      Distributed

                   
    5. Information Retrieval versus Information Visualization

    The retrieval process in the traditional view is quite simple. Information is stored and later retrieved when it is needed. On-line retrieval systems typically consist of a large document database. Terms that describe the document contents (index) are selected from manual or automatic indexing. The index terms are descriptors of the represented document. Queries are requests to process information and a search query consists of different terms combined in a structured query language. The traditional information retrieval paradigm is a matching process according to the similarity between the keyword index entries and the search query. The problem is to find all and only the relevant documents. To evaluate the retrieval results two statistics are used: recall, the percentage of relevant documents found and precision, the percentage of the documents found that are relevant. From the evaluation of the retrieval results one can formulate a new query. The traditional model of information retrieval is shown in Figure 1. Problems can arise in this model when the number of retrieved documents is very large or when the language used to specify the query is poorly matched to the real information need of the user.

    Figure 1: Information retrieval model

    An information visualization as user interface could help to overcome these problems. As shown in Figure 2 the abstract data model represented by the index is visualized as an information space.

    Figure 2: Information visualization or browsing

    The user interacts directly with the visualization to express his needs. The query in this model is stated implicitly in the view and it is filtered and refined through manipulation of the interface. Navigation inside the information space is helpful following a context-oriented search path to find certain domains of interest. Relevance in the visual setting is not necessarily a predetermined characteristic of a document. The interaction of the user with the interface supports browsing, creativity and constant refinement of the original statement of the goal of the search. Since relevance is not definable in the information visualization paradigm, assessment using recall and precision is impossible. New methods need to be developed to evaluate systems developed based on this model.

  5. Document Data Types and Representation
  6. "Information representation is multifaceted and flexible." - Gershon (1995)

    Although the inspiration for this paper was through the work that has been done on developing interfaces to support information retrieval, it is much larger in scope than the view used by most workers in information retrieval, which is to view documents solely as multidimensional objects. In this subsection text will be viewed variously as streams of words, flows of topics, collections of metadata, and even by reference to its physical manifestation in paper form. Shneiderman (1996, 1998) has suggested that a data type by interaction type framework could ground work in visualization as a whole. He proposed the types adopted here and has extended them recently (http://www.umd.cs.edu/users/north/infoviz.html).

    The current adaptation to a text-only environment has changed somewhat the classification scheme and many of the implementations fall in categories different from those suggested by Shneiderman. The use of this organization is not meant to imply that these are all the groupings that might be applied to text nor are the assignments the only ones possible. The structure is meant to be fluid.

    1. Linear Text
    2. Viewing documents as streams of words has a great deal of similarity to assessing speech. Techniques that are applied to spoken words can be adapted to analyzing written words. Speech is examined at several levels, including phonology, morphology, syntax, and semantics. A contextual method that has been applied to written text is discourse analysis. Hearst & Plaunt (1993, Hearst 1994) have investigated using a statistical parser to segment text into topical elements. As text is scanned from top to bottom, a sliding window can be programmed to process chunks of the text. The output is analyzed to determine when a subtopic is being introduced. The method is a motivated segmentation that reflects a text's underlying subtopic structure, which can span paragraph boundaries. TextTiling is a two step process that first compares adjacent blocks of text and assigns a similarity value. The blocks are usually 3-5 sentence units. The second step involves graphing the resulting similarity values and smoothing the generated curve. Peaks in the curve indicate regions of high subtopic coherence, whereas valleys indicate evidence for topic switching. Large expository documents were subjected to testing by asking volunteers to perform a topic identification task (Hearst & Plaunt 1993). The results showed that there was a high degree of correlation between the judgments of the subjects and the TextTiling algorithm.

      This approach to analyzing documents is related to several other methods. Salton & Buckley (1991) have used author-supplied orthographic markup to segment documents into paragraphs. Whereas Hearst's motivation in performing text segmentation (Hearst, 1993) was merely to determine where topic boundaries occurred, Salton & Buckley (1991) sought to discover the content of the individual segments. Stanfill & Waltz (1992) created text segments by dividing documents in 30-word blocks. The results of their study can be compared with the variant employed by Hearst which she terms 'unmotivated' segmentation. When compared with 'motivated' segments, the latter are shown to produce superior recall-precision statistics (Hearst & Plaunt 1993).

    3. Two-dimensional Text
    4. There are several ways that text can be viewed in two dimensions. The first way to view text as 2-D is to focus on the characteristics of the text as it appears on a page. The key feature is the formatting, such as paragraphs, headings, tables, and general use of 'white space'. The 2-dimensional view of text is especially productive of metaphors. Several visualizations have been developed that build on the tangibility of printed matter. People dog-ear pages to provide bookmarks and they underline and annotate. The printed page is familiar and provides a great deal of utility apart from its primary function; it is only reasonable that graphical interface designers would borrow from it. Rather than trying to determine semantics of the page's content, implementers only show pictures of the page. This is a very simple but versatile mechanism for conveying to a user something about a text. Zooming in on a page reveals successively more and more detail and allows a user to orient himself with respect to the organization of material.

      The second way that text can be characterized as two-dimensional is not truly a function of the data source but is a derived measure. If a document can be characterized by a low-dimensional vector, then standard graphical methods, such as pie charts, histograms, scatterplots, and line graphs, can be applied to rendering the document space. Strictly speaking, the renderings are 2-dimensional but the number of attributes that can be mapped to such a space is greater than two. For instance, in a scatterplot each object exists at a particular x-y coordinate, and may have an associated shape, size, color, texture, etc. Each of these features may represent different attributes of the item being represented. These graphical representations are important since they are so common in everyday experience. People have years of training in interpreting such graphs. In addition, considerable work has been performed that allows automatic generation of graphs (Mackinlay 1986, Casner 1991, and Roth et al. 1994). Most of the work done on auto-generation has concentrated on relational data. The metadata that is usually available for documents is relational and might be amenable to viewing in such systems.

    5. Three-dimensional Text
    6. Just as the planar page can give rise to a unique view of text, a view of books as 3-dimensional objects can also serve to characterize another useful view. The tangibility of books, the feel of the pages, their location in physical space, the color of the bindings are but a few of the characteristics that are part of the 3-dimensional aspect of text. This view has given rise to several popular metaphors for graphically rendering documents, including the desktop, piles, and rooms.

      Mander et al. implemented the "Pile" metaphor (1992). This is analogous to a pile of documents on a desk: the documents retain the order in which they were placed in the pile and some of their appearance, e.g., color. The pile of documents is displayed as a small perspective drawing, piles created by the user have a disheveled appearance, and those created by the system (perhaps as the result of a database query) appear neat. The design includes a gesture for spreading out the elements of the pile so they are all visible (a horizontal back and forth movement), and a gesture for starting to browse the pile elements (an up and down motion). The browsing operation uses a viewing cone, where the miniature document is displayed facing the viewer on the base of a pyramid pointing back towards the document's position in the pile.

      The Rooms system (Henderson and Card 1987) exchanges the idea of a single extended virtual surface for a collection of virtual screens of normal size. The reasoning is that typical work patterns are clustered into a collection of tasks between which people switch, and these tasks are not spatially related. The system also allows a window to appear in more than one room, and even to have a different location and shape depending on what room it is being seen from. An extension of this system to three dimensions is described in Robertson et al. (1993).

    7. Multidimensional Text

The use of natural language processing to generate better document vectors has been the object of intense investigation for a long time. Methods for detecting phrases (Croft et al. 1991) and for extracting names (Rau & Jacobs) and topics (Hahn 1990) have enriched the arsenal of information retrieval (IR) researchers. The advent of full-text rather than mere surrogates has opened the question of whether the old methods, which were developed to handle short text pieces, would scale up to handle full-text. The evidence shows that there is some degradation of processing effectiveness (Blair & Maron 1985). One of the possible factors that inhibits scalability is that long pieces of text are actually strings of related and dependent ideas whose major theme emerges from their juxtaposition. In order to capture the meaning of these longer texts there has been a considerable effort to detect and encode the content of subpassages of documents.

The purpose of this section is to guide the reader to an appreciation of the difficulty in producing the requisite data for visualizing text. The starting material for text characterization is usually full-text, but in some cases only surrogate documents comprised variously of title, authors, abstract, citation list are used. Methods for processing these text pieces are generally lexical in nature. Systems that are more ambitious employ syntactic and semantic parsing. There is some evidence that detection of phrases is useful in improving the effectiveness of retrieval. Other methods rely on neural networks to detect patterns in text. There is some intriguing evidence that the purely statistical methods and neural networks produce results that are highly similar (Schütze et al. 1995). The problem with all these methods is similar to the problem of people trying to understand each other. The overriding hope is that the words that are spoken or written convey some meaning that is intended by the speaker/writer and understood by the listener/reader. To ask machines to do what people often fail to do is a big task. The goal of all the methods is to capture some core essence of a passage, document or collection. The hope is that the content being examined is sufficiently clear, long enough, redundant on topic and sparsely populated by extraneous material. Two important criteria bear investigation:

Willett (1988), Schütze et al. (1995) and Lewis & Sparck Jones (1996) have presented reviews of data generation methods. The essential thing to keep in mind when performing visualizations based on this type of data is that the data is fuzzy at best. The computer slogan 'garbage in, garbage out' serves as a warning to those who attempt making pictures of questionable data.

      1. Text Analysis-the basic method
      2. Regardless of whether a Boolean, extended Boolean, fuzzy Boolean, probabilistic, or vector model is used for information retrieval, the document is represented in the computer system as a vector of terms. In some cases, the vector contents are binary (0, 1) to represent the presence or absence of a term. Other systems use numeric values to indicate the strength of a relationship between a document and a term element. The permissible range of values is of little consequence; systems using values between zero and one are common as are those that use positive integers. The first step in processing any text collection is to count the frequencies of words in the texts. Usually one or more stop lists are employed at this stage in order to speed up processing and to generate more meaningful term sets. A generic stop list contains words that are too common in the language to allow reasonable retrieval characteristics, e.g., 'the', 'a', 'an', 'of'', 'that.'. Additional stop lists may be employed in a particular domain to prevent inclusion of words that are prevalent in the local environment, e.g., 'rock' in a geology textbase, or 'computer' in a computer and information science collection. In addition, words are usually stemmed by any of several methods (e.g., Lovins (1968), Porter (1980)) so that the set of potential keywords is compacted.

      3. Refinements of the Basic Method
      4. The resulting raw count data is subjected to further processing by several methods. Depending on the domain and size of the collection, the number of terms that may be identified at this stage may be in the range of a several hundred to tens of thousands or more. Among the most common methods used at this point are: normalization for document length, application of a term discrimination method, term intercorrelation determination, and thesarurus expansion.

        Normalization for document length

        Collections can vary widely in the size of documents that they contain. A book and an abstract might both contain the same number of occurrences of a particular term. It is clear that, in this case, the term is probably a better descriptor of the shorter document. In order to control for document length, it is customary to normalize the term counts for document length. The necessity for this correction factor depends on the similarity measure chosen for subsequent calculations. If the cosine is used to determine similarity, then no correction need be applied. The process of weighting by frequency of occurrence in the total document collection is an attempt to normalize document representatives with respect to expected frequency distributions.

        Term discrimination value

        Thus far, a list of words or stems has been produced together with a frequency of occurrence of those elements in each text of a collection. The only adjustment has been for document length. One of the major purposes of a term list to allow a user to appreciate differences and similarities among texts. Terms that appear in nearly every document are useless for this purpose, as are terms that occur rarely. The inverse document frequency in one of several forms is applied to normalize for term set size (Harman 1992). Alternatively, a commonly applied heuristic for the lower bound is that a term should appear in over 20% of all documents. Similarly, terms that appear in over 80% of all texts can be ignored. The term discrimination value is another method for determining which terms provide the best indexing terms for a collection of documents (Salton 1989). The hundreds or thousands of terms generated during the concordance phase of text processing can be viewed as a multidimensional term space within which the documents are suspended. It is theoretically possible to determine the effect of adding or removing a term on the placement of documents in the space. If adding (or deleting) a term causes a significant change in shape of the space then the term is considered important. If adding (or deleting) a term produces little effect on document distribution then it could probably be ignored. The Exact method of Willett (1985) compares each multidimensional document descriptor with each other document vector using the cosine similarity measure. Terms that produce positive cosine values indicate 'good' discriminators; terms that produce negative cosines are useful for dissecting out regions of space that indicate 'not'; intermediate and zero values are neutral for the process of discriminating.

        The Exact method is an O(n2) process. Even though the calculation of discrimination values is not performed dynamically during a browsing or retrieval session, the number of terms can lead to processing times in the order of tens of hours even on powerful processors. The method described by Salton (1989) proposes to calculate a centroid document which is used for comparison with each document vector. This process is clearly O(n). A study by Crouch (1988) showed that the results of using this approximate method was as good as the exact method in terms of specificity of term identification with the expected huge reductions in processing time.

        Intercorrelation determination

        The terms identified by either a pure concordance or those filtered by calculation of term discrimination value (TDV) are likely to be intercorrelated, i.e., different terms produce the exact same documents in response to a query. The implication is that the number of terms can be reduced without affecting the quality of the index terms. In addition, a reduction of correlating terms is indicated in the situation of vector model retrieval in which a usual assumption is that the terms are pairwise orthogonal. Raghavan and Wong present a detailed description of the side effects of violating this assumption (1986). They admit, however, that applications based on vectors as notational convenience rather than a formal model of IR concepts have been successful. Clustering methods are frequently used to identify terms that co-occur. The review by Willett (1988) presents a lengthy discussions of the available methods and the advantages and disadvantages of each. Chen et al. (1995) review various methods and present results derived using several different clustering algorithms.

        Thesaurus expansion

        Thesauri can be applied to documents collections to generate broader, narrower, synonymous and related terms. Research in this area comprises both creation of and use of thesauri. Chen et al. (1995) describe a method for creating a thesaurus using multiple sources. In addition to using the methods described thus far in this paper-term frequency, document frequency, weighting for length, co-occurrence analysis-they subjected the term lists to one of two generative methods. In the first, they treated the terms as a single collection, regardless of source. In the other, they processed separately the terms from each of four different sources about the same topic. Their study concentrated on trying to determine if better methods could be devised for coping with the problems of information overload and language fluidity. This seems to be a major thrust of automatic thesaurus generation research-automating takes care of the 'overload' problem and creative indexing takes care of the 'fluidity' problem.

        The work of Losee and Haas (1995) is a typical study in the field of thesaurus development. Their work concentrates on sublanguages, the languages used by people working in a particular field or discipline. This area is particularly concerned with language that is changing rapidly to accommodate advances in science. Although all languages undergo gradual change, the world of scientific endeavor experiences even more rapid turnover due to the introduction of new concepts that need to find expression. A related problem is the borrowing of terms from one discipline to cover the needs of another. For automatic indexing systems, it is a special problem to know what the introduction of new terms might imply.

      5. Alternative Methods for Encoding Documents

Although keywords and vector representations are the most commonly encountered methods of representing text, especially in situations in which automatic encoding is desired, e.g., large on-line collections and/or the WWW, there are significant advantages to using different approaches to text processing. For instance, several of the projects from Xerox PARC employ citation tracing to support browsing of large information stores found in distributed sites (Mackinlay et al. 1995). The researchers undertaking these projects cite the utility of using the built-in schemes of large IR suppliers such as DIALOG. One of the side effects is the ability of such systems to use querying based on relational databases. While it would be difficult to encode in vector form the information about the year of publication, the names of the authors, or similar demographic information, systems that rely on relational databases can use this information quite effectively. Several projects are attempting to merge the two approaches to characterizing text-statistical and relational database (Blair 1988, Croft & Parenty 1985, Lynch & Stonebraker 1988, McLeod & Crawford 1983). Considerable interest exists but there is also much dissension regarding the proper methods to use (DeFazio et al. 1995). If the information sources that eventually become available include significant amounts of classical database material, then the possibilities for leveraging some of the methods that have been developed for visualizing databases will become immediately applicable to the visualization of document information.

The method called latent semantic indexing or LSI (Deerwester et al. 1990) seeks to leverage the correlations among terms in documents to yield superior indexing parameters. The method reduces the dimensionality required to render a document space. LSI uses a singular-value decomposition (SVD) method. A term-document association matrix is constructed using at least 100 terms. Transformation using SVD produces a series of matrices that have reduced dimensionality. In fact, this method generates orthogonal variables, which as mentioned earlier are a requirement for implementation of formally correct vector models (Raghavan and Wong 1986). Deerwester et al. (1990) showed that LSI was superior to several other methods with respect to both precision and recall. This method has been incorporated into other IR systems; e.g., Schütze et al. (1995) have found that LSI provides superior pre-processing for neural network inputs.

Kohnonen maps have also been used to characterize information spaces (Lin 1991, 1992, 1997). Lin has developed displays that can show both content and structure of a document space. He provides as inputs to his algorithm N-dimensional vectors. Through a series of iterations of weight adjustments, the system converges. Sample experiments are described in which input vectors consist of a hundred to more than a thousand elements. The outputs were mapped to grids that were either 10 by 14 or 14 by 14. The mapping that is produced has large areas for concepts that are focal in the collection and smaller areas for less well-mentioned topics. In the examples shown (Lin 1997) the reduction in dimensionality was in the order of 10:1 or greater.

    1. Temporal Text
    2. The other forms of documents that have been considered thus far in this paper are alternative representations that can be generated with the materials at hand. Temporal data is both the same and different. Time, as a dimension, is the same as linear or low-dimensional data when one considers the content of a document, e.g., the timeline in a novel or news story. It is different when considered as metadata, e.g., creation date or date of last reading. Liddy (1995) has explored extracting temporal information from text in a system called CHESS. CHESS automatically creates a knowledge base which aggregates information about any named entity (people, places, events, organizations, companies or ideas) and organizes that knowledge into a timeline which covers the entire period of the knowledge base.

      Documents are created and edited in time. In paper form text is finalized and published. Although electronic text is said to be published, it is more difficult to say that it is actually finalized. There is no guarantee that the content might not be changed, more words added, sections removed, the whole reorganized or it might even disappear. Most documents do not have a version history. Computer programs have such records if they are maintained in a version control system. Similarly, legal documents and many electronically managed documents are tracked temporally. Some of the proposals for metadata standards include expanded temporal data fields (Ng et al. 1997).

      Docubase management is an issue that is becoming widely discussed. If changes to documents are to be recorded, what granularity of change should be used? Prep Editor (Neuwirth et al., 1990, 1992, 1994) is a system that has implemented a variable diff-ing in order to present users with various levels of changes of text over time. Each view of the document can be filtered to show the desired amount of detail in the editing process. Temporal issues are very important in Groupware settings. When documents are created and/or modified by more than one person, it is important that each participant know who made a change and when it was made.

    3. Trees
    4. Many types of data lend themselves to representations as trees, including structured documents, directories, and some kinds of hypertext (those that have no cyclic links). Many approaches have been developed to render these spaces. Conventional methods merely draw a tree as large as it needs to be and then render an image that is controlled with scroll bars. This process has the problem that the user is prevented from seeing the overall structure and must keep most of a large space in memory rather than in view. Although by their very nature, trees can be rendered in a plane, there is no satisfactory 2-D layout of a large tree (Lamping et al. 1995) . In order to make room for the leaf nodes, the nodes near the root must be placed far apart.

      Clearly trees are useful for representing large collections of documents, but single documents are also amenable to tree representations if the underlying structure of the document is hierarchical. There is a movement toward representing text structurally. SGML is a prime example of an effort to systematize document structure. Editors that are used to create SGML-compliant text maintain document structure as trees. In SGML trees, the content of a document resides in the leaf nodes of the tree.

    5. Networks
    6. Many views of documents can be thought of as networks. Queries, semantic networks, associative thesaurus and hypertexts can all be represented as networks. Multidimensional data, discussed above, differ qualitatively from network data in that the latter have dependencies among the parts. Multidimensional scaling methods tend to drive concepts apart, i.e., to find orthogonal dimensions, while networks assume dependencies among the concepts being manipulated.

      Although paper hypertexts exist (Ted Nelson's Literary Machines is probably the most famous), the importance of document networks rests on the fact that the Internet is based on hypertext. Documents are connected to other documents through links and nodes. Attempts to bring order to the potential chaos of hyperlinks run amok have come in the form of several proposed standards. The Dexter (Halasz & Schwartz 1994) and Amsterdam Models (Hardman et al. 1994) are the primary examples.

      Network displays can represent more general and more complicated structures than hierarchical displays. The complexity of the information spaces when expressed as networks can be difficult for users to comprehend. A major issue then is how to simplify such displays without losing critical information. One method for reducing complexity is to reduce the dimensionality of the space. Latent semantic indexing (LSI) is a method can be applied to reducing dimensionality. Furnas et al. (1994), however, suggest that too much information would be lost if a high-dimensional space were to be reduced to a small number that could be rendered in two dimensions.

    7. Distributed Documents/Workspaces

Working groups need various types of support but within the context of this paper the only type of information that is pertinent is the documents that these groups create or manage. Increasingly groups share documents and the management of these texts is handled by Groupware systems. All of the above views of documents are relevant in the context of group work. Each of the views can be exploited to support richer environments for groups of authors. As noted earlier, the temporal dimension is particularly important in distributed situations. Making one person aware of what changes have been made is qualitatively different from reminding a sole author of what he had done. The work of Neuwirth et al. (1990, 1992, 1994) and Greenberg et al. (1994, 1996, 1997) with PrepEdit and GroupKit, respectively, are notable in this context. The results of studies performed on these systems show that designers need to provide different support for workers depending on whether or not they are co-located and whether they work synchronously or asynchronously or mixtures of these conditions.

The types of data that need to be kept are related to the kinds that are kept in a document versioning system or a database. GroupKit (Greenberg et al. 1994) discusses concurrency control issues and their effect on the groupware user interface. These investigators examine common strategies such as serialization and locking. Nichols et al. (1995) discuss many of the same issues in the context of the Jupiter project. Jupiter is a multi-user, multimedia virtual world which supports shared documents, shared tools, and, optionally, live audio/video communication. The success of this project was in part determined by the centralized architecture and optimistic concurrency control algorithm used to maintain common values for all instances of shared widgets. These investigations make clear that one of the greatest obstacles to distributed document sharing is a determination of the appropriate granularity for subdividing documents. Overly large fragments prevent smooth interaction; fragments that are too small can congest systems.

  1. Interface Issues
  2. "The purpose of computing is insight, not numbers." - Hamming (1962)

    1. Overview
    2. For a visualization to be effective, it must provide the user with a sense of the overall composition and layout of the space. For complicated displays such as those that attempt to render large hierarchies using trees or any representation of a large document collection, this task is not as straightforward or obvious as it sounds.

      Several issues arise when a data set is to be mapped to an interface, such as how to make the best mapping of the attributes of the data to attributes of objects in the interface. Spring & Jennings (1993) have provided a comprehensive account of the dimensions that might be mapped in an interface. They categorize each of the stimuli as to its suitability to map to data depending on whether the data is nominal, ordinal, interval or ratio. Bartram (1997) has recently raised the issue of incorporating motion as a key feature of complex display due to its easy perceptibility. Other constraints apply when deciding how to map data since different stimuli convey different degrees of salience to users. For instance, it has been shown that tilted lines are more readily apparent than vertical lines. Similar observations were made with respect to curvature, color, line ends, movement, closure, contrast, and brightness (Treisman 1986). In addition, the relative order of noticeability of some stimuli has been determined, e.g., color > line > tilt > angle (Cleveland 1985).

      Another issue that arises when addressing the overview of visualizations is how to fit large spaces on the screen and still allow some appreciation of the detail that resides there. Toward this end, the fish-eye view has been developed (Sarkar et al. 1994). The space with a fish-eye lens on it is distorted so that the view is expanded under the lens. Problems can occur if large areas of the screen are distorted; many types of tasks cannot be performed under these conditions, such as comparing two points that are of different magnifications.

      Projection onto a hyperbolic surface has also been used to fit large data sources onto a single screen (Lamping et al. 1995). This method is suitable for some types of data such as hierarchies that can viewed as trees and some networks but is not a general solution. Munzer has extended the work on hyperbolic trees to a virtual 3-D rendering (1997). This method of presentation lessens the perceived distortion and enhances the user's interaction via direct manipulation.

      Work on 3-dimensional and virtual reality displays is highly evident. Examples of such systems that are used in visualizing documents include VR-VIBE (Benford et al. 1995), Lyberworld, (Hemmje et al. 1994) ,SPIRE (Wise et al. 1995), and Bead (Chalmers 1993, 1996). Whether users perform better in 3-D environments that 2-D has not yet been tested.

    3. Zoom
    4. Zooming is the technique for allowing a user to select a smaller region of the screen for display. Scrolling is an alternative to zooming but suffers greatly by comparison. Since only a portion of the display can ever be visible at one time, pieces of information that are at opposite ends of the display will never be subjected to some types of evaluations. Zooming includes any change in view from a larger portion to a smaller portion of a field or vice versa. As such, it is possible to implement zooming as a discrete number of intermediate views. Usually such views are available simultaneously to help the user to preserve his sense of place. However, smooth zooming is increasingly available.

      Smooth zooming has been incorporated into many of the currently available visualizations including PAD++ (Bederson & Hollan 1994) and the Document Lens (Robertson & Mackinlay 1993). The availability of fast algorithms and state-of-the-art hardware that incorporates graphical routines has made rapid screen update rates possible. Smooth zooming helps users maintain their sense of position of context (Schaffer et al., 1996). Variations in zoom techniques include the capability to move in more that one plane. In PAD++ the user needs only to hold down the mouse button on a location and the view will be transformed to move that region to the center of focus. To the user it appears that she has walked a straight line toward the region (Bederson & Hollan, 1993).

      Zooming is a method that has been widely used in virtual worlds. In fact, it is difficult to imagine a VR system that would not provide smooth movement of the virtual body through space. Although the issue of mapping using natural objects and landscapes might have fit more appropriately under the overview section, it seems mandatory to talk about it when movement is being addressed. George Robertson in a presentation at CMU (February 1997) made the observation that system designers who do not use a real-world metaphor in their interfaces are ignoring the fact that users live in a real world and know how to move there very well. He said "It is at their peril that designers will use any interface metaphor that doesn't incorporate what the user knows about moving in the real world." He included not only natural environments but virtual worlds in which the objects might be real-world correlates or abstract concepts. This point of view seems overly rigorous, but as a maxim for designers it should give an appropriate warning.

      Several projects have been developing systems based on metaphors that are based on the primary notion of using concrete objects and settings. In the document sphere, VR-VIBE (Benford et al. 1995) and Bead (Chalmers 1993, 1996) use a spatial metaphor that creates landscapes that encourage exploration. The Natural Scenes Paradigm project of P.K. Robertson (1991, 1994) gives an approach based on 1) using clearly and easily understood models such as 3-D structures or scenes, 2) representing data variables by the recognizable properties of the objects or scenes, and 3) inducing mental models in the observer's mind by using graphics scene simulation techniques.

    5. Filtering
    6. Filtering is the activity of weeding out uninteresting elements in a collection. With databases this is accomplished quite easily. Ahlberg has developed an Alphaslider (Ahlberg & Shneiderman 1994a) which maps an alphabetically sorted list to a slider, such that repositioning the thumb causes the list to be traversed in the expected order. The Alphaslider can be found in a variety of projects including FilmFinder (Ahlberg & Shneiderman 1994b), Spotfire (Ahlberg 1996) and HomeFinder (Williamson & Shneiderman 1992). A common term for this type of filtering is dynamic query (Ioannidis, 1996, Fishkin & Stone 1995).

      All of these projects are based on information that is stored in databases. The indexing performed on documents, especially the multidimensional vector type, produces vectors that contain hundreds or even thousands of elements. It is difficult to imagine incorporating an equal number of sliders to control whether or not a factor is to be considered in a display. Since as Olsen et al. (1997) have noted, querying a databases generates an answer that is 100% accurate; there is no concept of recall and precision in databases. There is full recall and total precision. In the instance of docubases that are characterized by large vectors, the issue of similarity arises and how to address this in interactive situations is a question that is currently under investigation. The TREC evaluation series has recently initiated an interactive track that is attempting to answer this question (Over 1996). Using visualizations to represent document sets raises many of the same issues with respect to evaluation that text-based interactive systems do. Perhaps criteria that emerge for assessing the efficacy of text-based systems for interacting with document collections will provide information that will support evaluation of visual presentations of the same material (Newby 1996).

      Another part of the problem is the special demands required to merge the data that is stored about documents. Some of the data is essentially metadata and this part is amenable to database treatment. Items such as 'author', 'date of publication', and 'publisher' are easily stored in relational form. The content of a document, the inverted file associated with it, the document vectors, and the other forms such as timeline, topic segmentation, and noun extracts are not so easily stored. The usual method is to perform an SQL search on the part of a query that is suitable (usually the metadata) and then to subject the resulting set to secondary methods, but results of this approach have not been entirely successful (DeFazio et al. 1995).

    7. Details-on-Demand
    8. At some point in interacting with a visualization system, the user may decide to take a closer look at one or more objects in the field of view. When the requested view provides the content of the object, 'detail-on-demand' has been provided. Most systems support this function and it is usually invoked by clicking on an item or group of items or by allowing the sprite (cursor) to dwell on an object. In the former case, a dialog pops up that contains detailed information. In the latter case, a lens might be provided. Lenses have a variety of appearances but as a group they provide what are commonly referred to as 'see-through tools' (Bier et al. 1994). Zooming can also provide details. When zooming magnifies a piece of a display, the view can show different information at the more detailed level.

      Problems can arise in several situations, including 1) when the information that pops up occludes the original view, 2) when the smoothness of the movement from one view to another disorients the users, 3) when the information that pops up is not what the user expects. This last issue brings up the questions of what actually constitutes 'detail.' In the case of documents, details can include the full-text that stands behind any representation. However, it may also be the case that details are to be found in another view of the same document. For instance, zooming to a highly clustered region of a visualization and clicking on an icon could indicate a need to see the text of the related document or it could be a request for metadata only.

    9. Relate
    10. The relate function seeks to make explicit the relationships between objects in a display. It can also refer to representing relationships between data in multiple associated windows. This function is implemented in a variety of ways. The idea of linking graphical representations is not new. Simple linking can be found in a wide variety of programs, e.g., BEAD (Chalmers, 1993), SeeSoft (Eick, 1992, Antis et al. 1996, Baker & Eick 1995), AutoVisual (Feiner & Beshers, 1990), VisDB (Keim & Kriegal, 1994), Nested Histograms (Mihalisin & Gawlinski, 1990), The Table Lens (Rao & Card, 1994), and The Dynamic HouseFinder (Williamson & Shneiderman, 1992). The Alphasliders found in FilmFinder and Spotfire (Ahlberg 1996) are updated to current values each time an object is selected. When performing Dynamic Queries (Ahlberg & Shneiderman 1994), users are shown consistent views onto the data in a similar way. Chuah et al. (1995) have used a similar technique to integrate multiple views that are a mix of tables and visualizations. The users can easily make connections between the pertinent relationships.

    11. History
    12. Maintaining histories is important for several reasons including placekeeping and supporting the ability to undo actions. Exploration in visualizations is a creative process and involves many sequential user actions to arrive at a satisfactory solution. The ability to retrace steps on a particular path is important. Shneiderman (1998) suggests that "most prototypes fail to deal with this requirement" and attributes this fact to the novelty of such interfaces. Borrowing from classical information retrieval systems would allow users to recover and refine intermediate searches.

      Animation of steps might be a useful mechanism for providing path retracing (Gonzalez 1996). A problem with maintaining an adequate history is the considerable resources that are required to maintain the various kinds of information that might be considered salient by the user. In navigating visualization spaces, it might be important to keep track of the landmarks that are visible, the granularity of the zoomable display, the state of user-selectable options. Step-by-step unwinding is a storage-intensive endeavor.

    13. External Memory/Extracts

    Once users have found regions or elements of interest in a visual display, they should be able to save the subsets. Not only might the user desire to save this collection as a new starting point for further study, but she might also like to print or mail it. An alternative to exporting a whole data set could be to save interface settings. This is the approach taken by VIBE (Olsen et al. 1992, 1993). The Visage project supports a drag-and-drop feature that allows inter-application exchange of data (Roth et al. 1997).

  3. Task Models
  4. Several frameworks for information visualization have been proposed (Kennedy et al. 1996, Rogowitz & Treinish 1993, Wehrend 1990). Some of these structures include modeling of the user. Increasingly, user-centered design is being adopted. In this paradigm, explicit representation of the user is important. The user can be modeled in the system by assessing the user's goals and/or defining the tasks the user needs to perform.

    This section will present several task models. Shneiderman labels the preceding discussion on interface functionalities as a task level model (see Appendix B for another Shneiderman view). Evaluation of visual interfaces, however, needs to be more grounded in task models from the viewpoint of the user than from the interface side. Some of the models presented here are domain-dependent and others are independent of domain. The granularity of analysis runs the gamut from very fine-grained to very high level.

    A classification scheme supports the development of task sets for system evaluation and lays the groundwork for the development of automatic visualization systems. By knowing the data that exists, the requirements of the interface and the goals of the user, it becomes possible to ask how one might build visualizations automatically. The purpose of this paper is to discuss the issues that contribute to understanding how best to approach the evaluation of document visualization systems.

    1. Wehrend -- a task level user model

The task classification of Wehrend & Lewis (1990) is a low-level, domain-independent taxonomy of tasks that users might perform in a visual environment. Domain-independence allows generalizability. Wehrend & Lewis' classification consists of the following set of user actions.

  1. Locate: This action can be applied to dependent as well as to independent variables. It covers interaction techniques that allow the user to find special data entries. Annotation techniques are covered by this action, for example an arrow marking the most interesting point of the display. Locate can also work like a filter, e.g., by highlighting data items that lie in a special range. Locate includes search for an object that the user already knows about.
  2. Identify: Identify is similar to Locate, but in this case the user is being asked to describe an object that was not necessarily known previously.
  3. Distinguish: This action allows distinguishing between different values of the same variable, e.g., for a user to know which objects have already been identified or interacted with. The interface might show different iconic representations for each object type.
  4. Categorize: Categorizing means to define divisions that displayed objects can be sorted by. Examples of VIBE (Olsen 1992, 1993) tasks that are categorizations are 1) To define all the regions of a 3-POI (point of interest) Boolean display, 2) Draw boundaries in a vector VIBE 3-POI display for each of the possible Boolean combination of terms.
  5. Cluster: The cluster task covers techniques that allow us to determine whether data entries are clustered or not. The ambiguity introduced by flattening the hyperdimensional spaces into two dimensions would be probed by this activity. It includes finding gaps in the display field (cluster of nothing).
  6. Distribution: The distribution action is closely related to cluster in much the same way that locate and identify are related. To distribute, the user needs to describe the overall pattern while cluster merely asks that the set be detected.
  7. Rank: Ranking is only possible for scalar and ordinal data. Users could be asked to indicate the best and worst cases in a display. Since nominal data cannot be ranked, it is important that displays of nominal data be designed so that the user is discouraged from trying to perform such actions.
  8. Compare within entities: This action describes tasks in which a user is called upon to decide something based on the attributes of similar objects.
  9. Compare between relations: When different entities are used as the basis of comparison, the 'compare between relations' operator is used. For instance, if a set of objects has been marked as seen and the remainder of the set is unseen, then the user might compare and contrast attributes of the sets.
  10. Associate: The associate action calls upon the user to form relationships between objects in a display.
  11. Correlate: If objects in a display have multiple attributes, then it should be possible to discern which other objects share attributes. For instance, in a scatterplot in which the marks have shape and color as well as their x and y position, the objects should be groupable by any of the attributes.

Some of these tasks are similar to those enumerated by Roth & Mattis (1990) as shown in the following table.

Table 2: Comparison of Tasks

Wehrend & Lewis (1990)

Roth & Mattis (1990)

Identify

Lookup value

Distribute

Distribute

Compare within

Compare within

Compare between

Pairwise or n-wise comparison

Rank

Index a structure by an element

Correlate

Correlate

 

    1. Task Models from Library Environments
    2. Modeling users in information retrieval situations has a long history in library science. Systems have changed from having only titles and minimal other metadata to having abstracts to the present situation in which most texts are available as full-texts. Systems have increased in capacity to accommodate the requirements of full-text storage and systems have taken advantage of increased computing power to perform searches. Where once an intermediary worked with a user to formulate and query which would be submitted in essentially batch mode, current systems are used by the end user and searches are interactive. Only recently have visualizations been developed that might help satisfy some of the user's information needs. The models developed by library scientists have changed to accommodate evolving resources.

      The following task models developed for use in library environments were chosen to show how varied the approaches are and to describe some models that might actually have some utility in evaluating visual interfaces. Reviews of the historical evolution of information retrieval can be found in Spink (1997) and Bates (1989).

      1. Marchionini

The breakdown of the information-seeking provided by Marchionini (1992) describes a network of tasks that are performed in various, user-defined orders until the information-seeking problem is solved. Marchionini states clearly that there are two basic forms of information needs - fact knowledge and browsing. The subtasks that he provides are not different for the two types of needs and are:

  1. Define the problem - this is the required first step in the process. As problem-solving proceeds, the problem will undergo a series of revisions.
  2. Select the source - a user must choose an entry point for a search, e.g., the Web via a particular engine, a library catalog, or an online periodical server.
  3. Articulate the problem - by articulation he means to form a query that can be processed by the system.
  4. Examine the results - the user must review the items returned in response to a query in order to determine whether useful information was retrieved.
  5. Extract information - when interesting and/or useful items are found, the user must able to acquire a physical or electronic copy of the material.

This particular task list is relevant to interface design but provides little guidance on what subtasks might be. It is also highly grounded in the traditional information retrieval paradigm in that it relies on query formation and an iterative performance of steps to arrive at a satisfactory solution.

      1. Bates

Bates (1989) describes a 'berrypicking' model of information retrieval, which she contrasts with the classical method. Her description of browsing in a world of text seems to offer similarities to visual representations. She presents a list of 6 tasks which are:

  1. Footnote chasing - a 'backward chaining' method, which enables a user to find material which preceded it in publication time.
  2. Citation searching - a 'forward chaining' method which allow users to find other papers that cite the same reference material.
  3. Journal run - once a user finds a 'good' journal, he will scan entire issues and even volumes. Precision with the core journals in a field is very high with this method.
  4. Area scanning - in real libraries books having the same topic codes are stored in close proximity. Users frequently start with a single reference and expand their search by literally browsing the stacks.
  5. Subject search in bibliographies and abstracting and indexing service - strategy based on the commonly available indexes.
  6. Author searching - self-explanatory.
      1. Belkin

Belkin et al. (1995) propose that information seeking can be defined with respect to four dimensions as shown in the following table.

Table 3: Information Seeking Dimensions (Belkin et al. 1995)

Searching as a method of interaction refers to trying to find some known item, while scanning refers to trying to find something interesting. The goal of interaction might be to learn something about an item or it might be to select the item. When looking for items, the user might specify what should be looked for or he might find it by recognizing it. The distinction between information and meta-information is the same distinction that has been made in this paper.

Belkin (Belkin et al. 1995) notes that there are 16 possible information-seeking strategies if each of the components is viewed as a Boolean value. For instance, traditional information retrieval might be characterized as Selecting + Specification + Meta-information + any method of interaction and information visualization would be described as Learning + Recognition + Information + any method of interaction but frequently Scanning.

    1. VIRI Research Group Tasks

The VIRI (Visual Information Retrieval Interface) research group developed a set of tasks that we term 'tool-enabled' tasks. The idea behind the name is that visualizations provide ways of doing things that might not be possible or that might be much more difficult using less visual means. The list is as follows:

    1. Summary of Task Models

Each of the task models presented above is incomplete and each is inadequate for supporting the development of an evaluation plan for assessing the usability of information visualization interfaces.

  1. Problems with the Shneiderman approach are that most of the things that require cognitive engagement of the user fall under the 'relate' category, thereby not providing much of a set of operations at all.
  2. The list of Wehrend is a bottom-up approach; it might fall short if the real goals of users are very complex. The advantage is that the set of operations is domain-independent.
  3. The problem with studies done with library patrons is that the tasks that they seek to accomplish are perhaps learned behaviors due to their prior knowledge of how libraries work -- they tend to ask questions that they know can be answered. Visualizations might support a different way of asking questions and getting answers.
  4. The task list developed by the VIRI Research group is not yet structured.

In all of these task sets, it is not clear what fraction of an information browser's tasks is covered by the list.

  1. Examples of Visualization Systems
  2. This section will present samples of visualizations that have been developed to handle data of the various types discussed above. The systems that are presented have been chosen either because they are unique to the category, because they are prototypical, because they are famous, or because they are of personal interest. In each section, it should be noted that the dimensionality of the display is not necessarily mapped in a one-to-one fashion with the text dimension. Obviously multidimensional data would be impossible to render at all if this were the case. However, linear and low dimensional data is mapped variously to one, two or three dimensions. Table 2 contains a more complete listing of visualization systems that have applicability to documents.

    Table 4: List of Visualization Systems for Documents

    Data Type

    System

    Reference

    Linear

    TileBars

    Hearst 1995

    2-dimensional

    Information Mural

    Pad++

    Perspective Wall

    Document Lens

    Jerding & Stasko 1995

    Bederson & Hollan 1994

    Mackinlay et al. 1991

    Robertson and Mackinlay 1993

    3-dimensional

    WebBook

    Card et al. 1996

    Multidimensional

    Bead

    LyberWorld

    Themescape / SPIRE

    VIBE

    VR-VIBE

    Chalmers 1993, 1996

    Hemmje et al. 1994

    Wise et al. 1995

    Olsen et al. 1993

    Benford et al. 1995

    Temporal

    GroupKit

    SeeSoft

    LifeLines

    EditWear/ReadWear

    Greenberg & Roseman

    Eick et al. 1992

    Plaisant et al. 1996

    Hill & Hollan 1992

    Hierarchical

    Cone/Cam-Trees

    Hyperbolic Trees

    3-D Hyperbolic Trees

    TreeMaps

    Elastic Windows

    Robertson et al. 1991

    Lamping et al. 1995

    Munzer 1997

    Johnson & Shneiderman 1991

    Kandogan & Shneiderman 1997

    Network

    Butterfly Citation Browser

    Influence Explorer

    Multi-Trees

    Navigational View Builder

    SemNet

    Mackinlay et al. 1995

    Tweedie et al. 1996

    Furnas & Zacks 1994

    Mukherjea & Foley 1995

    Lin 1991, 1992

    Distributed

    CASCADE

    GroupKit

    Web Forager

    Spring et al. 1996

    Greenberg & Roseman

    Card et al. 1996

     

    1. Linear Text: TileBars
    2.  

      Figure 3: TileBars embedded in a Scatter/Gather interface.

      TileBars (Hearst 1995) is shown in Figure 3 as part of a Scatter/Gather (Pirolli et al. 1996, Hearst et al. 1995, Hearst & Pedersen 1996) interface. Each TileBar icon represents a single document and the length of the bar is proportional to the length of the document. Each grayscale block represents a segment of text as determined by the TextTiling method described previously (section 3.1 and Hearst & Plaunt 1993). Dark blocks connote segments with a high occurrence of a term or combination of terms and lighter blocks stand for pieces of text with relatively less of the topic. This particular query was composed of three sets of terms so the display for each document contains one row of blocks for each term set. This display makes some information easy to gather, e.g., relative size of documents, co-occurrence of term sets in a document, absence of a particular concept from a document.

    3. Two-dimensional Text: Pad++
    4. In Pad++ (Perlin & Fox 1993), a document can be visible at any scale or at more than one scale simultaneously. The Pad project explores techniques by which spatial scaling can be integrated into applications. Techniques such as placing microscopic text in place of a footnote marker, applications such as a calendar which reveals finer hierarchical structure as the user approaches, editors for hierarchically structured text, and a multi-scale painting program using wavelets are described in the text.

      Figure 4: PAD++ rendering of a hypertext

      The view of PAD++ shown in Figure 4 is of a hypertext. Each node shows a text segment at a different level of detail. The developers of this system (Bederson & Hollan 1994) contend that zooming, the primary mechanism of interacting with PAD++ provides superior way-finding in many environments, including hypertext (Páez et al. 1996). This study showed that subjects not only answered more questions correctly and in less time, but there was also greater subjective satisfaction.

    5. Three-dimensional Text: WebBook
    6. Figure 5: WebBook close-up showing a book being riffled through

      Card et al. (1996) developed the interface shown in Figure 5 in response to the observation that users have difficulty finding pages, get lost, have difficulty relocating pages, and have problems organizing material they manage to find on the Internet. Although the book metaphor has been used often (e.g., Yankelovich et al. 1985, Remde et al. 1987), this particular implementation is quite compelling. The WebBook is not just a static interface object but provides a variety of interactions that are typical of the way people use books in the real world. Users can riffle through pages (the view shown in the figure), can rip pages from a book, and can tack a page to a desk or wall in the 3-D room in which the book is located. Even the use of bookmarks is more like the real world being flat objects that are inserted between pages and that hang out the end when the book is not open to the page where the bookmark was inserted.

      The WebBook represents a three-dimensional view of a virtual world that allows using documents at many levels. When a user scrolls in 'fontsize' mode, the display becomes a Document Lens (Robertson & Mackinlay 1993) which has many similarities to the PAD++ interface shown in the previous section. The WebBook (Card et al. 1996), Document Lens (Robertson & Mackinlay 1993), Perspective Wall (Mackinlay et al. 1991) and WebForager (Card et al.1996) are a few of the elements of the larger Xerox (Xsoft) project that are collectively called Information Visualizer (Card et al. 1991, Rao et al. 1995).

    7. Multidimensional Text: SPIRE
    8. Figure 6: SPIRE Themescape showing topic distribution in a large document space

      The number of potential choices for this section was larger than for any of the others. Multidimensional is undoubtedly the most elusive and highly sought after of the types of visualizations. This is primarily due to efforts by many segments of the information community, including but not limited to, database visualizers, geographical information systems and information retrievalists. Even by concentrating on information retrieval visualization, it was difficult to decide from the many systems such as BEAD (Chalmers 1993, 1996), VIBE (Olsen 1992, 1993), VR-VIBE (Benford et 1995), InfoCrystal (Spoerri 1993), and Lyberworld (Hemmje et al. 1994). SPIRE (Figure 6) is a project that was developed at Batelle National Labs and has gone into production by InXight. The goal of information retrieval projects is usually to find text that the user expects to read. SPIRE (Spatial Paradigm for Information Retrieval and Exploration) stands in contrast to this view. Wise et al. (1995) state that:

      "True text visualization that would overcome these time and attentional constraints must represent textual content and meaning to the analyst without them having to read it in the manner that text normally requires. These visualizations would instead result from a content abstraction and spatialization of the original text document that transforms it into a new visual representation that communicates by image instead of prose."

      The figure shows a Themescape to which a large document collection has been mapped. The layout of the concept space has been accomplished with a variety of clustering and dimensionality reducing algorithms (York & Bohn 1995). The elevation of a feature in the map indicates theme strength. Time sequence animations allow a high level appraisal of the overall change in topics over a period of time.

    9. Temporal Text: SeeSoft

Figure 7: SeeSoft showing overview of a software project

Figure 7 shows the status of a large software project as filtered through SeeSoft (Eick 1992). The blocks in the figure represent program code modules. Color is being used to profile 'hot spots' in the code. The information in this rendering is actually a frequency count-the more often a line of code is called, the closer the color is to the red end of the color spectrum shown in the legend to the left side of the image. Other modes of this tool can show the time elapsed since a module was last modified.

Other visualizations depict time in a variety of ways. SAGE (Roth et al. 1994, Chuah et al. 1995) has been used to code timelines of information extracted from historical documents. EditWear/ReadWear (Hill & Hollan 1992) creates a bar adjacent to the normal scroll bar when a document is being viewed. In the bar a two-dimensional plot of user activity can be plotted showing the length of time that the document has been either read or edited. ReadWear allows the computer to collect information about the time the user has spent working with various information objects at multiple levels of resolution, such as, time spent writing the different chapters of a book, time spent editing assorted lines of code in a program, or time spent reading interesting net news messages. This use is very similar to some of the functionality of SeeSoft. Therefore, it is clear that temporal covers a lot of territory. Time may be viewed as:

Temporal data is frequently overlaid with other types of information. One might think of temporal information as merely another dimension in a multidimensional space, but to do this is to risk losing the importance that people tie to this important kind of data.

    1. Tree Text: Hyperbolic Tree
    2. The hyperbolic tree (Lamping et al. 1995) shown in Figure 8 is representative of the types of renderings that are possible of hierarchical data. Central objects are larger than more peripheral ones. Interaction with the display is accomplished with a click and drag which cause an apparent rotation of the surface pulling peripheral parts into more central focus.

      Other implementations of the hyperbolic tree include some 3-dimensional ones (Munzer 1997). Part of the appeal of these displays is the compelling direct-manipulation that is built-in; the user may drag the surface around using the mouse. Cone Trees (Robertson et al. 1991) project a tree into a semi-transparent 3-dimensional display. Trees of modest size can be rendered using this approach but very large trees are still difficult to manage. TreeMaps (Johnson & Shneiderman 1991) are another way to maximize the use of screen territory. They are a space-filling technique that uses alternating directions, icon size and texture to render large numbers of objects in a hierarchy. These displays require a considerable learning time and the hierarchy is often lost in the remapping.

    3. Networks: Navigational View Builder
    4. Figure 9: Navigational View Builder showing relationships among a set of documents

      Figure 9 shows Navigational View Builder (Mukherjea & Foley 1995, Mukherjea et al. 1995), a tool for designing overview diagrams of hypermedia systems. The rendering done by the tool are the result of a series of operations, including binding (mapping data attributes to visual display attributes), clustering (coalescing nearby objects into a single icon), filtering based on content, links, and structure, and hierarchization (reduce dimensionality by viewing 3-D trees instead of graphs). The authors admit that the problem is difficult and that their solution can be markedly improved. The three main outstanding issues that they identify are: 1) The system has not been subjected to usability testing, 2) The algorithms that they use have not proven to be scalable, and 3) The metadata that is currently available is too limited to provide interesting views to be built and the content is not captured in the data that they do collect.

      There are many other projects whose aim is to automate the production of network displays, including SCALIR (Rose & Belew 1991), gIBIS (Conklin & Begeman 1989) and PFNETS (Fowler et al. 1991). SemNet (Fairchild et al. 1988) is a three-dimensional version of a network display in which the nodes and links are color-coded. Several heuristics are used in SemNet to derive semi-optimal placement of nodes so that intersecting links are minimized and conceptual clustering is enhanced. Another advantage of 3-D displays of networks or trees, mentioned both by Fairchild et al. (1988) and by Munzer (1997), is the ease with which users can interactively work with the visualization to change the point of view.

       

    5. Distributed Documents/Workspaces: CASCADE

Figure 10: CASCADE display demonstrating landmark feature including color-coded links, Mural and TileBar

Workspaces, whether they are personal or shared and, if shared, whether the work is synchronous or asynchronous, are complicated environments. CASCADE is a research testbed for investigating computer-supported co-operative work (Spring et al. 1996). Figure 10 shows a display that has a document in the main portion of the screen. For the purposes of this paper, the most salient features to note are the Mural, TileBar, and intradocument colored icons. The colored blocks in the document show the locations of interactive comments made with the CASCADE editor. These icons serve as landmarks that can provide detail-on-demand. The mural is modeled along the lines of a project by Jerding & Stasko (1995) and shows the location of all the comment landmarks in the entire document. This is an important navigational mechanism in the interface. The TileBar is an adaptation of the work of Hearst (1995).

There are many other Groupware projects and each of them has some interesting ways of notifying group member about other peoples' whereabouts and activities. The right mix of tools is yet unknown and how many of the right ones will be discovered in single-user systems is also not known. Whether the needs of groups require the development of more, better, or smarter visualizations is an area of considerable interest.

  1. Research Opportunities

The components of information visualization systems discussed in the state-of-the-art provide a springboard for a multifaceted research plan.

Broadly, there are three potential foci for research as described in this paper:

Although there are open issues in each of these areas, the following topics are of particular interest:

    1. developing a task typology to aid in evaluation of interfaces that support document retrieval
    2. using the task taxonomy to devise an evaluation method that can enable cross-system comparisons
    3. developing a task taxonomy based on Wehrend but specialized to better address the information retrieval and browsing domain.

With respect to tasks, it should be noted that while defining what people need from document interfaces is difficult, the availability of a data type breakdown of documents allows simplification since there are a finite number of questions that can be asked given a data description. If users are interrogated in field studies about their 'information needs', they provide answers that cover the gamut from 'I need to know X' where X is a simple fact such as a person, place or object attribute to 'I need to know how the world works.' When approached from the data type (a definitely bottom-up method), it is easy to see that if only temporal metadata is available, then questions regarding anything but time are useless. Collecting the tasks that can reasonably be generated from a particular data view can serve as a basis for evaluation of an interface that presents this view to the user. The greater goal is to collect a large set of tasks that can be used to test integrated interfaces that attempt to render documents from multiple data perspectives. It is likely that the set of tasks will grow linearly as a function of data type thus forming a coherent set of potential evaluation points.

 

  1. Summary

Information visualization is in its infancy. The human, computer, and interface components of such systems are only partially understood. This review attempted to present the current state of knowledge about

The use of the Shneiderman framework to delineate document types proved to have good points and bad points. In its original conception, the framework was intended to dissect the data types of objects in general. In changing the objects to actual instances, i.e., documents, it became less clear whether the fit of framework and object was valid. Indeed, the framework served its purpose -- "to sort out the prototypes [that currently exist] and guide researchers to new opportunities" (Shneiderman 1996). Novel ways of viewing documents were found and interesting questions were generated regarding the breakdown. In the absence of any other document framework for visualization, it appears that there was some utility in the choice of Shneiderman's.

On the negative side, it appears that the numbers of categories in the framework were more than could be accommodated by the underlying data. In essence, documents are content, structure and metadata. Generally speaking, hierarchical, network, 2-dimensional, and 3-dimensional, elements of the Shneiderman framework map to structural information in documents, while linear and multidimensional data are related to document content. Collaborative documents are complex types that are essentially recursive of all the basic types. The temporal dimension, in its intradocument sense, is part of the multidimensional data; in its other form, it is clearly part of the metadata. It seems that a content, structure, metadata framework could have been equally useful.

A third alternative might have been to use the breakdown discussed in the Scope and Definitions section - document components, documents, document sets, document collections, and document analytics. Whether any of these other frameworks is better for guiding development of document visualizations and their evaluation than the one employed in this paper cannot be answered at this time.

Appendix A: Metadata

 

Dublin Core

URC

Semantic Header

USMARC

IAFA templates

TEI header

INTRINSIC

           

Subject

+

+

+

+

+

 

Title

+

+

+

+

+

+

Author

+

+

+

+

+

+

Publisher

+

+

*

+

+

+

Publication Place

 

+

*

+

 

+

Other agent

+

 

+

+

 

+

Date

+

+

+

+

 

+

Object type

+

 

*

+

   

Form

+

+

 

+

   

Identifier (URN, ISBN...)

+

+

+

+

+

+

Relation

+

+

*

+

 

*

Source

+

   

+

+

+

Language

+

 

*

+

+

*

Coverage

+

 

*

+

   

Abstract

 

+

*

     

Version (edition)

 

+

*

+

+

+

Notes (annotation)

 

+

*

   

*

Signature

 

+

+

     

Classification

     

+

 

*

Classification (security level)

   

*

     

Keyword

   

+

 

+

*

EXTRINSIC

           

System requirement

   

*

 

+

 

Mode of Access

     

+

+

 

Availability

       

+

*

Cost

   

*

 

+

*

Control

   

*

 

+

 

Extent (size)

   

*

 

+

*

Encoding description

       

+

*

Revision description

   

*

 

+

*

+ Mandatory * Optional

Table reproduced from Ng et al. (1997).

 

Appendix B: List of Tasks from Shneiderman

The following list was obtained from the Olive website at the University of Maryland (http://www.otal.umd.edu/Olive/). It has been edited to include only document-related tasks.

1-D

2-D

3-D

Multi-D

Temporal

Tree

Network

Workspace