Design of 3-D Visualization of Search Results:
Evolution and Evaluation
John Cugini / cuz@nist.gov
Sharon Laskowski / laskowski@nist.gov
Information Technology Laboratory
National Institute of Standards and Technology
(NIST)
Gaithersburg, MD 20899
Marc Sebrechts / sebrechts@cua.edu
The Catholic University of America
Washington, DC 20064-0001
Contribution of the National Institute of Standards and Technology.
Not subject to copyright. Reference to specific commercial products
or brands is for information purposes only; no endorsement or
recommendation by the National Institute of Standards and Technology,
explicit or implicit, is intended.
Abstract
We discuss the evolution of the NIST Information Retrieval
Visualization Engine (NIRVE). This prototype employs modern
interactive visualization techniques to provide easier access to a set
of documents resulting from a query to a search engine. The
motivation and evaluation of several design features, such as keyword
to concept mapping, explicit clustering, the use of 3-D vs. 2-D, and the
relationship of visualization to logical structure are described. In
particular, the results of an extensive usability experiment show how
visualization may lead to either increased or decreased cognitive
load.
Keywords
Comparison of 3-D and 2-D;
Design of Visualization;
Evaluation of Visualization;
Information Visualization;
Usability Experiment
1. Background and Motivation
For the past four years, the Information Technology Laboratory of the
National Institute of Standards and Technology (NIST) has supported a
small project ([Cugi96], [Cugi97]) to explore the potential value of
visualization for information access. In particular, we were
interested in exploiting 3-D technology to help users understand and
manipulate search results, i.e. the set of documents returned by a
search engine in response to some query.
There have been some attempts to provide an overview of the design
space for information visualization (see [Chal95], [Card97], and
[Zhou98]). These offer a top-down framework within which particular
visualizations may be categorized. As opposed to a "unified field
theory" of information visualization design, this paper takes a
bottom-up approach: we present a case study of iterative design, from
which some familiar and some novel lessons emerged. We hope that this
detailed critique of various prototypes can serve as a guide to other
researchers who wish to do meaningful test and evaluation of new
approaches to document visualization.
2. Relationship to Previous Work
By now, the literature on visualization of textual databases is too
extensive to review in detail (see [Youn96], [Card96], and [Card99] for
survey articles). We will mention only those prototypes that aim to
visualize a result set, not an entire database. Although there are
obvious similarities, these two goals are quite distinct, as we shall
see. Even when we so restrict the field of interest, we find a wide
variety of approaches.
2.1 Details of Previous Prototypes
Allen et al [Alle93] developed a system of
hierarchical clustering, displayed as an interactive tree, with
logical zooming. There is always a selected document and a selected
subtree. The basic organizing principle of the tree is similarity to
the selected document. The 2-D visualization is contained in one of
four windows and serves as an overview. The other three windows have
textual details: they contain the current query, the subtree document
lists, and the text of the selected document.
Envision [Nowe96] is a flexible interface to a
digital library. Its
basic model is the scatterplot graph. It allows users to decide which
document attributes (e.g. relevance rank, score, author, date, index
terms) will be mapped to which visual attributes (e.g. location, size,
shape, color). It was decided to limit Envision to 2-D for the sake of
wider availability. Envision allows only one index term per document
as an attribute because of the "usability problems" that would arise
from multiple terms. The user evaluation consisted of comparing
subjects' performance to that of the interface designer, not to a
text-based equivalent. A satisfaction survey showed a positive
reaction to the interface.
Hearst and Pederson [Hear96] use a
scatter/gather algorithm to do
dynamic clustering and refinement of search results. The clustering
"consistently outperforms ranked titles" in retrieval precision. The
interface, however, is not a true visualization, but a GUI that
textually displays the clusters, labelled with their characteristic
terms. I.e. the emphasis is on the advantages of one
structure over another not on visualization per se.
In [Veer97], Veerasamy and Heikes
describe a simple 2-D grid system
with search keywords along the y-axis, and document identifiers along
the x-axis, using the rank order as returned by the search
engine. Each cell of the grid shows the frequency of the corresponding
keyword within that document. A careful study showed that the
addition of this visual interface to the usual text-based interface
allowed users to judge document relevance more quickly and accurately.
Visualization was particularly effective in the identification of
irrelevant documents as such.
More recently, Swan and Allan [Swan98] performed
a controlled study comparing 1) a text-based system
[ZPRISE], 2) a GUI-oriented system,
and 3) the latter enhanced with 3-D visualization of document
clusters. They wanted to improve so-called "aspect-oriented" IR, which
emphasizes finding some specified information, not documents per
se. In that context, using recall as a measure, there was a small
advantage for the 3-D system over text-based, and for the text-based
over the plain GUI. However, there was no evidence for the overall
effectiveness of the use of 3-D. In addition, the utility varied
depending on task and users. Experienced users preferred the
text-based system, while novices liked the GUI systems. Some users
thought the 3-D approach was "worthless", others thought it natural and
intuitive.
In another recent paper, Borner [Born00]
describes a system in which Latent Semantic Analysis (LSA) is used to
pre-compute inter-document similarity within a known collection. The
result set of a query on the collection is then clustered based on
this analysis. The clustering and other document details are
represented in a rich 3-D environment, using the CAVE interface. A
force-directed algorithm is used to lay out the clusters in 3-D space.
As of January 2000, no usability testing has been performed.
[Shne00] describes a tool
presenting search results in a grid-like system, in which one
of the axes represents a hierarchy. Early user tests have
yielded encouraging results.
[Mann99] describes an interesting project
which allows users to choose dynamically among several of these
visualizations.
2.2 Summary of Prototypes
In sum, only [Veer97] and [Swan98] have directly compared the
effectiveness of visualization against functionally equivalent
traditional interfaces. These experiments have demonstrated modest to
significant improvements. Furthermore, apart from [Swan98], the
experimental visualizations have been very conservative: trees or 2-D
grids. The following table sums up the five efforts, plus NIRVE on
the last line.
| System
| Structure
| Structure basis
| Visualization
| Evaluation Baseline
|
| [Alle93] | tree | term similarity | 2-D
| ranked list (informal)
|
| [Nowe96] | scatterplot | attributes | 2-D
| designer's performance
|
| [Hear96] | clusters | term similarity | GUI
| ranked list
|
| [Veer97] | grid | terms | 2-D
| text equivalent
|
| [Swan98] | clusters | term similarity | 3-D
| GUI and text equivalents
|
| [Born00] | clusters | LSA | 3-D
| None
|
| [Shne00] | grid | pre-categorized | 2-D
| None
|
| NIRVE | clusters | concept similarity | 3-D
| 2-D and text equivalents
|
3. Evolution of the NIRVE Prototype
There are a few design goals and constraints that have stayed constant
throughout the development of NIRVE. Since it was conceived as a
post-processor for NIST's PRISE [ZPRISE] search engine, NIRVE was
based on the information that PRISE accepted and returned. With a few
enhancements, PRISE basically accepts a set of terms, or keywords, as
a query; it does not take boolean combinations. It returns a set of
entries, one per document. Each entry contains:
- unique document identifier
- document title
- relevance score (indicating the search engine's estimate of the
"goodness" of the match between the document and the query),
- document rank (according to its score)
- document length
- the number of occurrences of each keyword.
The number of documents returned is controlled by the query.
Typically, we dealt with result sets of size 100-500. As a database,
we used all the news stories (about 90,000) issued by the
Associated Press for 1988, as made available through the Text
Retrieval Conference [TREC].
Early on, we developed a normalized keyword profile as the basic
metric for each document. Each component of this vector is calculated
as the square root of the number of occurrences of each keyword
divided by the document length, scaled to a maximum value of one for
each component.
We were inclined to experiment with highly metaphorical
visualizations, rather than something simple and schematic, such as a
grid. Our emphasis has always been on presenting the user with an
overview of the structure of the result set, rather than concentrating
on finding an individual document. Typically, visualization is more
helpful in such broad integrative tasks [Wick94] than in narrow
searching. In all the models, the user can move the display around in
3-D (rotate and shift), and also select icons for individual
operations, e.g. to view the full text of the document. As we shall
see, keywords have associated colors. When displaying the full text,
the keywords are rendered in the appropriate color.
All the NIRVE prototypes have been implemented as one or more
graphical windows managed by OpenGL, and a control menu window managed
by Tcl/Tk. These processes communicate via Xevents [Nye93].
3.1 Spiral Model
3.1.1 Spiral Design
Spiral Metaphor
In our first model, we tried to preserve the sequential structure of
the ranked list returned by PRISE, and enhance it with additional
information. We arranged document icons along a spiral in 2-D, with
the top-ranked document in the middle, and the others spaced out along
the spiral proportional to their scores (Figure 1). Thus the
user was encouraged to start at the middle and work out towards the
periphery - a reasonable metaphor. We believed that the most
important information to convey about each document was its keyword
content: this is what the user asked for, and also, what PRISE made
available. The icon, therefore, was a simple square containing a bar
chart, showing the relative frequency of color-coded keywords.
Keyword Weighting
The legend associating colors with keywords was in a separate window.
Each keyword had a colored slider which controlled its weighting
factor. As the user increased the factor for a given keyword,
document icons containing that keyword were elevated above the plane
of the spiral. The elevation was proportional to the sum of the
products of a document's keyword frequency and the keyword weighting
factor. That is, we used the 3rd dimension for an alternate ranking
of the documents based on keyword weights rather than on PRISE scoring.
3.1.2 Spiral Problems
Spurious Clustering
In the very earliest informal evaluations, the comment that always
came up was why an apparent cluster of document icons was grouped
together. The answer, of course, was that they just happened to be
placed there because of the layout algorithm, which used the PRISE
scores. And so, lesson number 1: people will view spatial arrangement
metaphorically whether you want them to or not.
Complexity of Icon Elevation
The idea behind elevating document icons was to allow the user to
express which keywords were really important and which were ancilliary
and then use that information to select the relevant documents. When
all the keywords but one had a zero weighting factor, this worked
moderately well. But assigning significant weights to several
keywords resulted in a display that was hard to interpret. Many icons
would float above the spiral plane and it was difficult to make use of
their positions (for similar remarks, see [Chal95]). In order to make
the weighting more selective, we added an "AND" mode (as opposed to
the implicit "OR" mode): in order to be elevated, a document had to
have a non-zero frequency for all keywords which had non-zero
weights. For example, if the user emphasized the keywords "michael"
and "jordan", then a document had to have some occurrence of both
keywords to be elevated. The moral here is less clear - perhaps two
can be suggested. First, make selection mechanisms selective; it
doesn't help to highlight 50% of a collection. Second, 3-D gets
confusing very quickly; layout in 3-space must be done with great
restraint.
3.2 3-D Axes Model
3.2.1 3-D Axes Design
3-D Axes Metaphor
Another early design, developed simultaneously with the spiral model,
was the use of 3-D axes. In the first iteration, the user could
dynamically select three keywords to be assigned to the X, Y, and Z
axes and the icon would be placed in the location corresponding to
those three components of its keyword profile (Figure 2). Each
document icon still had a bar chart for the full keyword profile. The
supposed advantage in this model was that each spatial dimension would
have a direct and meaningful semantic interpretation.
Keyword Aggregation
Since we were limited to three spatial dimensions, we extended the
model to allow the user to assign sets of keywords to each
dimension. The natural tendency was to bundle together keywords which
were close in meaning - the first hint of keyword aggregation.
3.2.2 3-D Axes Problems
Volumetric Occlusion
It quickly became apparent that naively scattering icons in 3-space
led to occlusion and confusion. Certain features, such as outliers,
and documents with a zero value for one or more of the three axes did emerge.
But most of the icons were located in the general volume down near the
origin and were very hard to distinguish. Moreover, unlike the spiral
model, there was no obvious sequential order in which the documents
could be accessed. And so, reinforcing what we found with the spiral
model: unless the data are already very structured (either naturally
or as a result of some analysis and feature extraction), volumetric
style 3-D is unlikely to be easily understood.
No Natural Clustering
We had hoped that some natural clustering might emerge from our 3-D
scatterplot, but this did not seem to happen in the examples we tried.
In retrospect, this was not surprising. This was not, after all, a
scatterplot of the entire database, but rather a small subset, chosen
precisely because its members matched a query. Since the documents
were chosen for their similarity with respect to the query, the query
keywords themselves are not likely to provide a means of strongly
differentiating among them. By contrast, using all the keywords of
the database to form high-dimensional document vectors may well result
in a good clustering, as in [Hear96].
3.3 Nearest Neighbor Circle Model
3.3.1 Nearest Neighbor Circle Design
Document Sequence
The two main ideas behind the Nearest Neighbor Circle (NNC) were to
radically simplify the visual display to avoid excess occlusion, and
to perform some analysis internally rather than relying on
visualization per se to generate good clustering. First, we
defined a distance metric between documents, such as the Euclidean
distance between their keyword profiles (recall the keyword profile is
essentially an n-dimensional point, with all values between zero and
one). NNC then orders the documents by applying the nearest neighbor
algorithm: the successor of a document is the closest document which
has not yet been chosen.
Circle Metaphor
The corresponding icons are then arranged in a circle, with the icons
sitting upright, somewhat like photographic slides in a circular tray
(Figure 3). The spacing is not uniform, however, but is
proportional to the distance between adjacent documents. Thus, large
visual gaps form implicit cluster boundaries - and indeed, we did find
that somewhat sensible groups were generated by this procedure.
Keyword Weighting
As in the spiral model, NNC supported the ability to elevate icons
based on dynamic weighting of keywords. Because the icons were
arranged upright in a circle (essentially a 1-D structure) rather than
laid out flat on a 2-D spiral, the result of elevating icons was much
clearer. Furthermore, the elevation would show distinct patterns:
nearby icons tended to all be raised or all be left on the base plane.
This was a result, of course, of putting documents with similar
profiles near each other.
3.3.2 Nearest Neighbor Circle Problems
Too Many Keywords
We noticed that when the query consisted of just a few (4-5) keywords,
the resulting structure of the display usually turned out to be
reasonably coherent. By contrast, although the algorithms could
handle any number of keywords, the visualization became harder to
understand if many keywords were used. Natural clusters of documents
tended to fragment into small, non-adjacent sub-clusters. This was
especially frustrating since sometimes clusters would fragment based
on keywords that were essentially synonyms - e.g. "tornado"
vs. "twister".
Implicit Clusters
Although the size of the gaps between documents provided a clue about
implicit cluster boundaries, it seemed as if marking these explicitly
would help. Also, there was no easy way to tell what a cluster was
about other than scanning through its icons and noticing which
keywords were dominant.
No Workspace
Finally, the only thing users could really do with the documents was
to look at the visualization, play with the keyword weighting, and
view full text. There was no way to designate or save a desirable
subset of documents.
3.4 Spoke and Wheel Model
3.4.1 Spoke and Wheel Design
The Spoke and Wheel prototype introduced a number of new features; the
three most important were keyword-concept mapping, explicit
clustering, and user marking and filtering.
Mapping Keywords to Concepts
Keyword-concept mapping allows the user to dynamically aggregate
keywords into a presumably smaller set of concepts. For instance, the
keywords "tornado", "twister", and "storm" might all be mapped to the
single concept "STORM". Although it is common for each keyword to be
mapped to exactly one concept, it is not required. This change
typically cuts down the number of dimensions significantly and
therefore simplifies the resulting visualization. Each document is
then characterized by a concept profile, rather than keyword
profile, and the bar chart of its icon reflects color-coded concepts,
not keywords. Using concepts to describe a document often makes more
sense semantically than the full set of keywords. Several keywords
may be included in a query to make sure that all relevant documents
are returned, but they may not denote any meaningful distinction in
the subject matter of interest. Control of this mapping was
incorporated into the keyword slider window via a keyword-concept
matrix (Figure 4). The user can click each cell to toggle its
value: a checkmark indicates that the corresponding keyword and
concept are associated.
Explicit Adjustable Clusters
As with NNC, a sequence of documents is calculated (based on their
concept profiles), but now clusters are made explicit. We defined a
cluster boundary as any gap in the document sequence larger than a
given threshhold. This had the interesting consequence of enabling
dynamic control of cluster granularity. The user can request fewer,
bigger clusters, causing the prototype to increase the threshhold for
gap size - i.e. only larger gaps will count as cluster boundaries.
Conversely, lowering the threshhold induces smaller, more numerous
clusters.
Cluster Icons and Spatial Arrangement
As before, document icons stand upright on a base plane, but now there
are also larger 3-D icons for explicit clusters. The concept profile
of a cluster is defined as the average of the profiles of its
documents. The cluster icons are arranged around a circle, facing
outward. The associated document icons are arranged outward along a
radius aligned with the cluster icon (Figure 5). The angular
distance between clusters is proportional to the logical distance
between them; likewise the radial distance between documents reflects
the distance metric separating their concept profiles.
Textual Equivalent
Since the clusters were now explicit structures, it allowed us to
generate a webpage in which document titles were organized
correspondingly. The direct motivation was to allow the user to see
several titles at once; however, this also formed the basis of later
experiments comparing interfaces with the same logical structure, but
different visual presentation, namely text, 2-D, and 3-D.
Marking and Filtering
The third principal innovation was decorating every icon with a small
colored flag, indicating the user's judgment of its value: red for
bad, yellow for undecided, and green for good. The user could mark
entities at the cluster or document level. Furthermore, once marked,
the user can do dynamic filtering based on these attributes, e.g. show
good and undecided documents, but suppress bad. In particular,
the suppression of irrelevant clusters served to simplify the
entire display.
Concept Weighting
This prototype still retained the ability, inherited from the spiral
model, to assign a weight to each concept. However, this weighting no
longer caused document icons to be elevated; rather it was used as a
scaling factor for each dimension and this in turn affected the
distance metric and clustering among documents. If a concept is
assigned a low weight, it means that it doesn't matter too much if two
documents differed with respect to that concept. Conversely, a
high-weighted concept magnifies the logical distance between such
document pairs. We found that this scheme was too subtle for most
users, who generally ignored the sliders.
3.4.2 Spoke and Wheel Problems
Distinguishing Documents within a Cluster
Organizing documents according to concepts tended to generate fairly
clean homogeneous clusters. But this success now made it more
difficult to distinguish among documents within a cluster based solely
on their (quite similar) concept profiles. The user could slide the
mouse along a row of document icons to cause their titles to appear
sequentially, but the spatial arrangement and bar charts of the
document icons really conveyed very little information.
Cluster Relationships
The arrangement of clusters around a circle was essentially
one-dimensional and this precluded a good visualization of the
relationships among clusters. Note that inter-cluster distance was
well-defined for all pairs of clusters, not just those adjacent along
the circle.
Matrix Interface
While the association matrix between keywords and concepts was
logically correct and complete, it tended to intimidate some users.
Moreover, to make things easier, we limited concept names to
capitalized versions of the keywords. Users therefore had to
distinguish between lowercase keywords occupying columns and uppercase
concepts occupying rows.
Disjunctive Aggregation Only
Keyword-concept mapping was motivated by the presence of near-synonyms
in queries. For these, disjunction was the appropriate model. But in
other cases, particularly proper names, there was good reason to want
conjunctive aggregation of keywords (e.g. the concept "EPA" =
"environmental" and "protection" and "agency"). This was
interestingly similar to our earlier refinement of keyword elevation:
we started with "OR" mode, and wound up needing an "AND" mode
as well.
Confusion over Input Modes
A long-standing problem concerned input mode. We wanted to allow the
user both to manipulate the 3-D display (move mode) and to select
icons within it (pick mode). Both modes required all three
mouse buttons, so we could not, for instance, use button
#1 for move and button #2 for pick. We tried to make these modes very
apparent, by setting the cursor to indicate which was in effect, and
enabling use of the spacebar as a toggle. But in early experiments
done by our collaborators at the Catholic University of America
[Sebr99], users uniformly reported confusion and frustration over this
issue. We conclude that mode-switching has a high cognitive load;
this presents a difficult design problem when the user is limited to a
single 2-D input device.
3.5 Concept Globe Model
3.5.1 Concept Globe Design
Simplify Cluster Definition
We had found that when adjusting the cluster threshhold, users tended
to form clusters in which all of the documents had the same set of
concepts present, i.e. their cluster profiles were similar in that all
had the same set of non-zero components. More generally, it seemed to
us from experience that slight variations in cluster profile conveyed
virtually no useful information - what was really wanted was
information about the presence or absence of a concept within a
cluster or document. And so, we decided to drastically simplify the
clustering algorithm: a cluster is defined as a set of documents all
of which have some occurrence of the same subset of concepts. Thus,
if five concepts are being used to distinguish among the documents,
there are at most 32 clusters: one with all five concepts, five with
four concepts, ten with three concepts, and so on.
Globe Metaphor
How then to arrange clusters spatially? It seemed reasonable that the
number of concepts was an important organizing principle; a cluster
with four or five concepts seemed more promising than one with only
one or two. We decided to arrange clusters on the surface of a globe
(Figure 6). The cluster icon was now a box, whose thickness
represented the number of documents it contained, and whose face
held the familiar colored bar chart for concept profile. The
latitude of an icon is determined by the number of concepts it
represents. Conveniently, there were unique locations - the North and
South Pole - for the unique clusters with all and no concepts. Also,
there was more room in the middle latitudes for the more numerous
clusters with an intermediate number of concepts. It turned out in
later experiments that subjects readily understood this metaphor.
Showing Cluster Relationships
What about the relationship among clusters? The closest relationship
was when two clusters differed by the presence of a single concept.
Note that two such adjacent clusters would necessarily be in two
adjacent bands of latitude. Therefore, we developed a heuristic
procedure to assign longitudinal position so as to try to keep such
adjacent pairs close; i.e. longitude had a relational but not an
absolute meaning. Of course, we had to avoid overlap within a
latitude band. The mere location of the cluster icons was not a
strong enough visual cue, and so we connected logically adjacent
clusters by an arc, whose color corresponded to the conceptual
difference between them. E.g. if cluster A has the concepts "boat",
"sink", and "ocean", and cluster B has "boat", "sink", "ocean", and
"storm", then they will be connected by an arc color-coded for
"storm". These arcs were put in almost as an afterthought, but turned
out to be quite successful: the subjects in our evaluation experiments
were able to use them to navigate among the clusters.
First-Class Concepts
We decided that concepts should be first-class objects, not just
uppercase versions of the keywords. Users could now freely assign
names and colors to concepts, specify whether they were conjunctive or
disjunctive, and add and delete them. Along with this, we changed the
interface for control of the keyword-concept mapping. Instead of a
matrix, we designed an interactive legend in which the colored
concepts were shown in a row, each with a column of its keywords
beneath it. Users could change the mapping by dragging and dropping
keywords among the concept columns. The last column is always
reserved for the UNUSED concept, in which unmapped keywords (if any)
are stored.
We did not take the final step of allowing concepts to be any boolean
function of keywords, although that might be useful in some cases,
e.g. PRESIDENT = ((bill or william) and clinton). Our tentative
judgment is that the utility is outweighed by the complexity of
meaning (most users are not skillful at formulating logical
expressions) and of the interface necessary to specify such
combinations.
Representing Documents
For the first time, we decided not to show document icons by default.
The concept profiles of documents within a cluster were now, by
definition, only variations in quantity among a fixed subset of
concepts, so showing the bar charts seemed almost useless. The
salient issues then become how to distinguish among a cluster's documents,
and how to design meaningful icons. Since we could not get access to the
full text or full term vector (as used in [Hear96]) of documents
quickly enough to support real-time operation, the only additional
information available was the documents' titles and relevance scores
(as returned by the search engine).
This raises an important issue: presumably, what the user really wants
to know is what a document is about, in the true semantic sense.
Concept profiles and term vectors are merely possible indicators of a
document's true meaning. We use them because they are susceptible to
automatic manipulation, not because they are perfect representations
of a document. A document title is normally more informative
than these, but it is trickier to use as an object of computation.
We arrange the icons for documents within a cluster on a 2-D
document field. A document icon is a simple rectangle
containing the title (not a bar chart), along with a little value flag
as discussed above. These icons are arranged in the document field
such that similar titles (i.e. those containing some matching words -
we developed a simple metric for title similarity) have nearby
horizontal positions. Vertical position is controlled by the score
assigned by the search engine. Thus similar titles appear in the
same column, with better scores towards the top of the column.
Clusters can be opened or closed, independently. When a cluster is
opened, its 2-D document field is projected outward from the cluster
icon and the view automatically zooms in on the field. Thus users can
decide whether to display just an overview of the entire result set,
or show details selectively.
Input Devices and Modes
We finally solved the nagging input mode problem by the simple
expedient of using a second input device. A Spaceball [Spac99] (a
6-dimensional input device) is used to move and rotate the entire 3-D
display. The mouse is used solely for picking. This elementary
change caused a major improvement in user satisfaction.
3.5.2 Concept Globe Evaluation
We had been performing informal evaluations of the various prototypes
described above. Once the globe model became stable, however, we
prepared to conduct a more formal usability experiment. In
particular, we wanted to measure the effects of various visualization
modes, namely 3-D, 2-D and text. Moreover, we wanted to carefully
isolate the effects of these modes, and not confound them with the
effects of functional differences among prototypes. Therefore, we
developed a 2-D and text version of the globe model, preserving as much
of the functionality as feasible. These prototypes were the object of
a detailed usability experiment, as reported in [Sebr99], from which
the following is excerpted.
In the 2-D model, the globe was flattened into a map on which all
clusters could be displayed simultaneously. Since there is no third
dimension to convey cluster box thickness, this information is
conveyed as the width of a gray bar located at the bottom of the box.
Arcs indicating conceptual similarity are depicted as straight lines,
and the field of document titles is simply drawn over the display of
cluster icons.
In the text model, an HTML file is displayed in Netscape. Clusters
are represented basically as lists of document titles. Each cluster
is labeled with a textual colored concept profile. The order of
clusters is according to the number of concepts contained, analogous
to the north-to-south arrangement on the globe.
How Many Concepts?
All the models worked better when the result set was organized with a
"reasonable" number of concepts, typically four or five. Once the
number of concepts reached seven or eight, the resulting display
became complex and difficult to interpret. How commonly do users
inquire about topics for which five or so concepts are insufficient
(clearly, the number of keywords may be greater)? We tentatively
suggest that, most of the time, five or six concepts will be enough to
characterize a topic, but this remains an open question.
Text vs. 2-D vs. 3-D
Initially, subjects performed better, as measured by task completion
and response times, using the text model than the 2-D model, and using
2-D than 3-D. This was especially true for selective tasks, such as
finding a document title. In such cases, the need to open a cluster
and scan through the document field was probably the big disadvantage.
In retrospect, it may have been a more valid comparison to implement
the text version as a list of cluster titles only, each of which would
have to be explicitly opened in order to see the contained document
titles.
Each subject went through six sessions, however, and by the last
session, the performance gap largely closed. 3-D showed the greatest
improvement, 2-D somewhat less, and performance using the text model
actually seemed to grow slightly worse. Moreover, the performance of
"expert" users (those with extensive computer experience) was
virtually equal for 3-D and text, and somewhat worse for the 2-D model.
We surmise, first, that novices and experts alike were already
familiar with text-like operations, such as scrolling, but that
novices had some difficulty adapting to the graphical interfaces;
second that it took some practice before the spatial metaphors became
familiar enough to be used without undue delay.
Cluster Grouping
Overall, the subjects understood and generally liked the
organizational aspects of NIRVE including clustering of documents and
the relational arrangement of clusters. They used the grouping of
concepts into clusters to narrow their search for particular
documents. If a particular concept was not of interest, the subject
knew which set of documents to avoid. The grouping also contributed to
the selection of potential documents because it showed concept
combinations that might not have otherwise been considered.
The relational structure of the clusters was also used to keep track
of preferred clusters. The vertical placement of clusters according to
the number of concepts helped users adopt the strategy of linking up
or down depending on their need of adding or subtracting
concepts. Many 2-D and 3-D participants would start from one pole of
the globe and navigate through various links. In a number of cases,
they began with the lowest level containing the minimal number of
"potential" concepts required to find a document, and then worked
their way up the globe or map until they found a matching document.
Color Works
The most frequently used feature of the NIRVE interface was
color. Users in all three modes took advantage of color-concept
mapping. The text condition benefited the most from this dimension,
making this otherwise tedious list more efficient than
anticipated. Instead of skimming or quickly reading the list of
concepts at the beginning of each cluster, the subjects adopted the
strategy of scanning the associated colors. This strategy is
efficient because visual scanning of color, an automatic process,
takes less time and effort than scanning words.
Visualization vs. Text
Subjects using the 2-D and 3-D models had difficulty using the document
field to find titles. Legibility was a problem, and the 2-D layout was
complex compared to a familiar one-dimensional scrollable list of
titles. Perhaps this is a case of "over-spatialization". Especially
for a list of moderate size (e.g. 20-30 titles), there is probably
little to be gained by structuring the set and then visualizing
the result. In our experiment, we used result sets of only 100
titles; larger result sets (and hence larger clusters) might
profit from such visualization.
3-D vs. 2-D
Although the globe was more visually appealing, it presented problems
for many users. First was our familiar nemesis, occlusion: roughly
half the clusters were not visible at any time because thay were on
the back side of the globe. Secondly, subjects tended to get
disoriented. They would find a needed cluster, go look at another
one, and then have trouble re-locating the original one. We put
alphabetic markers along the equator to help give a sense of absolute
location, but they did not seem to help. In contrast, the 2-D version
showed everything at once, and the only manipulation allowed was
panning and zooming - whereas in the 3-D model, the scene could be
shifted in any of three directions and rotated around the X or Y axis.
In short, the surface of the globe is a 2-D manifold, and, in this
application, there was no real advantage to curving it through
3-D space.
3.6 2.5-D Design
As a follow-up to the usability experiment just discussed, we
developed a hybrid model, attempting to combine the better features of
the 2-D and 3-D models. In this protoptype, cluster icons are laid
out on a 2-D map, but the icons themselves have thickness, and the
arcs connecting them loop up into the 3rd dimension (Figure 7).
Also, when clusters are opened, the document field is projected
outward, as in the global model. The document field itself remains a
prime candidate for re-design.
We suspect that for many applications, comprehensible use of 3-D will
require a similar strategy: relatively small 3-D entities embedded in a
2-D manifold, rather than full volumetric-style 3-D.
4. Conclusions
There are many design dimensions to be aware of when developing a
visualization application:
-
Data: size of database; complexity/simplicity of data;
inherently structured or heterogenous;
type of data: numeric, text, image, etc.
-
Tasks: finding specific items; overview; updating
-
Users: experts; one-time users
-
and finally, presentation mode: visual; text; other.
Often, the effect of each of these is not distinguished when
evaluating a new approach. This is understandable: it can be quite
expensive to test even a few variations along several dimensions.
However, a credible claim that visualization improved an application
must rest on a fair comparison with otherwise equivalent alternatives.
We suspect that because good visualization depends on good structure,
what often happens is that developers are motivated to perform a
deeper analysis in order to generate that structure. This
re-structuring not only enables the visualization but also may suggest
more powerful operations and functionality than originally foreseen.
Thus, the improved visual version surpasses its non-visual ancestor at
least as much because of this process of re-analysis as of the
visualization itself.
We hope to extend and refine NIRVE as a vehicle for exploring some of
the many unresolved design issues surrounding the visualization of
search results:
-
What is the effective scale for visualization techniques? Intuitively
one might suppose that sets of less than 30 documents or so don't need
visualization, and sets of more than 2000 may defeat any browsing
technique; but no systematic tests seem to have been done.
-
What really constitutes a "fair" baseline for evaluating a
visualization? Should there be a text version that is structurally
analogous? Is it more reasonable that it simply have similar
functionality?
-
Should the information visualization community try to develop and
exploit data structures that capture the semantics of documents at a
"deeper" level, thereby enabling more meaningful clustering? Or are
present techniques sufficient?
Acknowledgments
We thank all the following: Dr. Christine Piatko actively collaborated
in the early design and development of NIRVE. Michael Miller and
Joanna Vasilakis of the Catholic University of America provided
valuable help in evaluating various prototypes and offering design
suggestions.
Refererences
[Alle93]
R. Allen, P. Obry, M. Littman,
"An Interface for Navigating Clustered Document Sets Returned by Queries",
Proceedings of SIGOIS, pp.203-208, Milpitas, CA, June 1993.
[Born00]
K. Borner, "Visible Threads: A Smart VR Interface to Digital Libraries",
Proceedings of IST/SPIE's 12th Annual International
Symposium: Electronic Imaging 2000: Visual Data Exploration
and Analysis
(SPIE 2000),
San Jose, CA, 23-28 January 2000.
[Card96]
S.K. Card, "Visualizing Retrieved Information: A Survey",
IEEE Computer Graphics and Applications,
v.16(2), pp.63-67, March 1996.
[Card97]
Stuart K. Card, Jock D. Mackinlay, "The Structure of the
Information Visualization Design Space",
Proceedings of IEEE Symposium on Information Visualization,
Phoenix, AZ, October 1997.
[Card99]
Stuart K. Card, Jock D. Mackinlay, Ben Shneiderman,
Readings in Information Visualization: Using Vision to Think,
Morgan Kaufmann Publishers Inc. San Francisco, CA, 1999.
[Chal95]
M. Chalmers, "Design perspectives in visualising complex information",
Proc IFIP 3rd Visual Databases Conference,
Lausanne Switzerland, March 1995.
[Cugi96]
J. Cugini, C. Piatko, S. Laskowski,
"Interactive 3D Visualization for Document Retrieval",
Proceedings of the Workshop on New Paradigms in Information
Visualization and Manipulation , ACM Conference on Information
and Knowledge Management (CIKM '96), November 1996.
[Cugi97]
J. Cugini, S. Laskowski, C. Piatko,
"Document Clustering in Concept Space:
The NIST Information Retrieval Visualization Engine (NIRVE)",
CODATA Euro-American Workshop on Visualization of Information
and Data, Paris, France, June 1997.
[Hear96]
M. A. Hearst and J. O. Pederson,
"Reexamining the cluster hypothesis: Scatter/gather on retrieval
results", Proceedings of SIGIR '96, Zurich, Switzerland,
Aug 18-22 1996.
[Mann99]
T.M. Mann, "Visualization of WWW-Search Results",
Proceedings of the International Workshop on Web-Based Information
Visualization
(WebVis'99), pp. 264-268,
(in conjunction with DEXA'99, Tenth International Workshop on
Database and Expert Systems Applications, eds A.M. Tjoa, A. Cammelli,
R.R. Wagner)
Florence Italy, September 1-3 1999, IEEE Computer Society.
[Nowe96]
L.T. Nowell, R.K. France, D. Hix, L.S. Heath, and E.A. Fox,
"Visualizing Search Results: Some Alternatives to Query-Document
Similarity", Proceedings of SIGIR '96, Zurich, Switzerland,
Aug 18-22.
[Nye93]
Adrian Nye,
Xlib Reference Manual, O'Reilly & Associates, 1993.
[Sebr99]
M. Sebrechts, J. Vasilakis, M. Miller, J. Cugini, S. Laskowski,
"Visualization of Search Results: A Comparative Evaluation of Text,
2D, and 3D Interfaces",
22nd International ACM SIGIR Conference on Research and
Development in Information Retrieval,
Berkeley, California, August 1999.
[Shne00]
Ben Shneiderman, David Feldman, Anne Rose, Xavier Ferre' Grau,
"Visualizing Digital Library Search Results with Categorical and
Hierarchical Axes",
Proceedings of ACM Digital Libraries 2000,
San Antonio, Texas, June 2-7, 2000.
[Spac99]
See
http://www.spacetec.com/index.htm.
[Swan98]
R.C.Swan, J.Allan,
"Aspect Windows, 3-D Visualizations, and Indirect Comparisons of
Information Retrieval Systems",
Proc. 21st Annual SIGIR'98,
Melbourne Australia, August 1998.
[TREC]
Text Retrieval Conference:
http://trec.nist.gov
[Veer97]
Aravindan Veerasamy, Russell Heikes, "Effectiveness of a graphical
display of retrieval results", Proceedings of SIGIR '97,
pp. 85-92, Philadelphia, PA, July 27-31, 1997.
[Wick94]
C.D.Wickens, D.H.Merwin, E.L.Lin, "Implications of Graphics
Enhancements for the Visualization of Scientific Data:
Dimensional Integrality, Stereopsis, Motion, and Mesh",
Human Factors, 36(1) 44-61, 1994.
[Youn96] Peter Young,
"Three Dimensional Information Visualisation", Computer Science
Technical Report, No. 12/96, Department of Computer Science,
University of Durham, UK, November 1996.
[Zami98] Oren Zamir,
"Visualization of Search Results in Document Retrieval Systems:
General Examination",
Department of Computer Science and Engineering,
University of Washington, September 1998.
[Zami??] Oren Zamir and Oren Etzioni,
"Grouper: A Dynamic Clustering Interface to Web Search Results",
Department of Computer Science and Engineering,
University of Washington.
[Zhou98]
Michelle Zhou and Steven Feiner,
"Visual Task Characterization for Automated Visual Discourse Synthesis",
Proc. CHI'98,
pp.392-399, LA, Calif, April 18-23 1998.
[ZPRISE]
NIST PRISE Search Engine:
http://www.itl.nist.gov/div894/894.02/works/papers/zp2/main.html