SED navigation bar go to NIST home page SED Home Page SED Contacts SED Projects SED Products and Publications Search SED Pages


Organizer/Session Chair: Lisa Gill, NIST

Exploratory Data Analysis Techniques in a Science and Engineering Environment

James J. Filliben
Statistical Engineering Div., NIST

Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis which employs a variety of graphical techniques to

  1. maximize insight into a data set;
  2. uncover underlying structure;
  3. detect outliers/anomalies;
  4. test underlying assumptions; and
  5. develop parsimonious models.
This EDA Tutorial/Tour will consist of 3 parallel components:
  1. interesting data sets;
  2. appropriate EDA techniques; and
  3. underlying EDA principles.

The data sets will be drawn primarily from the physical sciences and engineering; we additionally include Clinton/Bush/Perot data from the last election, data from the last Olympics, and a revisiting of some classical textbook data.

EDA methods to be discussed include standard commonly-used tools such as histograms, probability plots, box plots, residual plots, Youden plots, and multiplotting, etc.

In addition, other less commonly-used (but powerful) techniques such as 4-plots, lag plots, PPCC plots, bi-histograms, block plots, GANOVA, and interaction effects matrices will be discussed.

The link between data set and appropriate EDA technique/methodology is, of course, driven by EDA principles. These principles are extremely important and serve as the guidance system to choose the appropriate technique(s) from an ever-growing collection of EDA methods. Such principles will be discussed along the way in conjunction with each data set.

[ James J. Filliben, Statistical Engineering Div., NIST, Gaithersburg, MD 20899 USA; ]

All data sets used in this talk are available over the web for possible use in academic teaching -- the URL is

Date created: 6/5/2001
Last updated: 6/21/2001
Please email comments on this WWW page to