SED navigation bar go to SED home page go to SED projects page go to NIST home page SED Home Page SED Contacts SED Projects SED Products and Publications Search SED Pages

Education and Training: Exploratory Data Analysis

Time and Location Exploratory Data Analysis
James Filliben
Statistical Engineering Division, NIST
Teusday, Wednesday March 11, 12 2003 1:00pm - 5:00pm
Adminstration Building, Lecture Room A
Teusday, Wednesday March 18, 19 2003 1:00pm - 5:00pm
Adminstration Building, Lecture Room D
Gaithersburg, MD
Abstract This 4-session tutorial-level workshop is an introduction to EDA--Exploratory Data Analysis--an approach/philosophy for data analysis (very much akin to "data mining") which employs a variety of (primarily) graphical techniques to
  1. maximize insight into a data set;
  2. uncover underlying structure;
  3. extract important factors;
  4. detect outliers & anomalies;
  5. test underlying assumptions;
  6. develop parsimonious models; and
  7. determine optimal factor settings.
General problem areas consist of
  1. Univariate;
  2. Multi-factor;
  3. Regression; and
  4. Multivariate.
For each of these 4 problem areas, heavy emplasis will be placed not only on the selection of appropriate EDA techniques, but also on the interpretation of output from such techniques so as to form a full and complete set of valid scientific/engineering conclusions. In short, the analyses will be conclusions-driven, and EDA will be the primary tool to develop such conclusions.

This EDA Workshop will consist of 3 parallel components:

  1. EDA techniques;
  2. EDA principles;
  3. interesting data sets.
EDA techniques to be discussed include standard commonly-used tools such as
    histograms, probability plots, box plots, residual plots,
and less commonly-used (but powerful) tools such as
    4-plots, lag plots, PPCC plots, bi-histograms, block plots, GANOVA plots, interaction plots, transformation plots, spectral plots, Youden plots, a variety of "multi-plots", etc.
EDA principles, of course, serve as the link between data set and EDA technique. These principles are the "guidance system" to choose the appropriate EDA technique from the collection of possible EDA techniques. Such principles will be discussed along the way in conjunction with each data set.

The data sets will be drawn primarily from NIST physical science and engineering applications, but we additionally include a few non-scientific data sets (e.g., 1992 election data), and a few "classical" textbook data sets. All such data sets used in this course are available on the web at

Comments on Course CLASS SIZE IS LIMITED TO 40.

A set of notes will be provided for the class.

Further Information For further information, contact or register online at

Date created: 3/7/2003
Last updated: 3/7/2003
Please email comments on this WWW page to

SED Home |  Education Home |  Previous ] Next ]