Exploratory Data Analysis Techniques in a Science and Engineering Environment
James J. Filliben
Exploratory Data Analysis (EDA) is an approach/philosophy for data analysis which employs a variety of graphical techniques to
The data sets will be drawn primarily from the physical sciences and engineering; we additionally include Clinton/Bush/Perot data from the last election, data from the last Olympics, and a revisiting of some classical textbook data.
EDA methods to be discussed include standard commonly-used tools such as histograms, probability plots, box plots, residual plots, Youden plots, and multiplotting, etc.
In addition, other less commonly-used (but powerful) techniques such as 4-plots, lag plots, PPCC plots, bi-histograms, block plots, GANOVA, and interaction effects matrices will be discussed.
The link between data set and appropriate EDA technique/methodology is, of course, driven by EDA principles. These principles are extremely important and serve as the guidance system to choose the appropriate technique(s) from an ever-growing collection of EDA methods. Such principles will be discussed along the way in conjunction with each data set.
[ James J. Filliben, Statistical Engineering Div., NIST, Gaithersburg, MD 20899 USA; firstname.lastname@example.org ]
All data sets used in this talk are available over the web for possible use in academic teaching -- the URL is http://www.nist.gov/itl/div882/conf/jrc/eda_datasets.html
Date created: 6/5/2001