5.1 Statistical Reference Datasets

M.  Carroll Croarkin

James J. Filliben

Lisa M.  Gill

William F. Guthrie

Eric S.  Lagergren

Hung-kung Liu

Mark G.  Vangel

Nien Fan Zhang

Statistical Engineering Division, CAML

Janet E.  Rogers

Bert W.  Rust

Applied and Computational Mathematics Division, CAML

James E.  Gentle

George Mason University

Eric Lagergren and Will Guthrie are leading a team effort to develop a suite of statistical reference datasets under the umbrella program, "Tools for Evaluating Mathematical and Statistical Software", in collaboration with the Applied and Computational Mathematics Division.

The purpose of this project is to improve the reliability of statistical software by providing reference datasets with validated computational results that enable the objective evaluation of statistical software by users and developers. The statistical software industry is a mature and stable one; as a result, use of such software is proliferating and statistical algorithms are increasingly being incorporated into traditionally non-statistical packages such as spreadsheets. However, the integrity of some of this software is questionable, therefore both software developers and users have a great need for reference datasets with validated answers to assess the quality of such products. Areas of particular concern include linear and nonlinear regression and analysis of variance.

The team is seeking guidance from experts in the field. James Gentle, a leader in statistical computing, recently gave a talk at NIST on "Testing Mathematical Software" and met with the team to discuss relevant project issues, particularly with regard to validating computational results.

Currently, the team is collecting and developing reference datasets and investigating strategies for validating computational results. A web page will be available for public use by the end of this fiscal year.

Date created: 7/20/2001
Last updated: 7/20/2001
