SED navigation bar go to SED home page go to SED seminars page go to NIST home page SED Home Page SED Contacts SED Projects SED Products and Publications Search SED Pages

Statistical Engineering Division Seminar

Analysis of Variance of Cross-Validation Estimators of the Generalization Error of Computer Algorithms

Dr. Marianthi Markatou
Department of Biostatistics
Columbia University
Statistical Engineering Division Seminar
Monday March 27, 2006, 1:30-2:30 PM
NIST North Room 618

Abstract

We bring together methods from two different disciplines, statistics and machine learning, to address the problem of estimating the variance of cross-validation estimators of the generalization error of computer algorithms. We approach this problem as a problem in approximating the moments of a statistic. The approximation illustrat es the role of training and test sets in the performance of the algorithm. It provides a unifying approach to the evaluation of various methods used in obtaining training and test sets, and it takes into account the variability due to different training and test sets. For the simple problem of predicting the sample mean and in the case of smooth loss functions, we show that the variance of the cross-validation estimator of the generalizatio n error is a function of the moments of the random variables Y = Card(Si n Sj ), Y* = Card(Sic n Sjc) where Si and Sj are two training sets and Sic and Sjc are the corresponding test sets. We prove that the distribution of Y, Y* is hypergeometric and we compare our estimator of variance with the estimator proposed by Nadeau and Bengio (2003). We extend then the results in the case of multiple linear regression, kernel regression and classification.

Remark: The talk is based on the paper: Markatou, M, Tian, H, Biswas, S. and Hripcsak, G. (2005). Analysis of Variance of Cross-Validation Estimators of the Generalization Error. Journal of Machine Learning Research, 6, 1127-1168..

NIST Contact: Charles Hagwood, (301) 975-2846.

Date created: 3/10/2006
Last updated: 3/10/2006
Please email comments on this WWW page to sedwww@cam.nist.gov.