8.2.1.5. Empirical model fitting - distribution free (Kaplan-Meier) approach

8. Assessing Product Reliability 8.2. Assumptions/Prerequisites 8.2.1. How do you choose an appropriate life distribution model? 8.2.1.5. Empirical model fitting - distribution free (Kaplan-Meier) approach
The Kaplan-Meier procedure gives CDF estimates for complete or censored sample data without assuming a particular distribution model	The Kaplan-Meier (K-M) Product Limit procedure provides quick, simple estimates of the Reliability function or the CDF based on failure data that may even be multicensored. No underlying model (such as Weibull or lognormal) is assumed; K-M estimation is an empirical (non-parametric) procedure. Exact times of failure are required, however. Calculating Kaplan - Meier Estimates The steps for calculating K-M estimates are the following: Order the actual failure times from $t_1$ through $t_r$, where there are $r$ failures Corresponding to each $t_i$, associate the number $n_i$, with $n_i$ = the number of operating units just before the the $i$-th failure occurred at time $t_i$ Estimate $R(t_1)$ by $(n_1 - 1)/n_1$ Estimate $R(t_i)$ by $R(t_{i-1}) \times (n_i - 1)/n_i$ Estimate the CDF $F(t_i)$ by 1 - $R(t_i)$ Note that unfailed units taken off test (i.e., censored) only count up to the last actual failure time before they were removed. They are included in the $n_i$ counts up to and including that failure time, but not after.
Example of K-M estimate calculations	A simple example will illustrate the K-M procedure. Assume 20 units are on life test and 6 failures occur at the following times: 10, 32, 56, 98, 122, and 181 hours. There were 4 unfailed units removed from the test for other experiments at the following times: 50 100 125 and 150 hours. The remaining 10 unfailed units were removed from the test at 200 hours. The K-M estimates for this life test are: $R$(10) = 19/20 $R$(32) = 19/20 x 18/19 $R$(56) = 19/20 x 18/19 x 16/17 $R$(98) = 19/20 x 18/19 x 16/17 x 15/16 $R$(122) = 19/20 x 18/19 x 16/17 x 15/16 x 13/14 $R$(181) = 19/20 x 18/19 x 16/17 x 15/16 x 13/14 x 10/11 A General Expression for K-M Estimates A general expression for the K-M estimates can be written. Assume we have $n$ units on test and order the observed times for these $n$ units from $t_1$ to $t_n$. Some of these are actual failure times and some are running times for units taken off test before they fail. Keep track of all the indices corresponding to actual failure times. Then the K-M estimates are given by: $$ \hat{R}(t_i) = \prod_{\begin{array}{c} j \in S \\ t_j \le t_i \end{array}} \frac{n-j}{n-j+1} \, , $$ with the "hat" over $R$ indicating it is an estimate and $S$ is the set of all subscripts $j$ such that $t_j$ is an actual failure time. The notation $j \in S$ and $t_j$ less than or equal to $t_i$ means we only form products for indices $j$ that are in $S$ and also correspond to times of failure less than or equal to $t_i$. Once values for $R(t_i)$ are calculated, the CDF estimates are $F(t_i) = 1 - R(t_i)$.
A small modification of K-M estimates produces better results for probability plotting	Modified K-M Estimates The K-M estimate at the time of the last failure is $R(t_r)$ = 0 and $F(t_r)$ = 1. This estimate has a pessimistic bias and cannot be plotted (without modification) on a probability plot since the CDF for standard reliability models asymptotically approaches 1 as time approaches infinity. Better estimates for graphical plotting can be obtained by modifying the K-M estimates so that they reduce to the median rank estimates for plotting Type I Censored life test data (described in the next section). Modified K-M estimates are given by the formula $$ \hat{R}(t_i) = \frac{n + 0.7}{n + 0.4} \prod_{\begin{array}{c} j \in S \\ t_j \le t_i \end{array}} \frac{n-j+0.7}{n-j+1.7} \, . $$ Once values for $R(t_i)$ are calculated, the CDF estimates are $F(t_i) = 1 - R(t_i)$.