8.1.8.5. Standby model

8. Assessing Product Reliability
8.1. Introduction
8.1.8. How can you evaluate reliability from the "bottom-up" (component failure mode to system failure rate)?

8.1.8.5. Standby model

The Standby Model evaluates improved reliability when backup replacements are switched on when failures occur.

A Standby Model refers to the case in which a key component (or assembly) has an identical backup component in an "off" state until needed. When the original component fails, a switch turns on the "standby" backup component and the system continues to operate.

In the simple case, assume the non-standby part of the system has CDF $F(t)$ and there are ($n - 1$) identical backup units that will operate in sequence until the last one fails. At that point, the system finally fails.

The total system lifetime is the sum of $n$ identically distributed random lifetimes, each having CDF $F(t)$.

Identical backup Standby model leads to convolution formulas

In other words, $T_n = t_1 + t_2 + \cdots + t_n$, where each $t_i$ has CDF $F(t)$ and $T_n$ has a CDF we denote by $F_n(t)$. This can be evaluated using convolution formulas: $$ \begin{eqnarray} F_2(t) & = & \int_0^t F(u) f(t-u) du \\ & & \\ F_n(t) & = & \int_0^t F_{n-1}(u) f(t-u) du \end{eqnarray} $$ where $f(t)$ is the PDF $F'(t)$. In general, convolutions are solved numerically. However, for the special case when $F(t)$ is the exponential model, the above integrations can be solved in closed form.

Exponential standby systems lead to a gamma lifetime model

Special Case: The Exponential (or Gamma) Standby Model

If $F(t)$ has the exponential CDF (i.e., $F(t) = 1 - e^{-lt}$), then $$ \begin{eqnarray} F_2(t) & = & 1 - \lambda t e^{-\lambda t} - e^{-\lambda t} \\ & & \\ f_2(t) & = & \lambda^2 t e^{-\lambda t} \,\, , \mbox{ and} \\ & & \\ f_n(t) & = & \frac{\lambda^n t^{n-1} e^{-\lambda t}}{(n-1)!} \end{eqnarray} $$ and the PDF $f_n(t)$ is the well-known gamma distribution.

Example: An unmanned space probe sent out to explore the solar system has an onboard computer with reliability characterized by the exponential distribution with a Mean Time To Failure (MTTF) of $1/\lambda$ = 30 months (a constant failure rate of 1/30 = 0.033 fails per month). The probability of surviving a two year mission is only $\mbox{exp}(-24/30)$ = 0.45. If, however, a second computer is included in the probe in a standby mode, the reliability at 24 months (using the above formula for $F_2$) becomes 0.8 $\times$ 0.449 + 0.449 = 0.81. The failure rate at 24 months ($f_2/(1-F_2)$) reduces to [(24/900) $\times$ 0.449]/0.81 = 0.015 fails per month. At 12 months the failure rate is only 0.0095 fails per month, which is less than 1/3 of the failure rate calculated for the non-standby case.

Standby units (as the example shows) are an effective way of increasing reliability and reducing failure rates, especially during the early stages of product life. Their improvement effect is similar to, but greater than, that of parallel redundancy . The drawback, from a practical standpoint, is the expense of extra components that are not needed for functionality.