Reliability Engineering with Weibull Distributions

The Crow-AMSAA Reliability Growth Model

The Crow-AMSAA Reliability Growth Model provides a statistical framework through which to analyze the reliability of a system. This tool implements most of its statistical analyses according to the Crow-AMSAA model, as well as extends them to another slightly more flexible model.

A treatment of the Crow-AMSAA model can be found in the ReliaSoft Reliability Growth and Repairable System Analysis Reference. However, we present our own treatment below for notational consistency and in order to explain any differences between the procedures followed by ReliaSoft and ASML DRGA.

Motivation

Reliability – or, more accurately, the frequency of failure – is modeled as a continuous stochastic process.

Indeed, covariates or confounding variables might influence or indicate the likelihood of failure; the failure of one trial may depend on that of previous trials; and the process governing failure may vary in time. The stochastic process governing the reliability of a system may therefore be quite complicated and not cleanly follow a simple, standard, analytic form. Still, reliability engineers have found a simple, standard, analytic form that empirically performs quite well in the role modeling reliability data as a stochastic process. Properties such as reliability growth rate, confidence intervals, goodness-of-fit, expected reliability, and more all have corresponding assignments under this model.

A convenient and somewhat precedented postulate for reliability data asserts something about the reliability growth pattern of a system:

Duane Postulate: A plot of cumulative failure number \(N\) vs cumulative operational time \(T\) ought to take the form of a line on log-log scaling.

This is equivalent to saying that there is some power law \(p(x) = a \cdot x^b\) which relates cumulative failure number and cumulative time: i.e., \(N = p(T)\).

Without examples, a first glance at the Duane Postulate may give rise to certain questions:
- Why should these two properties be related at all?
- Even so, why should their relationship follow this power law and not, say, a polynomial, or a more generic curve?
- Why should this relationship persist over time?
- Why should this relationship be smooth?
- Notice: at the beginning of the process \(T = 0\), the derivative satisfies \(\frac{dN}{dT} = 0\); why should there be this "slow start"?

Most whys follow from two qualities with a lot of inertia: that
1. the initial developers of the model found that the model fit their data well, and conjectured its broader applicability; and that
2. the model is mathematically feasible and reasonably interpretable.

Its feasibility follows from the analyticity of its form: many can identify a line when a presented log-log plot, and many can follow the algebraic manipulations used in the model's derivations. Its interpretability follows from the slightly more vague intuition that a reliability engineer can develop for the roles and aspects of the two parameters \(a\) and \(b\) in the postulated power law. That the model only requires two parameters also makes curve fitting or parameter estimation very feasible by standard methods.

The Model for Failure Times Data

We put ourselves in an experiment where a singular component or system of components runs with the potential for failure. When a component fails, there may be downtime during which the component is fixed or replaced before resuming operation. We record time between failures, cumulative operational time which excludes any downtime, and a tally of how many failures have occurred. We introduce some notation for these quantities:
- Let \(T\) represent the cumulative (operational) time of the system.
- Let \(N(T)\) represent the cumulative number of failures that have occurred in the system after cumulative operational time \(T\).
- Let \(t(N)\) represent the (operational) time between the \(N\)-th failure and (\(N - 1\))-st failure. We might refer to this as the \(N\)-th gap time.

To have _failure times data_ means to have access to observations in the form \(N\) vs. \(T\), recording the time of occurrence of the first so-many failures in a system.

Suppose further that we model the expected cumulative number of cumulative failures \(N(T)\) at cumulative time \(T\) by some model \(\mathcal{N}(T; \theta)\) with parameters \(\theta\). Moreover, denote by \(\mathcal{N}'(T; \theta)\) the derivative of \(\mathcal{N}(T; \theta)\), which is a model of the expected value of \(N'(T)\).

In symbols, we begin with
\[\mathbb{E}[N(T)] = \mathcal{N}(T; \theta); \quad \mathbb{E}[N'(T)] = \mathcal{N}'(T; \theta).\]

Then we may deduce the formulae for a few quantities under that model.

We will derive these formulae in general and apply them to Crow-AMSAA. To that end, let us make more explicit how Crow-AMSAA models these quantities.

In Crow-AMSAA, the instantaneous failure rate (otherwise known as the _failure intensity_) \(\displaystyle N'(T) := \frac{dN}{dT}\) is usually modeled as a Weibull failure rate function of the form
\[N'(T) \sim \mathcal{N}_{\text{Crow-AMSAA}}'(T; \lambda, \beta) := \rho(T) := \lambda \beta T^{\beta - 1} \quad \text{;} \quad \lambda, \beta > 0.\]

(If this model looks a bit contrived, look to the simplicity of the formula in the next step.)

Taking the antiderivative, we recover the Crow-AMSAA model for cumulative number of failures:

\[N(T) \sim \mathcal{N}_{\text{Crow-AMSAA}}(T; \lambda, \beta) := \lambda T^\beta.\]

This form assumes that no failures have occured at time \(T = 0\), i.e., that \(N(0) = 0\). Moreover, this formula for \(N(T)\) recovers what we expect to see according to the Duane Postulate on which Crow-AMSAA is based.

A Probability Distribution on Cumulative Number of Failures

All trials of the system can be understood as independent instances of a Poisson process: while we might know the expected failure time of a trial, the exact moment of failure is randomly distributed in a reasonable way about that mean. In particular, under our model, we impose the event that "the cumulative number of failures equals some number \(N\) by cumulative time \(T\)" to have probability:

\[\mathbb{P}[N(T) = N \mid \lambda, \beta] = \frac{(\mathbb{E}[N(T)])^N \cdot e ^{\mathbb{E}[N(T)]}}{N!} = \frac{\mathcal{N}(T; \theta)^N \cdot e^{\mathcal{N}(T; \theta)}}{N!}.\]

This formula associates a high probability for the cumulative number of failures \(N(T)\) to be close in value to its expectation \(\mathbb{E}[N(T)]\), but still leaves open the possibility for very different values.

Applied to Crow-AMSAA,
\[\mathbb{P}[N(T) = N \mid \lambda, \beta] = \frac{(\lambda T^\beta)^N \cdot e ^{\lambda T^\beta}}{N!}.\]

Cumulative Failures in a Time Interval

Modeling \(N(T) \sim \mathcal{N}(T ; \theta)\), we expect to observe a certain number of failures \(n(T_1, T_2)\) over any given time interval \([T_1, T_2]\) equal to

\[\begin{aligned} \mathbb{E}[n(T_1, T_2)] &:= \mathbb{E}[N(T_2) - N(T_1)]\\ &= \int_{T_1}^{T_2} N'(T) dT\\ &= \int_{T_1}^{T_2} \mathcal{N}'(T; \theta) dT\\ &= \mathcal{N}(T; \theta)\big|^{T_2}_{T = T_1}\\ &= \mathcal{N}(T_2; \theta) - \mathcal{N}(T_1; \theta). \end{aligned} \]

In particular, under this model, we should expect the number of failures to ever occur by cumulative time \(T\) to be

\[\mathbb{E}[N(T)] = \mathcal{N}(T; \theta).\]

We can actually say more: we have a non-homogeneous Poisson process describing the probability distribution on the number of failures to occur in a given time interval as follows:

\[n(T_1, T_2) = N(T_2) - N(T_1) \sim \text{Poisson}\left(\mathbb{E}[N(T_2)] - \mathbb{E}[N(T_1)] \right) = \text{Poisson}\left(\mathcal{N}(T_2; \theta) -\mathcal{N}(T_1; \theta) \right).\]

Applied to Crow-AMSAA, on time interval \([T_1, T_2]\):

\[\begin{aligned} \mathbb{E}[n(T_1, T_2)] = \mathbb{E}[N(T_2) - N(T_1)] &= \lambda T_2^\beta - \lambda T_1^\beta,\\ \mathbb{E}[N(T)] &= \lambda T^\beta,\\ n(T_1, T_2) = N(T_2) - N(T_1) &\sim \text{Poisson}\left(\lambda T_2^\beta - \lambda T_1^\beta \right). \end{aligned} \]

Note on Notation

Optional symbols often float in replacement of or in addition to the above. We record them here for the purposes of comparing to other sources, but do not make use of them in this document:

- The parameter symbol \(\eta\) satisfies \(\displaystyle \lambda = \frac{1}{\eta^\beta}\), meaning the instantaneous failure rate is modeled as \(\displaystyle \frac{\beta}{\eta^ \beta} \cdot T^{\beta - 1}.\)
- The expectation \(\mathbb{E}[N(T)]\) of the cumulative number of failures by time \(T\) is sometimes denoted \(\theta(T)\), and at other times \(\Lambda(T)\), being referred to the cumulative intensity function.
- Sometimes \(t\) is used in place of \(T\).

Shifting Crow-AMSAA: A Modification to Crow-AMSAA

We may add an extra degree-of-freedom to the Crow-AMSAA model in the form of a temporal shift parameter \(\tau\). This modification will permit fitting Weibull distributions to data that may have undergone a temporal shift or have begun its dynamics matching the Duane Postulate at a later time than the experiment's "time zero." In symbols, we write the model as follows:

\[\begin{aligned} N'(T) &\sim \mathcal{N}_{\text{Shifting}}'(T; \lambda, \beta, \tau) := \beta \lambda (T + \tau)^{\beta - 1},\\ N(T) &\sim \mathcal{N}_{\text{Shifting}}(T; \lambda, \beta, \tau) := \lambda (T + \tau)^\beta. \end{aligned} \]

We dub this modified version of the Crow-AMSAA model the _Shifting Crow-AMSAA_ model.

One may derive similar expressions as shown above now under the Shifting model:

\[\begin{aligned} \mathbb{P}[N(T) = N \mid \lambda, \beta] &= \frac{(\lambda (T + \tau)^\beta)^N \cdot e ^{\lambda (T + \tau)^\beta}}{N!},\\ \mathbb{E}[n(T_1, T_2)] = \mathbb{E}[N(T_2) - N(T_1)] &= \lambda (T_2 + \tau)^\beta - \lambda (T_1 + \tau)^\beta,\\ \mathbb{E}[N(T)] &= \lambda (T + \tau)^\beta - \lambda \tau^\beta,\\ n(T_1, T_2) = N(T_2) - N(T_1) &\sim \text{Poisson}\left(\lambda (T_2 + \tau)^\beta - \lambda (T_1 + \tau)^\beta \right). \end{aligned} \]

Parameter Estimation

Establish a Likelihood Function

Maximumum Likelihood Estimation (MLE) is a popular method of statistical inference for estimating the most likely parameters by which the assumed model would have generated the observed data. It is equivalent to Maximum a Posteriori Estimation. The ingredients essential towards performing MLE are

1. An assumed probability distribution (i.e., _model_) \(\mathcal{M}(\cdot ; \cdot)\).
2. Observed data \(\mathbf{x}\).

We model the set of observations as a random sample from an unknown joint probability distribution expressible from some parameters \(\theta\). A likelihood function can be constructed from the assumed probability distribution which characterizes how likely the observed data would be from particular values for the model's parameters. Under a further assumption that observations are drawn independently and according to the an indentical distribution, we may express the likelihood function \(\mathcal{L}(\theta ; \mathbf{y})\) for all observations as a product of univariate likelihood functions all of the same form \(\mathcal{M}(y ; \theta)\). MLE optimizes the likelihood function to find its argument maximum \(\hat{\theta}\) call the _maximum liklihood estimate_ with respect to the assumed model and observed data.

In order to perform parameter estimation for failure times data, we should not simply take a product over all individual probabilities for \(N(T_i) = i\) given parameters \(\theta\), because the data actually tells us a bit more. Knowing that the \(i\)-th failure was observed at moment \(T_i\) tells us as well when the \(i\)-th moment _didn't_ occur: namely, in interval \((T_{i - 1}, T_i)\). We formalize this into the expression for the probability density function of the \(i\)-th failure given that the \((i - 1)\)-st failure occurred at time \(T_{i - 1}\):

\[\begin{aligned} f(T_i \mid T_{i - 1}, \lambda, \beta) &= \mathbb{E}[N'(T_i) \mid \theta] \cdot \exp(- (\mathbb{E}[N(T_i) \mid \theta] - \mathbb{E}[N(T_{i - 1}) \mid \theta]))\\ &= \mathcal{N}'(T_i) \cdot \exp(- (\mathcal{N}(T_i; \theta) - \mathcal{N}(T_{i - 1} ; \theta))). \end{aligned} \]

With this, we construct the likelihood function for estimating the parameters \(\theta\) given some data \(\mathbf{x} = ((T_i, i))_{i = 1}^k\) in the form \(N\) vs \(T\) as follows:

\[\mathcal{L}(\theta; \mathbf{x}) = \prod_{i = 1}^k f(T_i \mid T_{i - 1}, \theta) = \prod_{i = 1}^k \mathcal{N}'(T_i) \cdot \exp(- (\mathcal{N}(T_i; \theta) - \mathcal{N}(T_{i - 1} ; \theta))) = \mathcal{N}(T_k; \theta) \prod_{i = 1}^k \mathcal{N}'(T_i),\]

where we have set \(T_0 = 0\) and assuming our model assigns zero failures upon the onset of the experiment.

In most cases, numerical optimization is the only hope for approximating the maximum likelihood estimate. For convenience, one might look towards performing MLE on the _logarithm_ of the likelihood function: \(\ln \mathcal{L}(\theta ; \mathbf{y})\). Since the logarithm is a monotonic function, the argument maximum of \(\ln \mathcal{L}(\cdot ; \mathbf{y})\) is the same as that of \(\mathcal{L}(\cdot ; \mathbf{y})\). This log-likelihood function may involve smaller numbers to avoid overflow, and possibly easier algebraic forms (e.g., sums instead of products) that are easier to implement.

Taking the logarithm, we find the log-likelihood to be:

\[\ln \mathcal{L}(\theta ; \mathbf{x}) = \ln \mathcal{N}(T_k; \theta) + \sum_{i = 1}^k \ln \mathcal{N}'(T_i).\]

Crow-AMSAA

For Crow-AMSAA, one adopts the model \(N(T) \sim \mathcal{M}_{\text{Crow-AMSAA}}(T ; \lambda, \beta) := \text{Poisson} \left(\lambda T^\beta \right)\). The corresponding quantities are:

\[\begin{aligned} f_{\text{Crow-AMSAA}}(T_i \mid T_{i - 1}, \lambda, \beta) &= \beta \lambda T_i^{\beta - 1} \cdot \exp \left(- \lambda \left(T_i^\beta - T_{i - 1}^\beta \right)\right),\\ \mathcal{L}_{\text{Crow-AMSAA}}(\lambda, \beta; \mathbf{x}) &= (\beta \lambda)^k e^{-\lambda T_k^\beta} \prod_{i = 1}^k T_i^{\beta - 1},\\ \ln \mathcal{L}_{\text{Crow-AMSAA}}(\lambda, \beta ; \mathbf{x}) &= k \ln \lambda + k \ln \beta - \lambda T_k^\beta + (\beta - 1) \sum_{i = 1}^k \ln T_i. \end{aligned} \]

Shifting Crow-AMSAA

And for the Shifting Crow-AMSAA model,

\[\begin{aligned} N(T) \sim \mathcal{M}_{\text{Shifting}}(T ; \lambda, \beta, \tau) &:= \text{Poisson} \left(\lambda (T + \tau)^\beta \right),\\ f_{\text{Shifting}}(T_i \mid T_{i - 1}, \lambda, \beta, \tau) &= \beta \lambda (T_i + \tau)^{\beta - 1} \cdot \exp \left(- \lambda \left((T_i + \tau)^\beta - (T_{i - 1} + \tau)^\beta \right) \right),\\ \mathcal{L}_{\text{Shifting}}(\lambda, \beta, \tau ; \mathbf{x}) &= (\beta \lambda)^k e^{-\lambda ((T_k + \tau)^\beta - \tau^\beta)} \prod_{i = 1}^k (T_i + \tau)^{\beta - 1},\\ \ln \mathcal{L}_{\text{Shifting}}(\lambda, \beta, \tau ; \mathbf{x}) &= k \ln \lambda + k \ln \beta - \lambda ((T_k + \tau)^\beta - \tau^\beta) + (\beta - 1) \sum_{i = 1}^k \ln (T_i + \tau). \end{aligned} \]

Solution

Now we have two approaches towards finding the maximum likelihood estimate for the parameters of our model. The first way is numerical: simply pass the formula for \(\ln \mathcal{L}\) to a standard solver and get estimates \(\hat{\theta}\). This takes the form of a search for the argument-maxima of the log-likelihood function, which, by monotonicity of the logarithm function, is equivalent to maximizing the likelihood function; so the resulting parameter fits \(\hat{\theta}\) will represent the most likely parameters to have produced the observed data under our model \(N(T) \sim \mathcal{N}(T; \theta)\). Alternatively, we attempt to estimate these values analytically via application of derivative tests.

We will see both approaches in our analysis of the two models for failure times data. We mention first one criticism of the Crow-AMSAA model (both the standard or shifting versions): for either failure times data or grouped data, the fit curve will pass through the latest point in the \(N\) vs \(T\) cumulative plot, adorning seemingly undue influence to the chronologically last observation. We will see in both contexts how this phenomenon follows from the Duane Postulate.

Crow-AMSAA

Let us begin with Crow-AMSAA. We may solve this maximization problem analytically first by deriving the partial first derivatives of the log-likelihood function:

\[\begin{aligned} \frac{\partial (\ln \mathcal{L})}{\partial \lambda} &= \frac{k}{\lambda} - T_k^{\beta},\\ \frac{\partial (\ln \mathcal{L})}{\partial \beta} &= \frac{k}{\beta} - \lambda T_k^\beta \ln T_k + \sum_{i = 1}^k \ln T_i. \end{aligned} \]

Setting both partial first derivatives in \(\lambda\) to zero produces the system:

\[\begin{cases} \hat{\lambda} &= \frac{k}{T_k^{\hat{\beta}}},\\ \hat{\beta} &= \frac{k}{k \ln T_k - \sum_{i = 1}^k \ln T_i}. \end{cases}\]

Here is our first indication that fitting under Crow-AMSAA necessarily produces a fitted curve passing through the latest observation; namely, the point \((T, N) = (T_k, k)\).

It's important to note that this result for \(\hat{\beta}\) is a _biased_ estimator, meaning its expected value does not match the real \(\beta\). We may unbias it by computing a new estimate \(\hat{\beta} := \frac{k - 1}{k} \hat{\beta}\). The fitted curve \((\lambda, \beta) = (\hat{\lambda}, \bar{\beta})\) would _not_ squarely pass through the chronologically last observation.

Shifting Crow-AMSAA

Under Shifting Crow-AMSAA, the partial first derivatives of the log-likelihood function are:

\[\begin{aligned} \frac{\partial (\ln \mathcal{L})}{\partial \lambda} &= \frac{k}{\lambda} - \left((T_k + \tau)^{\beta} - \tau^\beta \right),\\ \frac{\partial (\ln \mathcal{L})}{\partial \beta} &= \frac{k}{\beta} - \lambda \left((T_k + \tau)^\beta \ln (T_k + \tau) - \tau^\beta \ln \tau \right) + \sum_{i = 1}^k \ln (T_i + \tau),\\ \frac{\partial (\ln \mathcal{L})}{\partial \tau} &= - \beta \lambda \left((T_k + \tau)^{\beta - 1} - \tau^{\beta - 1} \right) + (\beta - 1) \sum_{i = 1}^k \frac{1}{T_i + \tau}. \end{aligned} \]

Setting the partial first derivative in \(\lambda\) to zero yields:
\[\hat{\lambda} = \frac{k}{(T_k + \hat{\tau})^{\hat{\beta}} - \hat{\tau}^{\hat{\beta}}}.\]

Again, the maximally likelihood estimates for \(\lambda\), \(\beta\), and \(\tau\) will produce a fitted curve passing through a special point related to the final observation.

Error Propagation

We also set up an error propagation method to produce error bars around an estimated value for some quantity which we will denote by \(X(T)\) given some maximum likelihood estimate \(\hat{\theta}\). We follow the Fisher matrix method, which uses the covariance matrix between all parameters in the model to propagate error in our estimate for \(X(T)\).

In particular, the Fisher information matrix is defined as
\[\mathcal{F} = \left[\frac{\partial^2(\ln \mathcal{L})}{\partial \theta_i \partial \theta_j} \right]_{i,j},\]

where \(\theta_i\) and \(\theta_j\) are the \(i\)-th and \(j\)-th parameters in the model. We derive the covariance matrix as

\[\mathbf{K} = \begin{bmatrix} \text{Var}(\hat{p}_1) & \text{Cov}(\hat{p}_1, \hat{p}_2) & \cdots & \\ \text{Cov}(\hat{p}_1, \hat{p}_2) & \text{Var}(\hat{p}_2) & \cdots & \\ \vdots & \vdots & \ddots & \\ & & & \text{Var}(\hat{p}_\ell, \hat{p}_\ell) \end{bmatrix} = [-\mathcal{F}]^{-1}.\]

Finally, we compute upper and lower bounds based on confidence level \(1 - \alpha\) by first computing the corresponding \(z\)-score \(z_\alpha\) under a normal distribution (according to whether we are performing a 1-sided or 2-sided confidence interval). The underlying assumption is that the estimate \(\hat{X}(T)\) for \(X(T)\) is normally distributed about \(X(T)\). The upper and lower bounds are computed to be:

\[X_\pm(T) = \hat{X}(T) \cdot \exp \left(\pm z_\alpha \sqrt{\text{Var}(\hat{X}(T))} / \hat{X}(T) \right),\]

where we set

\[\begin{aligned} \hat{X}(T) &:= \mathbb{E}\left[X(T) \mid \hat{\theta}\right],\\ \text{Var}(\hat{X}(T)) &:= \sum_{i, j} \left(\frac{\partial \hat{X}}{\partial p_i} \cdot \frac{\partial \hat{X}}{\partial p_j} \bigg|_{\theta = \hat{\theta}} \right) K_{i,j} = \sum_{i} \left(\frac{\partial \hat{X}}{\partial p_i} \bigg|_{\theta = \hat{\theta}} \right)^2 \text{Var}(\hat{p}_i) + 2 \sum_{i \neq j} \left(\frac{\partial \hat{X}}{\partial p_i} \cdot \frac{\partial \hat{X}}{\partial p_j} \bigg|_{\theta = \hat{\theta}} \right) \text{Cov}(\hat{p}_i, \hat{p}_j). \end{aligned} \]

Three quantities for which we would like to propagate error for prediction include the cumulative number of failure \(N(T)\), the failure intensity \(\rho(T)\), and its reciprocal: the Mean Time Between Failure (MTBF). Because the Crow-AMSAA and Shifting Crow-AMSAA models provide different numerical models for each of these quantities, we must perform each propagation separately under both models.

Crow-AMSAA

For Crow-AMSAA, the Fisher matrix takes the form:

\[\mathbf{K}_{\text{Crow-AMSAA}} = \begin{bmatrix} \text{Var}(\hat{\lambda}) & \text{Cov}(\hat{\lambda}, \hat{\beta}) \\ \text{Cov}(\hat{\lambda}, \hat{\beta}) & \text{Var}(\hat{\beta}) \end{bmatrix} = \begin{bmatrix} \frac{k}{\hat{\lambda}^2} & T_k^{\hat{\beta}} \ln T_k \\ T_k^{\hat{\beta}} \ln T_k & \frac{k}{{\hat{\beta}}^2} + k \cdot (\ln T_k)^2 \end{bmatrix} ^{-1}.\]

Propagating Error for Cumulative Number of Failures

When propagating error for the cumulative number of failures \(\hat{N}\) according to Crow-AMSAA, this looks like

\[\begin{aligned} \hat{N}(T) &= \hat{\lambda} \cdot T^{\hat{\beta}}, \\ \text{Var}(\hat{N}(T)) &= \left(\frac{\partial \hat{N}}{\partial \lambda} \bigg|_{\theta = \hat{\theta}} \right)^2 \text{Var}(\hat{\lambda}) + \left(\frac{\partial \hat{N}}{\partial \beta} \bigg|_{\theta = \hat{\theta}} \right)^2 \text{Var}(\hat{\beta}) + 2 \left(\frac{\partial \hat{N}}{\partial \lambda} \cdot \frac{\partial \hat{N}}{\partial \beta} \bigg|_{\theta = \hat{\theta}} \right) \text{Cov}(\hat{\lambda}, \hat{\beta}), \end{aligned} \]

where the partial first derivatives of the estimate \(\hat{N}(T)\) are:

\[\begin{aligned} \frac{\partial \hat{N}}{\partial \lambda} &= T^\beta = \frac{\hat{N}(T)}{\lambda},\\ \frac{\partial \hat{N}}{\partial \beta} &= \lambda T^\beta \ln T = \hat{N}(T) \ln T. \end{aligned} \]

Propagating Error for Failure Intensity

We may propagate error for other quantities as well, such as the for the failure intensity \(\rho(T)\) and its reciprocal, the Mean Time Between Failure (MTBF) under Crow-AMSAA. For \(\rho(T)\), we estimate

\[\hat{\rho}(T) = \hat{\beta} \hat{\lambda} T^{\hat{\beta} - 1},\]

and obtain the value of \(\text{Var}(\hat{\rho}(T))\) by computing the following partial first derivatives of the estimate \(\rho(T)\):

\[\begin{aligned} \frac{\partial \hat{\rho}}{\partial \lambda} &= \beta T^{\beta - 1} = \frac{\rho(T)}{\lambda},\\ \frac{\partial \hat{\rho}}{\partial \beta} &= \lambda T^{\beta - 1} ( 1 + \beta \ln T ) = \left( \frac{1}{\beta} + \ln T \right) \rho(T). \end{aligned} \]

Propagating Error for Mean Time Between Failures

For \(\text{MTBF}(T)\) under Crow-AMSAA, we have

\[\widehat{\text{MTBF}} = \frac{1}{\hat{\rho}(T)},\]

and obtain the value of \(\text{Var}(\widehat{\text{MTBF}}(T))\) by computing the following partial first derivatives of the estimate \(\text{MTBF}(T)\):

\[\begin{aligned} \frac{\partial \widehat{\text{MTBF}}}{\partial \lambda} &= - \frac{1}{\lambda} \widehat{\text{MTBF}}(T),\\ \frac{\partial \widehat{\text{MTBF}}}{\partial \beta} &= - \left(\frac{1}{\beta} + \ln T \right) \widehat{\text{MTBF}}(T). \end{aligned} \]

Shifting Crow-AMSAA

For the Shifting Crow-AMSAA model, the covariance matrix is computed from the Fisher information matrix as follows:

\[{\tiny \begin{aligned} \mathbf{K}_{\text{Shifting}} &= \begin{bmatrix} \text{Var}(\hat{\lambda}) & \text{Cov}(\hat{\lambda}, \hat{\beta}) & \text{Cov}(\hat{\lambda}, \hat{\tau}) \\ \text{Cov}(\hat{\lambda}, \hat{\beta}) & \text{Var}(\hat{\beta}) & \text{Cov}(\hat{\beta}, \hat{\tau})\\ \text{Cov}(\hat{\lambda}, \hat{\tau}) & \text{Cov}(\hat{\beta}, \hat{\tau}) & \text{Var}(\hat{\tau}) \end{bmatrix}\\ &= \begin{bmatrix} \frac{k}{\hat{\lambda}^2} & (T_k + \hat{\tau})^{\hat{\beta}} \ln (T_k + \tau) - \hat{\tau}^{\hat{\beta}} \ln \hat{\tau} & \hat{\beta} \left( (T_k + \hat{\tau})^{\hat{\beta} - 1} - \hat{\tau}^{\hat{\beta} - 1}\right)\\ (T_k + \hat{\tau})^{\hat{\beta}} \ln (T_k + \tau) - \hat{\tau}^{\hat{\beta}} \ln \hat{\tau} & \frac{k}{\hat{\beta}^2} + \hat{\lambda} \left((T_k + \hat{\tau})^{\hat{\beta}}(\ln (T_k + \hat{\tau}))^2 - \hat{\tau}^{\hat{\beta}} (\ln \hat{\tau})^2 \right) & \hat{\beta} \hat{\lambda} \left((T_k + \hat{\tau})^{\hat{\beta} - 1} (\ln(T_k + \hat{\tau}) + 1) - \hat{\tau}^{\hat{\beta} - 1} (\ln \hat{\tau} + 1) \right) - \sum_{i = 1}^k \frac{1}{T_k + \hat{\tau}} \\ \hat{\beta} \left( (T_k + \hat{\tau})^{\hat{\beta} - 1} - \hat{\tau}^{\hat{\beta} - 1}\right) & \hat{\beta} \hat{\lambda} \left((T_k + \hat{\tau})^{\hat{\beta} - 1} (\ln(T_k + \hat{\tau}) + 1) - \hat{\tau}^{\hat{\beta} - 1} (\ln \hat{\tau} + 1) \right) - \sum_{i = 1}^k \frac{1}{T_k + \hat{\tau}} & (\hat{\beta} - 1) \left(\hat{\beta} \hat{\lambda} \left((T_k + \hat{\tau})^{\hat{\beta} - 2} - \hat{\tau}^{\hat{\beta} - 2} \right) - \sum_{i = 1}^k \frac{1}{(T_k + \hat{\tau})^2} \right) \end{bmatrix} ^{-1}. \end{aligned} }\]

Propagating Error for Cumulative Number of Failures

For the cumulative number of failures, we estimate \(N(T)\) and its variance under Shifting Crow-AMSAA as follows:

\[\begin{aligned} \hat{N}(T) &= \hat{\lambda} \cdot (T + \hat{\tau})^{\hat{\beta}}, \\ \text{Var}(\hat{N}(T)) &= \left(\frac{\partial \hat{N}}{\partial \lambda} \bigg|_{\theta = \hat{\theta}} \right)^2 \text{Var}(\hat{\lambda}) + \left(\frac{\partial \hat{N}}{\partial \beta} \bigg|_{\theta = \hat{\theta}} \right)^2 \text{Var}(\hat{\beta}) + \left(\frac{\partial \hat{N}}{\partial \tau} \bigg|_{\theta = \hat{\theta}} \right)^2 \text{Var}(\hat{\tau})\\ &+ 2 \left(\frac{\partial \hat{N}}{\partial \lambda} \cdot \frac{\partial \hat{N}}{\partial \beta} \bigg|_{\theta = \hat{\theta}} \right) \text{Cov}(\hat{\lambda}, \hat{\beta}) + 2 \left(\frac{\partial \hat{N}}{\partial \lambda} \cdot \frac{\partial \hat{N}}{\partial \tau} \bigg|_{\theta = \hat{\theta}} \right) \text{Cov}(\hat{\lambda}, \hat{\tau}) + 2 \left(\frac{\partial \hat{N}}{\partial \beta} \cdot \frac{\partial \hat{N}}{\partial \tau} \bigg|_{\theta = \hat{\theta}} \right) \text{Cov}(\hat{\beta}, \hat{\tau}), \end{aligned} \]

where the partial first derivatives of the estimate \(\hat{N}(T)\) are:

\[\begin{aligned} \frac{\partial \hat{N}}{\partial \lambda} &= (T + \tau)^\beta = \frac{\hat{N}(T)}{\lambda},\\ \frac{\partial \hat{N}}{\partial \beta} &= \lambda (T + \tau)^\beta \ln (T + \tau) = \hat{N}(T) \ln(T + \tau),\\ \frac{\partial \hat{N}}{\partial \tau} &= \beta \lambda (T + \tau)^{\beta - 1} = \frac{\beta \hat{N}(T)}{T + \tau}. \end{aligned} \]

Propagating Error for Failure Intensity

For \(\rho(T)\) under Shifting Crow-AMSAA, we estimate

\[\hat{\rho}(T) = \hat{\beta} \hat{\lambda} (T + \hat{\tau})^{\hat{\beta} - 1},\]

and obtain the value of \(\text{Var}(\hat{\rho}(T))\) by computing the following partial first derivatives of the estimate \(\rho(T)\):

\[\begin{aligned} \frac{\partial \hat{\rho}}{\partial \lambda} &= \beta (T + \tau)^{\beta - 1} = \frac{\rho(T)}{\lambda},\\ \frac{\partial \hat{\rho}}{\partial \beta} &= \lambda (T + \tau)^{\beta - 1} ( 1 + \beta \ln (T + \tau) ) = \left( \frac{1}{\beta} + \ln (T + \tau) \right) \rho(T) ,\\ \frac{\partial \hat{\rho}}{\partial \tau} &= (\beta - 1) \beta \lambda (T + \tau)^{\beta - 2} = \frac{\beta - 1}{T + \tau} \rho(T). \end{aligned} \]

Propagating Error for Mean Time Between Failures

For \(\text{MTBF}(T)\) under Shifting Crow-AMSAA, we have

\[\widehat{\text{MTBF}} = \frac{1}{\hat{\rho}(T)},\]

and obtain the value of \(\text{Var}(\widehat{\text{MTBF}}(T))\) by computing the following partial first derivatives of the estimate \(\text{MTBF}(T)\):

\[\begin{aligned} \frac{\partial \widehat{\text{MTBF}}}{\partial \lambda} &= - \frac{1}{\lambda} \widehat{\text{MTBF}}(T),\\ \frac{\partial \widehat{\text{MTBF}}}{\partial \beta} &= - \left(\frac{1}{\beta} + \ln(T + \tau) \right) \widehat{\text{MTBF}}(T) ,\\ \frac{\partial \widehat{\text{MTBF}}}{\partial \tau} &= -\frac{\beta - 1}{T + \tau} \widehat{\text{MTBF}}(T). \end{aligned} \]

The Model for Grouped Data

As opposed to failure times data, _grouped data_ is assumed to arrive as \(d\) many observations of the form \(n\) vs. \(T\), where \(n\) refers to the number of failures observed between two values of cumulative time \(T_{i - 1}\) and \(T_i\). Certain failure times between those observations are unknown; so we view the grouped data format as a weaker representation than the full failure times format.

The interpretation of the \(T\) column is subtly different than that for failure times data: we do _not_ interpret row data \((n_i, T_i)\) to mean: "the \(\left(\sum_{j \leq i} n_i\right)\)-th failure happened at time \(T_i\)," but rather that "\(\left(\sum_{j \leq i} n_i\right)\) many failures have occurred by time \(T_i\)."

Parameter Estimation

In order to fit a model such as \(\mathcal{N}_{\text{Crow-AMSAA}}(T) = \lambda \cdot T^\beta\) to grouped data, one could follow one of a few paths. One might hope to recover failure times data \(N\) vs \(T\) from grouped data, but this is (1) not uniquely determined, and (2) not necessary in order to perform curve fitting. The Crow-AMSAA model already outlines a Maximum Likelihood Estimation method for fitting \(\lambda\) and \(\beta\) to grouped data; and we extend this procedure to any model \(\mathcal{N}(T; \theta)\), including to the Shifted Crow-AMSAA model.

Establish a Likelihood Function

Recall under a model:

\[\mathbb{E}[N(T)] = \mathcal{N}(T; \theta); \quad \mathbb{E}[N'(T)] = \mathcal{N}'(T; \theta),\]

we have the following expression for the probability distribution on the number of failures to occur in a given time interval:

\[n(T_1, T_2) = N(T_2) - N(T_1) \sim \text{Poisson}\left(\mathcal{N}(T_2; \theta) -\mathcal{N}(T_1; \theta) \right).\]

Let us refer to the given grouped data in the form \(n\) vs. \(T\) as \(\mathbf{x} = \left( (T_i, n_i) \right)_{i = 1}^d\). Under the above model for \(n\), we may establish a likelihood function describing how likely we find the parameters \(\theta\) given previous observations \(\mathbf{x}\): all observations in \(\mathbf{x}\) are assumed independent, so we may write the probability for parameters \(\theta\) producing these observations as the product of probabilities of the individual events according to the assumed Poisson distribution:

\[\begin{aligned} \mathcal{L}(\theta; \mathbf{x}) &:= \prod_{i = 1}^d \mathbb{P}_{\mathcal{N}} (n(T_{i - 1}, T_i) = n_i \mid \theta)\\ &= \prod_{i = 1}^d \frac{\left(\mathcal{N}(T_i; \theta) -\mathcal{N}(T_{i - 1}; \theta)\right)^{n_i} \exp\left(-(\mathcal{N}(T_i; \theta) -\mathcal{N}(T_{i - 1}; \theta))\right)}{n_i!}\\ &= e^{-\mathcal{N}(T_d; \theta)} \prod_{i = 1}^d \frac{\left(\mathcal{N}(T_i; \theta) -\mathcal{N}(T_{i - 1}; \theta)\right)^{n_i}}{n_i!}, \end{aligned} \]

where we have set \(T_0 = 0\) and assume the model to assign zero failures at the onset of the experiment. Taking logarithms of both sides, we obtain the log-likelihood function:

\[\ln\mathcal{L}(\theta; \mathbf{x}) = -\mathcal{N}(T_d ; \theta) + \sum_{i = 1}^d \left[n_i \ln (\mathcal{N}(T_i; \theta) -\mathcal{N}(T_{i - 1}; \theta) ) - \ln n_i! \right].\]

Fix the symbol \(k\) to mean the total number of failures observed across the grouped data. I.e.,

\[k = \sum_{i = 1}^d n_i.\]

Further, interpret \(T_d\) as the final cumulative observation time.

Crow-AMSAA

Applied to Crow-AMSAA, the log-likelihood function for grouped data can be written as:

\[\ln\mathcal{L}_{\text{Crow-AMSAA}}(\lambda, \beta; \mathbf{x}) = - \lambda T_d^\beta + k \ln \lambda + \sum_{i = 1}^d n_i \ln \left(T_i^\beta - T_{i - 1}^\beta \right) - \sum_{i = 1}^d \ln n_i!.\]

Shifting Crow-AMSAA

For Shifting-Crow-AMSAA, the log-likelihood function for grouped data is:

\[\ln\mathcal{L}_{\text{Shifting}}(\lambda, \beta, \tau; \mathbf{x}) = - \lambda \left((T_d + \tau)^\beta - \tau^\beta \right) + k \ln \lambda + \sum_{i = 1}^d n_i \ln \left((T_i + \tau)^\beta - (T_{i - 1} + \tau)^\beta \right) - \sum_{i = 1}^d \ln n_i!.\]

Solution

Attempt to solve the maximization problem analytically first by deriving the partial first derivatives of the log-likelihood function and setting them to zero. For both models, we will need to resort to numerical simulation to perform the estimation.

Crow-AMSAA

For Crow-AMSAA, we begin by taking first partial derivatives:

\[\begin{aligned} \frac{\partial (\ln \mathcal{L})}{\partial \lambda} &= \frac{k}{\lambda} - T_d^{\beta},\\ \frac{\partial (\ln \mathcal{L})}{\partial \beta} &= \sum_{i = 1}^d n_i \left[\frac{T_i^\beta \ln T_i - T_{i - 1}^\beta \ln T_{i - 1}}{T_i^\beta - T_{i - 1}^\beta} \right] - \lambda T_d^\beta \ln T_d. \end{aligned} \]

Setting the partial first derivative in \(\lambda\) to zero produces the relation:

\[\hat{\lambda} = \frac{k}{T_d^{\hat{\beta}}}.\]

So, the results of MLE on grouped data – whether done fully numerically or analytically – ought to obey this relation. It is exactly this condition that guarantees that the resulting curve defined by the parameters \(\hat{\lambda}\) and \(\hat{\beta}\) passes through the final observation \((T, N) = (T_d, k)\). Algebraically, the non-homogeneous Poisson process by which we model the expected number of failures to occur in a given time interval – when concatenated across multiple contiguous time intervals – contributes a telescoping series of terms whose extreme terms are the only to survive: \(\mathcal{N}(T_d ; \theta) - \mathcal{N}(T_0 ; \theta) = \mathcal{N}(T_d ; \theta)\) (since the number of failures is assumed to be zero at initial time moment \(T_0 = 0\)). Therefore, this phenomenon is a feature of Crow-AMSAA: it follows from the Duane Postulate.

We cannot analytically solve the relation implied by setting \(0 = \partial (\ln \mathcal{L}) / \partial \beta\), i.e.,

\[0 = \sum_{i = 1}^d n_i \left[\frac{T_i^{\hat{\beta}} \ln T_i - T_{i - 1}^{\hat{\beta}} \ln T_{i - 1}}{T_i^{\hat{\beta}} - T_{i - 1}^{\hat{\beta}}} - \ln T_d \right].\]

However, one still can numerically solve the resulting equation to approximate \(\hat{\beta}\), and then obtain \(\hat{\lambda}\) from the previous relation.

Shifting Crow-AMSAA

The partial first derivatives in each direction of the log-likelihood function under the Shifting Crow-AMSAA model for grouped data are as follows:

\[\begin{aligned} \frac{\partial (\ln \mathcal{L})}{\partial \lambda} &= \frac{k}{\lambda} - \left((T_d + \tau)^\beta - \tau^\beta \right),\\ \frac{\partial (\ln \mathcal{L})}{\partial \beta} &= \sum_{i = 1}^d n_i \left[\frac{(T_i + \tau)^{\beta} \ln (T_i + \tau) - (T_{i - 1} + \tau)^{\beta} \ln (T_{i - 1} + \tau)}{(T_i + \tau)^{\beta} - (T_{i - 1} + \tau)^{\beta}}\right] - \lambda \left((T_d + \tau)^\beta \ln (T_d + \tau) - \tau^\beta \ln \tau \right),\\ \frac{\partial (\ln \mathcal{L})}{\partial \tau} &= \beta \sum_{i = 1}^d n_i \left[\frac{(T_i + \tau)^{\beta - 1} - (T_{i - 1} + \tau)^{\beta - 1}}{(T_i + \tau)^{\beta} - (T_{i - 1} + \tau)^{\beta}}\right] - \beta \lambda \left((T_d + \tau)^{\beta - 1} - \tau^{\beta - 1} \right). \end{aligned} \]

Setting the partial first derivative of the log-likelihood function in \(\lambda\) yields:

\[\hat{\lambda} = \frac{k}{(T_d + \hat{\tau})^{\hat{\beta}} - \hat{\tau}^\beta}.\]

Once again, numerical simulation can help us with the final two relations in particular, obtaining for us estimates \(\hat{\beta}\) and \(\hat{\tau}\) from which we may compute the likeliest estimate \(\hat{\lambda}\). In particular, this equation guarantees that the resulting curve defined using \(\hat{\lambda}\), \(\hat{\beta}\), and \(\hat{\tau}\) will also pass through the point \((T, N) = (T_d, k)\).

Error Propagation

We will be interested in propagating error for the same quantities as in the case of failure times data. The formulas for the estimate and variance of the quantities of interest (such as \(N(T)\), \(\rho(T)\), and \(\text{MTBF}(T)\)) all follow from the same in the previous section on failure times data for the corresponding model; but the Fisher matrices will differ since grouped data presents a different likelihood function.

Crow-AMSAA

For the purposes of error propagation of some quantity under the Crow-AMSAA model for grouped data, we find the covariance matrix from the Fisher information matrix:

\[\mathbf{K}_{\text{Crow-AMSAA}} = \begin{bmatrix} \text{Var}(\hat{\lambda}) & \text{Cov}(\hat{\lambda}, \hat{\beta}) \\ \text{Cov}(\hat{\lambda}, \hat{\beta}) & \text{Var}(\hat{\beta}) \end{bmatrix} = \begin{bmatrix} \frac{k}{\hat{\lambda}^2} & T_d^{\hat{\beta}} \ln T_d \\ T_d^{\hat{\beta}} \ln T_d & \hat{\lambda} T_d^{\hat{\beta}} (\ln T_d)^2 - \sum_i n_i \left[\frac{T_{i}^{\hat{\beta}} T_{i - 1}^{\hat{\beta}} (\ln T_i - \ln T_{i - 1})^2}{\left(T_i^{\hat{\beta}} - T_{i - 1}^{\hat{\beta}}\right)^2} \right] \end{bmatrix} ^{-1}.\]

Finally, we compute upper and lower bounds based on confidence level \(1 - \alpha\) by first computing the corresponding \(z\)-score \(z_\alpha\) under a normal distribution (according to whether we are performing a 1-sided or 2-sided confidence interval).

Shifting Crow-AMSAA

For the purposes of error propagation, we find the covariance matrix from the Fisher information matrix under the Shifting Crow-AMSAA model for grouped data:

\[\mathbf{K}_{\text{Shifting}} = \begin{bmatrix} \text{Var}(\hat{\lambda}) & \text{Cov}(\hat{\lambda}, \hat{\beta}) & \text{Cov}(\hat{\lambda}, \hat{\tau}) \\ \text{Cov}(\hat{\lambda}, \hat{\beta}) & \text{Var}(\hat{\beta}) & \text{Cov}(\hat{\beta}, \hat{\tau})\\ \text{Cov}(\hat{\lambda}, \hat{\tau}) & \text{Cov}(\hat{\beta}, \hat{\tau}) & \text{Var}(\hat{\tau}) \end{bmatrix},\]

where

\[\tiny { \begin{aligned} \text{Var}(\hat{\lambda}) &= \frac{k}{\hat{\lambda}^2} ,\\ \text{Var}(\hat{\beta}) &= \hat{\lambda}\left((T_d + \hat{\tau})^{\hat{\beta}} (\ln(T_d + \hat{\tau}))^2 - \hat{\tau}^{\hat{\beta}} (\ln \hat{\tau})^2 \right) - \sum_{i = 1}^d n_i \left[\frac{(T_i + \hat{\tau})^{\hat{\beta}} (T_{i - 1} + \hat{\tau})^{\hat{\beta}} (\ln (T_i + \hat{\tau}) - \ln (T_{i - 1} + \hat{\tau}))^2}{((T_i + \hat{\tau})^{\hat{\beta}} - (T_{i - 1} + \hat{\tau})^{\hat{\beta}})^2} \right],\\ \text{Var}(\hat{\tau}) &= (\hat{\beta} - 1)\left((T_d + \hat{\tau})^{\hat{\beta} - 2} - \hat{\tau}^{\hat{\beta} - 2} \right)\\ & - \hat{\beta} \sum_{i = 1}^d n_i \left[ \frac{(\hat{\beta} - 1) \left((T_i + \hat{\tau})^{\hat{\beta} - 2} - (T_{i - 1} + \hat{\tau})^{\hat{\beta} - 2} \right)((T_i + \hat{\tau})^{\hat{\beta}} - (T_{i - 1} + \hat{\tau})^{\hat{\beta}}) - \hat{\beta} \left((T_i + \hat{\tau})^{\hat{\beta} - 1} - (T_{i - 1} + \hat{\tau})^{\hat{\beta} - 1} \right)^2}{((T_i + \hat{\tau})^{\hat{\beta}} - (T_{i - 1} + \hat{\tau})^{\hat{\beta}})^2} \right],\\ \text{Cov}(\hat{\lambda}, \hat{\beta}) &= (T_d + \hat{\tau})^{\hat{\beta}} \ln (T_d + \tau) - \hat{\tau}^{\hat{\beta}} \ln \hat{\tau},\\ \text{Cov}(\hat{\lambda}, \hat{\tau}) &= \hat{\beta} \left( (T_d + \hat{\tau})^{\hat{\beta} - 1} - \hat{\tau}^{\hat{\beta} - 1}\right),\\ \text{Cov}(\hat{\beta}, \hat{\tau}) &= \hat{\lambda} \left((T_d + \hat{\tau})^{\hat{\beta} - 1} (\hat{\beta} \ln (T_d + \hat{\tau}) + 1) - \hat{\tau}^{\hat{\beta} - 1} (\hat{\beta} \ln \hat{\tau} + 1) \right)\\ & - \sum_{i = 1}^d n_i \left[ \frac{\left((T_i + \hat{\tau})^{\hat{\beta}} + (T_{i - 1} + \hat{\tau})^{\hat{\beta}} \right) \left((T_i + \hat{\tau})^{\hat{\beta}} (T_{i - 1} + \hat{\tau}) - (T_i + \hat{\tau})(T_{i - 1} + \hat{\tau})^{\hat{\beta}} \right) + \hat{\tau} (T_i + T_{i - 1} + 2 \hat{\tau}) (T_i + \hat{\tau})^{\hat{\beta}} (T_{i - 1} + \hat{\tau})^{\hat{\beta}} (\ln(T_i + \hat{\tau}) - \ln(T_{i - 1} + \hat{\tau}))}{(T_i + \hat{\tau})(T_{i - 1} + \hat{\tau}) ((T_i + \hat{\tau})^{\hat{\beta}} - (T_{i - 1} + \hat{\tau})^{\hat{\beta}})^2} \right]. \end{aligned} }.\]

Subscribe to Sifter

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe