4.7: Fault Tree Analysis
The previous section described various models of component failure forecasting. Fault trees are used to forecast the failure probability of a system of components based upon the likelihood of component failures. Fault trees provide a means of identifying weaknesses in systems and allowing managers to make changes to reduce the risk of failure.
Fault trees begin with a top node representing the condition of the entire system. Causes for system failure are then deve4loped as a series of events and subcomponents that may cause failures. Multiple layers of subcomponents may be included. Figure 4.7.1 illustrates a simple fault tree with three layers and eight different elements.
As an example, suppose an infrastructure manager is tasked with ensuring electricity is available in a building at all times. The manager invests in a back-up generator in case the grid electricity fails. In this case, the building might not have electricity if the grid fails and the back-up generator fails to start. In any given day, if there is a 1% chance (0.01 probability) that the power grid may fail and a 5% chance (0.05 probability) that the back-up generators fails, then there is a 0.0005 or 0.05% chance that electricity will fail. This is an example in which the redundancy of power sources reduces the chances of not having power for the building. A further step of analysis might be to examine reasons for failure of the back-up generator such as lack of fuel or damage to wiring. Also, a manager might set up a regular inspection regime for the back-up generator to attempt to reduce its 5% chance of failure.
This electric power provision is an example of redundancy with an ‘and’ node: both the power grid and the backup generator must fail for the system to fail. Unfortunately, system failures might also occur if any one of a number of events occurs. This is would be an ‘or’ node. For example, a ladder would fail if either of the two vertical supports failed. If the probability that a vertical support fails is 1% (0.01 probability) in normal use, then the probability of success is 0.99. There are four cases that might arise from use:
- Both vertical supports work with probability 0.99*0.99 = 0.9801
- Left support breaks and right support does not fail, but ladder as a system fails with probability 0.01*0.99 = 0.0099
- Right support breaks and left support does not fail, but ladder as a system fails with probability 0.01*0.99 = 0.0099
- Both vertical supports fail and the ladder system fails with probability 0.01*0.01 = 0.0001
With a ‘or’ node relationships (multiple potential causes of failure), the probability of failure can be calculated as:
\[\text { Pr \{ failure } \}=\Sigma_{i}\{1-\mathrm{~ P r ~ ( s u b c o m p o n e n t ~ i ~ f a i l u r e ) \} ~}\]
Where the summation \(\Sigma\) is taken over all the subcomponents included in the ‘or’ node level.
A common convention in drawing fault tree networks is to represent ‘or’ gate relationships with a curve at the bottom (as in the top gate in figure 4.6.6) and an ‘and’ gate relationship with a straight bottom (as in the bottom gate for events 7 and 8 in Figure 4.6.6). The failure probability of the system in Figure 4.6.6 would then be traced through the three ‘or’ gates and the two ‘and’ gates:
\[\mathrm{~ P r \{ S y s t e m ~ F i g . ~ 4 . x ~ f a i l u r e \} ~}=[1- (1-Prelevent 1 failure))*(1-Pr[event 2 failure]] +
\text { Pr\{event }3 \mathrm{~ f a i l u r e \} * P r \{ e v e n t ~} 4 \text { failure }\} \text { 'Prievent } 5 \text { failure }\}+
\left[1-(1-\text { Prievent } 6 \text { failure }\}^{*}(1-\text { Pr fevent } 7 \text { failure }\}^{*} \text { Pr } \text { [event } 8 \text { failure }\right] ] Eq.\]
More complicated relationship gates can be defined (such as exclusive ‘or’ gates), but they are not widely used for any infrastructure failure models. These more complicated relationships can find use in fault tree analysis of circuits or computer operating systems.
A complication in the calculation of failure probabilities shown above will occur when failures are correlated in some fashion. For example, flooding might cause both the power grid and the backup generator to fail in the electric power example above. In this case, the straightforward probability of the backup generator failing, Pr{failure backup generator} would be replaced with the probability of failure of the backup generator conditional on the power grid failure: Pr{failure backup generator │ failure power grid}. Of course, a prudent infrastructure manager might insure that the backup generator is protected from floods, so this chance of system failure due to flooding would disappear.
Another difficulty for fault tree analysis for infrastructure is that some systems may not fail completely but may degrade in performance. For example, a roof may start to leak rather than fail completely. For such cases, separate degradation states can be defined and fault trees developed for each level of degradation.
Fault tree analysis is fairly labor-intensive and it is difficult to be comprehensive about potential failure modes. However, the conceptual process of identifying failure causes and events can be helpful in managing the reliability of infrastructure systems.