Failure analysis is a general description of the testing employed to determine why something had failed. Often this will start with a simple discussion and description of the failure and process conditions when the failure occurred. Testing conditions are prescribed once all information has been gathered and conditions known. The testing employed will include techniques discussed throughout the laboratory as well as surface analysis.
The surface analysis laboratory consists of Scanning Electron Microscopes (SEM), with Energy Dispersive X-ray Spectrometers (EDS) for elemental analysis of the targeted areas. In addition, the laboratory houses FTIR (Fourier Transform Infrared) spectrometers for molecular analysis of targeted areas.
What is a Failure Analysis?
This is a very general question, yet it ultimately dictates about ninety percent of what we are contracted to do in an independent testing laboratory. This contracting boils down to answering the simple question of, “why did this break?” To the recipient of the answer to this question, there will likely be a further question to answer which is, “what can I do to make this not break?” This is sometimes still within the scope of the contracted laboratory, but most often requires engineering information and expertise outside of the scope of the investigation.
How is Failure Analysis Categorized?
We all certainly encounter failures in everyday life and for discussion purposes we classify these failures into 2 broad categories. The first being failures of physical parts or components of a system for which it is readily apparent that a failure has occurred. More simply, the system fails to work. The second category relates to failures in processes for which the failure may not be so readily apparent. For the purpose of this discussion, however, we will stick to the first category in our description.
In the laboratory, somebody will walk through the door daily with a component in their hand or truck and ask, “why did this break?” Without any additional information, we answer in jest that the part broke because the load exceeded the engineered strength of the component. In this joke we use the term “load” generally, and “engineered strength” to include items such as materials and manufacturing. All jokes aside, to perform a proper and informative failure analysis, the part in hand is only the start of the process that requires the interaction with the client.
What is Involved in a Failure Analysis?
Examination of Failure
The failure analysis process begins with an examination of the part. It is clear to see that it broke, but what were the conditions that lead up to that point? There was an obvious expectation that the part should not break, however what is this expectation based on? Was the part loaded properly? Was the part maintained properly? These are pieces of information that may rely on the experience of the user to recall such facts or can rely on documentation maintained as part of the maintenance of the system from which the part came.
Review of Failure
As a commercial testing facility we have inquiries of varying levels of sophistication and familiarization with the failure analysis process. Even the seasoned engineers will overlook or not supply information paramount to understanding the birth and life of the part. We consider several items key to structuring a proper failure analysis, however, the documentation review must begin with engineering drawings and materials specifications. In addition, a good review of any documented maintenance or interview with the user is key to understanding the timeline and possible conditions leading up to the failed condition. A good interviewer knows the questions to ask, even if not fully familiar with the process for which the part came from.
Examining a Complete System for Failure Analysis
The image below shows how a complete system (the bearing discussed) can be examined for failure analysis. The inner and outer raceway surfaces, the cage that holds the ball bearings and the bearings themselves are all components within this complete system. It serves to show that proper forensic protocols are followed during labeling of components during sectioning.
In addition, often parts are received for failure analysis from a complete system, whereas the complete system was not examined, only the observed failed component. We must learn to treat failure analysis of systems as a crime scene of sorts. That operation should be isolated or quarantined. Any surrounding materials should be examined for relevancy to the failure, including documentation with lots of photographs. In addition, even soiling materials may need to be isolated and saved for possible analytical needs. An example comes to mind is a fairly large loaded bearing failure analysis we have previously worked on, for which it is not understood whether the bearing seized first causing a generator fire, or whether the generator fire caused the seizure. As part of the test plan constructed for the failure analysis, we are to maintain the grease isolated from the race to characterize any wear or abrasive metal content.
Identifying Possible Failure Modes
The next portion of the failure analysis is to identify the possible failure modes that could have led to the observed failure. This then becomes critical for the investigator to become the expert in the part by familiarization with case histories or employing team members that do have expertise in the part, material, manufacturing process, and usage. The test plan will be constructed around the possible failure modes. Some failure modes may be immediately discounted by observation and consultation with the experts, but it is also important not to leave a stone unturned if we simply do not have the information to rule out a possible mode. The test plan is constructed both to document the process as well as to ensure the theories are tested adequately and that all data and information is available.
The image above shows that classical evidence of fatigue can be identified easily, even on a large fracture surface. However, note that while fatigue cracking may have initiated the cracking, it was not the failure mode by which the quarry bucket failed. The bucket was improperly manufactured and did not meet the manufacturer’s own specification requirements for fracture toughness, which measures a component’s ability to handle impact loading.
What Happens to the Failure Analysis Sample?
As the testing per the test plan commences, we must acknowledge that destructive testing will likely occur, which of course means that the pristine broken part will no longer be available for inspection as received. All possible needs for preservation of material or material sections as well as photography must be anticipated and incorporated into the test plan prior to initiation of any destructive testing. The test plan should be accepted by the client and approved (preferably in writing) prior to start. In the case of litigation support, both attorney sides should be part of this approval. The last thing a failure analyst wants to be caught in the middle of is accusation of destroying “evidence”.
At RTI Laboratories, we photograph diligently prior to sectioning of any critical as-received features. For example, the image below shows the outline of a metallographic specimen produced through a large, branched crack that progressed through aluminum plate material.
How is Failure Analysis Reported?
The failure analyst will subsequently compile data and test the datasets against the proposed theories of failure mode. It is important to note that the failure analyst is a third party investigator in which the data and documentation review will be examined impartially, and the trained investigator is simply not going to “spin” the datasets or discount conflicting datasets to support a prevailing theory. The data is examined as comprehensive support for all possible failure modes. If during review a dataset has determined to be lacking for a theory that has now become supported elsewhere, the test plan can be re-examined and supplemented.
As the data is compiled and the report constructed, the failure analyst will present all aspects of the data contracted to present, including such which may not support the prevailing theory or conclusion. The analyst will present this data such that additional parties can follow the timeline of the test plan and reconstruct the “story” of the failure. The analyst will present conclusive statements where supported by the presented data in the report.
Root Causes of Failure
It is important to note that up to this point we have talked only about “physical cause” of failure, for which we cannot confuse with the “root cause” of the failure. The root cause of the failure will involve assessment of two additional causal items for which we have to put together with the physical cause. The additional causal items are “human cause” and “latent cause”. For human causal analysis, we must consider the human factors that led up to the physical cause. For latent causal analysis, we must consider the cultural rules that led up to the human cause. It is far too common for the contracted laboratory to be asked to provide root cause analysis, when human and latent causes are outside of the scope of the investigation. The contractor must perform their own root cause analysis which considers the physical causes presented by the failure analyst, with the human and latent causes known from their (or additional experts) experience with the particular system.
As you can see from the presented description of failure analysis, there truly is no magic box for which we can put a part into and tell why it failed. The proper and informative failure analysis has to be systematically constructed based on information from many sources including the data provided from the test lab, and for true root cause analysis demands that other factors are examined besides the physical causes for which the laboratory is contracted to provide.