Could the same machine learning (ML) techniques that keep spam emails from invading your inbox, detect unusual transactions on your credit card and create Netflix recommendations also support pipeline integrity management? The answer is yes, and the fact is, while we think of ML as something new — a scientific way to get computers to recognize patterns and learn from data without being specifically programmed — the pipeline industry has been using the concepts behind it to provide insight into integrity issues for decades.
As far back as the 1980s, integrity engineers and service providers were constructing models to predict things like metal loss geometric dimensions using a combination of finite element modeling [1][2], statistical modeling and predictive functions [3]. Today, ML algorithms are used across the pipeline industry to boost operating efficiency, control costs and support integrity management. Applying ML techniques to in-line inspections (ILI) performed by intelligent pigs is improving the actionable data that pipeline operators need to keep their systems running safely.
The pipeline industry began applying formal ML techniques in the early 2000s to classify pipeline features and identify mechanical damage. Since then, the use of ML has expanded considerably: It's now being applied to improve metal-loss sizing, fitting classification and the identification of interactive threats [4]. Much of that progress goes hand in hand with the development of advanced ILI technology: ML is increasingly considered a powerful way to analyze the complex datasets produced by an electromagnetic acoustic transducer (EMAT), ultrasonic (UT) or the combination of multiple inspection technologies.
Despite how successfully ML has been deployed, it’s not a silver bullet. Even when the goal seems simple — alerting people to the possibility of credit card fraud — achieving the best solutions isn’t a one-and-done proposition; there’s no single model that can be employed to achieve 100% accuracy. Often what is required is breaking the overall problem down into smaller subsets and developing specialized models focused on solving a specific task. These specialized models can then be ensembled together to produce a final overall prediction. This method of ensembling often outperforms a single general model.
And it all requires good data. The quantity, quality and representation of the data directly influence the performance of predictions made by any ML model. But collecting, analyzing and labeling the critical data used to develop ML models can be difficult and time-consuming. As previously published [5], if you want to predict housing prices in Tulsa but use a model that learned from Los Angeles based data, your predictions will be grossly incorrect. The importance of quantity, quality and representation cannot be understated. This is especially true for ILI as much of the data used in model development is based upon field investigations which can be very expensive to complete.
Fortunately, it’s possible to carefully select, annotate and fine-tune the available data to maximize its efficacy on ML models.
At the risk of conflating man and machine, implementing an ML model to identify possible integrity threats relies on the same element used by an ILI data analyst to detect, identify and record signals recorded by ILI tools: experience based on examples and outcomes. Both analysts and ML models use their experience to identify patterns and make predictions — and the more experience they have, the better.
The experience of ML models is referred to as training data. Training data is what ML models use to develop a mathematical function that can be applied to make future predictions. The better the training data is in describing and representing a given problem, the better it will do in making a future prediction. There are several subcategories of ML, each with different ways that training data is presented and utilized. Supervised learning models use a set of data with known or desired answers referred to as labels. Labels are assigned to specific features that describe the problem and a one-to-one mapped set of inputs (features) and outputs (labels) is developed.
Input features are often engineered by subject matter experts to describe the important characteristics a model should learn. This becomes especially important when the quantity of data is not large enough for the model to learn the characteristics on its own. As mentioned earlier, this is generally the case for ILI as data labeling is often dependent on expensive field investigations. However, the beauty of these models is that they can be trained using multiple features. At T.D. Williamson (TDW), for example, the MDS™ platform allows subject matter experts to develop relevant, focused and valuable input features from multiple advanced technologies for a single labeled example.
Even with descriptive features or large quantities of data, the quality of labels is extremely important. If a set of perfectly descriptive features is mapped to an inaccurate label, any model will just learn to be inaccurate. Large datasets do not always mean better models; a small high-quality dataset is always preferred over a large one riddled with inaccuracies. When both input features and labeled outputs are accurate, a model’s performance is likely to be high. However, it may only be high for predictions made on data that has some correlation to the training data. Understanding the future prediction space and ensuring it is represented in the training data will lead to higher-performing models.
The quantity, quality and representation of training data are extremely important. Simply stated, the quality of the output is determined by the quality of the input. Garbage in, garbage out. Like almost everything else involved in ML, there's nothing easy about assigning performance to these models. For one thing, the quantity of the training examples should be maximized, which can mean there are only a few blind examples remaining for the actual testing. Ideally, performance metrics should be based on completely blind data unseen by the training process.
To highlight some of the nuances in assessing model performance, TDW trained a ML model using a technique called gradient boosting [6] to predict metal loss depth using a few simple features — signal amplitude, length, width and wall thickness (t). The training data set included more than 100 samples of high-quality corrosion field investigations from a single ILI of pipeline A.
Figure 1 shows the distribution of metal loss depths and a unity plot of the residual errors against depths measured in the field for pipeline A. Because the model made predictions on the data it was trained with, any assessment of future prediction performance would be extremely optimistic. This process of reviewing the residual model error may be beneficial for internal model development but should be avoided when projecting future performance.
However, the assessment of residual performance highlighted some interesting outliers circled in red and orange. In one case, axial magnetic flux leakage (MFL) indicated two independent metal loss signatures that spiral magnetic flux leakage (SMFL) revealed was actually one long, connected feature. Association with the long seam and the high amplitude SMFL signal response suggested selective seam weld corrosion (SSWC), which poses a more serious integrity concern than general metal loss crossing the long seam. This feature was verified in the field as SSWC. The others were found to be mechanical damage associated with denting. Both the SSWC and mechanical damage features performed poorly as they were significantly underrepresented in our data set and our input features were chosen to describe general volumetric corrosion features, not these unique anomalies.
Despite the model’s good performance, we wouldn’t make any assumptions about its prediction powers for the next inspection. Building confidence in a model requires testing it on a blind dataset and striving to answer "How is it going to perform on the next inspection?" For our simple example, we’ll use pipeline B as our blind dataset. This pipeline has a similar number of features to pipeline A but is dissimilar in other characteristics, such as wall thickness, diameter and material type. It is important to note that none of pipeline B’s features were included in the model training data and that the model only learned from pipeline A's data. Figure 2 shows the distribution of metal loss depths for pipelines A and B and a unity plot of the prediction performance against field depths for pipeline B. This time, the model overestimated metal loss depths and performed nowhere close to that of the residual errors on pipeline A. This is a clear example of how assessing model performance without a blind dataset can lead to overly optimistic performance metrics.
When we take a closer look at the distribution of data, pipeline A contains generally deeper metal loss samples compared to pipeline B. The model likely learned from the characteristics for deeper depth metal loss features and these poorly extrapolate to shallower depth features.
Additionally, the ILI data for pipelines A and B have different characteristics that were not accounted for in our simple set of input features. Despite the poor performance, a more accurate understanding of how the model will perform on the next inspection has been achieved.
This process of blind testing allows ILI vendors to more confidently state performance specifications which directly impacts how reported results can be used in managing pipeline integrity.
The above example demonstrates that no two pipelines are alike, and that model development requires careful consideration of the quantity, quality and representation of training data. Collecting enough data to represent all geometries at all depths across all material types is an arduous task, made even more difficult by the general desire of learning models to include thousands of examples in each prediction space. Curating and refining data is a never-ending process.
The good news is that the ILI industry is successfully applying learning models to solve challenging problems every day. For example, pipelines with incomplete documentation are a perfect problem for another subcategory of ML, unsupervised machine learning — that is, where the model receives input for each example and attempts to discover the structured patterns within the dataset on its own. By using the signals from patterns in permeability and bore variations created during the manufacturing process, data from high field MFL, axial low field MFL (LFM) and high resolution profilometry deformation (DEF), clusters of similar joints of pipe can be automatically identified.
TDW has also used strategically engineered features available only from MDS datasets to develop learning models that can separately classify two types of pipeline threats with small populations and unique characteristics: Selective seam weld corrosion (SSWC) [7][8][9] and dents with gouges from coincidental corrosion [10][11]. When we applied these models to the outliers in pipeline A, they predicted a 99% probability of SSWC and correctly classified both dents as having associated gouging. By focusing the model’s objective on specific anomalies, we were able to create adequate input features, which shrunk the prediction space and improved predictions. Ultimately, this work — more than 40 years in the making — has given operators a more comprehensive way to prioritize mitigation plans and manage pipeline integrity.
[1] Atherton, D.L. and Daly, M.G., 1987, “Finite element calculation of magnetic flux leakage detector signals,” NDT International, Vol. 20 No. 4, pp 235-238.
[2] Yang, S., Sun, Y., Upda, L., Upda, S. and Lord, W., 1999, “3D Simulation of Velocity Induced Fields for Nondestructive Evaluation Application,” IEEE Transactions on Magnetics, Vol. 35, No. 3, pp 1754-1756.
[3] Nestleroth, J.B., Rust, S.W., Burgoon, D.A. and Haines, H., “Determining Corrosion Defect Geometry from Magnetic Flux Leakage Pig Data,” 1996, Corrosion96, March 24-29, Denver, NACE-96044.
[4] Bubenik, T. A., Nestleroth, J.B., Davis, R. J., Crouch, A., Upda, S., Afzal, A. K., 2000, “In-line Inspection Technologies for Mechanical Damage and SCC in Pipelines: Final Report,” US DOT, OPS, Report No. DTRS56-96-C-0010.
[5] Burden, D., Dalfonso, P., & Belanger, A., 2020, “The Current Progeny of In-Line Inspection Machine Learning,” Proceedings of the 2020 Pipeline Pigging and Integrity Management Conference, Houston, Clarion Technical Conferences and Great Southern Press.
[6] How to explain gradient boosting. (n.d.). Explained.ai. Retrieved April 18, 2023, from https://explained.ai/gradient-boosting/index.html.
[7] Nestleroth, B. J., Simek, J., & Ludlow, J., 2016, “New Classification Approach for Dents with Metal Loss and Corrosion Along the Seam Weld,” Proceedings of the 2016 11th International Pipeline Conference (p. 10), Calgary, American Society of Mechanical Engineers.
[8] Andrew, J., & Simek, J. (2019). United States of America Patent No. US 2019/0162700 A1.
[9] Romney, M., Burden, D., and Lunstrom, R., 2023, “Validating Selective Seam Weld Corrosion Classification Using ILI Technology”, Pipeline Pigging and Integrity Management Conference, (PPIM), Feb 6-10, Houston, Texas
[10] Romney, M., and Kirkwood, M., 2023, “The Power to Know More About Pipeline Anomalies”, Pipeline Technology Conference, 8-11 May, Berlin, Germany.
[11] Burden, D., & Romney, M., 2022, “A Case Study Applying Gouge Classification to Mechanical Damage Defects,” Proceedings of the 2022 14th International Pipeline Conference, Calgary, American Society of Mechanical Engineers.
Stay up to date
Sign up to our email newsletter
Disclaimer: The information contained on this web site has been submitted by the Members and is intended for guidance only. The Pigging Products & Services Association cannot accept responsibility for its accuracy, nor for any errors or omissions which may have occurred.