[en] This study introduces a machine learning framework tailored to large-scale industrial processes characterized by a plethora of numerical and categorical inputs. The framework aims to (i) discern critical parameters that influence the output and (ii) generate accurate out-of-sample qualitative and quantitative predictions of production outcomes. Specifically, we address the pivotal question of the significance of each input in shaping the process outcome, using an industrial Chemical Vapor Deposition (CVD) process as an example. The initial objective involves merging subject matter expertise and clustering techniques exclusively on the process output, here, coating thickness measurements at various positions in the reactor. This approach identifies groups of production runs that share similar qualitative characteristics, such as film mean thickness and standard deviation. In particular, the differences of the outcomes represented by the different clusters can be attributed to differences in specific inputs, indicating that these inputs are potentially critical to the production outcome. Shapley value analysis corroborates the formed hypotheses. Leveraging this insight, we subsequently implement supervised classification and regression methods using the identified critical process inputs. The proposed methodology proves to be valuable in scenarios with a multitude of inputs and insufficient data for the direct application of deep learning techniques, providing meaningful insights into the underlying processes.
Disciplines :
Chemical engineering
Author, co-author :
PAPAVASILEIOU, Paris ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Engineering (DoE) ; NTUA - National Technical University of Athens [GR] > School of Chemical Engineering
Giovanis, Dimitrios G. ; JHU - Johns Hopkins University [US-MD] > Department of Civil & Systems Engineering, Whiting School of Engineering
Kevrekidis, Ioannis G.; JHU - Johns Hopkins University [US-MD] > Department of Chemical and Biomolecular Engineering & Department of Applied Mathematics and Statistics, Whiting School of Engineering
Boudouvis, Andreas G.; NTUA - National Technical University of Athens [GR] > School of Chemical Engineering
BORDAS, Stéphane ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Engineering (DoE)
KORONAKI, Eleni ; University of Luxembourg > Faculty of Science, Technology and Medicine (FSTM) > Department of Engineering (DoE)
External co-authors :
yes
Language :
English
Title :
Integrating supervised and unsupervised learning approaches to unveil critical process inputs
Agarwal, P., Tamer, M., Sahraei, M.H., Budman, H., Deep learning for classification of profit-based operating regions in industrial processes. Ind. Eng. Chem. Res. 59:6 (2020), 2378–2395, 10.1021/acs.iecr.9b04737.
Aggarwal, C.C., Neural Networks and Deep Learning: A Textbook. 2018, Springer International Publishing, Cham, 10.1007/978-3-319-94463-0.
Ankerst, M., Breunig, M.M., Kriegel, H.-P., Sander, J., OPTICS: Ordering points to identify the clustering structure. SIGMOD Rec. 28:2 (1999), 49–60, 10.1145/304181.304187.
Aviziotis, I.G., Cheimarios, N., Duguet, T., Vahlas, C., Boudouvis, A.G., Multiscale modeling and experimental analysis of chemical vapor deposited aluminum films: Linking reactor operating conditions with roughness evolution. Chem. Eng. Sci. 155 (2016), 449–458, 10.1016/j.ces.2016.08.039.
Aviziotis, I.G., Duguet, T., Vahlas, C., Boudouvis, A.G., Combined macro/nanoscale investigation of the chemical vapor deposition of Fe from Fe(CO)5. Adv. Mater. Interfaces, 4(18), 2017, 1601185, 10.1002/admi.201601185.
Bar-Hen, M., Etsion, I., Experimental study of the effect of coating thickness and substrate roughness on tool wear during turning. Tribol. Int. 110 (2017), 341–347, 10.1016/j.triboint.2016.11.011.
Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., Herrera, F., Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58 (2020), 82–115, 10.1016/j.inffus.2019.12.012.
Biefeld, R.M., The metal-organic chemical vapor deposition and properties of III–V antimony-based semiconductor materials. Mater. Sci. Eng. R 36:4 (2002), 105–142, 10.1016/S0927-796X(02)00002-5.
Breiman, L., Random forests. Mach. Learn. 45:1 (2001), 5–32, 10.1023/A:1010933404324.
Breiman, L., Friedman, J., Olshen, R.A., Stone, C.J., Classification and Regression Trees. 1984, Chapman and Hall/CRC, New York, 10.1201/9781315139470.
Brouwer, A.F., Eisenberg, M.C., The underlying connections between identifiability, active subspaces, and parameter space dimension reduction. 2018, 10.48550/arXiv.1802.05641 arXiv:1802.05641.
Cheimarios, N., Koronaki, E.D., Boudouvis, A.G., Illuminating nonlinear dependence of film deposition rate in a CVD reactor on operating conditions. Chem. Eng. J. 181–182 (2012), 516–523, 10.1016/j.cej.2011.11.008.
Chen, T., Guestrin, C., XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016, ACM, San Francisco California USA, 785–794, 10.1145/2939672.2939785.
Chong, I.-G., Jun, C.-H., Performance of some variable selection methods when multicollinearity is present. Chemometr. Intell. Lab. Syst. 78:1 (2005), 103–112, 10.1016/j.chemolab.2004.12.011.
Cote, D.R., Nguyen, S.V., Stamper, A.K., Armbrust, D.S., Tobben, D., Conti, R.A., Lee, G.Y., Plasma-assisted chemical vapor deposition of dielectric thin films for ULSI semiconductor circuits. IBM J. Res. Dev. 43:1.2 (1999), 5–38, 10.1147/rd.431.0005.
Czettl, C., Mitterer, C., Mühle, U., Rafaja, D., Puchner, S., Hutter, H., Penoy, M., Michotte, C., Kathrein, M., CO addition in low-pressure chemical vapour deposition of medium-temperature TiCxN1-x based hard coatings. Surf. Coat. Technol. 206:7 (2011), 1691–1697, 10.1016/j.surfcoat.2011.07.086.
Ester, M., Kriegel, H.-P., Sander, J., Xu, X., A density-based algorithm for discovering clusters in large spatial databases with noise. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining KDD’96, 1996, AAAI Press, Portland, Oregon, 226–231.
Evangelou, N., Wichrowski, N.J., Kevrekidis, G.A., Dietrich, F., Kooshkbaghi, M., McFann, S., Kevrekidis, I.G., On the parameter combinations that matter and on those that do not: Data-driven studies of parameter (non)identifiability. PNAS Nexus, 1(4), 2022, pgac154, 10.1093/pnasnexus/pgac154.
Fraley, C., Raftery, A.E., Model-based clustering, discriminant analysis, and density estimation. J. Amer. Statist. Assoc. 97:458 (2002), 611–631, 10.1198/016214502760047131.
Friedman, J.H., Greedy function approximation: a gradient boosting machine. Ann. Statist. 29:5 (2001), 1189–1232 arXiv:2699986.
Gakis, G., Koronaki, E., Boudouvis, A., Numerical investigation of multiple stationary and time-periodic flow regimes in vertical rotating disc CVD reactors. J. Cryst. Growth 432 (2015), 152–159, 10.1016/j.jcrysgro.2015.09.026.
Garthwaite, P.H., An interpretation of partial least squares. J. Amer. Statist. Assoc. 89:425 (1994), 122–127, 10.1080/01621459.1994.10476452.
Gkinis, P., Aviziotis, I., Koronaki, E., Gakis, G., Boudouvis, A., The effects of flow multiplicity on GaN deposition in a rotating disk CVD reactor. J. Cryst. Growth 458 (2017), 140–148, 10.1016/j.jcrysgro.2016.10.065.
Gkinis, P., Koronaki, E., Skouteris, A., Aviziotis, I., Boudouvis, A., Building a data-driven reduced order model of a chemical vapor deposition process from low-fidelity CFD simulations. Chem. Eng. Sci. 199 (2019), 371–380, 10.1016/j.ces.2019.01.009.
Ha, H.Y., Nam, S.W., Lim, T.H., Oh, I.-H., Hong, S.-A., Properties of the TiO2 membranes prepared by CVD of titanium tetraisopropoxide. J. Membr. Sci. 111:1 (1996), 81–92, 10.1016/0376-7388(95)00278-2.
Hastie, T., Tibshirani, R., Friedman, J., Ensemble learning. Hastie, T., Tibshirani, R., Friedman, J., (eds.) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2009, Springer, New York, NY, 605–624, 10.1007/978-0-387-84858-7_16.
Hastie, T., Tibshirani, R., Friedman, J., Unsupervised learning. Hastie, T., Tibshirani, R., Friedman, J., (eds.) The Elements of Statistical Learning: Data Mining, Inference, and Prediction Springer Series in Statistics, 2009, Springer, New York, NY, 485–585, 10.1007/978-0-387-84858-7_14.
Heinze, G., Wallisch, C., Dunkler, D., Variable selection – A review and recommendations for the practicing statistician. Biom. J. 60:3 (2018), 431–449, 10.1002/bimj.201700067.
Hochauer, D., Mitterer, C., Penoy, M., Puchner, S., Michotte, C., Martinz, H., Hutter, H., Kathrein, M., Carbon doped α-Al2O3 coatings grown by chemical vapor deposition. Surf. Coat. Technol. 206:23 (2012), 4771–4777, 10.1016/j.surfcoat.2012.03.059.
Humfeld, K.D., Gu, D., Butler, G.A., Nelson, K., Zobeiry, N., A machine learning framework for real-time inverse modeling and multi-objective process optimization of composites for active manufacturing control. Composites B, 223, 2021, 109150, 10.1016/j.compositesb.2021.109150.
James, G., Witten, D., Hastie, T., Tibshirani, R., Statistical learning. An Introduction to Statistical Learning: with Applications in R, 2021, Springer US, New York, NY, 15–57, 10.1007/978-1-0716-1418-1_2.
James, G., Witten, D., Hastie, T., Tibshirani, R., Unsupervised learning. James, G., Witten, D., Hastie, T., Tibshirani, R., (eds.) An Introduction to Statistical Learning: with Applications in R Springer Texts in Statistics, 2021, Springer US, New York, NY, 497–552, 10.1007/978-1-0716-1418-1_12.
Jia, S., Chen, W., Zhang, J., Lin, C.Y., Guo, H., Lu, G., Li, K., Zhai, T., Ai, Q., Lou, J., CVD growth of high-quality and large-area continuous h-BN thin films directly on stainless-steel as protective coatings. Mater. Today Nano, 16, 2021, 100135, 10.1016/j.mtnano.2021.100135.
Jia, H., Ding, S., Xu, X., Nie, R., The latest research progress on spectral clustering. Neural Comput. Appl. 24:7 (2014), 1477–1486, 10.1007/s00521-013-1439-2.
Karner, J., Pedrazzini, M., Reineck, I., Sjöstrand, M.E., Bergmann, E., CVD diamond coated cemented carbide cutting tools. Mater. Sci. Eng. A 209:1 (1996), 405–413, 10.1016/0921-5093(95)10140-3.
Kathrein, M., Schintlmeister, W., Wallgram, W., Schleinkofer, U., Doped CVD Al2O3 coatings for high performance cutting tools. Surf. Coat. Technol. 163–164 (2003), 181–188, 10.1016/s0257-8972(02)00483-8.
Khatib, S.J., Oyama, S.T., Silica membranes for hydrogen separation prepared by chemical vapor deposition (CVD). Sep. Purif. Technol. 111 (2013), 20–42, 10.1016/j.seppur.2013.03.032.
Koronaki, E.D., Cheimarios, N., Laux, H., Boudouvis, A.G., Non-axisymmetric flow fields in axisymmetric CVD Reactor Setups revisited: influence on the film's non-uniformity. ECS Solid State Lett., 3(4), 2014, P37, 10.1149/2.002404ssl.
Koronaki, E.D., Evangelou, N., Psarellis, Y.M., Boudouvis, A.G., Kevrekidis, I.G., From partial data to out-of-sample parameter and observation estimation with diffusion maps and geometric harmonics. Comput. Chem. Eng., 2023, 108357, 10.1016/j.compchemeng.2023.108357.
Koronaki, E.D., Gakis, G.P., Cheimarios, N., Boudouvis, A.G., Efficient tracing and stability analysis of multiple stationary and periodic states with exploitation of commercial CFD software. Chem. Eng. Sci. 150 (2016), 26–34, 10.1016/j.ces.2016.04.043.
Koronaki, E., Gkinis, P., Beex, L., Bordas, S., Theodoropoulos, C., Boudouvis, A., Classification of states and model order reduction of large scale chemical vapor deposition processes with solution multiplicity. Comput. Chem. Eng. 121 (2019), 148–157, 10.1016/j.compchemeng.2018.08.023.
Koronaki, E.D., Nikas, A.M., Boudouvis, A.G., A data-driven reduced-order model of nonlinear processes based on diffusion maps and artificial neural networks. Chem. Eng. J., 397, 2020, 125475, 10.1016/j.cej.2020.125475.
Kumar, K., Partial least square (PLS) analysis: most favorite tool in chemometrics to build a calibration model. Reson 26:3 (2021), 429–442, 10.1007/s12045-021-1140-1.
Łępicka, M., Grądzka-Dahlke, M., The initial evaluation of performance of hard anti-wear coatings deposited on metallic substrates: Thickness, mechanical properties and adhesion measurements – a brief review. Rev. Adv. Mater. Sci. 58:1 (2019), 50–65, 10.1515/rams-2019-0003.
Lu, B., Castillo, I., Chiang, L., Edgar, T.F., Industrial PLS model variable selection using moving window variable importance in projection. Chemometr. Intell. Lab. Syst. 135 (2014), 90–109, 10.1016/j.chemolab.2014.03.020.
Lundberg, S.M., Erion, G., Chen, H., DeGrave, A., Prutkin, J.M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., Lee, S.-I., From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2:1 (2020), 56–67, 10.1038/s42256-019-0138-9.
Lundberg, S.M., Lee, S.-I., A unified approach to interpreting model predictions. Advances in Neural Information Processing Systems, Vol. 30, 2017, Curran Associates, Inc.
Ma, Y., Zhu, W., Benton, M.G., Romagnoli, J., Continuous control of a polymerization system with deep reinforcement learning. J. Process Control 75 (2019), 40–47, 10.1016/j.jprocont.2018.11.004.
MacQueen, J., Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1, 1967, Oakland, CA, USA, 281–297.
Martin-Linares, C.P., Psarellis, Y.M., Karapetsas, G., Koronaki, E.D., Kevrekidis, I.G., Physics-agnostic and physics-infused machine learning for thin films flows: Modelling, and predictions from small data. J. Fluid Mech., 975, 2023, A41, 10.1017/jfm.2023.868.
Mitrovic, B., Gurary, A., Quinn, W., Process conditions optimization for the maximum deposition rate and uniformity in vertical rotating disc MOCVD reactors based on CFD modeling. J. Cryst. Growth 303:1 (2007), 323–329, 10.1016/j.jcrysgro.2006.11.247.
Murtagh, F., Contreras, P., Algorithms for hierarchical clustering: An overview. WIREs Data Min. Knowl. Discov. 2:1 (2012), 86–97, 10.1002/widm.53.
Papananias, M., McLeay, T.E., Mahfouf, M., Kadirkamanathan, V., A Bayesian framework to estimate part quality and associated uncertainties in multistage manufacturing. Comput. Ind. 105 (2019), 35–47, 10.1016/j.compind.2018.10.008.
Papavasileiou, P., Koronaki, E.D., Pozzetti, G., Kathrein, M., Czettl, C., Boudouvis, A.G., Bordas, S.P., Equation-based and data-driven modeling strategies for industrial coating processes. Comput. Ind., 149, 2023, 103938, 10.1016/j.compind.2023.103938.
Papavasileiou, P., Koronaki, E.D., Pozzetti, G., Kathrein, M., Czettl, C., Boudouvis, A.G., Mountziaris, T.J., Bordas, S.P.A., An efficient chemistry-enhanced CFD model for the investigation of the rate-limiting mechanisms in industrial chemical vapor deposition reactors. Chem. Eng. Res. Des. 186 (2022), 314–325, 10.1016/j.cherd.2022.08.005.
Priore, P., Ponte, B., Puente, J., Gómez, A., Learning-based scheduling of flexible manufacturing systems using ensemble methods. Comput. Ind. Eng. 126 (2018), 282–291, 10.1016/j.cie.2018.09.034.
Psarellis, G.M., Aviziotis, I.G., Duguet, T., Vahlas, C., Koronaki, E.D., Boudouvis, A.G., Investigation of reaction mechanisms in the chemical vapor deposition of al from DMEAA. Chem. Eng. Sci. 177 (2018), 464–470, 10.1016/j.ces.2017.12.006.
Saxena, A., Saad, A., Evolving an artificial neural network classifier for condition monitoring of rotating mechanical systems. Appl. Soft Comput. 7:1 (2007), 441–454, 10.1016/j.asoc.2005.10.001.
Schmauder, T., Nauenburg, K.D., Kruse, K., Ickes, G., Hard coatings by plasma CVD on polycarbonate for automotive and optical applications. Thin Solid Films 502:1 (2006), 270–274, 10.1016/j.tsf.2005.07.296.
Schubert, E., Sander, J., Ester, M., Kriegel, H.P., Xu, X., DBSCAN revisited, revisited: why and how you should (still) use DBSCAN. ACM Trans. Database Syst. 42:3 (2017), 19:1–19:21, 10.1145/3068335.
Shapley, L.S., A Value for N-Person Games: Technical Report., 1952, RAND Corporation.
Spencer, R., Gkinis, P., Koronaki, E., Gerogiorgis, D., Bordas, S., Boudouvis, A., Investigation of the chemical vapor deposition of Cu from copper amidinate through data driven efficient CFD modelling. Comput. Chem. Eng., 149, 2021, 107289, 10.1016/j.compchemeng.2021.107289.
Sundararajan, M., Najmi, A., The many Shapley values for model explanation. Proceedings of the 37th International Conference on Machine Learning, 2020, PMLR, 9269–9278.
Susto, G.A., Schirru, A., Pampuri, S., McLoone, S., Beghi, A., Machine learning for predictive maintenance: A multiple classifier approach. IEEE Trans. Ind. Inform. 11:3 (2015), 812–820, 10.1109/tii.2014.2349359.
Tibshirani, R., Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58:1 (1996), 267–288, 10.1111/j.2517-6161.1996.tb02080.x arXiv:2346178.
Tkadletz, M., Keckes, J., Schalk, N., Krajinovic, I., Burghammer, M., Czettl, C., Mitterer, C., Residual stress gradients in α-Al2O3 hard coatings determined by pencil-beam X-ray nanodiffraction: The influence of blasting media. Surf. Coat. Technol. 262 (2015), 134–140, 10.1016/j.surfcoat.2014.12.028.
Topka, K.C., Vergnes, H., Tsiros, T., Papavasileiou, P., Decosterd, L., Diallo, B., Senocq, F., Samelor, D., Pellerin, N., Menu, M.-J., Vahlas, C., Caussat, B., An innovative kinetic model allowing insight in the moderate temperature chemical vapor deposition of silicon oxynitride films from tris(dimethylsilyl)amine. Chem. Eng. J., 431, 2022, 133350, 10.1016/j.cej.2021.133350.
Vijaya, K.C., Sharma, S., Batra, N., Comparative study of single linkage, complete linkage, and ward method of agglomerative clustering. 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing, COMITCon, 2019, IEEE, Faridabad, India, 568–573, 10.1109/COMITCon.2019.8862232.
Ward, J.H., Hierarchical grouping to optimize an objective function. J. Amer. Statist. Assoc. 58:301 (1963), 236–244, 10.1080/01621459.1963.10500845.
Wu, H., Yu, Z., Wang, Y., Experimental study of the process failure diagnosis in additive manufacturing based on acoustic emission. Measurement 136 (2019), 445–453, 10.1016/j.measurement.2018.12.067.
Zhang, M.-L., Zhou, Z.-H., A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 26:8 (2014), 1819–1837, 10.1109/TKDE.2013.39.