Feature importance analysis for enhanced interpretability of spectrophotometric Machine Learning (ML) models in water quality monitoring

 

Guardado en:
Bibliografiske detaljer
Autores: Hernández-Alpízar, Laura, Gómez-Mejía, José Andrés
Format: artículo original
Status:Versión publicada
Fecha de Publicación:2026
Beskrivelse:Ultraviolet-visible (UV-Vis) spectrophotometry for real-time NO3- quantification in water is commonly affected by spectral interferences from Dissolved Organic Matter (DOM). This study evaluates the use of machine learning (ML) models for this task, using feature importance analysis as a method to enhance chemical interpretability and detect spectral interferences. Four algorithms were compared using a dataset of 29 surface water samples: PCA-Random Forest (PCA-RF), PCA-XGBoost, full-spectrum RF (All-RF), and full-spectrum XGBoost (All-XGB). Leave-one-out cross-validation (LOOCV) showed no significant performance differences among the models (p = 0.182), with mean RMSE values between 0.6 and 0.8 mg / L. Nonetheless, feature importance analysis revealed that PCA-based models depend on variance rather than chemical relevance, which limits their reliability. The full-spectrum XGBoost model demonstrated superior spectral interpretability, successfully identifying both the NO3- absorption peak (≈ 220 nm) and the DOM interference correction peak (≈ 260 nm). This suggests that XGBoost could be advantageous for continuous water monitoring systems due to its ability to identify spectral interferences.
País:Portal de Revistas TEC
Institution:Instituto Tecnológico de Costa Rica
Repositorio:Portal de Revistas TEC
Sprog:Español
OAI Identifier:oai:ojs.pkp.sfu.ca:article/8521
Online adgang:https://revistas.tec.ac.cr/index.php/tec_marcha/article/view/8521
Palabra clave:UV-Vis spectroscopy
nitrate
water
spectral interference
Random Forest
XGBoost
Espectroscopía UV-Vis
nitrato
agua
interferencia espectral
XGBoots