A Hybrid Machine Learning Model Based on Deep Learning for Air Quality Prediction

Document Type : Original Research Paper

Authors

1 Department of Industrial Management, Faculty of Industrial Management and Technology, College of Management, University of Tehran, Tehran, Iran

2 Department of Information Technology and Operations Management, Faculty of Management and Accounting, Allameh Tabataba'i University, Tehran, Iran

10.22059/poll.2025.388743.2750

Abstract

Air pollution is a major global challenge, significantly and directly affecting public health, urban sustainability, and environmental policy. Accurate air quality prediction has increasingly become essential to address the challenges posed by environmental adversities. This study proposes a novel hybrid machine learning model that combines deep learning and advanced ensemble techniques to improve air quality prediction. This model combines Deep Neural Network (DNN), along with ensemble learning algorithms such as XGBoost, CatBoost, LightGBM, and Random Forest as a metamodel to aggregate the predictions. The model was tested on a dataset that included environmental aspects ranging from PM2.5, PM10, CO, and NO2 variables to socio-economic variables such as proximity to industrial areas and population density. Feature selection and data imbalance were handled using RFECV and SMOTE, respectively. The tuning of the hyperparameters in the model was done using both TPE implemented by Optuna and Bayesian optimization by Keras-Tuner. This model can achieve a remarkable accuracy of 97.34%, which is superior to conventional approaches. The results present a case for building hybrid machine learning techniques for air quality prediction as a basis for intelligent global environmental monitoring in an interpretable, accurate, and scalable manner. Future work can integrate the real-time incoming data from the Internet of Things (IOT) and extend the model concept for multi-prediction benchmarks to other environmental indices, thus broadening its horizon and applicability to upcoming global environmental challenges.

Keywords

Main Subjects


Agbehadji, I. E., & Obagbuwa, I. C. (2024). Systematic Review of Machine Learning and Deep Learning Techniques for Spatiotemporal Air Quality Prediction. Atmosphere, 15(11), 1352. https://doi.org/https://doi.org/10.3390/atmos15111352 
Araveeporn, A. (2022). Comparing the linear and quadratic discriminant analysis of diabetes disease classification based on data multicollinearity. International Journal of Mathematics and Mathematical Sciences, 2022(1), 1-12. https://doi.org/https://doi.org/10.1155/2022/7829795 
Arifuzzaman, M., Hasan, M. R., Toma, T. J., Hassan, S. B., & Paul, A. K. (2023). An advanced decision tree-based deep neural network in nonlinear data classification. Technologies, 11(1), 1-24. https://doi.org/https://doi.org/10.3390/technologies11010024 
Awad, M., & Fraihat, S. (2023). Recursive feature elimination with cross-validation with decision tree: Feature selection method for machine learning-based intrusion detection systems. Journal of Sensor and Actuator Networks, 12(5), 67. https://doi.org/https://doi.org/10.3390/jsan12050067 
Beaulac, C., & Rosenthal, J. S. (2020). BEST: A decision tree algorithm that handles missing values. Computational Statistics, 35(3), 1001-1026. https://doi.org/https://doi.org/10.1007/s00180-020-00987-z 
Bentéjac, C., Csörgő, A., & Martínez-Muñoz, G. (2021). A comparative analysis of gradient boosting algorithms. Artificial Intelligence Review, 54, 1937-1967. https://doi.org/https://doi.org/10.1007/s10462-020-09896-5 
Bhanja, S., & Das, A. (2021). A hybrid deep learning model for air quality time series prediction. Indonesian Journal of Electrical Engineering and Computer Science, 22(3), 1611-1618. https://doi.org/https://doi.org/10.11591/ijeecs.v22.i3.pp1611-1618 
Bhardwaj, D., & Ragiri, P. R. (2024). A Deep Learning Approach to Enhance Air Quality Prediction: Comparative Analysis of LSTM, LSTM with Attention Mechanism and BiLSTM. 2024 IEEE Region 10 Symposium (TENSYMP), 
Can, R., Kocaman, S., & Gokceoglu, C. (2021). A comprehensive assessment of XGBoost algorithm for landslide susceptibility mapping in the upper basin of Ataturk dam, Turkey. Applied Sciences, 11(11), 4993. https://doi.org/https://doi.org/10.3390/app11114993 
Chang, Y.-S., Abimannan, S., Chiao, H.-T., Lin, C.-Y., & Huang, Y.-P. (2020). An ensemble learning based hybrid model and framework for air pollution forecasting. Environmental Science and Pollution Research, 27, 38155-38168. https://doi.org/https://doi.org/10.1007/s11356-020-09855-1 
Charbuty, B., & Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of applied science and technology trends, 2(01), 20-28. https://doi.org/https://doi.org/10.38094/jastt20165 
Chaturvedi, P. (2024). Air Quality Prediction System Using Machine Learning Models. Water, Air, & Soil Pollution, 235(9), 578. https://doi.org/https://doi.org/10.1007/s11270-024-07390-0 
Chowdhury, A. A., Das, A., Hoque, K. K. S., & Karmaker, D. (2022). A comparative study of hyperparameter optimization techniques for deep learning. Proceedings of International Joint Conference on Advances in Computational Intelligence: IJCACI 2021, 
Dey, R., & Mathur, R. (2023). Ensemble learning method using stacking with base learner, a comparison. International Conference on Data Analytics and Insights, 
Ding, Y., Zhu, H., Chen, R., & Li, R. (2022). An efficient AdaBoost algorithm with the multiple thresholds classification. Applied Sciences, 12(12), 5872. https://doi.org/https://doi.org/10.3390/app12125872 
Djeziri, M. A., Djedidi, O., Morati, N., Seguin, J.-L., Bendahan, M., & Contaret, T. (2022). A temporal-based SVM approach for the detection and identification of pollutant gases in a gas mixture. Applied Intelligence, 52(6), 6065-6078. https://doi.org/https://doi.org/10.1007/s10489-021-02761-0 
Dong, Y., Li, F., Zhu, T., & Yan, R. (2024). Air quality prediction based on quantum activation function optimized hybrid quantum classical neural network. Frontiers in Physics, 12, 1412664. https://doi.org/https://doi.org/10.3389/fphy.2024.1412664 
Du, S., Li, T., Yang, Y., & Horng, S.-J. (2019). Deep air quality forecasting using hybrid deep learning framework. IEEE Transactions on Knowledge and Data Engineering, 33(6), 2412-2424. https://doi.org/https://doi.org/10.1109/tkde.2019.2954510 
Emeç, M., & Yurtsever, M. (2025). A novel ensemble machine learning method for accurate air quality prediction. International Journal of Environmental Science and Technology, 22(1), 459-476. https://doi.org/https://doi.org/10.1007/s13762-024-05671-z 
Fathima, M. D., Donavalli, S., & Kambham, H. (2024). Air Quality Prediction using Deep Learning models. 2024 International Conference on Advancements in Power, Communication and Intelligent Systems (APCI), 
Ghosh, S., Gourisaria, M. K., Sahoo, B., & Das, H. (2023). A pragmatic ensemble learning approach for rainfall prediction. Discover Internet of Things, 3(1), 13. https://doi.org/https://doi.org/10.1007/s43926-023-00044-3 
Gilik, A., Ogrenci, A. S., & Ozmen, A. (2022). Air quality prediction using CNN+ LSTM-based hybrid deep learning architecture. Environmental Science and Pollution Research(29), 1-19. https://doi.org/https://doi.org/10.1007/s11356-021-16227-w 
Hancock, J. T., & Khoshgoftaar, T. M. (2020). CatBoost for big data: an interdisciplinary review. Journal of big data, 7(1), 94. https://doi.org/https://doi.org/10.1186/s40537-020-00369-8 
Hastie, T. (2020). Ridge regularization: An essential concept in data science. Technometrics, 62(4), 426-433. https://doi.org/https://doi.org/10.1080/00401706.2020.1791959 
Hettige, K. H., Ji, J., Xiang, S., Long, C., Cong, G., & Wang, J. (2024). Airphynet: Harnessing physics-guided neural networks for air quality prediction. arXiv preprint arXiv:2402.03784, 2, 1-16. https://doi.org/https://doi.org/10.48550/arxiv.2402.03784 
Hosein, P., & Baboolal, K. (2024). Bayes Classification using an approximation to the Joint Probability Distribution of the Attributes. International Conference on Deep Learning Theory and Applications, 
Hu, Y., Li, Q., Shi, X., Yan, J., & Chen, Y. (2023). Multi-spatial Multi-temporal Air Quality Forecasting with Integrated Monitoring and Reanalysis Data. arXiv preprint arXiv:2401.00521, 1. https://doi.org/https://doi.org/10.48550/arxiv.2401.00521 
Jafarnejad Chaghoshi, A., Rezasoltani, A., & Khani, A. M. (2024). Unleashing the Power of Ensemble Learning: Predicting National Ranks in Iran’s University Entrance Examination. Industrial Management Journal, 16(3), 457-481. https://doi.org/https://doi.org/10.22059/imj.2024.381521.1008178 
Jayaraman, S., & Abirami, S. (2025). Enhancing urban air quality prediction using time-based-spatial forecasting framework. Scientific Reports, 15(1), 4139. https://doi.org/https://doi.org/10.1038/s41598-024-83248-z 
Kebriaeezadeh, S., Ghodduosi, J., Alesheikh, A. A., Arjmandi, R., & Mirzahosseini, S. A. (2022). Analyzing trend and factors affecting air quality in urban areas: a case study in Isfahan-metropolis, Iran. Environmental Sciences, 20(2), 171-184. 
Khamlich, M., Stabile, G., Rozza, G., Környei, L., & Horváth, Z. (2023). A physics-based reduced order model for urban air pollution prediction. Computer Methods in Applied Mechanics and Engineering, 417, 116416. https://doi.org/https://doi.org/10.48550/arxiv.2305.04575 
Kim, H. I., Kim, D., Mahdian, M., Salamattalab, M. M., Bateni, S. M., & Noori, R. (2024). Incorporation of water quality index models with machine learning-based techniques for real-time assessment of aquatic ecosystems. Environmental Pollution, 355, 124242. https://doi.org/https://doi.org/10.1016/j.envpol.2024.124242 
Kim, H. I., Kim, D., Salamattalab, M. M., Mahdian, M., Bateni, S. M., & Noori, R. (2024). Machine learning-based modeling of surface water temperature dynamics in arctic lakes. Environmental Science and Pollution Research, 31(49), 59642-59655. https://doi.org/https://doi.org/10.1007/s11356-024-35173-x 
Kramer, O. (2013). Dimensionality reduction with unsupervised nearest neighbors (Vol. 51). Springer. https://doi.org/https://doi.org/10.1007/978-3-642-38652-7_2 
Li, F., & Dong, Y. (2024). Air quality prediction based on improved quantum long short-term memory neural networks. Physica Scripta, 99(8), 085035. https://doi.org/https://doi.org/10.1088/1402-4896/ad619a 
Li, Y., Jiang, T., Gu, H., Lu, W., Wu, Q., & Yu, Y. (2023). Air Quality Index Prediction Based on CNN-LSTM-Attention Hybrid Modeling. 2023 International Conference on the Cognitive Computing and Complex Data (ICCD), 
Liu, H., Cheng, J., & Liao, W. (2024). Deep neural networks are adaptive to function regularity and data distribution in approximation and estimation. arXiv preprint arXiv:2406.05320, 1. https://doi.org/https://doi.org/10.48550/arxiv.2406.05320 
Ma, X., Chen, T., Ge, R., Xv, F., Cui, C., & Li, J. (2023). Prediction of PM2. 5 concentration using spatiotemporal data with machine learning models. Atmosphere, 14(10), 1517. https://doi.org/https://doi.org/10.3390/atmos14101517 
Mao, Q., Zhu, X., Zhang, X., & Kong, Y. (2024). Effect of air pollution on the global burden of cardiovascular diseases and forecasting future trends of the related metrics: a systematic analysis from the Global Burden of Disease Study 2021. Frontiers in Medicine, 11, 1472996. https://doi.org/https://doi.org/10.3389/fmed.2024.1472996 
Mateen., M. (2024). Air Quality and Pollution Assessment [Data set] (https://doi.org/https://doi.org/10.34740/KAGGLE/DS/6197184
Mengara Mengara, A. G., Park, E., Jang, J., & Yoo, Y. (2022). Attention-based distributed deep learning model for air quality forecasting. Sustainability, 14(6), 3269. https://doi.org/https://doi.org/10.3390/su14063269 
Mirzadeh, H., & Omranpour, H. (2024). Extended Random Forest for multivariate air quality forecasting. International Journal of Machine Learning and Cybernetics, 16, 1-25. https://doi.org/https://doi.org/10.1007/s13042-024-02329-7 
Mitchell, R., & Frank, E. (2017). Accelerating the XGBoost algorithm using GPU computing. PeerJ Computer Science, 3, e127. https://doi.org/https://doi.org/10.7717/peerj-cs.127 
Natarajan, S. K., Shanmurthy, P., Arockiam, D., Balusamy, B., & Selvarajan, S. (2024). Optimized machine learning model for air quality index prediction in major cities in India. Scientific Reports, 14(1), 6795. https://doi.org/https://doi.org/10.1038/s41598-024-54807-1 
Nguyen, A. T., Pham, D. H., Oo, B. L., Ahn, Y., & Lim, B. T. (2024). Predicting air quality index using attention hybrid deep learning and quantum-inspired particle swarm optimization. Journal of big data, 11(1), 71. https://doi.org/https://doi.org/10.1186/s40537-024-00926-5 
Noori, R., Hoshyaripour, G., Ashrafi, K., & Araabi, B. N. (2010). Uncertainty analysis of developed ANN and ANFIS models in prediction of carbon monoxide daily concentration. Atmospheric Environment, 44(4), 476-482. https://doi.org/https://doi.org/10.1016/j.atmosenv.2009.11.005 
Nukui, T., & Onogi, A. (2023). An R package for ensemble learning stacking. Bioinformatics Advances, 3(1), vbad139. https://doi.org/https://doi.org/10.1093/bioadv/vbad139 
Pal, A. (2021). Logistic regression: A simple primer. Cancer Research, Statistics, and Treatment, 4(3), 551-554. https://doi.org/https://doi.org/10.4103/crst.crst_164_21 
Petrić, V., Hussain, H., Časni, K., Vuckovic, M., Schopper, A., Andrijić, Ž. U., Kecorius, S., Madueno, L., Kern, R., & Lovrić, M. (2024). Ensemble Machine Learning, Deep Learning, and Time Series Forecasting: Improving Prediction Accuracy for Hourly Concentrations of Ambient Air Pollutants. Aerosol and Air Quality Research, 24(12), 230317. https://doi.org/https://doi.org/10.4209/aaqr.230317 
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V., & Gulin, A. (2018). CatBoost: unbiased boosting with categorical features. Advances in neural information processing systems, 31, 1-11. https://doi.org/https://doi.org/10.48550/arxiv.1706.09516 
Qiuqian, W., GaoMin, KeZhu, Z., & Chenchen. (2025). A light gradient boosting machine learning-based approach for predicting clinical data breast cancer. Multiscale and Multidisciplinary Modeling, Experiments and Design, 8(1), 75. https://doi.org/https://doi.org/10.1007/s41939-024-00662-6 
Quynh, T. P. T., Viet, T. N., Thi, H. D., & Manh, K. H. (2023). Enhancing air quality prediction accuracy using hybrid deep learning. Int J Environ Sci Dev, 14(2), 155-159. https://doi.org/https://doi.org/10.18178/ijesd.2023.14.2.1428 
Rahman, M. M., Nayeem, M. E. H., Ahmed, M. S., Tanha, K. A., Sakib, M. S. A., Uddin, K. M. M., & Babu, H. M. H. (2024). AirNet: predictive machine learning model for air quality forecasting using web interface. Environmental Systems Research, 13(1), 44. https://doi.org/https://doi.org/10.1186/s40068-024-00378-z 
Rajagopal, K., & Narayanan, K. (2024). A Novel Approach for Air Quality Index Prognostication using Hybrid Optimization Techniques. International Research Journal of Multidisciplinary Technovation, 6(2), 84-99. https://doi.org/https://doi.org/10.54392/irjmt2427 
Ramadan, M. S., Abuelgasim, A., & Al Hosani, N. (2024). Advancing air quality forecasting in Abu Dhabi, UAE using time series models. Frontiers in Environmental Science, 12, 1393878. https://doi.org/https://doi.org/10.3389/fenvs.2024.1393878 
Roy, S., Mehera, R., Pal, R. K., & Bandyopadhyay, S. K. (2023). Hyperparameter optimization for deep neural network models: a comprehensive study on methods and techniques. Innovations in Systems and Software Engineering, 1-12. https://doi.org/https://doi.org/10.1007/s11334-023-00540-3 
Rudin, C. (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature machine intelligence, 1(5), 206-215. https://doi.org/https://doi.org/10.48550/arxiv.1811.10154 
Saravani, M. J., Noori, R., Jun, C., Kim, D., Bateni, S. M., Kianmehr, P., & Woolway, R. I. (2025). Predicting chlorophyll-a concentrations in the world’s largest lakes using Kolmogorov-Arnold networks. Environmental Science & Technology, 59(3), 1801-1810. https://doi.org/https://doi.org/10.1021/acs.est.4c11113 
Scornet, E. (2023). Trees, forests, and impurity-based variable importance in regression. Annales de l’Institut Henri Poincare (B) Probabilites et statistiques, 
Shankar, L., & Arasu, K. (2023). Deep Learning Techniques for Air Quality Prediction: A Focus on PM2. 5 and Periodicity. Migration Letters, 20(S13), 468-484. https://doi.org/https://doi.org/10.59670/ml.v20is13.6477 
Sharifi, M. S., Aslami, A., Zaheb, H., Abed, I., Shokoori, A. W., & Yona, A. (2024). Modeling the Impact of Socio-Economic and Environmental Factors on Air Quality in the City of Kabul. Sustainability, 16(24), 10969. https://doi.org/https://doi.org/10.3390/su162410969 
Sigamani, S. (2024). Air quality index prediction with optimisation enabled deep learning model in IoT application. Environmental Technology, 46(11), 1892–1908. https://doi.org/https://doi.org/10.1080/09593330.2024.2409993 
Sun, Q., Zhu, Y., Chen, X., Xu, A., & Peng, X. (2021). A hybrid deep learning model with multi-source data for PM 2.5 concentration forecast. Air Quality, Atmosphere & Health, 14, 503-513. https://doi.org/https://doi.org/10.1007/s11869-020-00954-z 
Tang, S. (2024). The box office prediction model based on the optimized XGBoost algorithm in the context of film marketing and distribution. Plos one, 19(10), e0309227. https://doi.org/https://doi.org/10.1371/journal.pone.0309227 
Tejaswi, M. (2024). AIR MAP- Deep Learning Prediction in Air Quality for Smarter Decisions. Interantional Journal of Scientific Research in Engineering and Management, 08(05), 1-5. https://doi.org/https://doi.org/10.55041/ijsrem35317 
Tsokov, S., Lazarova, M., & Aleksieva-Petrova, A. (2022). A hybrid spatiotemporal deep model based on CNN and LSTM for air pollution prediction. Sustainability, 14(9), 5104. https://doi.org/https://doi.org/10.3390/su14095104 
Victoria, A. H., & Maragatham, G. (2021). Automatic tuning of hyperparameters using Bayesian optimization. Evolving Systems, 12(1), 217-223. https://doi.org/https://doi.org/10.1007/s12530-020-09345-2 
Wang, T. (2024). Air Quality Prediction based on Neural Network. Highlights in Science, Engineering and Technology, 105, 37-43. https://doi.org/https://doi.org/10.54097/2fsfav47 
Wang, X., Zhang, S., Chen, Y., He, L., Ren, Y., Zhang, Z., Li, J., & Zhang, S. (2024). Air quality forecasting using a spatiotemporal hybrid deep learning model based on VMD–GAT–BiLSTM. Scientific Reports, 14(1), 17841. https://doi.org/https://doi.org/10.54097/2fsfav47 
Wang, Y., Liu, K., He, Y., Wang, P., Chen, Y., Xue, H., Huang, C., & Li, L. (2024). Enhancing air quality forecasting: a novel spatio-temporal model integrating graph convolution and multi-head attention mechanism. Atmosphere, 15(4), 418. https://doi.org/https://doi.org/10.1038/s41598-024-68874-x 
Wardana, I. N. K., Gardner, J. W., & Fahmy, S. A. (2021). Optimising deep learning at the edge for accurate hourly air quality prediction. Sensors, 21(4), 1064. https://doi.org/https://doi.org/10.3390/s21041064 
Wonderling, D., Mariani, A., Samarasekera, E. J., Wilkinson, C., Patel, R. S., & Mills, J. (2024). Secondary prevention of cardiovascular disease, including cholesterol targets: summary of updated NICE guidance. bmj, 384, 1-4. https://doi.org/https://doi.org/10.1136/bmj.q637 
Xu, R., Wang, D., Li, J., Wan, H., Shen, S., & Guo, X. (2023). A hybrid deep learning model for air quality prediction based on the time–frequency domain relationship. Atmosphere, 14(2), 405. https://doi.org/https://doi.org/10.3390/atmos14020405 
Zhang, Z., Zeng, Y., & Yan, K. (2021). A hybrid deep learning technology for PM 2.5 air quality forecasting. Environmental Science and Pollution Research, 28, 39409-39422. https://doi.org/https://doi.org/10.1007/s11356-021-12657-8 
Zhao, M. (2025). Synthetic minority oversampling technique based on natural neighborhood graph with subgraph cores for class-imbalanced classification. The Journal of Supercomputing, 81(1), 1-35. https://doi.org/https://doi.org/10.1007/s11227-024-06655-z 
Zhao, M., & Ye, N. (2024). High-Dimensional Ensemble Learning Classification: An Ensemble Learning Classification Algorithm Based on High-Dimensional Feature Space Reconstruction. Applied Sciences, 14(5), 1956. https://doi.org/https://doi.org/10.3390/app14051956 
Zhao, S., Zhang, B., Yang, J., Zhou, J., & Xu, Y. (2024). Linear discriminant analysis. Nature Reviews Methods Primers, 4(1), 70. https://doi.org/https://doi.org/10.1038/s43586-024-00346-y