A Review of the Application of Machine Learning and Geospatial Analysis Methods in Air Pollution Prediction

Document Type : Review Paper


1 GIS Department, Faculty of Geodesy & Geomatics Engineering, K. N. Toosi University of Technology, P.O.Box 16315-1355, Tehran, Iran

2 School of Built Environment, Faculty of the Arts, Design & Architecture, University of New South Wales (UNSW).P.O.Box 259, Sydney, Australia


During the past years, air quality has become an important global issue, due to its impact on people's lives and the environment, and has caused severe problems for humans. As a prevention to effectively control air pollution, forecasting models have been developed as a base for decision-makers and urban managers during the past decades. In general, these methods can be divided into three classes: statistical methods, machine learning methods and hybrid methods. This study's primary intent is to supply an overview of air pollution prediction techniques in urban areas and their advantages and disadvantages. A comparison has also been made between the methods in terms of error assessment and the use of geospatial information systems (GIS). In addition, several approaches were applied to actual data, and the findings were compared to those acquired from previous published literatures. The results showed that forecasting using machine learning and hybrid methods has provided better results. It has also been demonstrated that GIS can improve the results of the forecasting methods.


Abdullah, S., Ismail, M. and Fong, S. (2017). Multiple linear regression (MLR) models for long term PM10 concentration forecasting during different monsoon seasons. Journal of Sustainability Science and Management, 12(1), 60-69. http://jbsd.umt.edu.my/wp-content/uploads/sites/51/2017/06/7-web.pdf

Aditya, C., Deshmukh, C. R., Nayana, D. and Vidyavastu, P. G. (2018). Detection and prediction of air pollution using machine learning models. International Journal of Engineering Trends and Technology (IJETT), 59(4). https://doi.org/10.14445/22315381/IJETT-V59P238

Ahmad, S. S., Aziz, N., Ejaz, M. and Ali, M. T. (2012). Integration of GIS and Artificial Neural Network for prediction of Ozone Concentration in Semi-rural areas of Rawalpindi and Islamabad. International Journal Of Computational Engineering Research, 2. https://www.semanticscholar.org/paper/Integration-of-GIS-and-Artificial-Neural-Network-of-Ahmad-Aziz/f0c7357e7b7f0fa2f5a99474b390285877d3be3c

Alkasassbeh, M., Sheta, A. F., Faris, H. and Turabieh, H. (2013). Prediction of PM10 and TSP air pollution parameters using artificial neural network autoregressive, external input models: a case study in salt, Jordan. Middle-East Journal of Scientific Research, 14(7), 999-1009. http://www.idosi.org/.../20.pdf

An overview of the analyzing patterns toolset. From ESRI. Website at:https://pro.arcgis.com/en/pro-app/latest/tool-reference/spatial-statistics/an-overview-of-the-analyzing-patterns-toolset.htm

Arslan, O. and Akyürek, Ö. (2018). Spatial modelling of air pollution from pm10 and so2 concentrations during winter season in marmara region (2013-2014). International Journal of Environment and Geoinformatics, 5(1), 1-16. https://doi.org/10.30897/ijegeo.412391

Arthur C. Stern, editors. The Effects of Air Pollution; Third edition. Academic Press: New York, NY, USA.1977.

Asadolah-Fardi, G. and Zangoi, H. (2017). PM10 AIR POLLUTION IN MASHAD CITY USING ARTIFICIAL NEURAL NETWORK AND MAKOV CHAIN MODEL. Journal of Applied researches in Geographical Sciences.; 17(47):39-59. https://www.sid.ir/en/journal/ViewPaper.aspx?ID=608455

Asghari, M. and Nematzadeh, H. (2016). Predicting air pollution in Tehran: Genetic algorithm and back propagation neural network. Journal of AI and Data Mining, 4(1), 49-54. https://doi.org/10.5829/IDOSI.JAIDM.2016.04.01.06

Atabi, F., Moattar, F., Mansouri, N., Alesheikh, A. and Mirzahosseini, S. (2013). Assessment of variations in benzene concentration produced from vehicles and gas stations in Tehran using GIS. International Journal of Environmental Science and Technology, 10(2), 283-294. https://doi.org/10.1007/s13762-012-0151-6

Ayturan, Y. A., Ayturan, Z. C. and Altun, H. O. (2018). Air pollution modelling with deep learning: a review. International Journal of Environmental Pollution and Environmental Modelling, 1(3), 58-62. https://api.semanticscholar.org/CorpusID:201904082

Bai, L., Wang, J., Ma, X. and Lu, H. (2018). Air pollution forecasts: An overview. International journal of environmental research and public health, 15(4), 780. https://doi.org/10.3390/ijerph15040780

Broomhead, D. and Lowe, D. (1988). Multivariable functional interpolation and adaptive networks, complex systems, vol. 2. https://sci2s.ugr.es/keel/pdf/algorithm/articulo/1988-Broomhead-CS.pdf

Cabaneros, S. M., Calautit, J. K. and Hughes, B. R. (2019). A review of artificial neural network models for ambient air pollution prediction. Environmental Modelling & Software, 119, 285-304. https://doi.org/10.1016/j.envsoft.2019.06.014

Cai, M., Yin, Y. and Xie, M. (2009). Prediction of hourly air pollutant concentrations near urban arterials using artificial neural network approach. Transportation Research Part D: Transport and Environment, 14(1), 32-41. https://doi.org/10.1016/j.trd.2008.10.004

Chaloulakou, A., Grivas, G. and Spyrellis, N. (2003). Neural network and multiple regression models for PM10 prediction in Athens: a comparative assessment. Journal of the Air & Waste Management Association, 53(10), 1183-1190. https://doi.org/10.1080/10473289.2003.10466276

Chelani, A. B., Gajghate, D. and Hasan, M. (2002). Prediction of ambient PM10 and toxic metals using artificial neural networks. Journal of the Air & Waste Management Association, 52(7), 805-810. https://doi.org/10.1080/10473289.2002.10470827

Copeland, M. (2016). What’s the Difference between Artificial Intelligence, Machine Learning, and Deep Learning? https://blogs.nvidia.com/blog/2016/07/29/whats-difference-artificial-intelligencemachine-learning-deep-learning-ai/, retrieval date: 24.04.2018.

Cortes, C. and Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297. https://doi.org/10.1007/BF00994018

Cortina–Januchs, M. G., Quintanilla–Dominguez, J., Vega–Corona, A. and Andina, D. (2015). Development of a model for forecasting of PM10 concentrations in Salamanca, Mexico. Atmospheric Pollution Research, 6(4), 626-634. https://doi.org/10.5094/APR.2015.071

Díaz-Robles, L. A., Ortega, J. C., Fu, J. S., Reed, G. D., Chow, J. C., Watson, J. G. and Moncada-Herrera, J. A. (2008). A hybrid ARIMA and artificial neural networks model to forecast particulate matter in urban areas: The case of Temuco, Chile. Atmospheric Environment, 42(35), 8331-8340. https://doi.org/10.1016/j.atmosenv.2008.07.020

Djemai, S., Brahmi, B. and Bibi, M. O. (2016). A primal–dual method for SVM training. Neurocomputing, 211, 34-40.

Feng, Y., Zhang, W., Sun, D. and Zhang, L. (2011). Ozone concentration forecast method based on genetic algorithm optimized back propagation neural networks and support vector machine data classification. Atmospheric Environment, 45(11), 1979-1985. https://doi.org/10.1016/j.atmosenv.2011.01.022

Gardner, M. and Dorling, S. (2000). Statistical surface ozone models: an improved methodology to account for non-linear behaviour. Atmospheric Environment, 34(1), 21-34. https://doi.org/10.1016/S1352-2310(99)00359-3

Ghadi, M. E., Qaderi, F. and Babanezhad, E. (2019). Prediction of mortality resulted from NO 2 concentration in Tehran by Air Q+ software and artificial neural network. International Journal of Environmental Science and Technology, 16(3), 1351-1368. https://doi.org/10.1007/s13762-018-1818-4

Graupe, D. (2007). Principles of Artificial Neural Networks: World Scientific. https://doi.org/10.1142/8868

Grivas, G. and Chaloulakou, A. (2006). Artificial neural network models for prediction of PM10 hourly concentrations, in the Greater Area of Athens, Greece. Atmospheric Environment, 40(7), 1216-1229. https://doi.org/10.1016/j.atmosenv.2005.10.036

Haykin, S. (1998). Neural Networks: A Comprehensive Foundation (2nd Edition): Prentice Hall. https://dl.acm.org/doi/book/10.5555/1213811

Hiregoudar S. (2020, August 5). Ways to Evaluate Regression Models. Website at: https://towardsdatascience.com/ways-to-evaluate-regression-models-77a3ff45ba70

Hrust, L., Klaić, Z. B., Križan, J., Antonić, O. and Hercog, P. (2009). Neural network forecasting of air pollutants hourly concentrations using optimised temporal averages of meteorological variables and pollutant concentrations. Atmospheric Environment, 43(35), 5588-5596. https://doi.org/10.1016/j.atmosenv.2009.07.048

Jenkin, M. E. and Clemitshaw, K. C. (2000). Ozone and other secondary photochemical pollutants: chemical processes governing their formation in the planetary boundary layer. Atmospheric Environment, 34(16), 2499-2527. https://doi.org/10.1016/S1352-2310 (99)00478-1

Kröse, B., Krose, B., van der Smagt, P. and Smagt, P. (1993). An introduction to neural networks. http://citeseerx.ist.psu.edu/viewdoc/similar?doi=

Kumar, A. and Goyal, P. (2011). Forecasting of air quality in Delhi using principal component regression technique. Atmospheric Pollution Research, 2(4), 436-444. https://doi.org/10.5094/APR.2011.050

Leong, W., Kelani, R. and Ahmad, Z. (2020). Prediction of air pollution index (API) using support vector machine (SVM). Journal of Environmental Chemical Engineering, 8(3), 103208. https://doi.org/10.1016/j.jece.2019.103208

Liu, W., Li, X., Chen, Z., Zeng, G., León, T., Liang, J., Huang, G., Gao, Z., Jiao, S., He, X. and Lai, M. (2015). Land use regression models coupled with meteorology to model spatial and temporal variability of NO2 and PM10 in Changsha, China. Atmospheric Environment, 116, 272-280. https://doi.org/10.1016/j.atmosenv.2015.06.056

Lu, W.-Z., Wang, W.-J., Wang, X.-K., Yan, S.-H. and Lam, J. C. (2004). Potential assessment of a neural network model with PCA/RBF approach for forecasting pollutant trends in Mong Kok urban air, Hong Kong. Environmental Research, 96(1), 79-87. https://doi.org/10.1016/j.envres.2003.11.003

Lu, W., Wang, W., Leung, A. Y., Lo, S.-M., Yuen, R. K., Xu, Z. and Fan, H. (2002). Air pollutant parameter forecasting using support vector machines. Paper presented at the Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No. 02CH37290). https://doi.org/10.1109/IJCNN.2002.1005545

McKendry, I. G. (2002). Evaluation of artificial neural networks for fine particulate pollution (PM10 and PM2. 5) forecasting. Journal of the Air & Waste Management Association, 52(9), 1096-1101. https://doi.org/10.1080/10473289.2002.10470836

Mishra, D. and Goyal, P. (2015). Development of artificial intelligence based NO2 forecasting models at Taj Mahal, Agra. Atmospheric Pollution Research, 6(1), 99-106. https://doi.org/10.5094/APR.2015.012

Molina-Gómez, N., Díaz-Arévalo, J. and López-Jiménez, P. A. (2020). Air quality and urban sustainable development: the application of machine learning tools. International Journal of Environmental Science and Technology, 1-18. https://doi.org/10.1007/s13762-020-02896-6

Nejadkoorki, F. and Baroutian, S. (2012). Forecasting extreme PM10 concentrations using artificial neural networks. https://www.sid.ir/en/Journal/ViewPaper.aspx?ID=370188

Nishikawa, Y. and Kannari, A. (2011). Atmospheric concentration of ammonia, nitrogen dioxide, nitric acid, and sulfur dioxide by passive method within Osaka prefecture and their emission inventory. Water, Air, & Soil Pollution, 215(1-4), 229-237. https://doi.org/10.1007/s11270-010-0472-3

Núñez-Alonso, D., Pérez-Arribas, L. V., Manzoor, S. and Cáceres, J. O. (2019). Statistical tools for air pollution assessment: multivariate and spatial analysis studies in the Madrid region. Journal of analytical methods in chemistry, 2019. https://doi.org/10.1155/2019/9753927

Nunez, C. (2019). Air pollution, explained. From the nationalgeographic. Website at: https://www.nationalgeographic.com/environment/global-warming/pollution/

Ochando, L. C., Julián, C. I. F., Ochando, F. C. and Ramirez, C. F. (2015). Airvlc: An application for real-time forecasting urban air pollution. Paper presented at the MUD@ ICML. https://dl.acm.org/doi/10.5555/3045776.3045786

Papanastasiou, D., Melas, D. and Kioutsioukis, I. (2007). Development and assessment of neural network and multiple regression models in order to predict PM10 levels in a medium-sized Mediterranean city. Water, air, and soil pollution, 182(1), 325-334. https://doi.org/10.1007/s11270-007-9341-0

Park, J. and Sandberg, I. W. (1991). Universal approximation using radial-basis-function networks. Neural computation, 3(2), 246-257. https://doi.org/10.1162/neco.1991.3.2.246

Paschalidou, A. K., Karakitsios, S., Kleanthous, S. and Kassomenos, P. A. (2011). Forecasting hourly PM 10 concentration in Cyprus through artificial neural networks and multiple regression models: implications to local environmental management. Environmental Science and Pollution Research, 18(2), 316-327. https://doi.org/10.1007/s11356-010-0375-2

Piraino, F., Aina, R., Palin, L., Prato, N., Sgorbati, S., Santagostino, A. and Citterio, S. (2006). Air quality biomonitoring: Assessment of air pollution genotoxicity in the Province of Novara (North Italy) by using Trifolium repens L. and molecular markers. Science of the Total Environment, 372(1), 350-359. https://doi.org/10.1016/j.scitotenv.2006.09.009

Qiao, C., Gen-niu, C. and Liu, C. (2010). Application of support vector machine to atmospheric pollution prediction. Computer Technology and Development, 20, 250-253. http://en.cnki.com.cn/Article_en/CJFDTotal-WJFZ201001064.htm

Sánchez, A. S., Nieto, P. G., Fernández, P. R., del Coz Díaz, J. and Iglesias-Rodríguez, F. J. (2011). Application of an SVM-based regression model to the air quality study at local scale in the Avilés urban area (Spain). Mathematical and Computer Modelling, 54(5-6), 1453-1466. https://doi.org/10.1016/j.mcm.2011.04.017

Shad, R., Mesgari, M. S. and Shad, A. (2009). Predicting air pollution using fuzzy genetic linear membership kriging in GIS. Computers, environment and urban systems, 33(6), 472-481. https://doi.org/10.1016/j.compenvurbsys.2009.10.004

Stein, A., Riley, J. and Halberg, N. (2001) Issues of scale for environmental indicators. Agriculture, Ecosystems & Environment. 87(2):215-32. https://doi.org/10.1016/S0167-8809(01)00280-8

Taheri Shahraiyni, H. and Sodoudi, S. (2016). Statistical modeling approaches for PM10 prediction in urban areas; A review of 21st-century studies. Atmosphere, 7(2), 15. https://doi.org/10.3390/atmos7020015

Tan, J., Zhang, Y., Ma, W., Yu, Q., Wang, J. and Chen, L. (2015). Impact of spatial resolution on air quality simulation: A case study in a highly industrialized area in Shanghai, China. Atmospheric Pollution Research, 6(2), 322-333. https://doi.org/10.5094/APR.2015.036

Valari, M. and Menut, L. (2008). Does an increase in air quality models’ resolution bring surface ozone concentrations closer to reality? Journal of Atmospheric and Oceanic Technology, 25(11), 1955-1968. https://doi.org/10.1175/2008JTECHA1123.1

Vapnik, V. N. (1999). An overview of statistical learning theory. IEEE transactions on neural networks, 10(5), 988-999. https://doi.org/10.1109/72.788640

Vitolo, C., Scutari, M., Ghalaieny, M., Tucker, A. and Russell, A. (2018). Modeling air pollution, climate, and health data using Bayesian Networks: A case study of the English regions. Earth and Space Science, 5(4), 76-88. https://doi.org/10.1002/2017EA000326

Wang, L.-S., Xu, Y.-T. and Zhao, L.-S. (2005). A kind of hybrid classification algorithm based on rough set and support vector machine. Paper presented at the 2005 international conference on machine learning and cybernetics. https://doi.org/10.1109/ICMLC.2005.1527214

What is GIS?. (2019). From GISLongue. Website at: https://www.gislounge.com/what-is-gis/

Yadav, V. and Nath, S. (2019). Novel hybrid model for daily prediction of PM 10 using principal component analysis and artificial neural network. International Journal of Environmental Science and Technology, 16(6), 2839-2848. https://doi.org/10.1007/s13762-018-1999-x

Zangooei, H., Delnavaz, M. and Asadollahfardi, G. (2016). Prediction of coagulation and flocculation processes using ANN models and fuzzy regression. Water Science and Technology, 74(6), 1296-1311. https://doi.org/10.2166/wst.2016.315

Zickus, M., Greig, A. and Niranjan, M. (2002). Comparison of four machine learning methods for predicting PM 10 concentrations in Helsinki, Finland. Water, Air and Soil Pollution: Focus, 2(5-6), 717-729. https://doi.org/10.1023/A:1021321820639