Predicting Covid-19 ICU Needs Using Deep Learning, XGBoost and Random Forest Regression with the Sliding Window Technique

Written by Aristeidis Mystakidis¹, Nikolaos Stasinos¹, Anestis Kousis¹, Vangelis Sarlis¹, Paraskevas Koukaras¹, Dimitris Rousidis¹, Ioannis Kotsiopoulos², and Christos Tjortjis^1*

The effects of COVID-19 have caused severe strains to healthcare systems globally. Healthcare infrastructures are tested to their limits in almost every country and city, smart or not. This article utilizes deep and machine learning forecasting algorithms, such as Artificial Neural Networks (ANN), XGBoost and Random Forest. Using the sliding window technique, we predict the expected number of Intensive Care Unit (ICU) beds required for short (one week), mid (two weeks) and longer term (three weeks) time frames. We consider daily confirmed COVID-19 cases, current ICU, regular and special bed occupation, hospitalized cases, recovered and intubated patients and deaths. Results show that the models demonstrate very high coefficient of determination (R2) in the training phase, whilst providing accurate predictions in the forecasting phase. We report the weighted average output of ANN, XGBoost and Random Forest, which resulted in very low Mean Absolute Percentage Error (MAPE). The accurate and timely prediction of ICU beds can support decision making for Healthcare Systems, optimizing deployment of resources, as needed. Our approach can be enhanced by incorporating non-clinical parameters, based on smart city infrastructures, such as data from smart sensors.

1. Introduction

The recent COVID-19 pandemic caused a worldwide crisis [1], [2], motivating the research community to attempt to address various of its facets [3], [4]. A multivariable regression model, developed on hospitalized COVID-19 positive patients and based on the TRIPOD guideline (Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis) [5], reached an accuracy of 74% for predicting ICU admissions and 83% for predicting mortality [6]. Several classification models predict level-of-care requirements based on clinical and lab data [7], [8], [9]. Another research considers 2,566 COVID-19 patients with the model output achieving 88% accuracy for hospitalization, 87% for ICU and 86% for mechanical ventilation requirements [10].

A Machine Learning (ML) risk-based prioritization tool was utilized to forecast imminent ICU Transfer in Hospitalized COVID-19 patients for a 24-hour period. Timeseries data, such as vital signs, nursing assessments, laboratory data, and electrocardiograms, were used to train a Random Forest (RF) model. The dataset consisted of 1,987 COVID-19 patients who were admitted to non-ICU hospital units. The model achieved 76.2% accuracy [11]. In a similar study, researchers developed a Deep Learning prediction model for the likelihood of ICU admission and mortality. They collected data about chronic comorbidities, vital signs, symptoms laboratory tests on admission and demographics [2].

Many studies aim to detect early predictive factors upon admission for enhancing the management of patients moved to ICUs ([4], [7], [12]). Others, try to forecast the spread of COVID-19 and ICU requirements, utilizing regression analysis (autoregressive integrated moving average - ARIMA) [13] in confirmed cases, for predicting future cases [3]. For short-term forecasting of ICU beds, a combination of autoregressive models, ML and epidemiological models yielded promising results. Such an approach demonstrated average forecasting errors of 4% and 9% for one- and two-week ahead, respectively, outperforming several other competing forecasting models [3].

2. Methodology

In our work, we aimed to predict ICU needs by collecting weekly data from the Greek Ministry of Health ([14], [15]) and other sources, and pre-processing them to improve modelling and forecasting [16], [17]. COVID-19 cases, ICU beds, hospitalized, intubated, recovered, and deaths are among the parameters we used. We conducted data analysis and evaluation and report here on short-, mid- and longer-term results for the timeframe from December 28, 2020 up to March 22, 2021.

We utilized a triple-model forecasting approach to minimise MAPE, focusing on ICU beds [18], [19]. We used three prediction models: ANN, XGBoost and Random Forest [20], [21], [22], [23]. The utilization process incorporates the sliding window technique [24], combining seven timestamps, each representing a single day’s data. Each algorithm is trained with input parameters such as daily positive cases, current ICU, regular/special bed occupation, hospitalized cases, recovered/intubated patients, imported cases and deaths for days n – 7, n – 6, …, n – 1. There are 21 different training executions having as target the ICU beds needed for day n, n + 1, …, n + 20. As a result, 21 different executions abide with the same data architecture and hyperparameters, but follow a different process for training.

The most accurate prediction for the period from December 28, 2020, up to March 22, 2021 was reported by the weighted average output of combining ANN, XGBoost and Random Forest.

3. Results & Evaluation

The evaluation of each model was based on the Coefficient determination (Equation 1) [25], [26].

Equation 1: Coefficient of determination (R2).

Screen Shot 2021 07 21 at 5.19.16 PM

Where n is the total number of values, x is the actual value and y is the forecast value. Each model scored above 99% for 12 days ahead, and above 98% for 13 to 21 days ahead (Table 1).

table 1

To assess the reliability of combining the three algorithms, we used MAPE (Equation 2) to measure the error of predicted values against actual results for a target date.

Equation 2: Mean Absolute Percentage Error (MAPE).

Screen Shot 2021 07 21 at 4.51.37 PM

Where i is the number of fitted points, t the timestamp, At is the actual value and Ft is the forecast value. We note that incorporating a 3-day moving average MAPE (comparing forecasts with the average of the target date and its two previous days), further enhances results (Table 2).

Table 2: Forecasting method MAPE based on daily results and 3-day moving average.

Screen Shot 2021 07 21 at 4.54.35 PM

We focus on Greece as well as the Attica region. As far as Greece is concerned, for week ahead predictions and the period from January 4, 2021 to March 22, 2021 the “actual data” average MAPE was 3.78%, while the 3-day moving average was 3.75%. For the two weeks ahead prediction, from January 11th to March 22nd the “actual data” average MAPE was 11.77%, whereas the 3-day moving average was 10.54%. Finally, for three weeks ahead prediction, from January 18th to March 22nd the “actual data” average MAPE was 25.30%, while the 3-day moving average was 24.24%.

For Attica the reported predictions start from February 1 2021. For the next week predictions, from February 1st to March 22nd the “actual data” average MAPE was 5.86%, while the 3-day moving average was 4.05%. Regarding the two weeks ahead prediction, from February 8th to March 22nd the “actual data” average MAPE was 13.71%, while the 3-day moving average was 11.77%. Finally, for the three weeks ahead prediction, from February 15th to March 22nd the “actual data” average MAPE was 25.29%, whereas the 3-day moving average was 23.91%.

4. Conclusion

Results show that the best individual model in terms of R2 was ANN with an average value of 99.17% over 21 days, RF and XXB coming close second and third with 99.06% and 99.05% respectively. For one and two weeks ahead predictions, the best average model MAPE was achieved for Greece, whilst for three weeks ahead it was the one for Attica. Furthermore, for all three time periods (one-two-three weeks) and for both Greece and Attica the predictions had lower MAPE for 3-day moving average results compared to daily results. Finally, ongoing and future work include the utilization of several other indicators, such as the impact of temperature, climate and incubation period combined with demographics, country characteristics or government mitigation actions (lockdown, social distancing, vaccinations etc.), aiming to further enhance prediction performance.

References

1. J. A. Teixeira da Silva and P. Tsigaris, “The role of lockdowns and health policies for COVID-19 in Italy,” Ital. J. Med., no. March, 2020, doi: 10.4081/itjm.2020.1366.
2. W. J. Guan et al., “Clinical Characteristics of Coronavirus Disease 2019 in China,” N. Engl. J. Med., vol. 323, no. 16, pp. 1545–1546, 2020, doi: 10.1056/NEJMoa2002032.
3. M. Goic, M. S. Bozanic-Leal, M. Badal, and L. J. Basso, “COVID-19: Short-term forecast of ICU beds in times of crisis,” PLoS One, vol. 16, no. 1 January, pp. 1–24, 2021, doi: 10.1371/journal.pone.0245272.
4. Y. Allenbach et al., “Development of a multivariate prediction model of intensive care unit transfer or death: A French prospective cohort study of hospitalized COVID-19 patients,” PLoS One, vol. 15, no. 10 October, pp. 1–12, 2020, doi: 10.1371/journal.pone.0240711.
5. G. S. Collins, J. B. Reitsma, D. G. Altman, and K. G. M. Moons, “Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement,” J. Br. Surg., vol. 102, no. 3, pp. 148–158, 2015.
6. Q. Li et al., “Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus–Infected Pneumonia,” N. Engl. J. Med., vol. 382, no. 13, pp. 1199–1207, 2020, doi: 10.1056/nejmoa2001316.
7. E. R. Lusczek et al., “Characterizing COVID-19 clinical phenotypes and associated comorbidities and complication profiles,” medRxiv, pp. 1–18, 2020, doi: 10.1101/2020.09.12.20193391.
8. A. Banerjee et al., “Estimating excess 1- year mortality from COVID-19 according to underlying conditions and age in England: a rapid analysis using NHS health records in 3.8 million adults,” medRxiv, no. March, p. 2020.03.22.20040287, 2020, doi: https://doi.org/10.1101/2020.03.22.20040287.
9. S. P. J. M. Horbach, “No time for that now! Qualitative changes in manuscript peer review during the Covid-19 pandemic,” Res. Eval., pp. 1–9, 2021, doi: 10.1093/reseval/rvaa037.
10. L. Wynants et al., “Prediction models for diagnosis and prognosis of covid-19 infection: Systematic review and critical appraisal,” BMJ, vol. 369, 2020, doi: 10.1136/bmj.m1328.
11. C. Huang et al., “Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China,” Lancet, vol. 395, no. 10223, pp. 497–506, 2020, doi: 10.1016/S0140-6736(20)30183-5.
12. C. Anastassopoulou, L. Russo, A. Tsakris, and C. Siettos, “Data-Based Analysis , Modelling and Forecasting of the COVID-19 outbreak,” PLoS One, vol. 15, no. 3, pp. 1–21, 2020, doi: 10.1371/journal.pone.0230405.
13. G. E. P. Box and G. M. Jenkins, “Control,” Halden-Day, San Fr., 1970.
14. “NPHO. 2020. Home - NPHO EODY.” [Online]. Available: https://eody.gov.gr/category/covid-19. [Accessed: 20-April-2021].
15. “Ministry of Health Data resource.” [Online]. Available: https://www.pio.gov.cy/coronavirus/eng. [Accessed: 29-April-2021].[16]
16. “Worldmeter - Coronavirus Pandemic.” [Online]. Available: https://www.worldometers.info/coronavirus/. [Accessed: 06-May-2021].
17. “Coronavirus COVID-19 Global Cases by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University (JHU).” [Online]. Available: https://coronavirus.jhu.edu/map.html. [Accessed: 06-May-2021].
18. B. H. Andrews, M. D. Dean, R. Swain, and C. Cole, “Building ARIMA and ARIMAX Models for Predicting Long-Term Disability Benefit Application Rates in the Public / Private Sectors Sponsored by Society of Actuaries Health Section Prepared by University of Southern Maine,” Soc. Actuar., no. August, 2013.
19. N. Yau, Visualize this: The flowingData Guide to Design, Visualization, and Statistics. 2011.
20. A. Géron, Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow. O’Reilly Media, 2019.
21. J. Grus, Data Science from Scratch from First Edition. O’Reilly Media, 2019.
22. M. J. Kane, N. Price, M. Scotch, and P. Rabinowitz, “Comparison of ARIMA and Random Forest time series models for prediction of avian influenza H5N1 outbreaks,” BMC Bioinformatics, vol. 15, no. 1, 2014, doi: 10.1186/1471-2105-15-276.
23. S. Madhavan, Mastering Python for Data Science. 2015.
24. C.-H. Lee, C.-R. Lin, and M.-S. Chen, “Sliding-window filtering: an efficient algorithm for incremental mining,” in Proceedings of the tenth international conference on Information and knowledge management, pp. 263–270, 2001.
25. B. Ratner, Statistical and machine-learning data mining: Techniques for better predictive modeling and analysis of big data, third edition. 2017.
26. J. Song and T. M. Song, Big Data Analysis Using Machine Learning for Social Scientists and Criminologists. Cambridge Scholars Publishing, 2019.

This article was edited by Bernard Fong

For a downloadable copy of the July 2021 eNewsletter which includes this article, please visit the IEEE Smart Cities Resource Center.

Aristeidis Mystakidis, PhD candidate in Data Science for Smart Cities, at the International Hellenic University, School of Science and Technology and member of Mining and Analytics Research Group. He holds a Master of Engineering (MEng) in Electrical and Computer Engineering as 1st degree from Democritus Polytechnical University of Thrace, a Master in Business Administration (MBA) Degree and specialization in Operations Research from University of Macedonia and a Master of Science (MSc) focused in Mobile / Web Computing and Data Science from International Hellenic University. In his professional experience, he worked as instructor/tutor at University’s Students Tutorial before joining INTRACOM Constructions – Intrakat as a Data Engineer at SKG Airport. For the last 3+ years, he is a research Data Scientist/Engineer at EMISIA SA, working directly in many research projects with European Environmental Agency and Air Pollution, Transport, Noise and Industrial Pollution (ETC/ATNI) department, regarding CO2 emissions and fleet characterization of passenger cars, light commercial vehicles and heavy duty trucks.

Nikolaos "Nikos" Stasinos is owner, CEO/Co-Founder and Senior IT Solution Architect in his company. He is carrying out his PhD studies in Data Science and is member of Data Mining and Analytics Research Group. He has 20 years of development experience with high-risk projects concerning Telecoms, Banking, and Commerce. Education: PhD candidate in Data Science for Law at the International Hellenic University (2019-Now), MSc in Data Communication Systems at Brunel University (London 2008), BSc in Computer Science at Brunel University (London 2006). His professional experience has to do with complicated Projects in the financial sector. He held various positions like Solution Architect, Business Developer Manager, Team Leader, Project Manager and Quality Evaluator. In his research, he is trying to improve and bind Data Science with the Law to deliver results on detecting violations of legislation using technology.

Anestis Kousis is a computer science teacher in secondary education, and he also works as an adult educator. He received the bachelor’s degree in Computer Science from the Hellenic Open University, Greece, in 2015 and his master’s degree in educational sciences from Aristotle University of Thessaloniki, Greece, in 2017. He is currently pursuing the Ph.D. degree with the School of Science and Technology, International Hellenic University (IHU), Greece, working with Prof. C. Tjortjis on data science for smart cities. He is a member of the Data Mining and Analytics research group (DaMA). He works as an IT lab assistant in courses, such as data mining and advanced database systems, at IHU. His current research focus is on developing and applying new data mining techniques (machine learning, social network analysis, natural language processing, graph mining algorithms, sentiment analysis) for smart cities and smart communities.

Vangelis Sarlis is a solid and dependable professional with high experience in Sports Analytics and specialization in advanced basketball Analytics. He has more than 10 years’ involvement in a variety of Projects budgeted up to 10M euros. Versatile experience covering IT, Consulting, Analytics, QA, Project & Product Management, Marketing and Business Analysis. Vangelis is a PhD candidate in Data Science for Sports Analytics, holding an MBA from ALBA Business School (2017), an MSc in Computing & IT from the University of Sunderland (2013) and a BSc in Physics from Aristotle University of Thessaloniki (2012). Before joining Deloitte, he was part of OSeven S.A. regarding computational intelligence applications in “big data” transportation problems and driving behavior analysis. He also worked in COSMOTE Group, Intrasoft International S.A., Globo Plc, and Entersoft S.A. He has experience both in Startups and Corporate multicultural environments in fields of Telecommunications, Banking, Insurance, FinTech, ERP\CRM and Software houses.

Paraskevas Koukaras is a research assistant at the Information Technologies Institute (ITI) of the Centre for Research and Technology - Hellas (CERTH) and a PhD candidate at the International Hellenic University (IHU). He received his bachelor’s degree from the Department of Informatics, Alexander Technology Educational Institute of Thessaloniki (ATEI) in 2013. In 2017, he received his master’s degree in Information and Communication Technology (ICT) Systems from IHU. Since 2018, he is undertaking his Doctoral Degree at IHU, investigating interdisciplinary approaches for knowledge extraction in Social Media and Energy. Since 2019, he has participated in H2020 projects, engaging in demand response technologies for improved decision-making regarding energy ecosystems and prescriptive analytics for increased energy efficiency and wellbeing in residential buildings. He is a member of the Hellenic Artificial Intelligence Society (EETN) and IHU’s Data Mining and Analytics Research Group (DAMA) while being an active reviewer in various International Conferences and Journals.

Dimitris Rousidis is an IT Instructor in various higher education institutes and lifelong learning seminars and a PhD candidate with the International Hellenic University (IHU). His research is associated with social media analytics and the improvement of forecasting algorithms. He holds a BSc in Physics from Aristotle University of Thessaloniki, a BSc in Library Science and Information Systems from IHU, an MSc in Computation from UMIST. He worked as a tutor in Databases modules in all three years of the Computing course, at the DEI College, Thessaloniki. He has worked as an IT lab assistant in courses, such as advanced databases and spreadsheets, digital signal processing, automation of library administration and web design. For the past 13 years he has been working as a life-long educator in Second Chance Schools, Vocational Training Centres and Lifelong Learning Centres and Vocational Training Institutes

Dr Ioannis Kotsiopoulos is Secretary General of Health Services at the Greek Ministry of Health. He studied Mathematics at the Kapodistrian University of Athens and holds an MSc in Decision Support Systems and a PhD in Informatics from the University of Manchester. He worked as Chief Information Officer in the British NHS and is a member of the NHS Digital Academy. He has served as Deputy Chief at Konstantopoulio General Hospital of Nea Ionia, member of the Supervisory Board of the Hellenic Center for Documentation & Costing of Hospital Services (DRGs) and has taught at the Technological Educational Institute of Athens. He has worked in the UK on research and technology projects and has published many research papers in international conferences and journals. His research interests are in information systems and the digital transformation of health services as well as quality assurance in health.

Christos Tjortjis is the Dean of the School of Science & Technology (SST) at the International Hellenic University (IHU) and Assoc. Professor in Knowledge Discovery and Software Engineering systems. He is also the Director for the MSc in Data Science, the MSc in ICT systems and the EMJMDs MSc in Smart Cities and Communities. He holds a Deng(Hons) from Patras, Computer Eng. & Informatics, a BSc(Hons) from Democritus Law School, Greece, an MPhil in Computation from UMIST, and a PhD in Informatics from University of Manchester (UoM), U.K. He was Lecturer at Computation, UMIST, and the Schools of Informatics and Computer Science, UoM. His research focuses on Data Mining Analytics emphasising in health systems. He published some 80 papers in int’l journals and conferences. He received over 1000 citations (h-index 18). He leads the Data Mining and Analytics (DaMA) Research Group, comprising 6 PhD and 11 MSc students.

Predicting Covid-19 ICU Needs Using Deep Learning, XGBoost and Random Forest Regression with the Sliding Window Technique

1. Introduction

2. Methodology

3. Results & Evaluation

4. Conclusion

Past Issues

IEEE Smart Cities Newsletter Editors

Predicting Covid-19 ICU Needs Using Deep Learning, XGBoost and Random Forest Regression with the Sliding Window Technique

1. Introduction

2. Methodology

3. Results & Evaluation

4. Conclusion

To have the eNewsletter delivered monthly to your inbox, join the IEEE Smart Cities Community.

IEEE Smart Cities Publications Journals and Magazines Special Issues

Past Issues

IEEE Smart Cities Newsletter Editors