# Covid-19 Prediction

Created by:
Manuel Oviedo and Manuel Febrero
Dpt. of Statistics, Mathematical Analysis and Optimization

#### Table of observed and predicted rates (by 100,000 inhabitants)

## Methods

### Raw case numbers

1. World country dataset from: John Hopkins University Center for System Science and Engineering John Hopkins University dataset, which is updated daily in DATA1 The name of the latest time sereies (since 22/3):

• time_series_covid19_confirmed_global.csv for cumulated confirmed cases.
• time_series_covid19_deaths_global.csv for cumulated deaths.
• time_series_covid19_recovered_global.csv for cumulated revoverd cases.
2. Spanish region dataset. Confirmed cases, hospitalised cases, Health intensive care units cases (UCI), deaths cases and recovered cases by Autonomous Community of Spain available at Situation of COVID-19 in Spain from Instituto de Salud Carlos III. Data updated daily in DATA2.

3. Spanish region dataset. Confirmed cases, hospitalised cases, Health intensive care units cases (UCI), deaths cases and recovered cases by regions of Italy available at COVID-19 Italia - Monitoraggio situazioneDipartimento della Protezione Civile from Presidenza del Consiglio dei Ministri - Dipartimento della Protezione Civile. Data updated daily in DATA3.

### Variables definition:

• Cumulated cases at day $$t$$: $$x_t^{(j)}$$ with $$j=1$$ for confirmed cases and $$j=2$$ for deaths cases.
• New cases at day $$t$$: $$x_t^{(j)} - x_{t-1}^{(j)}$$
• Growth Rate of cases - H$$_k$$: $$r_{k}^{(j)}(t)=\frac{x_{t+k}^{(j)} - x_{t}^{(j)}}{x_{t}^{(j)} + 1}$$ for $$t=…,t_0-1$$ and $$k=1,\ldots,5$$
• Active cases at day $$t$$: $$a_t = x_{t}^{(1)} - x_{t}^{(2)} - z_{t}$$, where $$z_{t}$$ are the recovered cases at day $$t$$.
Note: new active cases can be negative for some days, if on this day there were more new recoveries $$+$$ deaths cases than there were new confirmed cases.

## Methodology

Related with the idea of “flattening the curve”, we consider the curve ($$r_{1}^{(j)}(t)$$) that captures how growth rate changes over time. Besides, we smooth this signal to avoid the effect of sudden changes in notification (such as the weekend effect).

Objective: Predict the growth rate at horizon $$k$$ using the past during the last 15 days of growth rate H$$_1$$:
$R_{1}(0)=\{r_1^{(j)}(-14),\ldots,r_1^{(j)}(0)\}$

### Algorithm steps:

1. Filtering:

• Some data from certain regions are banned by certain inconsistency on the records: “Diamond Princess”,“Iran”,“Japan”,“Bahrain” and “Qatar”
• For $$r_{t+k}^{(1)}$$ response (confirmed cases), we uses the countries or regions with more than 200 confirmed cases at time $$t$$.
• For $$r_{t+k}^{(2)}$$ response (deaths cases), we uses the countries or regions with more than 30 deaths at time $$t$$.
2. Fit the model. Three functional models of the general regression are constructed: $$r_{k}^{(j)}(0) = f(R_{1}(0)) + \epsilon$$, where the difference lies in the form of the $$f$$:

• FLM, uses a linear function: $$f(R_{1}(0))= \int{R_{1}(t)\beta(t)dt}$$.
• FNP: uses a $$f$$ is a nonparametric kernel estimate.
• SAM: uses a $$f$$ is an additive combination of smooth functions of the main functional principal components.
3. Predictions:

• Re-estimate Functional Models (Step 2) when new data is available (all countries and regions of Data1, Data2 and Data2).
• Reconstruct the expected number of accumulated cases and deduce the new cases to each horizon (confirmed , deaths and actives).

## Fundings

This work has been supported by Project MTM2016-76969-P from Ministerio de Economía y Competitividad - Agencia Estatal de Investigación and European Regional Development Fund (ERDF) and IAP network StUDyS from Belgian Science Policy.

## Acknowledgements

Thanks to Diego Campanario for creating the Shiny server.