Methods
Raw case numbers
Country/Region data from: John Hopkins University Center for System Science and Engineering John Hopkins University dataset, which is updated daily in DATA1
- Confirmed cases and accumulated deaths by Autonomous Community of Spain available at Situation of COVID-19 in Spain from Instituto de Salud Carlos III. Data updated daily in DATA2.
Variables definition:
- Cumulated cases at day \(t\): \(x_t^{(j)}\) with \(j=1\) for confirmed cases and \(j=2\) for deaths cases.
- New cases at day \(t\): \(x_t^{(j)} - x_{t-1}^{(j)}\)
- Growth Rate of cases - H\(_k\): \(r_{k}^{(j)}(t)=\frac{x_{t+k}^{(j)} - x_{t}^{(j)}}{x_{t}^{(j)} + 1}\) for \(t=…,t_0-1\) and \(k=1,\ldots,5\)
Methodology
Related with the idea of “flattening the curve”, we consider the curve (\(r_{1}^{(j)}(t)\)) that captures how growth rate changes over time. Besides, we smooth this signal to avoid the effect of sudden changes in notification (such as the weekend effect).
Objective: Predict the growth rate at horizon \(k\) using the past during the last 15 days of growth rate H\(_1\):
\[R_{1}(0)=\{r_1^{(j)}(-14),\ldots,r_1^{(j)}(0)\}\]
Algorithm steps:
Filtering:
- Some data from certain regions are banned by certain inconsistency on the records: “Diamond Princess”,“Iran”,“Japan”,“Bahrain” and “Qatar”
- For \(r_{t+k}^{(1)}\) response (confirmed cases), we uses the countries or regions with more than 200 confirmed cases at time \(t\).
- For \(r_{t+k}^{(2)}\) response (deaths cases), we uses the countries or regions with more than 30 deaths at time \(t\).
Fit the model. Three functional models of the general regression are constructed:
\(r_{k}^{(j)}(0) = f(R_{1}(0)) + \epsilon\), where the difference lies in the form of the \(f\):
- FLM, uses a linear function: \(f(R_{1}(0))= \int{R_{1}(t)\beta(t)dt}\).
- FNP: uses a \(f\) is a nonparametric kernel estimate.
- SAM: uses a \(f\) is an additive combination of smooth functions of the main functional principal components.
Predictions:
- Re-estimate Functional Models (Step 2) when new data is available (all countries and regions of Data1 and Data2).
- Reconstruct the expected number of accumulated cases and deduce the new cases to each horizon (confirmed and deaths)
Fundings
This work has been supported by Project MTM2016-76969-P from Ministerio de Economía y Competitividad - Agencia Estatal de Investigación and European Regional Development Fund (ERDF) and IAP network StUDyS from Belgian Science Policy.
Acknowledgements
Thanks to Diego Campanario for creating the R server.