Covid-19 Prediction

Updated: 2020/04/13 and created by:
Manuel Oviedo and Manuel Febrero
Dpt. of Statistics, Mathematical Analysis and Optimization


Raw case numbers

Country/Region data from: John Hopkins University Center for System Science and Engineering John Hopkins University dataset, which is updated daily in DATA1

  1. Confirmed cases and accumulated deaths by Autonomous Community of Spain available at Situation of COVID-19 in Spain from Instituto de Salud Carlos III. Data updated daily in DATA2.

Variables definition:

  • Cumulated cases at day \(t\): \(x_t^{(j)}\) with \(j=1\) for confirmed cases and \(j=2\) for deaths cases.
  • New cases at day \(t\): \(x_t^{(j)} - x_{t-1}^{(j)}\)
  • Growth Rate of cases - H\(_k\): \(r_{k}^{(j)}(t)=\frac{x_{t+k}^{(j)} - x_{t}^{(j)}}{x_{t}^{(j)} + 1}\) for \(t=…,t_0-1\) and \(k=1,\ldots,5\)


Related with the idea of “flattening the curve”, we consider the curve (\(r_{1}^{(j)}(t)\)) that captures how growth rate changes over time. Besides, we smooth this signal to avoid the effect of sudden changes in notification (such as the weekend effect).

Objective: Predict the growth rate at horizon \(k\) using the past during the last 15 days of growth rate H\(_1\):

Algorithm steps:

  1. Filtering:

    • Some data from certain regions are banned by certain inconsistency on the records: “Diamond Princess”,“Iran”,“Japan”,“Bahrain” and “Qatar”
    • For \(r_{t+k}^{(1)}\) response (confirmed cases), we uses the countries or regions with more than 200 confirmed cases at time \(t\).
    • For \(r_{t+k}^{(2)}\) response (deaths cases), we uses the countries or regions with more than 30 deaths at time \(t\).
  2. Fit the model. Three functional models of the general regression are constructed: \(r_{k}^{(j)}(0) = f(R_{1}(0)) + \epsilon\), where the difference lies in the form of the \(f\):

    • FLM, uses a linear function: \(f(R_{1}(0))= \int{R_{1}(t)\beta(t)dt}\).
    • FNP: uses a \(f\) is a nonparametric kernel estimate.
    • SAM: uses a \(f\) is an additive combination of smooth functions of the main functional principal components.
  3. Predictions:

    • Re-estimate Functional Models (Step 2) when new data is available (all countries and regions of Data1 and Data2).
    • Reconstruct the expected number of accumulated cases and deduce the new cases to each horizon (confirmed and deaths)


This work has been supported by Project MTM2016-76969-P from Ministerio de Economía y Competitividad - Agencia Estatal de Investigación and European Regional Development Fund (ERDF) and IAP network StUDyS from Belgian Science Policy.


Thanks to Diego Campanario for creating the R server.