Modelling COVID-19 Cases for South Korea, India, and Sweden.

I recently wrote a paper that modelled the spread of covid-19 in Italy using a logistic fit. I wrote this a while back, and I was curious about how such a logistic function would behave NOW, so I decided to look at how a logistic fit could be applied to India, South Korea, and Sweden. I’ve taken these 3 countries because they’ve adopted entirely different approaches to tackling the pandemic, so I was curious as to whether or not the effects of these approaches would be graphically discernible. Here, I modelled the cumulative confirmed cases for the respective country against the number of days since the beginning of the outbreak. I’ll go through solely the results, you can find the full code here.

I’m using for obtaining the required data. It gives you a good deal of data, and has excellent support so I’d recommend using this API for all things covid-19. After obtaining the data, I treated it by filtering out undesired columns/rows, converting the date string to an integer that represented number of days since the beginning of the outbreak, and broke my data into a feature matrice and a target vector. I’ve used a logistic function for fitting the data.

South Korea

Here are the results, graphically, for South Korea.

It’s evident that the logistic model can’t correctly explain the epidemic in South Korea, but that’s because of the South Korean government’s perfect handling of their localised cluster. South Korea rolled out tests and implemented stringiest social distancing policies not just more effectively, but more quickly than any other country. The graph acts as representative of these results. Real world cases overtake the logistic fit, but then fall back down. South Korea’s strict implementation of social distancing and mass-scale testing can be seen to be effective at around the 50 day mark. The curve flattens out, to some extent, but then resumes an upwards slope.


India demonstrates a nearly perfect logistic fit. You can see that the logistic model explains really world data to what appears to be nearly perfect accuracy. I believe this is because of some inherent conditions here. Firstly, population density. It’s hard not to come in contact with someone else while sick in India. While modelling a disease’s spread, you use a r0 value. In other countries, population density may restrict how many people an infected person infects. However, In India, any individual who is not quarantined and sick will come in contact with more people than the R0 number.


A logistic model also seems to describe the situation in Sweden pretty well. The logistic fit largely pertains to real world data. This is somewhat of a surprise, as Sweden implemented a vastly different policy regarding COVID-19. Instead of carrying out mass scale testing or enforcing a lockdown, Sweden opted for ‘herd immunity.’ Herd immunity is immunity to a disease as a consequence of a large proportion of the population having recovered from it. On a social level, this didn’t exactly go well for Sweden.

I think its quite interesting to look at how accurate logistic models are in states with different approaches. For instance, South Korea succeeded in pushing down the R0 in their cluster due to testing and quarantine. India, on the other hand, has so far failed to implement any successful policy that reduces the R0 values. Sweden is actually trying to increase its R0 so as to develop herd immunity. One inherent assumption in any logistic model, and most epidemiological models is that the R0 value remains largely constant. This assumption definitely holds true for India and Sweden, but breaks down when it comes to South Korea.