In [1]:
import sys
sys.path.insert(0,'../lib')
from generallib import *

#connection = getConnection()

startDate = '2020-01-01'
thisYearRolling = 1
lastYearRolling = 7

thisYearRollingLabel = ''
lastYearRollingLabel = ' (rolling 7d avg)'

In [2]:
display(md('# COVID-19 Pandemic'))


# COVID-19 Pandemic¶

In [3]:
display(md('This is an analysis of the effects of COVID-19. Dates covered are '+startDate+' to yesterday.'))
display(md("Compiled "+datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")+" UTC."))


This is an analysis of the effects of COVID-19. Dates covered are 2020-01-01 to yesterday.

Compiled 2020-04-04 12:00:04 UTC.

In this report:

# Why Should I Care?¶

2% case fatality rate may not seem high. Let's get some perspective by really looking at the numbers.

In recent years, the United States has an average annual mortality rate of 0.72%.(1) The average American knows around 600 people.(2) You may learn that someone you know personally has died once every two years or so.

In recent years, the common flu infects and produces symptoms in around 35 million Americans each year, or a little over 10% of the total population.(5) In your average-size circle of acquaintances, then, you may know around 60 people who show flu symptoms each year. The common flu has a mortality rate below 0.1%, so most people do not actually know anyone who has died from flu complications.

Now, let's compare this to our current situation. Mortality rate for COVID-19 has been calculated to be anywhere from 1% to 4%.(3,4) The consensus seems to be that the real mortality rate for COVID-19 will be around 2%. Imagine that COVID-19 spreads at the same rate as the common flu. On average, at least one person you know will die from COVID-19.

The situation may actually be worse, as recent research indicates that there is a high rate of asymptomatic COVID-19 infections, meaning people are walking around with it and have no idea.(4,6) This is what leads to the higher mortality calculations of 3% or 4%, because the deaths are carefully recorded+, while total infections may be under-counted by a large margin. If those estimates are true, two or three people you know may die.

Of course, all this is predicated on COVID-19 spreading as widely as the common flu. By following current guidelines and mandates for social distancing and sheltering in place, these numbers could be reduced.

+ Even this may not be true. Recent data out of Italy shows that many deaths may not have been counted at all,(7) falsely deflating reported mortality rates even further.

# Disclaimer¶

I am not an epidemiologist. I do not work in the medical field. I am a data analyst whose perspective is "numbers are numbers", but as we all know, context is key. Please take this data with a grain of salt. Note that DDP metric below is completely made up by me after thinking through this data for about 10 minutes.

# COVID-19 Statistics¶

Data sourced from Johns Hopkins CSSE and is available here.

In [4]:
covidinfdf = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv')
covidinfdf = covidinfdf.set_index(['Country/Region','Province/State'])
#covidinfdf = covidinfdf.drop('China',level=0)
covidinfdf = covidinfdf.drop(columns=['Lat','Long'])

coviddeathdf = coviddeathdf.set_index(['Country/Region','Province/State'])
coviddeathdf = coviddeathdf.drop(columns=['Lat','Long'])

covidrecovdf = covidrecovdf.set_index(['Country/Region','Province/State'])
covidrecovdf = covidrecovdf.drop(columns=['Lat','Long'])
newcols = {}
for sub in covidrecovdf.columns:
newcols[sub] = sub.replace('/2020','/20')
covidrecovdf = covidrecovdf.rename(columns=newcols)


## Infections and Deaths Overview¶

In [5]:
covidtotdf = pd.DataFrame(data=None, columns=covidinfdf.columns)
covidtotdf.loc['TotalInfections'] = covidinfdf.sum(axis=0)
covidtotdf.loc['TotalDeaths'] = coviddeathdf.sum(axis=0)
covidtotdf.loc['TotalRecovered'] = covidrecovdf.sum(axis=0)

covidtotdf = covidtotdf.T
covidtotdf.index.name = 'Date'
covidtotdf = covidtotdf.reset_index()

covidtotdf = covidtotdf[-30:]
covidtotdf['Date'] = [datetime.datetime.strptime(d,'%m/%d/%y') for d in covidtotdf['Date']]

In [6]:
fig = generateGenericGraphDF('Total Infections, Deaths, and Recovered Worldwide',covidtotdf,['TotalInfections','TotalDeaths','TotalRecovered'],labels=['infections','deaths','recovered'])
show(fig)


Total COVID-19 infections worldwide.

In [7]:
fig = generateGenericGraphDF('Total Deaths Worldwide',covidtotdf,['TotalDeaths'],labels=['deaths'],ylabel='Deaths')
show(fig)


Total deaths caused by COVID-19 worldwide.

## World Epicenters¶

In [8]:
rollingDays = 3

covidusdf = pd.DataFrame(data=None,columns=coviddeathdf.columns)
covidusdf.loc['usdeath'] = coviddeathdf.loc[['US']].sum(axis=0)
covidusdf.loc['usinf'] = covidinfdf.loc[['US']].sum(axis=0)
covidusdf.loc['usrecov'] = covidrecovdf.loc[['US']].sum(axis=0)

covidusdf.loc['chinainf'] = covidinfdf.loc[['China']].sum(axis=0)
covidusdf.loc['chinarecov'] = covidrecovdf.loc[['China']].sum(axis=0)

covidusdf.loc['italydeath'] = coviddeathdf.loc[['Italy']].sum(axis=0)
covidusdf.loc['italyinf'] = covidinfdf.loc[['Italy']].sum(axis=0)
covidusdf.loc['italyrecov'] = covidrecovdf.loc[['Italy']].sum(axis=0)

covidusdf.loc['spaindeath'] = coviddeathdf.loc[['Spain']].sum(axis=0)
covidusdf.loc['spaininf'] = covidinfdf.loc[['Spain']].sum(axis=0)
covidusdf.loc['spainrecov'] = covidrecovdf.loc[['Spain']].sum(axis=0)

covidusdf = covidusdf.T
covidusdf.index.name = 'Date'
covidusdf = covidusdf.reset_index()
covidusdf = covidusdf[-30:]
covidusdf['Date'] = [datetime.datetime.strptime(d,'%m/%d/%y') for d in covidusdf['Date']]
covidusdf = covidusdf.set_index('Date')

covidusdf['usdailydeath'] = covidusdf['usdeath'] - covidusdf['usdeath'].shift()
covidusdf['usactive'] = covidusdf['usinf'] - covidusdf['usrecov'] - covidusdf['usdeath'] + covidusdf['usdailydeath']
covidusdf['usdeathprob'] = covidusdf['usdailydeath'].rolling(rollingDays).mean() / covidusdf['usactive'].rolling(rollingDays).mean()

covidusdf['italydailydeath'] = covidusdf['italydeath'] - covidusdf['italydeath'].shift()
covidusdf['italyactive'] = covidusdf['italyinf'] - covidusdf['italyrecov'] - covidusdf['italydeath'] + covidusdf['italydailydeath']
covidusdf['italydeathprob'] = covidusdf['italydailydeath'].rolling(rollingDays).mean() / covidusdf['italyactive'].rolling(rollingDays).mean()

covidusdf['spaindailydeath'] = covidusdf['spaindeath'] - covidusdf['spaindeath'].shift()
covidusdf['spainactive'] = covidusdf['spaininf'] - covidusdf['spainrecov'] - covidusdf['spaindeath'] + covidusdf['spaindailydeath']
covidusdf['spaindeathprob'] = covidusdf['spaindailydeath'].rolling(rollingDays).mean() / covidusdf['spainactive'].rolling(rollingDays).mean()

In [9]:
fig = generateGenericGraphDF('Infections in Epicenters',covidusdf,['usinf','chinainf','italyinf','spaininf'],labels=['infections in US','infections in China','infections in Italy','infections in Spain'],ylabel='Infections')
show(fig)


COVID-19 infections reported in epicenters.

In [10]:
fig = generateGenericGraphDF('Deaths in Epicenters',covidusdf,['usdeath','chinadeath','italydeath','spaindeath'],labels=['deaths in US','deaths in China','deaths in Italy','deaths in Spain'],ylabel='Deaths')
show(fig)


COVID-19 deaths reported in epicenters.

In [11]:
usinfdf = pd.read_csv('https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_US.csv')
usinfdf = usinfdf.drop(columns=['UID','iso2','iso3','code3','FIPS','Country_Region','Lat','Long_','Combined_Key'])

usdeathdf = usdeathdf.drop(columns=['UID','iso2','iso3','code3','FIPS','Country_Region','Lat','Long_','Combined_Key'])

statesinfdf = usinfdf.groupby('Province_State').sum()
statesinfdf = statesinfdf.T
statesinfdf.index.name = 'Date'
statesinfdf = statesinfdf.reset_index()
statesinfdf = statesinfdf[-30:]
statesinfdf['Date'] = [datetime.datetime.strptime(d,'%m/%d/%y') for d in statesinfdf['Date']]
statesinfdf = statesinfdf.set_index('Date')

statesdeathdf = usdeathdf.groupby('Province_State').sum()
statesdeathdf = statesdeathdf.T
statesdeathdf.index.name = 'Date'
statesdeathdf = statesdeathdf.reset_index()
statesdeathdf = statesdeathdf[-30:]
statesdeathdf['Date'] = [datetime.datetime.strptime(d,'%m/%d/%y') for d in statesdeathdf['Date']]
statesdeathdf = statesdeathdf.set_index('Date')


## Washington¶

In [12]:
fig = generateGenericGraphDF('Infections in Washington',statesinfdf,['Washington'],labels=['infections in WA'],ylabel='Infections')
fig2 = generateGenericGraphDF('Deaths in Washington',statesdeathdf,['Washington'],labels=['deaths in WA'],ylabel='Deaths')
fig.plot_width=300
fig2.plot_width=300
show(row(fig,fig2))


COVID-19 infections and deaths reported in Washington.

Because Washington was the first state in the US hit by a major outbreak, it is worth watching how the pandemic there plays out.

## Florida¶

In [13]:
fig = generateGenericGraphDF('Infections in Florida',statesinfdf,['Florida'],labels=['infections in FL'],ylabel='Infections')
fig2 = generateGenericGraphDF('Deaths in Florida',statesdeathdf,['Florida'],labels=['deaths in FL'],ylabel='Deaths')
fig.plot_width=300
fig2.plot_width=300
show(row(fig,fig2))


COVID-19 infections and deaths reported in Florida.

## Georgia¶

In [14]:
fig = generateGenericGraphDF('Infections in Georgia',statesinfdf,['Georgia'],labels=['infections in GA'],ylabel='Infections')
fig2 = generateGenericGraphDF('Deaths in Georgia',statesdeathdf,['Georgia'],labels=['deaths in GA'],ylabel='Deaths')
fig.plot_width=300
fig2.plot_width=300
show(row(fig,fig2))


COVID-19 infections and deaths reported in Georgia.

In [15]:
fig = generateGenericGraphDF('Infections in Alaska',statesinfdf,['Alaska'],labels=['infections in AK'],ylabel='Infections')
fig.plot_width=300
fig2.plot_width=300
show(row(fig,fig2))


COVID-19 infections and deaths reported in Alaska.

## Daily Death Probability¶

Actively infected total is calculated by subtracting total previous deaths and total recovered from total infections. Daily Death Probability (DDP) is the ratio between deaths on a particular day and that total. This metric may indicate quality of healthcare. Note that I am not an epidemiologist.

In [16]:
fig = generateGenericGraphDF('Daily Death Probability (DDP) in Epicenters',covidusdf,['usdeathprob','chinadeathprob','italydeathprob','spaindeathprob'],labels=['probability in US','probability in China','probability in Italy','probability in Spain'],ylabel='Probability')
show(fig)


Probability that any actively infected person will die on a given day on a 3-day rolling average.

DDP smooths over time as both numbers of actively infected and deaths increase. Of note:

• Italy's and Spain's DDPs have recently started declining. This coincides with the recent transition from exponential to linear growth in deaths and may indicate "turning the corner" in those countries.
• US's DDP has begun a slow increase as infections ramp up and healthcare infrastructure feels the strain.

# COVID-19 Event Timeline¶

Below is a timeline of events related to COVID-19 that may have affected US consumer behavior.

In [17]:
dates = [
'2020-01-21',
'2020-01-29',
'2020-01-31',
'2020-02-02',
'2020-02-04',
'2020-02-24',
'2020-02-26',
'2020-02-29',
'2020-03-04',
'2020-03-06',
'2020-03-08',
'2020-03-11',
'2020-03-13',
'2020-03-16',
'2020-03-16',
'2020-03-25',
'2020-03-29',
]
names = [
'1st US infection',
'US travel restrictions',
'1st non-Chinese death',
'Diamond Princess',
'Dow drops 1000',
'1st untraceable US case',
'1st death in US',
'100k worldwide',
'500 US cases',
'Pandemic',
'Nat\'l emerg.',
'NYSE halted',
'Stimulus',
'Extension',
]
descriptions = [
'First COVID-19 infection in the US',
'White House announces a dedicated task force',
'Travel restrictions for those entering the US who have recently traveled in China',
'First death of a COVID-19 victim outside of China',
'Diamond Princess quarantine reported by media',
'Dow Jones sheds 1000 points, beginning a five-day correction',
'First case in the US that could not be traced to an origin',
'First death of a COVID-19 victim in the US',
'Four more dead in Washington state, bringing total to ten in that state',
'Worldwide infections pass 100,000 mark',
'Over 500 infections in the US',
'WHO officially declares COVID-19 a pandemic',
'President Trump declares national emergency',
'NYSE temporarily halted after 2,725 point drop',
'DHS issues "no unnecessary travel" advisory',
'Congress agrees on $2 trillion stimulus bill', 'Trump extends distancing guidelines through April 30', ] timelinedf = pd.DataFrame() timelinedf['Date'] = dates timelinedf['Event'] = names timelinedf['Description'] = descriptions timelinedf = timelinedf.set_index('Date') fig,axis = getTimeline("COVID-19 Event Timeline",dates,names,interval=3) display(fig)  Details on most recent events: In [18]: timelinedf = timelinedf.reset_index() def prettyDateFormat(val): valDate = datetime.datetime.strptime(val,'%Y-%m-%d') return valDate.strftime('%b %d') timelinedf['Date'] = timelinedf.apply(lambda x: prettyDateFormat(x['Date']), axis=1) pd.set_option('max_colwidth',100) display(timelinedf[-10:].style.hide_index()) pd.reset_option('max_colwidth')  Date Event Description Feb 29 1st death in US First death of a COVID-19 victim in the US Mar 04 10 dead in WA Four more dead in Washington state, bringing total to ten in that state Mar 06 100k worldwide Worldwide infections pass 100,000 mark Mar 08 500 US cases Over 500 infections in the US Mar 11 Pandemic WHO officially declares COVID-19 a pandemic Mar 13 Nat'l emerg. President Trump declares national emergency Mar 16 NYSE halted NYSE temporarily halted after 2,725 point drop Mar 16 Advisory DHS issues "no unnecessary travel" advisory Mar 25 Stimulus Congress agrees on$2 trillion stimulus bill
Mar 29 Extension Trump extends distancing guidelines through April 30

# References¶

1. "Mortality in the United States, 2018", CDC. (https://www.cdc.gov/nchs/products/databriefs/db355.htm)
2. "The Average American Knows How Many People?", NY Times. (https://www.nytimes.com/2013/02/19/science/the-average-american-knows-how-many-people.html)
3. "The WHO Estimated COVID-19 Mortality at 3.4%. That Doesn't Tell the Whole Story", Time. (https://time.com/5798168/coronavirus-mortality-rate/)
4. "Coronavirus disease 2019 (COVID-19) Situation Report – 46", WHO. (https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200306-sitrep-46-covid-19.pdf?sfvrsn=96b04adf_2)
5. "Disease Burden of Influenza", CDC. (https://www.cdc.gov/flu/about/burden/index.html)
6. "CDC Director On Models For The Months To Come: 'This Virus Is Going To Be With Us'", NPR. (https://www.npr.org/sections/health-shots/2020/03/31/824155179/cdc-director-on-models-for-the-months-to-come-this-virus-is-going-to-be-with-us)
7. "Italy's Coronavirus Death Toll Is Far Higher Than Reported", MSN. (https://www.msn.com/en-us/news/world/italys-coronavirus-death-toll-is-far-higher-than-reported/ar-BB122vvc)