GitHub - mezcalhead/covid-19-tools: COVID-19 Tools

COVID-19 Tools Project

This site and code's purpose is to assist with data analysis tools and scripts on the JHU SSE Covid feed. The code is NumPy/SciPy friendly so that the data is ready to go for performant analysis. We credit Johns Hopkins for their data to make our project possible. Other data sets may be added at a later time.

The code directory has the core python code, classes, and utilities.
The data directory is where output files are placed, as well as reference files.
Large data files are here: https://www.dropbox.com/sh/u5cy9lb3mst7bta/AAB6jHGFCrWzsawkHBtlWSKna?dl=0
See README.md in the data directory for more information about the data files.

2 lines of code ingests the JHU data while performing some basic cleanups:

import covid_tools as ct
world = ct.ingestData('some-path-to-JHU-CSSE-dir')

Or take advantage of our nightly serialized world object, so your data initializes blazingly fast. You don't need to sync directly with JHU's GitHub leveraging our serialized cache!

import covid_tools as ct
world = ct.fetchWorld()

Below is a simple code example that reports on the US confirmed cases and deaths, iterates all the countries, then iterates all the counties in VA. All geographic hierarchies descend from the base class 'Area' in covid-structures.py:

import covid_tools as ct
from datetime import datetime
import os
from os import listdir, path
from os.path import isfile, join

# this populates the world hierarchy
basepath = path.abspath(path.join(path.dirname(__file__), '../../COVID-19/csse_covid_19_data/csse_covid_19_time_series/'))
world = ct.ingestData(basepath)

# get the US
c = world.getArea('US')
data = c.getData('CONFIRMED')
print('US TOTAL CASES:\n', data)
data = c.getData('DEATHS')
print('US TOTAL DEATHS:\n', data)

# iterate
for c in world.areas():
  print(c.name())
 
# iterate VA counties
for area in c.getArea('Virginia').areas():
  data = area.getData('DEATHS')
  print(area.name() + ' has ' + data[-1] + ' deaths as of ' + area.world.getDates()[-1].strftime('%m/%d/%Y') + '...')

There are also some basic plotting functions such as this. Just give it an area (a county, nation, state, etc...):

ct.simplePlot(area, 'some title', filename, 20, xaxis = 'Days')

Here's a loop to plot several US states and countries in a single multiPlot trend. We use the set {} hash to store the mix of areas we identified and sent to the multiPlot function calls - one for confirmed cases and one for deaths - after those areas relatively crossed a threshhold of 10 cases and 10 deaths.

Here's the code snipplet for the above image:

# plot individual areas
set = {}
v_thresh = 10 # threshhold for starting particular plots

set = {}
for key in ['US', 'Italy', 'Germany', 'United Kingdom']:
  area = world.getArea(key)
  set[area.name()] = area
for key in ['New York', 'New Jersey', 'Michigan', 'Louisiana']:
  area = world.getArea('US').getArea(key)
  set[area.name()] = area

filename = path.abspath(path.join(basepath, 'multiplot_mix_c.png'))
ct.multiPlot(set, 'CONFIRMED', 'Confirmed', filename, v_thresh, \
  xaxis='Days (since ' + str(v_thresh) + '+ cases) thru ' + area.world.getDates()[-1].strftime('%m/%d/%Y'), overlay=['avg'])
filename = path.abspath(path.join(basepath, 'multiplot_mix_d.png'))
ct.multiPlot(set, 'DEATHS', 'Deaths', filename, v_thresh, \
  axis='Days (since ' + str(v_thresh) + '+ deaths) thru ' + area.world.getDates()[-1].strftime('%m/%d/%Y'), overlay=['avg'])

print('\nDone.')
duration = timer()-start
print('Execution Time: {:0.2f}s'.format(duration))

Here is the code snipplet for the above image:

# plot individual areas
set = {}
sort_set = {}
v_thresh = 20 # threshhold for starting particular plots

for area in world.areas():
  # delete if exists - not necessary but can be useful
  filename = path.abspath(path.join(basepath, area.a['name'].replace(' ','_').replace(',','') + '.png'))
  if os.path.isfile(filename):
    os.remove(filename)
  # filter out more affected areas
  if max(area.getData('CONFIRMED')) > 1000:
    set[area.name()] = area
    sort_set[area.name()] = int(area.getData('CONFIRMED')[-1]) # last value
    xaxis_label = 'Days (since ' + str(v_thresh) + '+ occurences) thru ' + area.world.getDates()[-1].strftime('%m/%d/%Y')
    print('Plotting ' + area.name() + '...')
    ct.simplePlot(area, area.a['name'], filename, v_thresh, xaxis = xaxis_label)

# work with the top 30 subset
print('++++++++++++++++++++++++++++++++++++++++++++')
bag = {}
i = 0
for k, v in sorted(sort_set.items(), key = lambda kv:(kv[1], kv[0]), reverse = True):
  print(k, v)
  bag[k] = set[k]
  i += 1
  if (i > 30): break

# plot top 30 subset
print('++++++++++++++++++++++++++++++++++++++++++++')
filename = path.abspath(path.join(basepath, 'multiplot_g_c.png'))
ct.multiPlot(bag, 'CONFIRMED', 'Confirmed', filename, v_thresh, \
  xaxis='Days (since ' + str(v_thresh) + '+ cases) thru ' + area.world.getDates()[-1].strftime('%m/%d/%Y'), in_h = 8)
filename = path.abspath(path.join(basepath, 'multiplot_g_d.png'))
ct.multiPlot(bag, 'DEATHS', 'Deaths', filename, v_thresh, \
  xaxis='Days (since ' + str(v_thresh) + '+ deaths) thru ' + area.world.getDates()[-1].strftime('%m/%d/%Y'), in_h = 10, in_w = 6.5)

This project has been generously sponsored, in part, by Culmen International (www.culmen.com).

Name		Name	Last commit message	Last commit date
Latest commit History 374 Commits
code		code
data		data
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SETUP.md		SETUP.md
config.properties.template		config.properties.template
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

COVID-19 Tools Project

About

Releases 11

Packages

Contributors 3

Languages

License

mezcalhead/covid-19-tools

Folders and files

Latest commit

History

Repository files navigation

COVID-19 Tools Project

About

Resources

License

Stars

Watchers

Forks

Releases 11

Packages 0

Contributors 3

Languages

Packages