Note:If any of the reports below look funny, check the Pipelines Status Dashboard.
COVID-19
I've developed a handful of resources on the COVID-19 pandemic:
-
Pandemic Overview focuses on infections and deaths worldwide and in selected epicenters.
An automated Airflow process extracts Johns Hopkins data from GitHub each day, transforms it into daily row data, and loads it into my database. It then generates an HTML report using Jupyter Lab.
-
Anomalies Analysis report highlights data of note using Verticals.
This report is also based on the source data from above process. Data is then analyzed by Verticals to identify data points from previous day that are anomalous when compared to the days before. These are then addressed in an HTML report generated by Jupyter Lab.
-
US Dashboard provides overview and drill-down views for total and daily infections and deaths in the US.
This dashboard is also based on the source data from above. An additional Airflow process calculates daily deltas for the time series data and uploads to Google Sheets using the Sheets API for use by the dashboard.
The Stack
My personal data processing, analysis, and reporting stack includes the following:
- Apache Airflow and Python for ETL task automation.
- MySQL, primarily, for data storage.
- Verticals, an anomaly detection system for time series data using machine learning methods, developed by me.
- Jupyter Lab for static HTML reports.
- Integration with Google Sheets and Drive APIs as alternative and convenient data storage solution.
- Google Data Studio for interactive dashboards.
Stack Status
The
Pipelines Status Dashboard is automatically updated every 10 minutes with the latest status.