+ All Categories
Home > Healthcare > Air Quality Dashboard

Air Quality Dashboard

Date post: 22-Feb-2017
Category:
Upload: puspal-hore
View: 133 times
Download: 0 times
Share this document with a friend
11
CleanAir: Air Quality Dashboard Puspal Hore Insight Data Engineering Fellow
Transcript
Page 1: Air Quality Dashboard

CleanAir: Air Quality Dashboard

Puspal Hore

Insight Data Engineering Fellow

Page 2: Air Quality Dashboard

Air Pollution• Air Pollution is a Health Hazard• Many Types of Pollutants–Ozone–Particulate matter (pm2.5, pm10)–NO2– SO2–CO

• Sensitive Groups: Children, Elderly, Patients with Lung and Heart Disease etc.

Page 3: Air Quality Dashboard

Data Source: EPA

• EPA tracks air quality

• Monitoring stations across the country

• Hourly, daily data feed available as flat

files with delay (only data up to 2015 is

available at present)

Page 4: Air Quality Dashboard

AQI

• Air Quality Index

• Composite Index based on

–Ozone, pm2.5, pm10, NO2, SO2, CO

Page 5: Air Quality Dashboard

Data

• Location Data : State, County, Site

• Time Data : Date, Hour, Minute

• Value : AQI, individual pollutant concentration

Page 6: Air Quality Dashboard

Data Volume

• Daily data : 52 MB/ year for each pollutant

• Hourly data : 1.3 – 3 GB / year / pollutant

• 2 – 6 million rows

Page 7: Air Quality Dashboard

Data Pipeline

Simulated Air Pollution Data

Demo: https://youtu.be/6ILrpyf8zPQ

Page 8: Air Quality Dashboard

Challenges: Forecasting

• Poor support in python compared to R

• SparkR vs Python / rpy2

• Python / Statsmodels.tsa

• Cloudera Spark-TS

Page 9: Air Quality Dashboard

About Me

• Electrical Engineer, IIT, India

• DBA, Systems Engineer

• MD, Rutgers NJMS

• Books, Travel, Photography

Page 10: Air Quality Dashboard

Influxdb

• Time Series DB with clustering

• CRud

• SQL like query language, HTTP, JSON

• Continuous Queries and Downsampling

• Built-in data retention policy

Page 11: Air Quality Dashboard

Versions…

• Kafka v0.8.2.2 with Scala 2.10• Zookeeper v3.4.6• Spark v1.5.2 with Hadoop v2.4+• Hadoop v2.7.1• Influxdb 0.10.0• Grafana 2.6.0


Recommended