Date post: | 12-Apr-2017 |
Category: |
Data & Analytics |
Upload: | dhaval-bhatt |
View: | 53 times |
Download: | 0 times |
['flu','headache','bodyache','sinusinfection','tonsillitis','tonsil','pneumonia','influenza','strep','fever','migraine','colds','cold','bronchitis','sniffly','sniffling','sinuses','sinus','runnynose','throat','allergy','allergies', 'allergyseason','food','pollen','allergyproblems','allergens','medication']
Architecture Detailed Design
Data Processing & Analysis
Data Analysis – Use Case
Sample Use Case
Location New York
Date 06/28/2015
# Matched Tweets 2000
Matched Tweets: Tweets contain one of the keywords listed below
Flu Cold Allergyflu sinusinfectio
nallergy
bodyache tonsillitis allergies
headache tonsil allergyseason
pneumonia migraine pollen
influenza colds allergyproblems
Fever sniffly allergens
strep cough
bronchitis sniffly
throat sniffling
infection sinus
runnynose
Keywords
Raw Data
Tweets
Weather
Allergy
EMR 1 – Tweets Processing
1 2 3 n
n Number of DISTINCT combination of key wordsEMR Reduce Algorithm
For each tweet
Identify the key words present in the tweet text
More than 1 keyword in the tweet
text
Keyword = keyword 1 +
‘+’ + keyword 2
Get the cumulative count for each keyword
Location: New YorkDate: 06/28/2015
Yes
Data Pipeline Processes
A bucket is created for each combination of• Location• Tweet date• Keyword
5
5 For each bucket
5
Syndrome Count Process
Flu Category Cold Category Allergy Category
Preference: Flu > Allergy > Cold
Noise Category
Examples of keywords• “throat”• “food + allergies”
•
July 29 data missing, so ignore the dip