OpenFDA Queries
https://api.fda.gov/drug/event.json?
search=patient.drug.openfda.pharm_class_epc:"nonsteroidal+anti-inflammatory+drug”
&count=patient.reaction.reactionmeddrapt.exact
End Point
search for records where
openfda.pharm_class_epc (pharmacologic
class) contains nonsteroidal anti-
inflammatory drug.
count the field patient.reaction.rea
ctionmeddrapt (patient reactions).
https://api.fda.gov/drug/event.json?search=patient.drug.openfda.pharm_class_epc:%22nonsteroidal+anti-inflammatory+drug%22&count=patient.reaction.reactionmeddrapt.exact
Important OpenFDA data types
What the drug is supposed to fix: Pharmacologic Class (EPC) - pharm_class_epc
How the drug works: Mechanism of Action (MOA) - pharm_class_moa
What the drug affects: Physiologic Effect (PE) - pharm_class_pe
What is in the drug: Chemical Structure (CS) - pharm_class_cs
https://api.fda.gov/drug/event.json?search=patient.drug.openfda.pharm_class_epc:%22Serotonin+and+Norepinephrine+Reuptake+Inhibitor%22
Safety Report ID
Biographical DataAdverse Reactions
Drug Information
More OpenFDA data types
How serious is the reaction: serious (1 for Yes, 2 for No)• "serious": "1",• "seriousnesscongenitalanomali": "1", • "seriousnessdeath": "1", • "seriousnessdisabling": "1" • "seriousnesshospitalization": "1", • "seriousnesslifethreatening": "1", • "seriousnessother": "1”
What is the drug indicated for: drugindication
Circumstances for taking drug: patient.drug.drugadditional
Predictions on OpenFDA Data
Hierarchical Clustering (“unsupervised learning”) on Manufacturers by Drug Class and Adverse Events
Generates Insights and Further Questions to Explore, Like; Do some adverse events dominate all others? What is the role of retail distributors rather than
manufacturers – an artifact of the data or something else they do between between themselves and patient?
Manufacturers by All Drug Classes
Group distinguished by abnormally large adverse events for the products they make – includes companies Mylan and Teva
Group troubling in the large number of adverse events for the products they make – includes companies Abbvie and Pfizer
Group above average for the number of product adverse events. includes private labeling companies CVS, Kroger, Wal-Mart, Publix
Other manufacturers not troubling in the number of adverse events
Manufacturers by All Adverse Events
Other manufacturers not troubling in the number of adverse events
Group of 1 highly (Mylan) distinguished by abnormally large adverse events for the products they make
Group troubling in the large number of adverse events for the products they make – includes companies Teva and Grocery Store Kroger
Group above average for the number of product adverse events. includes big pharma maker Merck.
Conditional Probability Models (Bayes) Very Helpful for Predictions
Model Type % Correct on Age
% Correct on Gender
Random Forest 48% 55%
Support Vector Machine
48% 55%
Decision Trees 14% 9%
Naïve Bayes 64% 78%
Why is Bayes So Much Better?
Works on Conditional Probability
Utilizes Much More of What We Already Know
Probability of Age 18to34 | Rating % Age 18to34drug
drug
Bayes is Conditional Probability
Intuition is “What the chances of X given I know Y”
This will always be better than flipping a coin – as in the case of gender prediction
The probability of Female (F) for a any given Drug (T) is the same as the probability of the Drug given Female times the probability of being female divided by the probability of the Drug.
Bayes Results for Single Person Households
**** ACCURACY **** WEIGHTED ACCURACY
Genre Gender Age Size Weight Gender AgeADVENTURE 75.4% 62.0% 16,565 1.001 75.5% 62.1%
AUDIENCE PARTICIPATION 84.1% 78.8% 46,283 1.003 84.4% 79.0%AWARD CEREMONIES 60.4% 42.6% 655 1.000 60.4% 42.6%
CHILD - LIVE 78.6% 67.7% 4,868 1.000 78.6% 67.7%CHILD DAY - ANIMATION 74.7% 59.3% 3,487 1.000 74.7% 59.4%
CHILD MULTI-WEEKLY 81.6% 73.2% 1,916,697 1.144 93.3% 83.8%CHILDREN'S NEWS 76.0% 33.3% 300 1.000 76.0% 33.3%COMEDY VARIETY 76.7% 68.9% 326,770 1.025 78.6% 70.6%CONCERT MUSIC 67.8% 54.6% 2,822 1.000 67.9% 54.6%
CONVERSATIONS, COLLOQUIES 76.8% 63.3% 113,290 1.009 77.5% 63.9%
DAYTIME DRAMA 81.1% 62.5% 20,478 1.002 81.2% 62.6%DEVOTIONAL 64.0% 47.8% 1,344 1.000 64.0% 47.8%
EVENING ANIMATION 80.7% 76.7% 481,722 1.036 83.6% 79.5%FEATURE FILM 74.5% 62.7% 449,549 1.034 77.0% 64.8%
FORMAT VARIES 76.6% 56.0% 1,127 1.000 76.6% 56.0%GENERAL DOCUMENTARY 74.6% 63.9% 2,004,256 1.150 85.8% 73.5%
GENERAL DRAMA 75.0% 63.6% 1,949,243 1.146 86.0% 72.9%GENERAL VARIETY 73.4% 62.1% 377,859 1.028 75.5% 63.8%
INSTRUCTION, ADVICE 79.1% 67.2% 1,000,586 1.075 85.0% 72.2%NEWS 77.8% 65.4% 971,951 1.073 83.5% 70.1%
NEWS DOCUMENTARY 77.5% 63.2% 100,634 1.008 78.1% 63.7%OFFICIAL POLICE 46.6% 29.2% 1,009 1.000 46.6% 29.2%
PARTICIPATION VARIETY 75.3% 62.3% 174,900 1.013 76.3% 63.1%POPULAR MUSIC 77.0% 67.5% 458,606 1.034 79.6% 69.8%POPULAR MUSIC
STANDARD 69.0% 50.5% 2,335 1.000 69.0% 50.5%PRIVATE DETECTIVE 71.5% 71.5% 20,522 1.002 71.6% 71.7%
QUIZ GIVE AWAY 79.1% 68.7% 76,822 1.006 79.5% 69.1%QUIZ PANEL 79.8% 63.4% 1,700 1.000 79.8% 63.4%
SCIENCE FICTION 76.1% 65.3% 24,219 1.002 76.2% 65.4%SITUATION COMEDY 75.4% 61.3% 1,124,687 1.084 81.8% 66.5%
SPORTS ANTHOLOGY 83.8% 64.8% 52,166 1.004 84.1% 65.0%SPORTS COMMENTARY 79.0% 68.7% 993,734 1.075 84.9% 73.9%
SPORTS EVENT 75.0% 62.2% 204,127 1.015 76.2% 63.1%SPORTS NEWS 81.1% 68.3% 15,275 1.001 81.2% 68.4%
SUSPENSE/MYSTERY 81.3% 70.9% 342,405 1.026 83.4% 72.7%UNCLASSIFIED 77.8% 62.8% 38,060 1.003 78.0% 63.0%
WESTERN DRAMA 75.6% 63.8% 4,300 1.000 75.7% 63.9%
AVERAGE 75.4% 62.1%13,325,35
3 77.5% 63.9%
Simplifying the Problem Set
Single Households
Multi-Person Households
Same Gender & Same Age Class
Same Gender & Diff. Age Class
Diff. Gender & Same Age Class
Diff. Gender & Diff. Age Class
123K
21K
44K
303K
133K
500K
nothing to predict
predict age
predict gender
predict both
Age / Gender models by Drug
2 Stage Models
Same Gender & Diff. Age Class
Diff. Gender & Same Age Class
Diff. Gender & Diff. Age Class
predict age
predict gender
predict both
Age / Gender Models by Drug
Age / Gender Conditional Probability
1
2
Single Households
Full Bayes Model
Using all the independent variables –
Where MAX is the prediction of Age or Gender classification given all the conditional probabilities known.
NOTE: The MAX prediction for Age is constrained by ID – each ID has only 2 possible Age classes since these are known, so if model predicts an Age class outside boundaries of a ID pick next highest MAX probability for Age.