Date post: | 26-Jan-2015 |
Category: |
Technology |
Upload: | yhat |
View: | 107 times |
Download: | 2 times |
analyzing MLB data with ggplot
Greg Lamp
ggplot
● What is it?● Alternatives● How it works● Why should I use it?● Brief case study● Questions
Here I am on the Internet.
Founder/CTO @ Yhat
Hi, I’m Greg!
What is ggplot?
DSL for graphics
DSL for graphics
scatterplot
histogram
labels
color
shape
What about matplotlib?
a quick example
matplotlib ggplot
it’s not all bad!
matplotlib
syntax, api, default themes, learning curve
matplotlib
maturity, ipython, customization, community
syntax, api, default themes, learning curve
What about d3.js?
d3.js
ggplot
ggplot d3.js
How it works
Format
ggplot
data frame
“aesthetics”
Aesthetics
color
shape
size
...fill, alpha, slope, intercept, ymin,
ymax, ...
Geoms, Stats, & Scales
geom_point
geom_area
...there are many
stat_smooth
...there are a few
scale_color_brewer
scale_color_gradient
...there are many
Layers
ggplot()
+ggplot() geom_point()
+ +ggplot() geom_point() stat_smooth()
+ +ggplot() geom_point() stat_smooth()+ +
ggplot() + geom_point() + stat_smooth()
Why is this good?
Makes “reasonable assumptions”
not real colors
matplotlib freaks
still not real colors
...but i can guess what you mean
Concise yet expressive
Looks pretty good(and is easy to customize)
Seaborngithub.com/mwaskom/seaborn
Case Study
pitch speed
103.4 mph
Load ggplot and pandas
Read in our pitch f/x data
define the x-axis
pass in your data frame
add a histogram
How does fatigue impact velocity?
...not helpful
What about at the individual level?
Justin Verlander
ggplot let’s you fail quicker
Finding Help
/tagged/python-ggplot
What’s next?