A Baseball Statistics Class
Jim Albert
Department of Mathematics and StatisticsBowling Green State University
Supported by the National Science Foundation
Outline
Describe the intro stats class at BGSU Why focus a class on sports? Examples of Data analysis Examples of Probability Examples of Inference Address some questions
MATH 115 – Introduction to Statistics
Satisfies math elective for students in College of Arts and Sciences
Required by students in health college Students have range of math skills Goal of course is statistical literacy – how
does one draw conclusions from data Book is at the level of Moore, Basic Practice
of Statistics
Class is hard to teach
No one wants to take stats. Easy to focus on number crunching rather
than concepts. Students have little interest in the topics and
datasets discussed. How to make the class more relevant to
everyday life?
Statistics can made more interesting if we capitalize on “good” datasets
Come in raw form Are authentic Are intrinsically interesting Are topical or controversial Offer substantial learning Lend itself to a variety of statistical analyses
Why base a stats course on baseball?
Great American game Great historical tradition. Statistics are a integral part of baseball, used
to rate players and teams. Players are known by their statistics (60, 56,
1.12) Relatively easy to model using probability.
MATH 115 b
Special section of MATH 115 with a baseball emphasis
I’ve taught it several times, most recently this summer.
Text: Albert, Teaching Statistics Using Baseball, Mathematical Association of America.
Getting started with data analysis
Looked at Bernie Williams’ baseball card.
Started with a question “Was Bernie a big home run hitter?”
Used graphs to answer the question.
Great home run hitters
Watched part of Ken Burn’s documentary about Babe Ruth.
Explored the slugging percentages of Babe. Interesting to plot SLG against his AGE
(his career trajectory) Notice a familiar pattern. Interesting outlier (the bellyache heard
around the world)
Do all players show a similar trajectory?
Look at Barry Bonds’ slugging percentages over time.
Shows unusual pattern towards the end of his career.
Baseball shapes
Counts of things, like home run counts tend to be right-skewed.
Derived baseball stats tend to be symmetric.
The Babe, Roger, and Barry
Watched part of the movie “61*” Compared the home run rates of players in
1921, 1961, 2001 Which outlier
was mostnotable?
The Second Best Baseball Player from BGSU?
Orel Herscheiser was the best. Who was the 2nd best: Grant Jackson and
Roger McDowell ? (Grant’s niece was in my class.)
Compared their strikeout rates. Jackson was the better strikeout pitcher.
Fitting lines to scatterplots
Used spaghetti to fit a line to (Home run, Slugging Percentage) for Mike Piazza’s data (note the Italian connection).
Talked about the best batting measure. Is batting average or OBP better in predicting runs scored per game?
Regression effect
Suppose your favorite team has a crummy season last season.
I predict they will do better this season. The regression effect. Illustrate by looking at the number of wins of
teams for two consecutive seasons.
Field of Dreams
Watched part of the movie. Looked at the statistics of Shoeless Joe
Jackson and Moonlight Graham. Who was better: Ty Cobb or Shoeless
Jackson? Can you predict Jackson’s triple count for a
season if you know his double count?
Introducing probability
Played a simple dice game Big League Baseball.
A single die controls the pitch (ball or strike). Two dice control the “in play” outcome. Simple enough you can talk about
probabilities of various events (like a hit).
All-Star Baseball
Spinner game where each spinner controls the hitting outcome for a single player.
Student had a project where they constructed a spinner for a player given his career hitting statistics.
Played a spinner game in class.
Spinner for Mike Schmidt (one of my favorite players)
The spinner game motivates inference
There is a distinction between a player's ability and his performance. An ability is an intrinsic quality of a player, say his batting talent, that we really don't know exactly. We do observe a player's performance, say his batting average for a particular season.
The objective of Statistics is to learn about a player's ability on the basis of his performance.
Suppose a player’s true on-base percentage average is .4
Use a 10-sided die to simulate the performance of a player in 20 plate appearances.
Big distinction between his ability and his on-base performance in these games.
Do we observe chance variation in baseball?
Watched part of “Angels in the Outfield”.
Went to a Toledo Mud Hens game. Students were asked to look for lucky things that happened in the game (such as a groundball that found the right location for a hit)
Concluded with a discussion of some interesting issues in inference
Are baseball players really streaky?
Are situational statistics in baseball meaningful?(this is how players perform in different situations like Home/Away, in different months, against different pitchers, etc.)
Arguments against teaching this type of course
I’ll describe five objections
“All students aren’t interested in baseball”
At BGSU, easy to fill one section with students who like baseball
Don’t need to be a baseball fan, just willing to learn some baseball and statistics.
“Baseball (game) and statistics (serious science) don’t mix”
Baseball is a serious business for players, managers and owners.
Need a proper interpretation of statistics to be a successful baseball team.
Controversy about the use of statistics – similar to the mistrust of statistics in the public area.
“The course appeals mainly to one gender”
Course does tend to attract more men.
But the course only requires a willingness to learn.
“I don’t know any baseball, but my brothers played sports, and I was learning to learn.”
“Students won’t be able to think statistically in other settings”
Use baseball as the medium where students learn statistical concepts, such as learning about an ability (a parameter).
Once the concept is learned, it is relatively easy to expose students to other examples outside baseball.
“Course doesn’t cover all topics in a first statistics course”
Only topic that didn’t receive much attention was collecting data through sample surveys and designed experiments.
But could include these topics within context of baseball.
Was the course successful?
Fun for both instructor and the students. Enthusiasm of the instructor about the
material had a positive impact on learning. Baseball is a great context for learning many
statistics concepts. Students could make sense of the statistical
conclusions.
Moral of this experiment
Should explore alternative methods of teaching statistics.
In particular, explore ways of engaging students through interesting applications so they can make more sense of statistical thinking.
Some references
“A Baseball Statistics Class”, Journal of Statistics Education
http://www.amstat.org/publications/jse/v10n2/albert.html I created a blog of my recent class.
http://bstats.blogspot.com/ See my website http://bayes.bgsu.edu for
more information about the book.