Statistics
Using StatCrunch in a Large Enrollment Course
Roger Woodard
Department of Statistics
NC State University
Statistics
ST311
• Introductory Statistics– 700 students per semester– Majors from social and biological sciences– Not for business or engineering
• Sections– Taught by graduate students– 10 to 12 per semester– 65 students per section
• No computer labs
Statistics
GAISE
• Guidelines
• Assessment and
• Instruction in
• Statistics
• Education
Statistics
GAISE Recommendations
• Stress conceptual understanding rather than mere knowledge of procedures;
• Emphasize statistical literacy and thinking
• Use real data;
• Foster active learning in the classroom;
• Use assessments to improve and evaluate student learning;
• Use technology for developing conceptual understanding and analyzing data;
Statistics
Use software?
• Requirements– Good graphics– Good statistics
• Full range of procedures up through multiple regression
– Minimal technical overhead – Easy to use– Low cost
Statistics
Software?
• R – Free for everyone– Reasonable graphics– Must be installed, command line
• JMP – free for NCSU students– Good graphics– Must be installed
Statistics
Software?
• Excel – everyone has it– Doesn’t do statistics well– Graphics good for some but not others– Not interactive
Statistics
Statcrunch
• Does not need to be installed– Runs from within a web browser.– www.statcrunch.com
• Low cost – $12 per 6 months– Site license also possible
• Point and click interface, menu driven– Ease of use
Statistics
Statcrunch
• Look at Statcrunch:
Statistics
Statcrunch
Statistics
Good Graphics
Statistics
Interactive features
• Allow better understanding of statistical issues
Statistics
Interactive features
• Interactive examination of outliers
Statistics
Linked graphics
Statistics
Linked graphics
Statistics
Standard statistical methods
Statistics
Web based advantages
• Opens data from variety of sources– Computer– Websites– Paste
• Can be shared to Facebook, twitter, etc
• Can administer surveys
• Direct Data Link
Statistics
Direct data link
Statistics
Direct data link
• Instructors do not need to reload data sets
• Accessible from all computers across campus
Statistics
Link of data from within homework.
Statistics
Recommendations
• Concentrate on the statistics– Minimize the amount of work students need to do
to use software– Avoid busy work– Easy links to get data into software– Build up techniques over several assignments– Avoid giving software a “bad reputation”
Statistics
Recommendations
• Use video instructions– Students don’t read text documentation– Videos are easy– www.youtube.com– Segment video in small amounts. (30 seconds to 3
minutes) – Give videos based on tasks to perform.
Statistics
Video instructions
Statistics
Recommendations
• Ask questions that matter– Ask what real world conclusions can be gleaned
• Get students involved in the data– Use data sets that are understandable– Why would distribution look like it does?– Why would there be outliers?– What other sources of variability are there?
• Survey of students
Statistics
Survey of students
Statistics
Textbooks
• Why is the average around $400?
• Why would some be around $0?
• Why would some be up near $1000
Statistics
Survey of students
• Car Age?– How old are students cars?– Do males or females have older cars?
Statistics
Survey of students
Statistics
Height and Shoe size
Statistics
Outliers?
Statistics
Outliers?
Statistics
Statistics
Assessment
• Assessment should match what we want students to do.– Think about real world question– Use software to explore data
Statistics
Data exploration problem
• John is a new college graduate working at his first job. After years of living in an apartment he has decided to purchase a home. He has found a great neighborhood from which he can walk to work. Before buying a home in the area he has decided to collect some data on the homes in this neighborhood. A data set has been compiled that represents a sample of 100 homes in the neighborhood he is considering. The variables included in this data set include: Value: the current value of the home as determined by the county tax assessor.
– Size: the size of the home in square feet.
– Year: the year the homes were built.
– Basement: does the home have a basement (y=yes, n=no).
– Fireplace: does the home have a fireplace (y=yes, n=no).
– Type: the structure a single family house or a townhouse. (house or townhouse).
• Create histograms for each of the numeric variables and create bar charts for each of the categorical variables. Use these variables to explore the data and determine which of the following best fits this situation.
Statistics
Data exploration problem
Statistics
Data exploration problem
• The histogram for value is clearly bimodal. The reason it is bimodal appears to be because the homes in the neighborhood have higher priced single family houses and lower priced town homes.
• The histogram for value is clearly bimodal. The reason it is bimodal appears to be because the neighborhood was built in two phases, the newer phase consists of larger more expensive homes and the older phase consists of smaller less expensive homes.
• The histogram for value is clearly bimodal. The reason it is bimodal appears to be because the neighborhood was built in two phases, the older phase consists of larger more expensive homes and the newer phase consists of smaller less expensive homes.
• The histogram for value is clearly bimodal. The reason it is bimodal appears to be because the neighborhood has some homes that have basements that tend to be larger in size with another group of homes that do not have basements and tend to be smaller.
Statistics
• Lets explore the data:
Statistics
Web links
• GAISE report:– http://www.amstat.org/education/gaise/
• Statcrunch:– www.statcrunch.com
• Video instructions:– Youtube.com– http://www4.stat.ncsu.edu/~woodard/statcrunch/