Lab 1 intro

Post on 20-Feb-2017

2,174 views 1 download

transcript

Labs & assignments Lab activities will parallel lecture material

(to all extent possible) and handout materials will be used as appropriate.

All lab assignments must be submitted via Blackboard one week after the assigned dates unless otherwise noted by the instructor.

No duplicated Lab!!!!!!!!!

Biol 205: Lab 1

Ecological Data &

Descriptive Statistics

Dr. Davenport

Objectives Why and what is statistics? What is data? Basic principle of statistics --

relationship between (statistical) population and sample?

Descriptive Statistics Assignment and Questions

Why statistic?How to draw the intelligent judgment in the

presence of uncertainty?

Statistics

is a branch of applied mathematics that helps us to make intelligent judgements and informed decisions in the presence of uncertainty and variation.

• Useful in the planning of experiments and studies that will result in meaningful data.

• Provides a set of tools to extract and understand information resulting from experiments.

Data is :

collection of facts from which conclusions may be drawn

representation of facts, concepts, or instructions in a formal manner suitable for communication, interpretation, or processing by human beings or by computers.

formal representation of raw material from which information is constructed via processing or interpretation.

Why you need data?Basic principle of statistics

The data is very important to present, summary and interpret the ecological phenomena.

However, it usually is impossible or impractical to monitor the entire habitat or obtain measurements of all the organisms in a given area.

So most time, only part of the population will be sampled when you acquire a set of data.

8

Population The entire group of individuals is

called the population. For example, a researcher may be

interested in the relation between class size (variable 1) and academic performance (variable 2) for a population of third-grade children.

9

Sample Usually populations are so large that

a researcher cannot examine the entire group. Therefore, a sample (subset of population) is selected to represent the population in a research study. The goal is to use the results obtained from the sample to infer information about the population.

Basic principle of statistics

SummaryPopulation: the set of all measurements of interest.

Sample: a subset of measurements of interest to the investigator.

Population Sample

Statistics

Selecting Samples Sample should be taken at a random

order. Why?

Random sampling implies that each measurement in the population has an equal opportunity of being selected as part of your sample.

Otherwise, your samples could be biased.

Sampling Replication Why do we need replication?

Single measurement generally is insufficient to draw a conclusion about a population.

DefinitionsDescriptive Statistics: basic tools for summarizing and presenting numerical data.

Low Birth Weight DataVariable Abbreviation

Identification Code ID Low Birth Weight (0 = Birth Weight >= 2500g, LOW 1 = Birth Weight < 2500g) Age of the Mother in Years AGE Weight in Pounds at the Last Menstrual Period LWT Race (1 = White, 2 = Black, 3 = Other) RACE Smoking Status During Pregnancy (1 = Yes, 0 = No) SMOKE History of Premature Labor (0 = None, 1 = One, etc.) PTL History of Hypertension (1 = Yes, 0 = No) HT Presence of Uterine Irritability (1 = Yes, 0 = No) UI Number of Physician Visits During the First Trimester FTV (0 = None, 1 = One, 2 = Two, etc.) Birth Weight in Grams BWT

Low Birth Weight Data ID LOW AGE LWT RACE SMOKE PTL HT UI FTV BWT

85 0 19 182 2 0 0 0 1 0 2523 86 0 33 155 3 0 0 0 0 3 2551 87 0 20 105 1 1 0 0 0 1 2557 88 0 21 108 1 1 0 0 1 2 2594 89 0 18 107 1 1 0 0 1 0 2600 91 0 21 124 3 0 0 0 0 0 2622 92 0 22 118 1 0 0 0 0 1 2637 76 1 20 105 3 0 0 0 0 3 2450 77 1 26 190 1 1 0 0 0 0 2466 78 1 14 101 3 1 1 0 0 0 2466 79 1 28 95 1 1 0 0 0 2 2466 81 1 14 100 3 0 0 0 0 2 2495 82 1 23 94 3 1 0 0 0 0 2495 83 1 17 142 2 0 0 1 0 0 2495 84 1 21 130 1 1 0 1 0 3 2495

Hosmer and Lemeshow (2000) Applied Logistic Regression: 2nd Edition; John Wiley & Sons

N=189

Data Presentation

Three ways to summarize, or describe data:

1. Tables

2. Graphics

3. Basic Summary Statistics

TabulationsTables are used to describe qualitative data. The tables simply present the counts, or frequencies, observed in each category of a variable of interest.

Race

White

Black

Other

Count

96

26

67

%

51

14

35

Tabulations

None

One

Two

Three

Four or More

Visits Count Percent

100

47

30

7

5

52.9

24.9

15.9

3.7

2.6

Physician Visits During the 1st Trimester

No Visits One Visit Two Visits Three Visits Four or More

020

4060

8010

0

Bar ChartPhysician Visits During First Trimester

No V

isits

One V

isit Two Visits

Three Visits

Four or More

Number of Physcian VisitsPie ChartPhysician Visits During First Trimester

Summary StatisticsMeasures of Center (Central Tendency)

MeanMedianModeMeasures of Spread (Variability)RangeVarianceStandard Deviation

MeanThe mean of a data set is the average of all

the data values. If the data are from a sample, the mean is

denoted by

If the data are from a population, the mean is denoted by “mu”.

x xni

xNi

x

Measures of CenterMean (average): sum of sampled values divided by the number of samples taken.

n = sample sizeXi = sampled value = symbol for summation

= population mean

X = sample mean

Measures of CenterExample:

30, 26, 26, 36, 48, 50, 16, 31, 22, 27, 23, 35, 52, 28, 37

1

1 1 30 26 ... 37 32.4715

n

ii

X Xn

Note: The mean is sensitive to extreme values.

30, 26, 26, 36, 48, 50, 16, 31, 22, 27, 23, 35, 52, 28, 37, 113

37.50X

How do extreme values affect the mean?

( 1)2

2 2

[ ]

[ ] [ 1]

if n is odd

if n is even2

n

n n

xX x x

Measures of CenterMedian: the value of a set of measurement that falls in the middle position when the data are ordered from smallest to largest.

Measures of Center

16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52

N = 15 is odd, so the 8th value is the median:

The 8th valueWhy 8? (15 + 1)/2 = 8

16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52, 113How do extreme values affect the median?

Now N=16, so the average of the 8th and 9th value is the median, which is 30.5 ... not much different from the original data!

Measures of CenterMode: the value of a set of measurements that occurs most frequently.In our example data, the mode is 26.

16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52

26 is the modeFact: For data that is symmetric and unimodal, the mean, median and mode are similar.

Measures of SpreadRange: the difference between the largest and smallest sample measurements.In our example, the range is 36.

16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52

Note: Two data sets may have the same range, but very different shape and variability.

R = 52-16 = 36

Measures of SpreadSum of squared deviations from the mean, which is referred to simply as the sum of squares (SS)

_SS = ∑(Xi - X)2

Measures of SpreadVariance (s2): the sum of the squares of the deviations divided by the sample size minus one.Standard Deviation (s): the square root of the variance.

22 ( )

1ix x

sn

2s s

Measures of Spread Degree of freedom (DF):

DF = n-1

Measures of SpreadA computationally more convenient formula to calculate the variance:

2

22 2

2

1 1

ii

i

xxx nx ns

n n

Measures of Spread

The variance and standard deviation for our example are:

16, 22, 23, 26, 26, 27, 28, 30, 31, 35, 36, 37, 48, 50, 52

2 510.822.6

ss

Normal Distribution

https://en.wikipedia.org/wiki/Normal_distribution#/media/File:Normal_Distribution_PDF.svg

Lab 1: AssignmentAs a fishery scientist working for

NOAA, you did lots research on the strip bass (rockfish) population in the Chesapeake Bay. In one of your studies, you gathered data about the age structure for rockfish population in the Chesapeake Bay, and you need to do some statistical analysis before you can present your data to the public. 

The fish samples you collected were in 3 age groups: age1 (1 year old); age2 (2 year old), and age 3 (3 years old).

Lab 1: Questions1. What is statistical population (N)? What is sample (n)? What is the relationship between statistical population and sample? What information does the sample (n) infer about the statistical population (N)?2. Write the definition (formulas) for variance and standard deviation3. Draw a bar chart and a pie chart about the number of the fishes from different age groups (the age structure about your sample).

Lab 1: Questions (continued)

4. What is the average weight of the fishes in your entire sample?5. What are the average weights of the fishes in different age groups (age1, age2, and age3)?6. What is the median weight of the fishes for age 1 group? And, What is the median weight of the fishes for age 3 group?7. What is the range of the weight for the fishes at age 2 group?

Lab 1: Questions (continued)

8. Calculate the variance of the weight of the fishes at age 2 group.9. Calculate the standard deviation of the weight for fishes at age 1 group.