+ All Categories
Home > Documents > 1. Metric Variability

1. Metric Variability

Date post: 06-Jan-2016
Category:
Upload: trung-tran
View: 218 times
Download: 0 times
Share this document with a friend
Description:
fgfdg
Popular Tags:

of 50

Transcript
  • 8/23/2015

    1

    Robert Kuan-Hung LinE-Mail : [email protected]

    RESEARCH FOCUS: (1). Molecular Breeding - marker assisted

    selection and QTL mapping(2). Plant stress physiology (3). Genetic engineering

    COURSE TEACHING :Biostatistics, Genetics and Lab, Plant Breeding and Lab, Biotechnology, etc.

    Homepage: http://www.crrmib.pccu.edu.tw/files/11-1123-1656.php

    2Copyright The McGraw-Hill Companies, Inc. Permission required to reproduce or display

    Hartwell et al., 4th ed., Chapter 2

  • 8/23/2015

    2

    3Copyright The McGraw-Hill Companies, Inc. Permission required to reproduce or display

    Hartwell et al., 4th ed., Chapter 2

    4Copyright The McGraw-Hill Companies, Inc. Permission required to reproduce or display

    Hartwell et al., 4th ed., Chapter 2

  • 8/23/2015

    3

    Disease

    Insects

    Environmental stresses biotic stress

    Course Objectives

    1. To transmit basic concepts of biostatistics in sufficient detail;

    2. To provide the necessary background for basic and advanced training in specialized areas;

    3. To realize the links between biostatistics and our daily life;

    4. To familiar with and practice SAS software

    Theory, methods and applications of biostatistics in life sciences

  • 8/23/2015

    4

    Course Outline 8 / 24 (Monday) ~ 8/ 28 (Friday) :

    (1) Lesson 1 : Metrics variability ; Population and samples ; Central tendency and dispersion variability; Normal distribution; SAS

    (2) Lesson 2 : Hypothesis tests ; Variance of Analysis (ANOVA) ; CRD , RCBD , LSD ; Regression and correlation analysis using SAS

    Data Is Everywhere ! Stroke rate decreased from 3.8% (2012) to 0.2 % (2013).

    The divorced rate increased from 2.8% (2010) to 3.1 % (2011).

    Average total cost of insurance decreased from $13,344 (2010) to $9,548 (2011).

    8

  • 8/23/2015

    5

    9Copyright The McGraw-Hill Companies, Inc. Permission required to reproduce or display

    Hartwell et al., 4th ed., Chapter 2

    10

  • 8/23/2015

    6

    Data Pieces of information

    The raw (original) materials of Statistics are data.

    Standardize data collection Consistency

    Careful planning is prudent Many errors in research arise from a poor planning (e.g., data collection)

    Fancy statistical methods cannot rescue garbage data We may define data as figures which result from the process of counting or from taking a measurement. For example :

    - When a hospital administrator counts the number of patients (counting).

    - When a nurse weighs a patient (measurement)11

    Sources of dataData can be obtained or summarized from :

    Surveys Experiments Reports Research literatures Popular media Counting Analysis of records - Statistics

    12

  • 8/23/2015

    7

    StatisticsStatistics is a field of study concerned with

    1. Collection, study design, organization, monitoring, summarization, and analysis of data.

    2. Drawing of inferences about a body of datawhen only a part of the data are observed.

    3. Try to interpret the data and communicate the results to others.

    4. An important part of the Scientific Method13

    What is biostatistics

    14

    It is a part of the statistics, in which the data are derived from the biological, ecological, and environmental sciences, public health, and medicine, etc.

    1. Planning/design of study

    2. Data collection

    3. Data analysis

    4 Interpretation of informative data

    Biostatistics = 1 + 2 +3 4

  • 8/23/2015

    8

    15

    Example Heights in Ton Duc Thang University students

    Suppose I want to know the average height of a student at TDTU. Of course I will never know the heights of all TDTU students. However, I can draw a sample from this population, for example from this class now ( no good, b/c it is NOT a random sample ! ) . Now, the question is how large of a sample should I draw? Suppose I have a choice of drawing a sample of 4 people, 14 people, 40 people or 400 people, which sample is more reliable. Why?

    16

    If large (small) variation exists in the population, then you ought to use a large (small) sample size. The variation of students height in TDTU is considered as small, so a small sample size can be used. b/c U R with similar ages, 19 ~ 22- year-old . A sample average based on 40 individuals (n = 40) is less variation than a sample based on 14 and 4 individuals. This does make sample average closer to real average (N > 100 is considered as a population) .

  • 8/23/2015

    9

    17

    population

    sample

    How to draw a sample from this population as a representative sample of students at TDTU ? choose 40 students from this class ? Or choose 4 ~14 students from each Dept ? ----- etc. Why use small sample size ?

    Cost less, less time, less labor, - - - -, etc.

    18

    Population

    population

    full data set :X1, X2,, XN

    Research target

    The entire group for which information is wanted.

    For example, the height of all 19~22- year-old male college students in Ton Duc Thang University .

    The blood pressure of female aged 25-39 in Ho Chi Minh City.

    The weight of boys aged 5-10 year olds in Vietnam.

    The size of the population is designated by N and the size of the sample is designated by n.

  • 8/23/2015

    10

    19

    Samplepopulation small population

    large population

    sampleSubset of

    data:x1, x2,, xn

    It is a subset of population, and the conclusions about the population can be drawn from it.

    The number (n) of individual observations in a sample.

    For example, sample of blood pressures n = fourteen 20-year-old male college students in Ton Duc Thang University.

    Q : How many subjects needed total ?Q : How many in each of the groups to be compared ?

    (normally, n < 30 is considered as a sample)

    Studying populations is too expensive and time-consuming, and thus impractical.

    If a sample is representative of the population, then by observing the sample we can learn something about the population.

    Looking at the characteristics of the sample (statistics), we may learn something about the characteristics of the population (parameters).

    A statistic is a descriptive measure computed from the data of the sample.

    A parameter is a descriptive measure computed from the data of the population. 20

    Populations vs Samples

  • 8/23/2015

    11

    21

    If we had chosen a different sample, then we would obtain different statistics (sampling variation or random variation). However, note that we are trying to estimate the same

    (constant) population parameters.

    Population mean Sample mean

    22

    denoted with Greek letters

    denoted with Latin letters

    (, , ---, etc.)

    sigma , mu; ,m

  • 8/23/2015

    12

    Population variance Sample variance

    sample variance= individual value= sample mean

    n = number of values

    = population variance

    N = population size

    = population mean

    24

    disease normal

    How to sampling ?

    A representative sample does not have major impact on sample size, but random sample does.

  • 8/23/2015

    13

    25

    Random sampleA subset of population

    The conclusions about the population can be drawn from the sample.

    Each member of the population has an equal chance to be selected

    The selection of any member from population does not influence the selection of any other member.

    Random sampling

    Random sample

    Each member of the population has an equal chance to be selected, and all samples are independent each other.

    Scott Evans, Ph.D., Lynne Peeples, M.S. 26

  • 8/23/2015

    14

    27

    ( Statistical inference )

    22

    (Descriptive Statistics )

    Inferential Statistics

    It is the procedure used to reach a conclusion about a population based on the information derived from a sample that has been drawn from that population.

    Make inferences about the population using what is observed in the sample.

    Primarily performed in two ways:Hypothesis testing and Confidence Intervals

  • 8/23/2015

    15

    Descriptive Statistics

    Measures of central tendency- Measures the center of the data- Mean , Median , Mode

    Measures of variability- Measure the spread in the data - Variance, Standard deviation, Range,

    Coefficient of variation - The larger value of these measures, the larger the spread and variability

    29

    Mean Measures of the center of data The average value of a set of numberFive 20-y-old students of systolic blood pressures (mmHg) from Ton Duc Thang University : 120, 80, 90, 110, 95

    [ population mean = ; sample mean = ]

    The sample size n is the number of observations

    = 5 30

  • 8/23/2015

    16

    Medium

    31

    2a. If n is odd, the medium is the middle value of an ordered list of observations.

    ModeMode is the most frequently occurring value in a data set.

    [Q] What are medium and mode of the following data : 120, 80, 90, 110, 95 , 90 kg Sort the data : 80, 90, 90, 95, 110 , 120

    The medium is ( 90 + 95 ) / 2 = 92.5 kgMode is 90 kg

    [Q] What are medium and mode of the following data: 120, 80, 90, 110, 95 , 90, 95 Sort the data : 80, 90, 90, 95, 95, 110 , 120

    The medium is 95 kg; Mode is 90 and 95 kg

  • 8/23/2015

    17

    Standard deviation / variance

    Measure of data variability; The dispersion of the data from the mean value

    Variance ( 2) = the extent of the extremes of a population

    Standard deviation ( , SD) = square root of variance = the mean distance of a randomly chosen value from the mean

    If a group of data without showing SD, then the mean value (m is meaningless , because you do not know how accurate the mean is !

    The bigger SD is, the more variability and imprecision estimation of m there are.

    The smaller SD is, the less variability and precision estimation of m there are.

    34Copyright The McGraw-Hill Companies, Inc. Permission required to reproduce or display

    Hartwell et al., 4th ed., Chapter 3

    Variance

    SD =

    (Xi - ) is the difference between each value and the mean

    Population SD: Sample SD: s

    Why Do We Divide by n1 Instead of n ? Sample tends to be smaller than popn, thus, to compensate, we divide by a smaller number n1 instead of n

    sum of squares degree of freedom

  • 8/23/2015

    18

    Five 20-y-old students of systolic blood pressures (mmHg) from Ton Duc Thang University : 120, 80, 90, 110, 95

    mmHg2 [The unit of variance is with square]

    Standard error (SE) SE = population standard deviation (SD, ) divided by the square root of the sample size

    It is the standard error () of the sample mean ( ).

    SE is an index of sampling error - an estimate of how much any sample can be expected to vary from the actual population value.

    The more sample numbers, the lower SE, the more accurate of the sample mean close to population mean.

    x n

    =

  • 8/23/2015

    19

    Q : A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg (millimeters of mercury) and a standard deviation of = 10 mm Hg in Taiwan. Suppose that you measure the diastolic blood pressure in n= 25 women. What are the sample mean and sample error of the sampling distribution of X25 diastolic blood pressure of the women?

    A: sample mean (popn mean) = 70mmHg

    Q : A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg (millimeters of mercury) and a standard deviation of = 10 mm Hg in Taiwan. Suppose that you measure the diastolic blood pressure in n= 25 women. What are the sample mean and sample error of the sampling distribution of X25 diastolic blood pressure of the women?

    A: sample mean (popn mean) = 70mmHg

    x n

    = = 10 (popn SD) / = 2 mmHg

    xmm =

    25

    The distribution of sample mean ( ) from a finite population is called sampling distribution.

    Q : A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of = 10 mm Hg in Taiwan. Suppose that you measure the diastolic blood pressure in n= 50 women. What are the sample mean and sample error of the sampling distribution of X25 diastolic blood pressure of the women?

    A:

    Q : A large study has determined that the diastolic blood pressure among women ages 18-74 is normally distributed with mean 70 mm Hg and a standard deviation of = 10 mm Hg in Taiwan. Suppose that you measure the diastolic blood pressure in n= 50 women. What are the sample mean and sample error of the sampling distribution of X25 diastolic blood pressure of the women?

    A: x n

    = = 10 (popn SD) / = 1.39 mmHg50

    --------------------------------------------------

    original sample sample sample sample

    population (n=5) (n=25) (n=50) (n=100)

    ---------------------------------------------------

    mean 70 70 ? 70 ? 70 ? 70 ?

    SD 10

    SE 4.48 2.0 1.39 1.0

    ----------------------------------------------------

  • 8/23/2015

    20

    39

    Q: The mean blood cholesterol concentration of a large population (N = 100) of adult males is 200 mg/dl with a standard deviation of 20 mg/dl. Assume that blood cholesterol measurements are normally distributed. What is the sample mean and sample error from a sample of 20 men ( n=25) diastolic blood cholestrol ?

    [ A ] : In ND, sample mean = population mean = 200 mg/ dL

    SE = 20 / = 4.47 mg / dL20

    Coefficient of variation (CV)A measure of relative amount of variation , so the unit cancel out no unit, or % .

    [ Q ] : Suppose that the average and SD scores of final exam in class A and B were 70 and 10 points, 65 and 8 points, respectively. Please calculate students from which class was with higher variability in this exam ? [A] CVA = (S / ) x 100% = (10 / 70) 100% = 14.28%

    CVB = (S / ) x 100% = (8 / 65) 100% = 12.31%

    12.31% (B) 14.28% (A) Therefore, students from A class had higher variation of score in the final exam than those from B. It might be due to outliers ( > 3 SD ) score in class A, such as < 40 points (read ND curve later !)

    X

    X

  • 8/23/2015

    21

    = 3.16 /

    = 1.41 kg

    Calculate mean, SD, SE, and CV of a group of data 2, 4, 6, 8, 10 kg

    22

    1

    ( 1)n

    ii

    S x x n=

    =

    65/)108642( == kg

    CV = s / = 3.16 / 6 = 0.526 = 52.6 %

    = 3.16 kg

    x n

    =SE =

    5

    42

    [ Practice ] : Please calculate SD, SE, mean, mode, medium, and CV of blood sugar (mg/ dL) from 8 freshmen with 120, 140, 120, 118, 125, 140, 90, and 89 readings at Biotechnology Department, TDTU .

    [ key ] : = 117.75 mg/dL, Mode = medium= 120 mg/dL; s = 19.46 mg/dL,

    SE = 6.93 mg/dL , CV = 16.52 %

    X

  • 8/23/2015

    22

    Descriptive data can be plotted in various ways to show distribution. Some visual ways to summarize data :

    Tables

    Graphs: Histograms, Pie/bar charts, etc.

    One table or figure is worth for one thousand words !

    43

    Visual Data

    It is a type of vertical bar graph, but there are no spaces between the bars Those bars touch each other to show that all values of data are accounted for.

    Histograms provide a view of the data density. Higher bars represent where the data are relatively more common (higher frequency).

    It is used to visually depict frequency distributions of continuous data especially convenient for describing the shape of the data distribution.

    44

    Histogram

  • 8/23/2015

    23

    45

    Ability 2: To choose a prooductive mathematical procedure

    05

    10

    152025

    INITIAL FINAL

    Score 0 Score 1 Score 2 Score 3

    Ability 1: To design a reliable experiment that solves the problem

    0

    5

    10

    15

    20

    25

    INITIAL FINAL

    Score 0 Score 1 Score 2 Score 3

    Ability 3: To communicate details of the experiment completely

    0

    5

    10

    15

    20

    25

    INITIAL FINAL

    Score 0 Score 1 Score 2 Score 3

    Ability 4: To evaluate the effects of experimental uncertianties

    05

    10152025

    INITIAL FINAL

    Score 0 Score 1 Score 2 Score 3

    Scott Evans, Ph.D., Lynne Peeples, M.S. 46Cigarette consumption between 1900 and 1990

    0

    1000

    2000

    3000

    4000

    Cigarette consumption

    19001910

    19201930

    19401950

    19601970

    19801990

  • 8/23/2015

    24

    Frequency Distributions

    Frequency can be displayed graphically using the histogram which is a vertical bar graph.

    Frequency tells how many of the data values fall into each class interval.

    Data can be grouped into a set of non-overlapping, contiguous intervals called class intervals which are used to sort the data.

    Theoretical frequency distributions is a normal distribution. 47

    48

    Histogram of height of freshman at Biotechnology Department ,TDTU where height was grouped at 5 cm of intervals.

    Frequency distributions are often depicted by a histogram

  • 8/23/2015

    25

    Frequency distribution for Bacterial Cell Lengths

    Below are the measured lengths (m) of 30 individual bacterial cells. As they have not yet been sorted to make a sorted list, they can be considered as raw data.

    1) 1.5 2) 2.0 3) 2.0 4) 3.0 5) 2.06) 3.2 7) 2.3 8) 1.5 9) 2.0 10) 2.0

    11) 1.0 12) 1.0 13) 2.5 14) 3.4 15) 2.1

    16) 2.0 17) 4.0 18) 3.0 19) 2.0 20) 2.0

    21) 2.2 22) 2.0 23) 2.0 24) 2.0 25) 2.0

    26) 1.5 27) 2.0 28) 1.0 29) 1.0 30) 1.049

    50

  • 8/23/2015

    26

    51

    57 male medical students whose lung function forced expiratory volume (FEV, litre) are showing :

    52

    Theoretical frequency distributions is a normal distribution described by a mean and a variance.

  • 8/23/2015

    27

    Normal distribution (ND) It was developed by Carl Friedrich Gauss. Therefore, ND is also called Gaussian distribution.

    In large samples outcomes predicted by chance have a ND, such as weight of seeds or human, height of human

    It is a bell-shaped curve graph, in which mean () is the average outcome with the standard deviation () , and it is is symmetrical about its mean X ~ N( , )

    Mean, median and mode are all equal.

    Density function of the normal distribution

    54

    EURO

    German (Deutsche) Mark$ 10 dollars

  • 8/23/2015

    28

    Scott Evans, Ph.D. and Lynne Peeples, M.S.

    55

    = 2

    2m

    xxf (

    2

    1exp

    2

    1)(

    56

    MeanMean

    The highest point of the overlying normal curve is at the mean

    Mean = median = mode

    ND is symmetrical about its mean

  • 8/23/2015

    29

    57

    m m2m2

    2.5 2.5

    Human height is a ND

    Distribution of height among 5000 British women

    ),(~ mNX

    About 68.3% of the area under a normal curve is within 1 of the mean, 95.5% within 2 , 99.7% within 3 (Note: data fall outside of 3 are outliers or extreme values.)

    70

    70 80 70 1006050 40

    Suppose and

    score in A class was 70 and 10 point, respectively.

    m < 40 and > 100 points are outliers

  • 8/23/2015

    30

    59Copyright The McGraw-Hill Companies, Inc. Permission required to reproduce or display

    Hartwell et al., 4th ed., Chapter 2

    Copyright The McGraw-Hill Companies, Inc. Permission required to reproduce or display Hartwell et al., 4th ed., Chapter 2

    60

    A= 2 B= 2

    c = 1

    mA= 1 mB = m c = 6

    = location = shape (spread)

  • 8/23/2015

    31

    Treatments A, B, and C have the same mean but different variances

    Variance of a distribution measures the spread of the distribution around the same mean

    A (=0.5

    B(=1C(=2

    62

    Location & Shape

    = location = shape

    (spread)

    C

    B

    D

    A

  • 8/23/2015

    32

    63

    The bigger SD is, the more variability and imprecision estimation of m there are. The smaller SD is, the less variability and precision estimation of m there are.

    due to smaller standard deviation

    due to bigger standard deviation

    ND with less variation ND with

    higher variation

    64

    Wide spread results in higher SD, but narrow spread in lower SD

  • 8/23/2015

    33

    Copyright The McGraw-Hill Companies, Inc. Permission required to reproduce or display Hartwell et al., 4th ed., Chapter 2

    65

    xmm =

    x n

    =

    The mean of the sampling distribution equals the population mean.

    ),(~ mNX

    ),(~n

    NX n

    m

    sample distribution

    ND

    Bar chart

  • 8/23/2015

    34

    Are you still awake?

    Are you aware?

  • 8/23/2015

    35

    69

    [ Q ] : There were 42 senior students who took my Horticulture Biotechnology class last year. The final grade of each student in this course was: 74, 86, 65, 89, 98, 82, 84, 62, 56, 94, 77, 82, 79,75, 84, 68, 90, 82, 73, 77,85, 92,76, 99, 62, 73, 87, 95,67,67,83,82,73,67, 64,95,68, 70, 82, 89, 61, and 63 points.

    Please calculate mean, mode, medium, variance, standard deviation, standard error, coefficient variation, and draw pie chart and histogram with normal graph.

    Log window

    Editor window

    writing in SAS programs

  • 8/23/2015

    36

    data biotech;input grade @@;cards; 74 86 65 89 98 82 84 62 56 94 77 82 79 75 84 68 90 82 73 77 85 92 76 99 62 73 87 95 67 67 83 82 73 67 64 9568 70 82 89 61 63;

    proc univariate;var grade;histogram grade/ normal;run;quit;

    In Editor window with the following SAS code:

    dm clear log;dm clear output;Title Assignment 1 Q 2 ;data biotech;input grade @@;cards; 74 86 65 89 98 82 84 62 56 94 77 82 79 75 84 68 90 82 73 77 85 92 76 99 62 73 87 95 67 67 83 82 73 67 64 9568 70 82 89 61 63;

    proc univariate;var grade;histogram grade/ normal;run;quit;

    SAS will clear up whatever left over in the previous Log and Output windows.

    Personally, I do NOT like these 3 lines --just like miniskirt, as short and simple as possible !

  • 8/23/2015

    37

    data biotech;input grade @@;cards; or datalines; 74 86 65 89 98 82 84 62 56 94 77 82 79 75 84 68 90 82 73 77 85 92 76 99 62 73 87 95 67 67 83 82 73 67 64 9568 70 82 89 61 63;proc univariate;var grade;histogram grade/ normal;run;quit;

    semicoloum; ends those commands or data entries

    @@ can continuously read all data until meets : ALWAYS use 2 mice but not one @

    Execute the above procedures

    74

    data biotech;input grade;cards; 7486 65 89 ::89 61 63;proc univariate;var grade;histogram grade/ normal;run;quit;

    If no @ or only one @ is used here, then SAS reads only one data in a line. However. @@ can continuously read all data until meets : ALWAYS use 2 mice @@ to save space !

  • 8/23/2015

    38

    SAS code and command

    proc univariate is a procedure that provides summary statistics on any quantitative variable.

    var grade requests that SAS performs PROC UNIVARIATE only on the variablegrade. if you do not specify which variables to analyze, SAS will perform PROC UNIVARIATE on every variable, which generates a lot of unnecessary outputs.

    histogram grade requests a histogram for the variable grade. The options after the / in the histogram specify you requests a normal curve be drawn over the histogram.75

    76

    The observations are random samples drawn from normally distributed populations. This can be tested using the UNIVARIATE procedure .

    If the normality assumptions are not satisfied, then use NPAR1WAY/ Wilcoxon procedure. Lectures related to Non-parametric statistics will be not be taught in this training course !

    Assumptions of proc univarate

  • 8/23/2015

    39

    77

    Output window

    Assignment 3 Q 4

    78

    Variable: gradeN 42 Sum Weights 42 Mean 78.0238095 Sum Observations 3277 Std Deviation 11.3018196 Variance 127.731127 Skewness 0.0240368 Kurtosis -0.9315826 Uncorrected SS 260921 Corrected SS 5236.97619 Coeff Variation 14.485090 Std Error Mean 1.74390863Median 78.00000 Mode 82.00000 Range 43.00000 Interquartile Range 18.00000

    Quantile Estimate 100% Max 99

    99% 99 95% 95 90% 94

    75% Q3 86 50% Median 78 25% Q1 68

    10% 63 5% 62 1% 56

    0% Min 56

    Extreme Observations ----Lowest---- ----Highest---Value Obs Value Obs

    56 9 94 10

    61 41 95 28

    62 25 95 36

    62 8 98 5

    63 42 99 24

    Output Window The UNIVARIATE Procedure

    median - The central point of the sample . Half the sample fall below the median and half the sample fall above it.

  • 8/23/2015

    40

    79

    Mean

    SD

    Population Sample

    Standard error (SE) SE = population standard deviation (SD, ) divided by the square root of the sample size

    It is the standard error () of the sample mean ( ).

    x n

    =

    Coefficient of variation (CV)

    = 11.30 / = 1.74 point

    42

  • 8/23/2015

    41

    81

    56 64 72 80 88 96

    0

    5

    10

    15

    20

    25

    30

    Percent

    gr ade

    Notice that the mean (78.02) is equal to median (78.0), indicating that the data is normal distribution.

    56 64 72 80 88 96

    0

    5

    10

    15

    20

    25

    30

    Percent

    gr ade

  • 8/23/2015

    42

    data biotech;input grade @@;cards; 74 86 65 89 98 82 84 62 56 94 77 82 79 75 84 68 90 82 73 77 85 92 76 99 62 73 87 95 67 67 83 82 73 67 64 9568 70 82 89 61 63;proc gchart data=biotech;pie3d grade/percent=arrow;run;quit;

    data biotech;input grade @@;cards; 74 86 65 89 98 82 84 62 56 94 77 82 79 75 84 68 90 8273 77 85 92 76 99

    62 73 87 95 67 67 83 82 73 67 64 9568 70 82 89 61 63;proc univariate;var grade;histogram grade;proc gchart data=biotech;pie grade;run;quit;

    Copyright The McGraw-Hill Companies, Inc. Permission required to reproduce or display Hartwell et al., 4th ed., Chapter 2

    84

    FREQUENCY of gr ade

    5612. 38%

    64921. 43%

    728

    19. 05%

    8010

    23. 81%

    88819. 05%

    96614. 29%

  • 8/23/2015

    43

    data biotech;input grade @@;cards; 74 86 65 89 98 82 84 62 56 94 77 82 79 75 84 68 90 8273 77 85 92 76 99 62 73 87 95 67 67 83 82 73 67 64 9568 70 82 89 61 63;proc univariate;var grade;histogram grade/ normal;run;quit;

    data biotech;Infile a:grade.1;input grade @@;proc univariate;var grade;run;

    If you have saved data in a drive as grade.1, then you can use infile command to read out of your saved data.

    86

    http://www.uri.edu/sasdoc

    http://support.sas.com/documentation/onlinedoc/sas9doc.html

  • 8/23/2015

    44

    Copyright The McGraw-Hill Companies, Inc. Permission required to reproduce or display Hartwell et al., 4th ed., Chapter 2

    87

    Source:http://www.okstate.edu/sas/v8/saspdf/stat/chap67.pdf

    Copyright The McGraw-Hill Companies, Inc. Permission required to reproduce or display Hartwell et al., 4th ed., Chapter 2

    88

  • 8/23/2015

    45

    For more practice, please read questions from Assignments !

    Please ask me

    questions!

  • 8/23/2015

    46

    [ Practice 1 ] :

    (1) Please calculate mean, mode,

    medium, variance, SD, SE, and CV

    of a group of data 6, 1, 8, 4, 7, 7, 8 ,

    5, 1, and 28 kg.

    (2) Is the value 28 outlier ?

    92

    Practice 2 : The sugar content (mg/ g fresh weight) of 25 oranges in Lins garden was measured using a sweeten

    meter, and data shown as following:

    1, 5, 7, 10, 11,12, 15, 8, 9, 11, 10, 9, 11, 16, 15, 6, 14, 11, 7, 8, 10, 12, 17, 21, and 13.

    Calculate mean, mode, medium, variance, standard deviation, standard error, coefficient variation, and draw pie chart and histogram with normal graph.

  • 8/23/2015

    47

    93

    How to install SAS ?

    Biostatistics with SAS(Statistical Analysis System/Software;

    Strategic Application Software)

    1966 - founder by Dr. A.J. Barr and Jim Goodnight at North Carolina State University, USA

    1976 - SAS Institute Inc., Cary, NC, USA

    http://v8doc.sas.com/sashtml/ http://support.sas.com/documentation/onlinedoc/v82/whatsnew.html

  • 8/23/2015

    48

    Editor - for writing and submitting SAS jobs and the programming is done Log - your session LOG and Informs you the current status. Messages appear here to tell you how things are going. Notes and Error messages also appear here. Output - results are displayed Results - Stores objects created by SAS, allowing for easier searching of SAS output Explorer- view of the SAS files,data, and graphs

    SAS v8.2 has 4 important windows

    Always smiling !

    Cheers !

  • 8/23/2015

    49

  • 8/23/2015

    50

    99Copyright The McGraw-Hill Companies, Inc. Permission required to reproduce or display

    Hartwell et al., 4th ed., Chapter 6

    Please review : Hypothesis testing ; ANOVA


Recommended