+ All Categories
Home > Documents > Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan...

Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan...

Date post: 19-Dec-2015
Category:
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
24
Biostatistical Biostatistical Methods II Methods II PubH 6415 PubH 6415 Spring 2007 Spring 2007
Transcript
Page 1: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

Biostatistical Biostatistical Methods IIMethods IIPubH 6415PubH 6415

Spring 2007Spring 2007

Page 2: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

2

PubH 6415 – Biostatistics IPubH 6415 – Biostatistics I

Instructor: Susan TelkeInstructor: Susan Telke

email: email: [email protected]@biostat.umn.edu (office (office hours: lecture hall or by appointment, hours: lecture hall or by appointment, location -A349 Mayo building)location -A349 Mayo building)

Teaching Assistant: Teaching Assistant:

Fang Liu– Fang Liu– [email protected]@biostat.umn.edu

Katie Schomaker – Katie Schomaker – [email protected]@biostat.umn.edu

Page 3: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

3

Books for 6415Books for 6415

Text Book: Text Book: Introductory Biostatistics-Introductory Biostatistics-(Chap T. Le) –Wiley(Chap T. Le) –Wiley

SAS Books (highly recommended):SAS Books (highly recommended):The Little SAS BookThe Little SAS Book – Delwiche and Slaughter – Delwiche and Slaughter

Applied Statistics and the SAS Programming Applied Statistics and the SAS Programming LanguageLanguage – Cody and Smith – Cody and Smith

Page 4: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

4

Web PageWeb Page

http://www.biostat.umn.edu/~susant/http://www.biostat.umn.edu/~susant/

Information on the web:Information on the web:1.1. General class informationGeneral class information

2.2. SyllabusSyllabus

3.3. Course notes (updated weekly)Course notes (updated weekly)

4.4. HomeworkHomework

5.5. Computer Help- How to access SAS!Computer Help- How to access SAS!

6.6. In Class Data Sets – More SAS examplesIn Class Data Sets – More SAS examples

Page 5: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

5

Computer LabsComputer Labs

Mayo D199 (Classroom & Lab)Mayo D199 (Classroom & Lab)

Teaching Assistant will have lab sessions Teaching Assistant will have lab sessions in this classroom before and after in this classroom before and after Wednesday’s classWednesday’s class..

Deihl Hall (Medical Library) Deihl Hall (Medical Library) Coffman UnionCoffman Union Carlson School of ManagementCarlson School of Management School of Public Health Lounge (Mayo)School of Public Health Lounge (Mayo)

Page 6: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

6

SASSAS

Primary computing environment will be Primary computing environment will be

PC SASPC SAS PC SAS can be purchased at the bookstore (one PC SAS can be purchased at the bookstore (one

year agreement is about $150).year agreement is about $150).

www.umn.edu/adcs/softwarewww.umn.edu/adcs/software

OROR SAS (not PC SAS) is available using the UNIX SAS (not PC SAS) is available using the UNIX

version of SAS by SSH to the biostat workstation version of SAS by SSH to the biostat workstation saturn. Instructions for use on course website.saturn. Instructions for use on course website.

Page 7: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

7

Exams and HomeworkExams and Homework

There will be weekly homework There will be weekly homework assignmentsassignments

There will be two midterms and one final There will be two midterms and one final exam.exam. The midterms account for 25% each and the The midterms account for 25% each and the

final accounts for 30% of the course grade. final accounts for 30% of the course grade. The remaining 20% is based on homework The remaining 20% is based on homework (best 10)(best 10)

Page 8: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

8

Course Course objectives:objectives:

Write and run simple SAS programs to Write and run simple SAS programs to perform common analyses. perform common analyses.

Analyze health science data using basic Analyze health science data using basic statistical and inferential techniques. statistical and inferential techniques.

Understand statistical methods as Understand statistical methods as commonly presented in public health commonly presented in public health literature literature

Page 9: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

9

Topics Topics CoveredCovered

T-tests (review)T-tests (review) One Factor ANOVA/ Two Factor ANOVAOne Factor ANOVA/ Two Factor ANOVA Linear regressionLinear regression Logistic regression (plus Poisson)Logistic regression (plus Poisson) Survival analysesSurvival analyses Proportional HazardsProportional Hazards Sample Size Determination (If time allows)Sample Size Determination (If time allows)

SAS programming to do above analysesSAS programming to do above analyses

Page 10: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

10

SAS UsageSAS Usage

SAS is the worlds largest privately held SAS is the worlds largest privately held software companysoftware company

40,000 customer sites worldwide40,000 customer sites worldwide 3.5 million users worldwide3.5 million users worldwide 90% of Fortune 500 companies use SAS90% of Fortune 500 companies use SAS Nearly all analyses of publications in medical Nearly all analyses of publications in medical

research use SASresearch use SAS SAS invests extensive resources to R & D.SAS invests extensive resources to R & D.

Page 11: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

11

What is SAS ?What is SAS ?

SAS is a programming language that SAS is a programming language that reads, processes, and performs statistical reads, processes, and performs statistical analyses of data.analyses of data.

A SAS program is made up of A SAS program is made up of programming statements which SAS programming statements which SAS interprets to do the above functions.interprets to do the above functions.

Page 12: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

12

Raw Data

Read in Data

Process Data(Create new variables)

Output Data(Create SAS Dataset)

Analyze Data Using Statistical Procedures

Data Step

PROCs

Page 13: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

13

Structure of DataStructure of Data

Made up of rows and columnsMade up of rows and columns Rows in SAS are called Rows in SAS are called observationsobservations Columns in SAS are called Columns in SAS are called variablesvariables

An observation is all the information An observation is all the information for one entity (patient, patient visit, for one entity (patient, patient visit, clinical center, county)clinical center, county)SAS processes data one observation SAS processes data one observation at a timeat a time

Page 14: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

14

Example of Example of DataData

12 observations and 5 variables

F 23 S 15 MNF 21 S 15 WIF 22 S 09 MNF 35 M 02 MNF 22 M 13 MNF 25 S 13 WIM 20 S 13 MNM 26 M 15 WIM 27 S 05 MNM 23 S 14 IAM 21 S 14 MNM 29 M 15 MN

•Gender •Age•Marital status•Number of credits•State of residence

Page 15: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

* This is a short example program to demonstrate what a SAS program looks like. This is a comment statement because it begins with a * and ends with a semi-colon ;

DATA demo; INPUT gender $ age marstat $ credits state $ ;

if credits > 12 then fulltime = 'Y'; else fulltime = 'N'; if state = 'MN' then resid = 'Y'; else resid = 'N'; DATALINES;F 23 S 15 MNF 21 S 15 WIF 22 S 09 MNF 35 M 02 MNF 22 M 13 MNF 25 S 13 WIM 20 S 13 MNM 26 M 15 WIM 27 S 05 MNM 23 S 14 IAM 21 S 14 MNM 29 M 15 MN;RUN;

TITLE 'Running the Example Program';PROC PRINT DATA=DEMO ; VAR gender age marstat credits fulltime state ;RUN;

Page 16: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

16

Rules for SAS Statements and Rules for SAS Statements and VariablesVariables

SAS statements end with a semicolon (;)SAS statements end with a semicolon (;) SAS statements can be entered in lower or SAS statements can be entered in lower or

uppercaseuppercase Multiple SAS statements can appear on one Multiple SAS statements can appear on one

lineline A SAS statement can use multiple linesA SAS statement can use multiple lines Variable names can be from 1-32 characters Variable names can be from 1-32 characters

and begin with A-Z or an underscore (_)and begin with A-Z or an underscore (_)

Page 17: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

DATA demo; Create a SAS dataset called demo INPUT gender $ What are the variables age marstat $ credits state $ ;

if credits > 12 then fulltime = 'Y'; else fulltime = 'N';

if state = 'MN' then resid = 'Y'; else resid = 'N';

Last two Statements create 2 new variables(fulltime and state -Character)

Page 18: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

DATALINES; Tells SAS the data is comingF 23 S 15 MNF 21 S 15 WIF 22 S 09 MNF 35 M 02 MNF 22 M 13 MNF 25 S 13 WIM 20 S 13 MNM 26 M 15 WIM 27 S 05 MNM 23 S 14 IAM 21 S 14 MNM 29 M 15 MN; Tells SAS the data is ending

RUN; Tells SAS to run the statements

Page 19: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

19

Types of DataTypes of Data

Numeric (e.g. age, blood pressure)Numeric (e.g. age, blood pressure)

Character (patient name, ID, diagnosis)Character (patient name, ID, diagnosis)

Each type treated differently by SASEach type treated differently by SAS

Page 20: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

TITLE 'Running the Example Program';

PROC PRINT DATA=demo ; VAR gender age marstat credits fulltime state ;RUN;

* You can run additional procedures;PROC MEANS DATA=demo ; VAR age credits ;RUN;

PROC FREQ DATA=demo ; TABLES gender ;RUN;

Page 21: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

21

Files Generated When SAS Files Generated When SAS Program is SubmittedProgram is Submitted

Log file – a text file listing program Log file – a text file listing program statements processed and giving notes, statements processed and giving notes, warnings and errors. warnings and errors. (in UNIX the file will be named filename.log)(in UNIX the file will be named filename.log)

Always look at the log file !Always look at the log file !Tells how SAS understood your programTells how SAS understood your program

Output file – a text file giving the output Output file – a text file giving the output generated from the PROCs generated from the PROCs

(in UNIX the file will be named filename.lst)(in UNIX the file will be named filename.lst)

Page 22: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

22

Messages in SAS LogMessages in SAS Log

NotesNotes – messages that may or may – messages that may or may not be importantnot be important

Warnings Warnings – messages that are usually – messages that are usually importantimportant

ErrorsErrors – fatal in that program will – fatal in that program will abortabort

(notes and warnings will not abort (notes and warnings will not abort your program)your program)

Page 23: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

LOG FILE

NOTE: Copyright (c) 1999-2001 by SAS Institute Inc., Cary, NC, USA.NOTE: SAS (r) Proprietary Software Release 8.2 (TS2M0) Licensed to UNIVERSITY OF MINNESOTA, Site 0009012001.NOTE: This session is executing on the WIN_NT platform.

NOTE: SAS initialization used: real time 7.51 seconds cpu time 0.89 seconds

1 * This is a short example program to demonstrate what a2 SAS program looks like. This is a comment statement because3 it begins with a * and ends with a semi-colon ;45 DATA demo;6 INFILE DATALINES;7 INPUT gender $ age marstat $ credits state $ ;89 if credits > 12 then fulltime = 'Y'; else fulltime = 'N';10 if state = 'MN' then resid = 'Y'; else resid = 'N';11 DATALINES;

NOTE: The data set WORK.DEMO has 12 observations and 7 variables.NOTE: DATA statement used: real time 0.38 seconds cpu time 0.06 seconds

Page 24: Biostatistical Methods II PubH 6415 Spring 2007. 2 PubH 6415 – Biostatistics I Instructor: Susan Telke email: susant@biostat.umn.edu (office hours: lecture.

25 RUN;26 TITLE 'Running the Example Program';27 PROC PRINT DATA=demo ;28 VAR gender age marstat credits fulltime state ;29 RUN;

NOTE: There were 12 observations read from the data set WORK.DEMO.NOTE: PROCEDURE PRINT used: real time 0.19 seconds cpu time 0.02 seconds

30 PROC MEANS DATA=demo N SUM MEAN;31 VAR age credits ;32 RUN;

NOTE: There were 12 observations read from the data set WORK.DEMO.NOTE: PROCEDURE MEANS used: real time 0.25 seconds cpu time 0.03 seconds

33 PROC FREQ DATA=demo; TABLES gender;34 RUN;

NOTE: There were 12 observations read from the data set WORK.DEMO.NOTE: PROCEDURE FREQ used: real time 0.15 seconds cpu time 0.03 seconds


Recommended