+ All Categories
Home > Documents > Public Health 5415 Biostatistical Methods II Spring 2005 Greg Grandits [email protected]....

Public Health 5415 Biostatistical Methods II Spring 2005 Greg Grandits [email protected]....

Date post: 19-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
26
Public Health 5415 Biostatistical Methods II Spring 2005 Greg Grandits [email protected]. 612-626-9033 Class Times Monday 10:10am-12:05pm Wednesday 10:10am-11:00am
Transcript

Public Health 5415Biostatistical Methods II

Spring 2005

Greg Grandits

[email protected].

612-626-9033

Class Times

Monday 10:10am-12:05pm

Wednesday 10:10am-11:00am

Course objectives:

• Write and run simple SAS programs to perform common analyses.

• Analyze health science data using basic statistical and inferential techniques.

• Understand statistical methods as commonly presented in public health literature

Topics Covered

• Linear regression• Logistic regression• Life-table analyses• Cox regression • Relative risk, odds ratio, hazard ratio

estimation

SAS programming to do above analyses

SAS Usage

• SAS is the worlds largest privately held software company

• 40,000 customer sites worldwide• 3.5 million users worldwide• 90% of Fortune 500 companies use SAS• Nearly all analyses of publications in

medical research use SAS• SAS invests extensive resources to R & D.

Why SAS?

• It is widely used

– Industry, government, and academia

• It is very powerful

– programming language– sophisticated analyses (better than Excel)

JAMA January 12, 2005

Meat Consumption and Risk of Colorectal Cancer, Chao

Colon and rectal cancer incidence rate ratios (RRs) and 95% CIs by meat intake were estimated using Cox proportional hazards regression modeling. P values for linear trend were estimated by modeling meat intake (g/wk) using the median value within quintiles; these results were similar when modeled as continuous variables.

All P values were 2-sided and considered significant at P<.05. All analyses were conducted using SAS version 9.0 (SAS Institute Inc, Cary, NC).

Consumption of Veg/Fruits and Risk of Breast Cancer

All analyses were performed using SAS version 8 (SAS Institute Inc, Cary, NC). All tests were 2-sided with an {alpha} of .05.

JAMA January 12, 2005

Fasting Serum Glucose Level and Cancer Risk in Korean Men and Women

Age-adjusted death and cancer incidence rates were calculated for each category of fasting serum glucose level and directly standardized to the age distribution of the 1995 Korean national population. All analyses were stratified by sex.

All analyses were conducted using SAS statistical software, version 8.0

(SAS Institute Inc, Cary, NC).

Detailshttp://www.biostat.umn.edu/~greg-g/ph5415.html

– Homework, readings, programs, data files

– Class slides

Lab/Office hours• 4 hours per week (TA or instructor)

DetailsText books:

Applied Statistics and the SAS Programming Language, RP Cody and JK Smith

(Read Chapter 1 for next week)

Introductory Biostatistics, CT Le

The Little SAS Book, LD Delwiche and SJ Slaughter

(Chapter 1 available on website)

Grading

Homework - 30% (half credit for late homework, can turn in no later than 2 weeks after due date)

Two tests - 30% each

Short project - 10%

No final exam

Using SAS

SAS is available several ways:

• In the Mayo A-269 (TRC) lab• Other PCs with SAS• From biostatistics UNIX computer via

telnet • Purchase from the University

152 Shepherd Labs (ADCS)

612-625-1300$150 per year

What is SAS ?

• SAS is a programming language that reads, processes, and performs statistical analyses of data.

• A SAS program is made up of programming statements which SAS interprets to do the above functions.

Raw Data

Read in Data

Process Data(Create new variables)

Output Data(Create SAS Dataset)

Analyze Data Using Statistical Procedures

Data Step

PROCs

Structure of Data

• Made up of rows and columns• Rows in SAS are called observations• Columns in SAS are called variables

An observation is all the information for one entity (patient, patient visit, clinical center, county)

SAS processes data one observation at a time

Example of Data

12 observations and 5 variables

F 23 S 15 MNF 21 S 15 WIF 22 S 09 MNF 35 M 02 MNF 22 M 13 MNF 25 S 13 WIM 20 S 13 MNM 26 M 15 WIM 27 S 05 MNM 23 S 14 IAM 21 S 14 MNM 29 M 15 MN

•Gender •Age•Marital status•Number of credits•State of residence

* This is a short example program to demonstrate what a SAS program looks like. This is a comment statement because it begins with a * and ends with a semi-colon ;

DATA demo; INPUT gender $ age marstat $ credits state $ ;

if credits > 12 then fulltime = 'Y'; else fulltime = 'N'; if state = 'MN' then resid = 'Y'; else resid = 'N'; DATALINES;F 23 S 15 MNF 21 S 15 WIF 22 S 09 MNF 35 M 02 MNF 22 M 13 MNF 25 S 13 WIM 20 S 13 MNM 26 M 15 WIM 27 S 05 MNM 23 S 14 IAM 21 S 14 MNM 29 M 15 MN;RUN;

TITLE 'Running the Example Program';PROC PRINT DATA=DEMO ; VAR gender age marstat credits fulltime state ;RUN;

Rules for SAS Statements and Variables

• SAS statements end with a semicolon (;)• SAS statements can be entered in lower or

uppercase• Multiple SAS statements can appear on one line• A SAS statement can use multiple lines• Variable names can be from 1-32 characters and

begin with A-Z or an underscore (_)

1 DATA demo; Create a SAS dataset called demo2 INPUT gender $ What are the variables age marstat $ credits state $ ;

3 if credits > 12 then fulltime = 'Y'; else fulltime = 'N';

4 if state = 'MN' then resid = 'Y'; else resid = 'N';

Statements 3 and 4 create 2 new variables

5 DATALINES; Tells SAS the data is comingF 23 S 15 MNF 21 S 15 WIF 22 S 09 MNF 35 M 02 MNF 22 M 13 MNF 25 S 13 WIM 20 S 13 MNM 26 M 15 WIM 27 S 05 MNM 23 S 14 IAM 21 S 14 MNM 29 M 15 MN; Tells SAS the data is ending

6 RUN; Tells SAS to run the statements

Types of Data

• Numeric (e.g. age, blood pressure)

• Character (patient name, ID, diagnosis)

Each type treated differently by SAS

TITLE 'Running the Example Program';

PROC PRINT DATA=demo ; VAR gender age marstat credits fulltime state ;RUN;

* You can run additional procedures;PROC MEANS DATA=demo ; VAR age credits ;RUN;

PROC FREQ DATA=demo ; TABLES gender ;RUN;

Files Generated When SAS Program is Submitted

• Log file – a text file listing program statements processed and giving notes, warnings and errors. (in UNIX the file will be named fname.log)

Always look at the log file !Tells how SAS understood your program

• Output file – a text file giving the output generated from the PROCs

(in UNIX the file will be named fname.lst)

Messages in SAS Log

• Notes – messages that may or may not be important

• Warnings – messages that are usually important

• Errors – fatal in that program will abort

(notes and warnings will not abort your program)

LOG FILE

NOTE: Copyright (c) 1999-2001 by SAS Institute Inc., Cary, NC, USA.NOTE: SAS (r) Proprietary Software Release 8.2 (TS2M0) Licensed to UNIVERSITY OF MINNESOTA, Site 0009012001.NOTE: This session is executing on the WIN_NT platform.

NOTE: SAS initialization used: real time 7.51 seconds cpu time 0.89 seconds

1 * This is a short example program to demonstrate what a2 SAS program looks like. This is a comment statement because3 it begins with a * and ends with a semi-colon ;45 DATA demo;6 INFILE DATALINES;7 INPUT gender $ age marstat $ credits state $ ;89 if credits > 12 then fulltime = 'Y'; else fulltime = 'N';10 if state = 'MN' then resid = 'Y'; else resid = 'N';11 DATALINES;

NOTE: The data set WORK.DEMO has 12 observations and 7 variables.NOTE: DATA statement used: real time 0.38 seconds cpu time 0.06 seconds

25 RUN;26 TITLE 'Running the Example Program';27 PROC PRINT DATA=demo ;28 VAR gender age marstat credits fulltime state ;29 RUN;

NOTE: There were 12 observations read from the data set WORK.DEMO.NOTE: PROCEDURE PRINT used: real time 0.19 seconds cpu time 0.02 seconds

30 PROC MEANS DATA=demo N SUM MEAN;31 VAR age credits ;32 RUN;

NOTE: There were 12 observations read from the data set WORK.DEMO.NOTE: PROCEDURE MEANS used: real time 0.25 seconds cpu time 0.03 seconds

33 PROC FREQ DATA=demo; TABLES gender;34 RUN;

NOTE: There were 12 observations read from the data set WORK.DEMO.NOTE: PROCEDURE FREQ used: real time 0.15 seconds cpu time 0.03 seconds

LST FILERunning the Example Program

Obs gender age marstat credits fulltime state

1 F 23 S 15 Y MN 2 F 21 S 15 Y WI 3 F 22 S 9 N MN 4 F 35 M 2 N MN 5 F 22 M 13 Y MN 6 F 25 S 13 Y WI 7 M 20 S 13 Y MN 8 M 26 M 15 Y WI 9 M 27 S 5 N MN 10 M 23 S 14 Y IA 11 M 21 S 14 Y MN 12 M 29 M 15 Y MN

The MEANS Procedure

Variable N Sum Mean----------------------------------------------age 12 294.0000000 24.5000000credits 12 143.0000000 11.9166667-----------------------------------------------

The FREQ Procedure

Cumulative Cumulativegender Frequency Percent Frequency Percent-----------------------------------------------------------F 6 50.00 6 50.00M 6 50.00 12 100.0


Recommended