SAS for Categorical Data Copyright © 2004 Leland Stanford Junior University. All rights reserved....

Post on 23-Dec-2015

224 views 0 download

transcript

SAS for Categorical Data

Copyright © 2004 Leland Stanford Junior University. All rights reserved.Warning: This presentation is protected by copyright law and international treaties. Unauthorized reproduction of this presentation, or any portion of it, may result in severe civil and criminal penalties and will be prosecuted to maximum extent possible under the law.

SAS SAS is a huge integrated data management

and analysis suite. It takes years to master 20% of SAS. Most people take weeks if not months to get comfortable working with it.

The course I teach has online slides which demonstrate how to do categorical data analyses as well as data management. http://www.stanford.edu/class/hrp223/ Topic 0 has information on using SAS as a

calculator.

Using SAS When you start SAS in a windowing

environment you automatically have access to at least 4 windows. The (enhanced) program editor is a place where

you type instructions to SAS. The log window gives you feedback on how SAS

interprets your work. The output window displays any printed results

from your request. There is also a two pained window. One,

Explorer, allows you to look at data sets. The other, Results, acts like a hyperlinked table of contents for the output window.

Telling SAS what to do.

You type instructions in the program editor and then push the run button.

The instructions you will use for this class will be data steps (to create data sets) and procedures (to analyze data sets).

Data steps In data steps you can create variables (a variable is

just like a box that can hold either numbers or letters). You can do math on variables including using functions that are build into SAS.

data work.someData;theAnswer = 1 + 1;

run; After you type the instructions you have to tell SAS to

actually do the work. Push the running person icon to do this.

Data steps The above code will

create a data set that will exist until you quit SAS. You can view it as if it was a spreadsheet by double clicking on Libraries then the Work library and finally the data set inside the SAS Explorer window.

Functions SAS has thousands of functions built in:data work.blah;

numberOne = 1;someTrigThing = sin(numberOne);

run; I have tried to document the ones that

students frequently need in Lecture 2 of 223. Take a look at the slides labeled Frequently Used Functions.

Finding fuctions … or you can look up the function in the SAS

online documentation. One of the useful links in the useful links

section of the class website http://www.stanford.edu/class/hrp223/2002f/usefulLinks.html

is the SAS online documentation. The URL of SAS OnLineDoc is:

http://v9doc.sas.com/sasdoc/ If you enter a bad password 3 times and it

will take you to the registration page. Access to the documentation is free.

Example of a Function

If you roll a die 50 times what's the chance that you'll get more than 10 "6"'s?

data work.pfft;x = 1 - CDF('BINOMIAL',10, 1/6, 50) ;

run;

Procedures

SAS has many built in statistical analysis procedures. The ones you will use for this class are: proc freq – contingency tables

See 223 topics 12 and 13 proc logistic – logistic regression

See 223 topics 14 and 15

Real data looks like this: data work.epi; input subjectID exposure $ disease $; datalines; 1 exposed Diseased 2 exposed Diseased 3 exposed Diseased 4 exposed Diseased 5 exposed notDiseased 6 notExposed notDiseased 7 exposed Diseased 8 exposed Diseased 9 exposed Diseased 10 notExposed notDiseased 11 exposed notDiseased 12 exposed Diseased 13 notExposed Diseased 14 notExposed notDiseased 15 exposed Diseased 16 exposed Diseased 17 exposed notDiseased 18 notExposed notDiseased 19 exposed Diseased 20 exposed Diseased 21 exposed Diseased 22 notExposed notDiseased 23 exposed notDiseased 24 exposed Diseased 25 notExposed notDiseased 26 exposed notDiseased 27 notExposed notDiseased 28 exposed Diseased ; run;

Contingency tables You can get a frequency table like this:proc freq data = work.epi;

tables exposure * disease;run;

Contingency tables analysisproc freq data= epi;

tables exposure*disease /chisq;run;

Grouped Data You will get grouped data in statistics classes… In a case-control study of 50 patients with pancreatic

cancer and 50 hospital controls, 15 patients and 25 controls are non-coffee-drinkers, 15 patients and 10 controls are mid-level coffee drinkers, and 20 patients and 15 controls are high-octane coffee addicts.  What are the odds ratios for the association between coffee drinking and pancreatic cancer (comparing high to low, high to none, low to none, and any to none)?

Grouped Datadata work.epi; input exposure $ disease $ people; datalines;notExposed diseased 15notExposed notDiseased 25little diseased 15little notDiseased 10lots diseased 20lots notDiseased 15;run;

Problems…

proc freq data = epi;tables exposure * disease;

run;

Weighted dataproc freq data = epi;

weight people;tables exposure * disease;

run;

Analysis of weighted dataproc freq data = epi;

weight people;tables exposure * disease /relrisk;where exposure in ("notExpos", "lots");

run;

Other groups

Just copy and paste the proc freq and pick different groups.

To get the combined groups use a character format (if you took 223) or just add the two exposed groups by hand.

Formats in Freqproc format;

value $coffee"lots" = "Exposed""little" = "Exposed""notExpos" = "notExpos";

run;

proc freq data = epi;weight people;format exposure $coffee.;tables exposure * disease /relrisk;

run;

Analysis of Formatted Grouped Data