+ All Categories
Home > Documents > Day2_PPT

Day2_PPT

Date post: 02-Dec-2015
Category:
Upload: ds
View: 10 times
Download: 1 times
Share this document with a friend
Description:
SAS Programming
Popular Tags:
187
SAS Environment and Concepts of Libraries SAS Training
Transcript
Page 1: Day2_PPT

SAS Environment and Concepts of Libraries

SAS Training

Page 2: Day2_PPT

Producing Descriptive Statistics:

PROC FREQ

Produces oneway and n-way frequency tables, and it concisely describes the data by reporting the distribution of variable values

Create crosstabulation tables that summarize data for two or more categorical variables by showing the number of observations for each combination of variable values

Can include many statements and options for controlling frequency output

By default, PROC FREQ creates a one-way table with the frequency, percent, cumulative frequency, and cumulative percent of every value of all variables in a data set

Syntax

Proc Freq Data = <SAS-data-set>; Run;

Where,

SAS-data-set is the name of the data set to be used

Page 3: Day2_PPT

Example:

proc freq data = parts.widgets;

run;

Here,

In the above program FREQ procedure creates a frequency table for each variable in the data set Parts.Widgets

Page 4: Day2_PPT

Specifying Variables in PROC FREQ:

To specify the variables to be processed by the FREQ procedure, include a TABLES statement

Syntax:

Proc Freq Data = <SAS-data-set> ; Tables variable(s);

Run;

Where,

SAS-data-set is the name of the data set to be used

variable(s) lists the variables to include

Page 5: Day2_PPT

Example:

proc freq data = finance.loans;

tables rate months;

run;

Here,

In the above program FREQ procedure creates a frequency table for variables rate and

months in the data set finance.loans

Rate Frequency Percent Cumulative Frequency Cumulative Percent

9.50% 2 22.22 2 22.22

9.75% 1 11.11 3 33.33

10.00% 2 22.22 5 55.56

10.50% 4 44.44 9 100.00

Month

s

Frequency Percent Cumulative Frequency Cumulative Percent

12 1 11.11 1 11.11

24 1 11.11 2 22.22

36 1 11.11 3 33.33

48 1 11.11 4 44.44

60 2 22.22 6 66.67

360 3 33.33 9 100.00

Page 6: Day2_PPT

Rate Frequency Percent Cumulative Frequency Cumulative Percent

9.50% 2 22.22 2 22.22

9.75% 1 11.11 3 33.33

10.00% 2 22.22 5 55.56

10.50% 4 44.44 9 100.00

Months Frequency Percent Cumulative Frequency Cumulative Percent

12 1 11.11 1 11.11

24 1 11.11 2 22.22

36 1 11.11 3 33.33

48 1 11.11 4 44.44

60 2 22.22 6 66.67

360 3 33.33 9 100.00

Page 7: Day2_PPT

Creating Two-Way Tables:

Crosstabulate frequencies with the values of other variables

Simplest crosstabulation is a two-way table

To create a two-way table, join two variables with an asterisk (*) in the TABLES statement of a PROC FREQ step

Syntax:

Proc Freq Data = <SAS-data-set>; Tables variable-1 * variable-2 * …. <variable-n>;Run;

Where,

SAS-data-set is the name of the data set to be used

variable-1 specifies table rows

variable-2 specifies table columns

variable-n specifies a multi-way table.

Page 8: Day2_PPT

Example:

proc freq data = clinic.diabetes;

tables weight * height;

run;

Here,

The above program creates the two-way table for variables weight and height

Page 9: Day2_PPT

Creating N-Way Tables:

Create n-way crosstabulation tables

A series of two-way tables is produced, with a table for each level of the other variables

Example:

proc freq data = clinic.diabetes;

tables sex*weight*height;

run;

Here,

The above program will produce two crosstabulation tables, one for each value of Sex.

Page 10: Day2_PPT

Suppressing Table Information:

Limit the output of the FREQ procedure to a few specific statistics

To control the depth of crosstabulation results, add a slash (/) and any combination of the following options to the TABLES statement:

NOFREQ suppresses cell frequencies.

NOPERCENT suppresses cell percentages

NOROW suppresses row percentages.

NOCOL suppresses column percentages.

Example:

proc freq data = clinic.diabetes;

tables sex*weight / nofreq norow nocol;

run;

Here,

The result will contain the statistics percent only.

Page 11: Day2_PPT

Output:

Page 12: Day2_PPT

PROC MEANS

Provides mean, minimum, maximum and other data summarization tools, as well as helpful options for controlling the output

Include many statements and options for specifying needed statistics

Syntax:

Proc Means <DATA=SAS-data-set> <statistic- keyword(s)> <option(s)>; Run;

Where,

SAS-data-set is the name of the data set to be used

statistic- keyword(s) specifies the statistics to compute

option(s) controls the content, analysis, and appearance of output

Page 13: Day2_PPT

Example:

proc means data = perm.survey;

run;

Here,

PROC MEANS prints the n-count (number of nonmissing values), the mean, the standard deviation, and the minimum and maximum values of every numeric variable in the data set perm.survey

Page 14: Day2_PPT

Specifying Statistics:

To specify statistics, include statistic keywords as options in the PROC MEANS statement

When a statistic is specified in the PROC MEANS statement, default statistics are not produced

Example

proc means data=perm.survey median range;

run;

Here,

Means procedure prints only median and range for all the numeric variables

Page 15: Day2_PPT

The following keywords can be used with PROC MEANS to compute statistics:

Descriptive Statistics

Keyword Description

CLM Two-sided confidence limit for the mean

CSS Corrected sum of squares

CV Coefficient of variation

KURTOSIS / KURT Kurtosis

LCLM One-sided confidence limit below the mean

MAX Maximum value

MEAN Average

MIN Minimum value

N Number of observations with non-missing values

NMISS Number of observations with missing values

RANGE Range

SKEWNESS / SKEW Skewness

STDDEV / STD Standard deviation

STDERR / STDMEAN Standard error of the mean

SUM Sum

SUMWGT Sum of the Weight variable values

UCLM One-sided confidence limit above the mean

USS Uncorrected sum of squares

VAR Variance

Page 16: Day2_PPT

Quantile Statistics

Keyword Description

MEDIAN / P50 Median or 50th percentile

P1 1st percentile

P5 5th percentile

P10 10th percentile

Q1 / P25 Lower quartile or 25th percentile

Q3 / P75 Upper quartile or 75th percentile

P90 90th percentile

P95 95th percentile

P99 99th percentile

QRANGE Difference between upper and lower quartiles: Q3-Q1

Hypothesis Testing

Keyword Description

PROBT Probability of a greater absolute value for the t value

T Student's t for testing the hypothesis that the population mean is 0

Page 17: Day2_PPT

Specifying Variables in PROC MEANS:

By default, the MEANS procedure generates statistics for every numeric variable in a data set

To specify the variables that PROC MEANS analyzes, add a VAR statement and list the variable names

Syntax:

Proc Means Data = <SAS-data-set> <statistic- keyword(s)> <option(s)>; Var variable(s);

Run;

Where,

SAS-data-set is the name of the data set to be used

statistic- keyword(s) specifies the statistics to compute

option(s) controls the content, analysis, and appearance of output

variable(s) lists numeric variables for which to calculate statistics

Page 18: Day2_PPT

Example:

proc means data = clinic.diabetes min max;

var age height weight;

run;

Here,

The means procedure will calculate the result for age, height and weight only.

Page 19: Day2_PPT

Group Processing Using the CLASS Statement:

Give statistics for grouped observations, instead of for observations as a whole

To produce separate analyses of grouped observations, add a CLASS statement to the MEANS procedure

does not generate statistics for CLASS variables, because their values are used only to categorize data

CLASS variables can be either character or numeric, but they should contain a limited number of discrete values that represent meaningful groupings

Syntax:

Proc Means Data = <SAS-data-set> <statistic- keyword(s)> <option(s)>; Class variable(s);

Run;

Where,

SAS-data-set is the name of the data set to be used

statistic- keyword(s) specifies the statistics to compute

option(s) controls the content, analysis, and appearance of output

variable(s) specifies category variables for group processing

Page 20: Day2_PPT

Example:

proc means data = clinic.heart;

var arterial heart cardiac urinary;

class survive sex;

run;

Here,

The output of the program shown above is categorized by values of the variables

Survive and Sex.

Page 21: Day2_PPT

Group Processing Using the BY Statement:

Specifies variables to use for categorizing observations

Syntax:

Proc Means Data = <SAS-data-set> <statistic- keyword(s)> <option(s)>; By variable(s);

Run;

Where,

SAS-data-set is the name of the data set to be used

statistic- keyword(s) specifies the statistics to compute

option(s) controls the content, analysis, and appearance of output

variable(s) specifies category variables for group processing

Page 22: Day2_PPT

Example:

proc means data = work.heartsort;

var arterial heart cardiac urinary;

by survive sex;

run;

Here,

The output of the program shown above is categorized by values of the variables

Survive and Sex.

Creates a separate table for each value of the group

Page 23: Day2_PPT

Differences Between BY and CLASS Statements:

Unlike CLASS processing, BY processing requires that the data is already sorted or indexed in

the order of the BY variables

BY group results have a layout that is different from the layout of CLASS group results.

Page 24: Day2_PPT

Creating a Summarized Data Set Using PROC MEANS:

Create an output SAS data set that contains only the summarized variable

Syntax:

Proc Means Data = <SAS-data-set> <statistic- keyword(s)> <option(s)>; Output Out = SAS-data-set <statistic-keyword= variable-name(s)>;

Run;

Where.

SAS-data-set in the output statement specifies the name of the output data set

statistic-keyword= specifies the summary statistic to be written out

variable- name(s) specifies the names of the variables that will be created to contain the values of the summary statistic. These variables correspond to the analysis variables that are listed in the VAR statement.

Page 25: Day2_PPT

Example:

proc means data = clinic.diabetes;

var age height weight;

class sex;

output out = work.sum_gender

mean = AvgAge AvgHeight AvgWeight

in = MinAge MinHeight MinWeight;

run;

Here,

The above program creates a typical PROC MEANS report and also creates a

summarized output data set that includes only the MEAN and MIN statistics Obs Sex _TYPE_ _FREQ_ AvgAge AvgHeight AvgWeight MinAge MinHeight MinWeight

1 0 20 46.7000 66.9500 174.650 15 61 102

2 F 1 11 48.9091 63.9091 150.455 16 61 102

3 M 1 9 44.0000 70.6667 204.222 15 66 140

Page 26: Day2_PPT

Creating a Summarized Data Set Using PROC SUMMARY

Create a summarized output data set

Similar to means procedure

The difference between the two procedures is that PROC MEANS produces a report by default. By contrast, to produce a report in PROC SUMMARY, must include a PRINT option in the PROC SUMMARY statement.

Syntax:

Proc Summary Data = <SAS-data-set> <statistic- keyword(s)> <option(s)>; Run;

Where,

SAS-data-set is the name of the data set to be used

statistic- keyword(s) specifies the statistics to compute

option(s) controls the content, analysis, and appearance of output

Page 27: Day2_PPT

Example:

proc summary data = clinic.diabetes;

var age height weight;

class sex;

output out = work.sum_gender

mean = AvgAge AvgHeight AvgWeight;

run;

Here,

The above program creates an output data set but does not create a report

Page 28: Day2_PPT

Output Delivery System (ODS)

Use ODS statements to specify destinations for your output

Create output in a variety of formats

the Listing destination is open by default

Syntax:

ODS open-destination;

ODS close-destination CLOSE;

Where,

open-destination is a keyword and any required options for the type of output that is to be created, such as

HTML FILE='html-file-pathname'

LISTING

close-destination is a keyword for the type of output

Page 29: Day2_PPT

Example:

ods html body = „ c:\mydata.html';

proc print data = sasuser.mydata;

run;

ods html close;

Here,

The ods html statement creates an HTML output of the name mydata.html in the path

specified.

Page 30: Day2_PPT

ODS Destinations:

The table that follows lists the ODS destinations that are supported.

This destination… Produces…

HTML output that is formatted in HyperText Markup Language

(HTML)

Listing output that is formatted like traditional SAS procedure

(listing) output

Markup Language Family output that is formatted using markup languages such as

Extensible Markup Language (XML)

ODS Document a hierarchy of output objects that enables you to render

multiple ODS output without re-running procedures

Output SAS data sets

Printer Family output that is formatted for a high-resolution printer, such as

PostScript (PS), Portable Document Format (PDF), or

Printer Control Language (PCL) files

RTF Rich Text Format output for use with Microsoft Word

Page 31: Day2_PPT

Closing Multiple ODS Destinations Concurrently:

Produce output in multiple formats concurrently by opening each ODS destination at the

beginning of the program

The keyword _ALL_ is used in the ODS CLOSE statement to close all open destinations

concurrently

Syntax:

ODS open-destination1;

ODS open-destination2;

ODS _all_ CLOSE;

Where,

open-destination1 is a keyword and any required options for the first type of output that is to be created

open-destination2 is a keyword and any required options for the second type of output that is to be created

_all_ keyword close all open destinations concurrently

Page 32: Day2_PPT

Example:

ods html file = 'c:\admit.html‘ ;

ods pdf file = 'c:\admit.pdf' ;

proc print data = sasuser.admit;

run;

ods _all_ close;

Here,

The ods html statement creates an HTML output of the name admit.html in the path specified

The ods pdf statement creates a PDF output of the name admit.pdf in the path specified

FILE= can also be used to specify the file that contains the HTML output. FILE= is an alias for BODY=.

Page 33: Day2_PPT

Creating HTML Output from Multiple Procedures:

Can also use the ODS HTML statement to direct the results from multiple procedures to the same HTML file

Syntax:

ODS open-destination;

Procedure1

Procedure2

ODS close-destination CLOSE;

Where,

open-destination is a keyword and any required options for the type of output that is to be created

Procedure1 is the proc step for first procedure

Procedure2 is the proc step for second procedure

close-destination is a keyword for the type of output

Page 34: Day2_PPT

Example:

ods html body = „ c:\records\data.html';

proc print data = clinic.admit label;

var id sex age height weight actlevel;

label actlevel = 'Activity Level';

run;

proc tabulate data = clinic.stress2;

var resthr maxhr rechr;

table min mean, resthr maxhr rechr;

run;

ods html close;

Here,

The program above generates HTML output for the PRINT and TABULATE procedures

Page 35: Day2_PPT

Creating and Applying User-Defined Formats

SAS Formats can be associate with variables either temporarily or permanently

User can create some of custom formats to apply on same variables. For example, we can format

a product number so that it is displayed as descriptive text

FORMAT procedure, can be used to create user defined formats for variables

Can store formats temporarily or permanently

Page 36: Day2_PPT

Syntax:

Proc Format <options> ;

Value format-name range1='label1' range2='label2' ... ;

Where,

options includes :

Library= libref , specifies the libref for a SAS data library that contains a permanent catalog in which user-defined formats are stored

Fmtlib , prints the contents of a format catalog

format-name names the format that is being created

must begin with a dollar sign ($) if the format applies to character data

cannot be longer than eight characters

cannot be the name of an existing SAS format

cannot end with a number

does not end in a period when specified in a VALUE statement

range specifies one or more variable values and a character string or an existing format

label is a text string enclosed in quotation marks

Page 37: Day2_PPT

When PROC FORMAT is used to create a format, the format is stored in a format catalog

If the SAS data library does not already contain a format catalog, SAS automatically creates one

If LIBRARY= option is not specified, then the formats are stored in a default format catalog named Work.Formats

Formats are stored in a permanent format catalog named Formats when we specify the LIBRARY= option in the PROC FORMAT statement

PROC FORMAT LIBRARY=libref;

A LIBNAME statement needed to associates the libref with the permanent SAS data library in which the format catalog is to be stored

It is recommended, but not required, to use the word Library as the libref when creating our own permanent formats

libname library 'c:\ sas \formats\lib„ ;

Page 38: Day2_PPT

Example:

Sample Data Set Empdata :

Here,

The values for JobTitle are coded, and they are not easily interpreted

Using proc format we can create a format for this variable which describes the values of

this variable

(Without Format)

FirstName LastName JobTitle Salary

Donny Evans 112 29996.63

Lisa Helms 105 18567.23

John Higgins 111 25309.00

Amy Larson 113 32696.78

Mary Moore 112 28945.89

Jason Powell 103 35099.50

Page 39: Day2_PPT

libname library 'c:\sas\formats\lib„ ;

proc format lib = library ;

value jobfmt

103='manager'

105='text processor'

111='assoc. technical writer'

112='technical writer'

113='senior technical writer„ ;

run;

Data empinfo ;

set empdata ;

format jobtitile jobfmt ;

run ;

Here,

The format JOBFMT is stored in a catalog named Library.Formats, which is located in the directory C:\Sas\Formats\Lib in the Windows environment

The user defined format JOBFMT is used for formatting a variable called jobtitle

Format statement can be placed in either a DATA step or a PROC step

Page 40: Day2_PPT

Output:

(With Format)

FirstName LastName JobTitle Salary

Donny Evans technical writer 29996.63

Lisa Helms text processor 18567.23

John Higgins assoc. technical writer 25309.00

Amy Larson senior technical writer 32696.78

Mary Moore technical writer 28945.89

Jason Powell manager 35099.50

Page 41: Day2_PPT

Example:

proc format lib = library;

value $ grade

'A'='Good'

'B'-'D'='Fair'

„F'='Poor'

'I','U'='See Instructor';

run;

Here,

Format is created for character variable ( $ sign before the format name)

proc format lib= library;

value jobfmt

103='manager'

105='text processor'

111='assoc. technical writer'

112='technical writer'

113='senior technical writer';

run;

Here,

Format is created for numeric variable ( no $ sign before the format name)

Page 42: Day2_PPT

Example: Specifying Value Ranges

proc format lib = library;

value agefmt

0-<13 = 'child'

13-<20 = 'teenager'

20-<65 = 'adult'

65-100 = 'senior citizen„ ;

run;

or

proc format lib = library;

value agefmt

low -<13 = „child'

13-<20 = „teenager'

20-<65 = 'adult'

65-high = 'senior citizen'

other = 'unknown';

run;

Page 43: Day2_PPT

Defining Multiple Formats:

proc format lib=library;

value jobfmt

103='manager'

105='text processor'

111='assoc. technical writer'

112='technical writer'

113='senior technical writer„ ;

value $response

'Y'='Yes'

'N'='No'

'U'='Undecided'

'NOP'='No opinion„ ;

run;

To define several formats, use multiple VALUE statements in a single PROC FORMAT step

Page 44: Day2_PPT

Displaying a List of Your Formats:

libname library 'c:\sas\formats\lib„ ;

proc format library = library fmtlib ;

run;

Adding the keyword FMTLIB to the PROC FORMAT statement displays a list of all the formats in the

catalog, along with descriptions of their values

Output:SAS Output

Format Name: JobFmt Length: 23 Number of Values: 5

Min Length: 1 Max Length: 40 Default Length: 23 Fuzz: Std

START END LABEL (VER. 9.00 29AUG2002:11:13:14)

103 103 manager

105 105 text processor

111 111 assoc. technical writer

112 112 technical writer

113 113 senior technical writer

Page 45: Day2_PPT

Proc Transpose

Restructures the data by changing the variables into observations

Syntax

PROC TRANSPOSE <DATA=input-data-set> <LABEL=label> <LET>

<NAME=name> <OUT=output-data-set> <PREFIX=prefix>;

BY <DESCENDING> variable-1 <...<DESCENDING> variable-n>;

COPY variable (s);

ID variable;

VAR variable (s);

Run;

where,

Label assign a name to the variable that contains the label of the variable being transposed

Name assign a variable name to the variable that contains the name of the variable being transposed

Prefix assign the prefix for the transposed variables. The default is COL, which would produce COL1,COL2, COL3, etc

Var select which variables to transpose

By statement specifies to transpose within the certain combination of “BY” variables

Id use the values of variable listed as the names for the variables transposed

Copy transfers variables without transposing them

Page 46: Day2_PPT

Example:

proc transpose data=long1 out=wide1 prefix=faminc;

by famid ;

id year;

var faminc;

run;

Obs famid year faminc

1 1 96 40000

2 1 97 40500

3 1 98 41000

4 2 96 45000

5 2 97 45400

6 2 98 45800

7 3 96 75000

8 3 97 76000

9 3 98 77000

Original Dataset

Obs famid _NAME_ faminc96 faminc97 faminc98

1 1 faminc 40000 40500 41000

2 2 faminc 45000 45400 45800

3 3 faminc 75000 76000 77000

Result Dataset

Page 47: Day2_PPT

Example:

proc transpose data=long1 out=wide1 prefix=faminc name=family;

by famid ;

id year;

var faminc;

run;

Obs famid family faminc96 faminc97 faminc98

1 1 faminc 40000 40500 41000

2 2 faminc 45000 45400 45800

3 3 faminc 75000 76000 77000

Page 48: Day2_PPT

Exporting Data

Export Using SAS GUI:

SAS GUI can be used to export a SAS dataset

SAS dataset can be exported as an external file of any type such as:

Excel (.xls)

SAS dataset (.sas7bdat)

Text (.txt)

CSV (.csv)

HTML (.html)

Microsoft Access Files (.mdb)

Page 49: Day2_PPT

Exporting SAS data set Using Proc Export:

Syntax:

Proc Export Data= <SAS-data-set>

Outfile =filename | Outtable = <table-name>

Dbms = <identifier>

Replace ; delimiter=<character>;

Where,

Data=SAS-data-set :- identifies the input SAS data set with either a one- or two-level SAS name (library and member name

Outfile="filename" :- specifies the complete path and filename of the output PC file, spreadsheet, or delimited external file

Outtable="tablename" :- specifies the table name of the output DBMS table

DBMS=identifier :- specifies the type of data to export. For example, DBMS=DBF specifies to export a dBASE file, DBMS=ACCESS exports a Microsoft Access table

REPLACE :- overwrites an existing file

Delimiter=<character> :- If DBMS=DLM then delimiter= <delimiting character> should be specified>

Page 50: Day2_PPT

Exporting a Delimited External File:

Example:

proc export data= myfiles.class outfile =“d:/myfiles/class" dbms=dlm;

delimiter ='&';

run ;

Here,

A text file with delimiter as „&‟ is created at the path specified in „outfile=„

Page 51: Day2_PPT

Exporting a to an Excel Spreadsheet:

Example:

proc export data = SASUSER.Accounts

outfile=“c:\ myfiles\ accounts.xls“ ;

run;

Here,

An excel file is created at the path specified by „outfile= „

Page 52: Day2_PPT

Exporting a Microsoft Access Table:

Example:

proc export data = sasuser.cust

Outtable ="customers“

Dbms =access

Database ="c: \ myfiles\ mydatabase.mdb";

Run ;

Here,

An access file is created with table name „customers‟ in the database specified by „Database= „

Page 53: Day2_PPT

General Form of SAS Functions

To use a SAS function, specify the function name followed by the function arguments, which are

enclosed in parentheses

Even if the function does not require arguments, the function name must still be followed by

parentheses

Unless the length of the target variable has been previously defined, a default length is assigned

Syntax:

function-name (argument-1 , <argument-n>);

where,

arguments can be

variables P H D Q x,y,z

constants P H D Q 456,502,612,498

expressions P H D Q 37*2,192/5 mean(22,34,56)

Page 54: Day2_PPT

Example:

A function that contains multiple arguments

std(x1,x2,x3) ;

mean (of x1-x3) ;

AvgScore = sum (exam1,exam2,exam3) ;

Page 55: Day2_PPT

Sum Function

Calculates the sum of values

Syntax:

sum( argument , argument,...)

where,

argument can be sas variables, constants and expressions

Page 56: Day2_PPT

Example:

Data work.after;

Set work.before;

totalsal = sum (sal1,sal2,sal3);

Run;

Here,

The above program calculates the sum of the values in sal1, sal2 and sal3 variables.

Page 57: Day2_PPT

MEAN Function

calculate the average of nonmissing values

Syntax:

mean (argument, argument,...)

where,

argument can be sas variables, constants and expressions

Page 58: Day2_PPT

Example:

Data work.after;

Set work.before;

avg = mean (marks1,marks2,marks3);

Run;

Here,

The above program calculates the average of the values in marks1, marks2 and marks3

variables.

Page 59: Day2_PPT

MIN Function

Finds the minimum value

Syntax:

min ( argument, argument,...)

where,

argument can be sas variables, constants and expressions

Page 60: Day2_PPT

Example:

Data work.after;

Set work.before;

minimum =min (marks1,marks2,marks3);

Run;

Here,

The above program finds the minimum of the values in marks1, marks2 and marks3

variables.

Page 61: Day2_PPT

MAX Function

Finds the maximum value

Syntax:

max(argument, argument,...)

where,

argument can be sas variables, constants and expressions

Page 62: Day2_PPT

Example:

Data work.after;

Set work.before;

maximum =max (marks1,marks2,marks3);

Run;

Here,

The above program finds the maximum of the values in marks1, marks2 and marks3

variables.

Page 63: Day2_PPT

VAR Function

calculates the variance of the values

Syntax:

var(argument, argument,...)

where,

argument can be sas variables, constants and expressions

Page 64: Day2_PPT

Example:

Data work.after;

Set work.before;

variance = var (s1, s2, s3);

Run;

Here,

The above program calculate the variance of the values in s1, s2 and s3 variables.

Page 65: Day2_PPT

STD Function

Calculates the standard deviation of the values

Syntax:

std(argument, argument,...)

where,

argument can be sas variables, constants and expressions

Page 66: Day2_PPT

Example

Data work.after;

Set work.before;

stdev =std (s1, s2, s3);

Run;

Here,

The above program calculate the standard deviation of the values in s1, s2 and s3

variables.

Page 67: Day2_PPT

Converting Data with Functions

INPUT function

Explicitly convert the character values to numeric values

Syntax:

INPUT (source, informat );

Where.

source indicates the character variable, constant, or expression to be converted to a

numeric value

informat is the numeric informat to be specified. When choosing the informat, be sure

to select a numeric informat that can read the form of the values.

Page 68: Day2_PPT

Example

Data hrd.newtemp;

Set hrd.temp;

Test=input(saletest,comma9.);

Run;

Here,

• The function uses the numeric informat COMMA9. to read the values of the character

variable SaleTest. Then the resulting numeric values are stored in the variable Test.

Character Value Informat

2115233 7.

2,115,233 COMMA9.

Page 69: Day2_PPT

PUT Function

Explicitly convert the numeric values to character values

Format specified in the PUT function must match the data type of the source

Syntax:

PUT(source,format) ;

Where,

source indicates the numeric variable, constant, or expression to be converted to a

character value

format specifies the matching data type of the source

Page 70: Day2_PPT

The PUT function always returns a character string.

The PUT function returns the source written with a format.

The format must agree with the source in type.

Numeric formats right-align the result; character formats left-align the result.

If you use the PUT function to create a variable that has not been previously identified, it creates a character variable whose length is equal to the format width.

Page 71: Day2_PPT

Example

data hrd.newtemp;

set hrd.temp;

Assignment = put (site,2.) || '/‘ || dept;

run;

Here,

Because Site has a length of 2, its given 2. as the numeric format.

Put function converts the data type of site variable into character data type.

After that the value is concatenated and saved in the new variable assignment.

Page 72: Day2_PPT

Manipulating SAS Date Values with Functions

YEAR Function

Extracts the year value from a SAS date value

Syntax:

YEAR (date);

Where,

date is a SAS date value that is specified either as a variable or as a SAS date constant

Page 73: Day2_PPT

Example

Data hrd.temp98;

Set hrd.temp;

yr = year(startdate);

Run;

Here,

Year function extracts the year portion from the date value variable startdate and save it

in the new variable yr.

Page 74: Day2_PPT

QTR Function

Extracts the quarter value from a SAS date value

Syntax:

QTR (date) ;

Where,

date is a SAS date value that is specified either as a variable or as a SAS date

constant.

Page 75: Day2_PPT

Example

Data hrd.temp98;

Set hrd.temp;

quarter = qtr(startdate);

Run;

Here,

QTR function extracts the quarter value from the date value variable startdate and save

it in the new variable quarter.

Page 76: Day2_PPT

MONTH Function

Extracts the month value from a SAS date value

Syntax:

MONTH (date) ;

where,

date is a SAS date value that is specified either as a variable or as a SAS date

constant.

Page 77: Day2_PPT

Example

data hrd.nov99;

set hrd.temp;

mn = month(startdate);

Run;

Here,

Month function extracts the month value from the startdate variable and save it in the

new variable mn.

Page 78: Day2_PPT

DAY Function

Extracts the day value from a SAS date value.

Syntax:

DAY (date);

Where,

date is a SAS date value that is specified either as a variable or as a SAS date constant

Page 79: Day2_PPT

Example:

data hrd.nov99;

set hrd.temp;

days = day(date);

Run;

Here,

Day function extracts the day value from the date variable and save it in the new

variable days.

Page 80: Day2_PPT

WEEKDAY Function

Extract the day of the week from a SAS date value

Syntax:

WEEKDAY (date) ;

where,

date is a SAS date value that is specified either as a variable or as a SAS date constant

Page 81: Day2_PPT

Example

data hrd.nov99;

set hrd.temp;

weekday = weekday(date);

Run;

Here,

WEEKDAY function extracts the day of the week value from the date variable and save

it in the new variable weekday.

Page 82: Day2_PPT

The WEEKDAY function returns a numeric value from 1 to 7. The values represent the days of the

week.

Value equals Day of the Week

1 = Sunday

2 = Monday

3 = Tuesday

4 = Wednesday

5 = Thursday

6 = Friday

7 = Saturday

Page 83: Day2_PPT

MDY Function

Creates a SAS date value from numeric values that represent the month, day, and year

Syntax:

MDY ( month , day , year );

Where,

month can be a variable that represents the month, or a number from 1-12

day can be a variable that represents the day, or a number from 1-31

year can be a variable that represents the year, or a number that has 2 or 4 digits.

Page 84: Day2_PPT

Example:

data hrd.newtemp (drop=month day year);

set hrd.temp;

Date= mdy(month,day,year);

run;

Here,

A new variable date will be created by combining the values in the variables month,

day and year using the mdy function.

Page 85: Day2_PPT

DATE and TODAY Functions

Return the current date from the system clock as a SAS date value

Syntax:

DATE()

TODAY()

These functions require no arguments, but they must still be followed by parentheses.

Page 86: Day2_PPT

Example

data hrd.newtemp;

set hrd.temp;

EditDate = date();

run;

Here,

Date function returns the current system date and store it in a new variable editdate.

Page 87: Day2_PPT

TIME Function

Return the current time as a SAS time

Syntax:

time ( );

This function require no arguments, but it must still be followed by parentheses

Page 88: Day2_PPT

Example:

data hrd.newtemp;

set hrd.temp;

starttime = time();

run;

Here,

TIME function returns the current system time and store it in a new variable starttime.

Page 89: Day2_PPT

INTCK Function

Returns the number of time intervals that occur in a given time span

Used to count the passage of days, weeks, months, and so on

Counts intervals from fixed interval beginnings, not in multiples of an interval unit from the fromvalue

Partial intervals are not counted

For example :

WEEK intervals are counted by Sundays rather than seven-day multiples from the fromargument

MONTH intervals are counted by day 1 of each month

YEAR intervals are counted from 01JAN, not in 365-day multiples

Page 90: Day2_PPT

Syntax:

INTCK ('interval ‘ , from , to );

Where,

'interval' specifies a character constant or variable. The

value must be one of the following in the box:

from specifies a SAS date, time, or datetime value that

identifies the beginning of the time span

to specifies a SAS date, time, or datetime value that

identifies the end of the time span

The type of interval (date, time, or datetime) must match the

type of value in from

DAY DTMONTH

WEEKDAY DTWEEK

WEEK HOUR

TENDAY MINUTE

SEMIMONTH SECOND

MONTH

QTR

SEMIYEAR

YEAR

Page 91: Day2_PPT

Example:

Data work.anniv20;

SET flights.mechanics ( KEEP=id lastname firstname hired);

Years= INTCK ( 'year„ , hired , today() );

If years=20 and Month (hired) = Month (TODAY());

Proc Print Data = work.anniv20;

Run;

Here,

The program identifies mechanics whose 20th year of employment occurs in the

current month

It uses the INTCK function to compare the value of the variable Hired to the date on

which the program is run.

Page 92: Day2_PPT

INTNX Function:

Applies multiples of a given interval to a date, time, or datetime value and returns the resulting value

Used to identify past or future days, weeks, months, and so on

Syntax:

INTNX (‘ interval ‘ , start-from , increment< , 'alignment'> )

Where,

'interval' specifies a character constant or variable

start-from specifies a starting SAS date, time, or datetime value

increment specifies a negative or positive integer that represents time intervals toward the past or future

Page 93: Day2_PPT

'alignment' (optional) forces the alignment of the returned date to the beginning, middle, or end of the interval.

The type of interval (date, time, or datetime) must match the type of value in start-from and increment.

When specifying date intervals, the value of the character constant or variable that is used in interval must be one of the following in the box:

Optional alignment argument lets us specify whether the date value should be at the beginning, middle, or end of the interval.

When specifying date alignment in the INTNX function, use the following arguments or their corresponding aliases:

BEGINNING B

MIDDLE M

END E

SAMEDAY S

DAY DTMONTH

WEEKDAY DTWEEK

WEEK HOUR

TENDAY MINUTE

SEMIMONTH SECOND

MONTH

QTRSEMIYEAR

YEAR

Page 94: Day2_PPT

Example:

The statements above count five months from January, but the returned value depends

on whether alignment specifies the beginning, middle, or end day of the resulting

month.

If alignment is not specified, the beginning day is returned by default.

SAS Statement Date Value

MonthX = intnx ('month','01jan95'd,5,'b'); 12935 (June 1, 1995)

MonthX = intnx ('month','01jan95'd,5,'m'); 12949 (June 15, 1995)

MonthX = intnx ('month','01jan95'd,5,'e'); 12964 (June 30, 1995)

Page 95: Day2_PPT

DATEPART Function

To separate the date portion from date and time value

Syntax:

Datepart (variable);

where,

variable specifies the name of the variable

Page 96: Day2_PPT

Example

data hrd.newtemp;

set hrd.temp;

Date = datepart(saledate);

run;

Here,

Datepart function extracts the date portion from saledate, which is in date and time

format, and save it in new variable date .

Page 97: Day2_PPT

DATDIF Functions

Calculate the difference in days between two SAS dates

Accept dates that are specified as SAS date values

Syntax:

DATDIF( start_date , end_date , basis ) ;

Where,

start_date specifies the starting date as a SAS date value

end_date specifies the ending date as a SAS date value

basis specifies a character constant or variable that describes how SAS calculates the

date difference.

Page 98: Day2_PPT

Example

data hrd.newtemp;

set hrd.temp;

date= DATDIF(sdate,edate,‟ACT/ACT‟);

run;

Here,

DATDIF function gives the difference between two dates in number of days.

Page 99: Day2_PPT

YRDIF Function

Calculate the difference in years between two SAS dates

Accept start dates and end dates that are specified as SAS date values

Use a basis argument that describes how SAS calculates the date difference

Syntax

YRDIF ( start_date , end_date , ’basis’ )

where,

start_date specifies the starting date as a SAS date value

end_date specifies the ending date as a SAS date value

basis specifies a character constant or variable that describes how SAS calculates the date difference.

Page 100: Day2_PPT

Example:

data hrd.newtemp;

set hrd.temp;

date= YRDIF (sdate, edate, ‟ACT/ACT‟);

run;

Here,

YRDIF function gives the difference between the two dates in number of years.

Page 101: Day2_PPT

There are two character strings that are valid for basis in the DATDIF function and four character

strings that are valid for basis in the YRDIF function. These character strings and their meanings

are listed in the table below.

Character String Meaning Valid In DATDIF Valid In YRDIF

'30/360' specifies a 30 day month and a 360 day

year

yes yes

'ACT/ACT' uses the actual number of days or years

between dates

yes yes

'ACT/360' uses the actual number of days between

dates in calculating the number of

years (calculated by the number of

days divided by 360)

no yes

'ACT/365' uses the actual number of days between

dates in calculating the number of

years (calculated by the number of

days divided by 365)

no yes

Page 102: Day2_PPT

Modifying Character Values with Functions

SCAN Function:

Enables you to separate a character value into words and to return a specified word

Uses delimiters, which are characters that are specified as word separators, to separate a character string into words

Can specify as many delimiters as needed to correctly separate the character expression

The default delimiters are

blank . < ( + | & ! $ * ) ; ^ - / , %

Page 103: Day2_PPT

Syntax:

SCAN (argument , n , delimiters);

where,

argument specifies the character variable or expression to scan

n specifies which word to read

delimiters are special characters that must be enclosed in single quotation marks (' ').

Page 104: Day2_PPT

Example:

Data hrd.newtemp ( DROP=name);

Set hrd.temp;

LastName = SCAN (name ,1 , ‘ ‘);

FirstName =SCAN (name , 2 , ’ ‘ );

MiddleName =SCAN (name ,3 , ‘ ‘);

Run;

Here,

It creates three variables to store the employee's first name, middle name & last name which is stored in a variable called name

Page 105: Day2_PPT

SUBSTR Function:

Extract a portion of a character value

Replace the contents of a character value

When the function is on the right side of an assignment statement, the function returns the requested string

When the function is on the left side of an assignment statement, the function is used to modify variable values

Syntax:

SUBSTR (argument, position, <n>)

Where,

argument specifies the character variable or expression from which to extract substring.

position is the character position to start from.

n specifies the number of characters to extract. If n is omitted, all remaining characters are included in the substring.

Page 106: Day2_PPT

Example:

Data work.newtemp (DROP = middlename);

Set hrd.newtemp;

MiddleInitial = Substr ( middlename , 1 ,1 );

Run;

Here,

It extract the first letter of the MiddleName value to create the new variable MiddleInitial.

Data hrd.temp2 (DROP = exchange );

Set hrd.temp;

Exchange= Substr ( phone , 1 , 3 );

If exchange='622' Then Substr (phone , 1 , 3) = '433';

Run;

Here,

It searches the value 622 and replace with 433 in the variable phone

Page 107: Day2_PPT

SCAN Function Compared with SUBSTR Function:

SCAN extracts words within a value that is marked by delimiters

The SCAN function is best used when we

know the order of the words in the character value

the starting position of the words varies

the words are marked by some delimiter

SUBSTR extracts a portion of a value by starting at a specified location

SUBSTR function is best used when the exact position of the substring that is to be extracted from the character value is known

Substring does not need to be marked by delimiters

Page 108: Day2_PPT

TRIM Function:

Enables to remove trailing blanks from character values

Whenever the value of a character variable does not match the length of the variable, SAS pads the value with trailing blanks

So problem occurs while concatenating two variable values.

Trim the values of a variable and then assign these values to a new variable, the trimmed values are padded with trailing blanks again if the values are shorter than the length of the new variable

Syntax:

TRIM ( argument )

Where,

argument can be any character expression, such as

a character variable: trim ( address )

another character function: trim (left (id) )

Page 109: Day2_PPT

Examples:

Data hrd.newtemp ( Drop = address city state zip);

Set hrd.temp;

NewAddress = Trim (address) || ', ‘ || TRIM (city) || ', ‘ || zip;

Run;

Here,

A new variable called newaddress is created which contain the full address taken from three different variables called address, city and zip

The trailing spaces of the variables address and city are trimmed using trim function .

Page 110: Day2_PPT

CATX Function:

Enables to concatenate character strings, remove leading and trailing blanks, and insert

separators

Returns a value to a variable, or returns a value to a temporary buffer

Results of the CATX function are usually equivalent to those that are produced by a combination of the concatenation operator and the TRIM and LEFT functions

Syntax:

CATX ( separator , string-1 <,...string-n> )

Where,

separator specifies the character string that is used as a separator between concatenated strings

string specifies a SAS character string.

Page 111: Day2_PPT

Example:

Data hrd.newtemp ( DROP = address city state zip);

Set hrd.temp;

NewAddress = CATX ( ', ‘ , address , city , zip);

Run;

Here,

The above program uses CATX function to concatenate the variables address, city & zip into new variable newaddress and separates each values with comma.

Page 112: Day2_PPT

INDEX Function:

Enables to search a character value for a specified string

Searches values from left to right, looking for the first occurrence of the string

Returns the position of the string's first character

If the string is not found, it returns a value of 0

Is case sensitive

Syntax:

INDEX (source ,excerpt )

Where,

source specifies the character variable or expression to search

excerpt specifies a character string that is enclosed in quotation marks („ ').

Page 113: Day2_PPT

Example:

Data hrd.datapool;

Set hrd.temp;

If Index ( job , 'word processing„ ) > 0;

Run;

Here,

It is creating a new dataset with only those observations, in which the function locates the string „word processing‟ and returns a value greater than 0.

Page 114: Day2_PPT

FIND Function:

Search for a specific substring of characters within a character string specified

Returns the position of that substring

If the substring is not found in the string, returns a value of 0

Similar to the INDEX function

Page 115: Day2_PPT

Syntax:

FIND (string , substring , <modifiers> , < startpos> )

Where,

string specifies a character constant, variable, or expression that will be searched for substrings

substring is a character constant, variable, or expression that specifies the substring of characters to search for in string

modifiers is a character constant, variable, or expression that specifies one or more modifiers

startpos is an integer that specifies the position at which the search should start and the direction of the search

If startpos is not specified, FIND starts the search at the beginning of the string and searches the string from left to right.

If startpos is positive, FIND searches from startpos to the right

If startpos is negative, FIND searches from startpos to the left

The modifiers argument specifies one or more modifiers for the function, as listed below.

The modifier i causes the FIND function to ignore character case during the search. If this modifier is not specified, FIND searches for character substrings with the same case as the characters in substring.

The modifier t trims trailing blanks from string and substring

Page 116: Day2_PPT

Example:

Data hrd.datapool;

Set hrd.temp;

If Find ( job , „ word processing „ , „ t „ ) > 0;

Run;

Here,

It Creates a new dataset with only those observations, in which the function locates the string „word processing‟ and returns a value greater than 0.

Page 117: Day2_PPT

UPCASE Function:

Converts all letters in a character expression to uppercase

Syntax:

UPCASE (argument)

Where,

argument can be any SAS expression, such as a character variable or constant

Page 118: Day2_PPT

Example:

Data hrd.newtemp;

Set hrd.temp;

Job = UPCASE (job) ;

Run;

Here,

The above program converts the values of Job to uppercase and save into a new

dataset.

Page 119: Day2_PPT

LOWCASE Function:

Converts all letters in a character expression to lowercase

Syntax:

LOWCASE ( argument )

Where,

argument can be any SAS expression, such as a character variable or constant.

Page 120: Day2_PPT

Example:

Data hrd.newtemp;

Set hrd.temp;

Contact = LOWCASE ( contact);

Run;

Here,

The above program converts the values of variable contact to lowercase and store in a

new dataset.

Page 121: Day2_PPT

PROPCASE Function:

Converts all words in an argument to proper case (the first letter in each word is capitalized)

First copies a character argument and converts all uppercase letters to lowercase letters

Then converts to uppercase the first character of a word that is preceded by a delimiter

Uses the default delimiters unless specified

Syntax:

PROPCASE (argument , <delimiter (s)> )

Where,

argument can be any SAS expression, such as a character variable or constant

delimiter(s) specifies one or more delimiters that are enclosed in quotation marks. The default delimiters are blank, forward slash, hyphen, open parenthesis, period, and tab.

Page 122: Day2_PPT

Example:

Data hrd.newtemp;

Set hrd.temp;

Contact = PROPCASE(contact);

Run;

Here,

The program converts the values of variable contact into proper case and save into new

dataset.

Page 123: Day2_PPT

TRANWRD Function

Replaces or removes all occurrences of a pattern of characters within a character string

Translated characters can be located anywhere in the string

Syntax

TRANWRD (source, target, replacement)

where

source specifies the source string that you want to translate

target specifies the string that SAS searches for in source

replacement specifies the string that replaces target.

target and replacement can be specified as variables or as character strings

Page 124: Day2_PPT

Example:

Data work.after;

Set work.before;

name = TRANWRD (name, 'Miss', 'Ms.');

name = TRANWRD (name ,'Mrs. ','Ms.');

Run;

Here,

The above program change all occurrences of Miss or Mrs. to Ms. in the variable name.

Page 125: Day2_PPT

Translate Function

Replaces or removes all occurrences of a character within a character string

Syntax

TRANSLATE(source, < to 1-n>, < from 1-n>)

where,

source specifies the source string or name of the variable whose value is to be translated

to 1-n specifies the characters to be replaced with

from 1-n specifies the characters to be replaced

Page 126: Day2_PPT

Example:

Data work.after;

Set work.before;

name = TRANSLATE (name, „XYZ', „ABC.');

Run;

Here,

The above program will replace all the A‟s with X, B‟s with Y and C‟s with Z in the name

variable.

Page 127: Day2_PPT

Modifying Numeric Values with Functions

INT Function

Return the integer portion of a numeric value

Decimal portion of the INT function argument is discarded

Syntax:

INT (argument)

Where,

argument is a numeric variable, constant, or expression.

Page 128: Day2_PPT

Example:

Data work.after;

Set work.before;

Intamt = INT(amount);

Run;

Here,

The value of the variable amount is converted to integer and stored in a new variable.

Page 129: Day2_PPT

ROUND Function

Round values to the nearest specified unit

If a round-off unit is not provided, a default value of 1 is used

Syntax:

ROUND ( argument , round-off-unit );

Where,

argument is a numeric variable, constant, or expression.

round-off-unit is numeric and nonnegative.

Page 130: Day2_PPT

Example:

Data work.after;

Set work.before;

amt = ROUND(amount,.2);

Run;

Here,

value of the variable amount is rounded to 2 decimal points.

Page 131: Day2_PPT

SAS System Options

Are used to modify system options

Can place an OPTIONS statement anywhere in a SAS program to change the settings from that point onwards

OPTIONS statement is global ie: the settings remain in effect until modify them, or end SAS session

Syntax:

OPTIONS options;

Where,

options specifies one or more system options to be changed

The available system options depend on the host operating system

Page 132: Day2_PPT

NUMBER | NONUMBER and DATE | NODATE Options:

Page numbers and dates appear with output

NONUMBER & NODATE Options:

Syntax:

options nonumber nodate;

This suppresses the printing of both page numbers and the date and time in listing output

NUMBER & DATE Options:

Syntax:

options nonumber nodate;

This prints both page numbers and the date&time in listing output

Page 133: Day2_PPT

Example:

options nonumber nodate;

proc print data=clinic.admit ;

var id sex age height weight;

where age>=30;

run;

options date;

proc freq data = clinic.diabetes;

where fastgluc >= 300;

tables sex;

run;

Here,

Page numbers and the current date are not displayed in the PROC PRINT output

Page numbers are not displayed in the PROC FREQ output, either, but the date does

appear at the top of the page that contains the PROC FREQ report

Page 134: Day2_PPT

Output:

The SAS System

Obs ID Sex Age Height Weight

2 2462 F 34 66 152

3 2501 F 31 61 123

4 2523 F 43 63 137

5 2539 M 51 71 158

7 2552 F 32 67 151

8 2555 M 35 70 173

The SAS System

15:19 Thursday, September 23, 1999

Cumulative Cumulative

Sex Frequency Percent Frequency Percent

--------------------------------------------------------------------------

F 2 25.0 2 25.0

M 6 75.0 8 100.0

Page 135: Day2_PPT

PAGENO, PAGESIZE & LINESIZE Options:

PAGENO= option is used to specify the beginning page number for the report

If its not specified, the output is numbered sequentially throughout the SAS session, starting with

page 1

The PAGESIZE= option specifies how many lines each page of output should contain

The LINESIZE= option specifies the width of the print line for the procedure output and log

Observations that do not fit within the line size continue on a different line

Syntax:

options pageno = n pagesize =n linesize = n;

Where,

n is any number

Page 136: Day2_PPT

Example:

options pageno =1 pagesize=15 linesize =64 ;

proc print data = clinic.admit ;

run ;

Here,

The output pages are numbered sequentially throughout the SAS session

The page of the output that the PRINT procedure produces contains 15 lines

The length of the observations are no longer than 64 characters

Page 137: Day2_PPT

YEARCUTOFF Option:

This option specifies which 100-year span is used to interpret two-digit year values

When a two-digit year value is read, SAS interprets it based on a 100-year span that starts with the YEARCUTOFF= value

The default value of YEARCUTOFF= is 1920

The default value of yearcutoff can be changed using the YEARCUTOFF= option

The value of the YEARCUTOFF= system option affects only two-digit year values

Date

Expression

Interpreted As

12/07/41 12/07/1941

18Dec15 18Dec2015

04/15/30 04/15/1930

15Apr95 15Apr1995

Page 138: Day2_PPT

Syntax:

options YEARCUTOFF = YEAR;

Where,

YEAR is the first year of the 100 year span

Page 139: Day2_PPT

Example:

options yearcutoff =1950 ;

Here,

The 100-year span will be from 1950 to 2049

Using YEARCUTOFF=1950, dates are interpreted as shown below:

Date Expression Interpreted As

12/07/41 12/07/2041

18Dec15 18Dec2015

04/15/30 04/15/2030

15Apr95 15Apr1995

Page 140: Day2_PPT

OBS, FIRSTOBS options:

Used to specify the observations to process from SAS data sets

Can specify either or both of these options as needed

OBS= to specify the last observation to be processed

FIRSTOBS= to specify the first observation to be processed

FIRSTOBS= and OBS= together to specify a range of observations to be processed

Page 141: Day2_PPT

Syntax:

OPTIONS FIRSTOBS=n;

OPTIONS OBS=n;

Where,

n is a positive integer

For FIRSTOBS=, n specifies the number of the first observation to process

For OBS=, n specifies the number of the last observation to process

By default, FIRSTOBS=1. The default value for OBS= is MAX

Page 142: Day2_PPT

Example:

options firstobs =10 ;

proc print data =sasuser.heart ;

run ;

Assume the data set Sasuser.Heart contains 20 observations.

Here SAS reads the 10th observation of the data set first and reads through the last observation

(for a total of 11 observations)

options firstobs =1 obs =10 ;

proc print data =sasuser.heart ;

run ;

Here SAS reads 1st to 10th observation (for a total of 10 observations)

Page 143: Day2_PPT

To reset the number of the last observation to process, you can specify OBS=MAX in the

OPTIONS statement.

options obs = max;

This instructs any subsequent SAS programs in the SAS session to process through the last

observation in the data set being read

Obs and firstobs will be for the duration of current SAS session

Page 144: Day2_PPT

Viewing System Options:

OPTIONS procedure can be used to display the current setting of one or all SAS system options

The results are displayed in the log

Syntax:

PROC OPTIONS < option (s ) > ;

RUN;

Where, option(s) specifies how SAS system options are displayed

Example:

proc options;

Run;

This lists all SAS system options, their settings, and a description

Page 145: Day2_PPT

To list the value of one particular system option, use the OPTION= option in the PROC OPTIONS

statement as shown below:

proc options option = yearcutoff ;

run ;

If a SAS system option uses an equal sign, such as YEARCUTOFF=, you do not include the

equal sign when specifying the option to OPTION=.

Page 146: Day2_PPT

Importing Raw Data Files

Raw Data Files:

Is an external text file whose records contain data values that are organized in fields

Raw data files are non-proprietary and can be read by a variety of software programs

Page 147: Day2_PPT

Create Dataset From Raw Data Files:

1. Reference the SAS library to store the data set.

2. Write a DATA step program to read the raw data file and create a SAS data set.

To read the raw data file, the DATA step must provide the following instructions to SAS:

the location or name of the external text file

a name for the new SAS data set

a reference that identifies the external file

a description of the data values to be read.

Page 148: Day2_PPT

The table below outlines the basic statements that is used to import a Raw data file

To Do This Use This SAS Statement

Reference a SAS data library LIBNAME statement

Reference an external file FILENAME statement

Name a SAS data set DATA statement

Identify an external file INFILE statement

Describe data INPUT statement

Execute the DATA step RUN statement

List the data PROC PRINT statement

Execute the final program step RUN statement

Page 149: Day2_PPT

FILENAME statement:

Is used to refer a external file

Before reading raw data, it must be pointed to the location of the external file that contains the data

FILENAME perform the same function as LIBNAME:

They create a reference that temporarily point to a storage location for external data

Page 150: Day2_PPT

Syntax:

FILENAME < fileref > ‘ path ‘ ;

where ,

fileref is a name which associate with an external file containing data

The name must be 1 to 8 characters long

Should begin with a letter or underscore

Contain only letters, numbers, or underscores.

„path‟ is the location of the external file in the memory

Page 151: Day2_PPT

Example:

filename tests „ c:\users\ tmill.dat „ ;

Here,

The FILENAME statement temporarily associates the fileref Tests with the external file that

contains the data

Page 152: Day2_PPT

Referencing Aggregate Storage Location:

A FILENAME statement can also be used to associate a fileref with an aggregate storage

location, such as a directory that contains multiple external files

Page 153: Day2_PPT

Syntax:

FILENAME <fileref> “ directoryname ” ;

Where,

fileref is a name that associate with an external file

The name must be 1 to 8 characters long

Begin with a letter or underscore

Should contain only letters, numbers, or underscores.

directoryname is the full path or location of the directory.

Page 154: Day2_PPT

Example:

filename finance „ c:\users\personal\finances „ ;

Here,

The FILENAME statement temporarily associates the fileref Finance

with the aggregate storage directory C:\Users\Personal\Finances

Page 155: Day2_PPT

Infile Statement:

Is used to indicate the file which contains the Data

Syntax:

INFILE file-specification <options> ;

Where,

file-specification can take the form fileref to name a previously defined file reference or 'filename'

to point to the actual name and location of the file

options describes the input file's characteristics and specifies how it is to be read with the INFILE

statement.

Page 156: Day2_PPT

Example:

FILENAME test 'c: \ irs \ personal\refund.dat ';

INFILE test obs =100;

Here,

INFILE statement is used along with FILENAME statement

Test is the file reference which contains the data

Obs= option will import only the first 100 observations from the data

INFILE statement can also specify the complete path of a file instead of using the FILENAME

statement:

Example: INFILE „ c: \ irs \ personal \ refund.dat „ ;

Page 157: Day2_PPT

Input Statement:

Describes the fields of raw data to be read and placed into the SAS data set.

Specify the variable names and data types

Syntax:

INPUT variable <$> startcol - endcol . . . ;

where

variable is the SAS variable name assigned to the field

($) identifies the variable type as character (if the variable is numeric, then $ is not specified)

startcol represents the starting column for this variable

endcol represents the ending column for this variable.

Page 158: Day2_PPT

Example:

The following code reads data from the file below.

filename exer „ c : \ users\ exer.dat „ ;

data exercise ;

infile exer ;

input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14 ;

run ;

Page 159: Day2_PPT

Reading Column input or fixed field raw data files

It is the most common input style

Column input specifies actual column locations for values

In such files the values for each variable are in the same location in all records

When use column input, the data must be:

Standard character or numeric values

In fixed fields

The file below contains fixed fields;

Page 160: Day2_PPT

Syntax:

The complete syntax for importing a raw data file from the memory to SAS is:

LIBNAME statement

FILENAME statement

DATA statement

INFILE statement

INPUT statement

RUN statement

Page 161: Day2_PPT

Example:

libname libref 'SAS-data-library‘ ;

filename exercise 'c:\users\exer.dat „ ;

data exer ;

infile exercise ;

input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14 ;

Run ;

Here,

Libname creates library reference

Filename Reference a external file

Data set name a SAS data set to be created

Infile statement identifies a external file

Input statement describes the data from the external file

Page 162: Day2_PPT

Features of Column Input:

It can be used to read character variable values that contain embedded blanks.

input Name $ 1-25;

No placeholder is required for missing data. A blank field is read as missing and does not cause

other fields to be read incorrectly.

input Item $ 1-13 IDnum $ 15-19 Instock 21-22 Backord 24-25;

Page 163: Day2_PPT

Fields or parts of fields can be re-read.

input Item $ 1-13 IDnum $ 15-19 Supplier $ 15-16 InStock 21-22 BackOrd 24-25;

Fields do not have to be separated by blanks or other delimiters.

input Item $ 1-13 IDnum $ 14-18 InStock 19-20 BackOrd 21-22;

Page 164: Day2_PPT

Standard and Nonstandard Numeric Data:

Standard numeric data values can contain only

numbers

decimal points

numbers in scientific or E-notation (2.3E4, for example)

plus or minus signs

Nonstandard numeric data includes

values that contain special characters, such as percent signs (%), dollar

signs ($), and commas (,)

date and time values

data in fraction, integer binary, real binary, and hexadecimal forms

Page 165: Day2_PPT

The file below contains personnel information for a technical writing department of a small

computer manufacturer. The fields contain values for each employee's last name, first name, job

title, and annual salary.

The values for Salary contain commas. The values for Salary are considered to be nonstandard

numeric values.

Column input cannot be used to read these values.

Page 166: Day2_PPT

Choosing an Input Style:

Nonstandard data values require an input style that is more flexibility than column input

Formatted input can be used, which combines the features of column input with the ability to read both standard and nonstandard data.

When raw data that is organized into fixed fields is to be read, use:

Column input to read standard data only

Formatted input to read both standard and nonstandard data.

Page 167: Day2_PPT

Reading formatted input:

INPUT Statement:

General Form of the INPUT Statement Using Formatted Input is :

Syntax:

INPUT < column pointer-control > variable informat . ;

Where,

Column pointer-control positions the input pointer on a specified column

variable is the name of the variable that is being created

informat is the special instruction that specifies how SAS reads raw data.

Page 168: Day2_PPT

Column pointer controls:

The two column pointer controls are:

@n :- Moves the input pointer to a specific column number

+n :- Moves the input pointer forward to a column number that is relative to the current position

Page 169: Day2_PPT

@n Column Pointer Control:

It moves the input pointer to a specific column number

The @ moves the pointer to column n, which is the first column of the field that is being read

The Syntax for Input using @n column pointer control is:

INPUT @n variable informat.;

Where,

variable is the name of the variable that is being created

informat is the special instruction that specifies how SAS reads raw data

Page 170: Day2_PPT

Example:

input @9 FirstName $5. @1 LastName $7. @15 JobTitle 3. @19 Salary comma9. ;

Here,

The value for FirstName is read first, starting in column 9.

The lastname is read by taking the @ pointer to the 1st column

The jobtitle and salary is read from column 15 and column 19 respectively

Page 171: Day2_PPT

The +n Pointer Control:

It moves the input pointer forward to a column number that is relative to the current position

It moves the pointer forward n columns

The Syntax for Input using +n column pointer control is:

INPUT +n variable informat . ;

Where,

variable is the name of the variable that is being created

informat is the special instruction that specifies how SAS reads raw data

In order to count correctly, it is important to understand where the column pointer control is located after each data value is read

Page 172: Day2_PPT

Example:

input LastName $7. +1 FirstName $5. +5 Salary comma9. @15 JobTitle 3.;

Here,

Because the values for LastName begin in column 1, a column pointer control is not

needed

After LastName is read, the pointer moves to column 8

To start reading FirstName, which begins in column 9, move the column pointer control

ahead 1 column with +1

After reading FirstName, the column pointer moves to column 14

Moved column pointer ahead 5 columns from column 14 to read Salary

@n column pointer control is used to return to column 15 to read jobtitle

Page 173: Day2_PPT

INFORMAT

Used to Read data values in certain forms into standard SAS values

It determines how data values are read into a SAS data set

Informats are used to read numeric values that contain letters or other special characters

Informats must be used to read standard / non-standard data (numeric data containing letters or

special characters such as comma).

The numeric value $1,234.00 contains two special characters, a dollar sign ($) and a comma (,). Informat is used to read the value while removing the dollar sign and comma, and then store the resulting value as a standard numeric value

$ 1,000,000 is a non-standard numeric data as it contains a dollar sign($) and commas (,). In order to remove the dollar sign and commas before storing the numeric value 1000000 in a variable, read the value with COMMA11. Informat

Page 174: Day2_PPT

INFORMAT statement:

It specifies the informat for reading the values of the variables that are listed in the INFORMAT

statement

An INFORMAT statement in a DATA step permanently associates an informat with a variable

Standard SAS informats or previously defined user-written informats can be used

A single INFORMAT statement can associate the same informat with several variables, or it can

associate different informats with different variables

If a variable appears in multiple INFORMAT statements, SAS uses the informat that is assigned

last.

Page 175: Day2_PPT

Syntax:

INFORMAT <variablename> [$] informat<w>.<d>;

Where,

variablename is the name of the variable for which we are specifying the informat

$ Indicates a character informat; its absence indicates a numeric informat.

Informat – names the informat

w Specifies the informat width, which for most informats is the number of columns in the input data

d Specifies an optional decimal scaling factor in the numeric informats

If w and d values are omitted from the informat, SAS uses default values

Informat can be specified in INPUT statement also

Page 176: Day2_PPT

Some important informats:

$w. – reads standard character data.

w.d – reads standard numeric data

COMMAw.d – removes embedded characters

DATEw. – reads date values in the form ddmmmyy or ddmmmyyyy

DATETIMEw. – reads datetime values in the form ddmmyy hh:mm:ss.ss or ddmmmyyyy

hh:mm:ss.ss

DDMMYYw. – reads date values in the form ddmmyy or ddmmyyyy

TIMEw. – Reads hours, minutes, and seconds in the form hh:mm:ss.ss

Page 177: Day2_PPT

Example:

INFORMAT Birthdate Interview date9. ;

Here,

we are specifying a numeric informat for variables Birthdate & Interview

Page 178: Day2_PPT

Using Informat in Input Statement:

Informat is used in input statement to read the data in a particular format from the raw data file

Example:

input @9 FirstName $5. @1 LastName $7. +7 JobTitle 3. @19 Salary comma9.;

Here,

As FirstName and LastName is character in type, $ is used. 5 and 7 are the width of

FirstName and LastName respectively

As jobTitle is a numeric value which is 3 in width, 3. is used to read those values

Page 179: Day2_PPT

Comma9. is used to read the Salary value, as it contains non-standard numeric values

COMMAw.d informat is used to read numeric values and to remove embedded

Blanks, commas,dashes , dollar signs, percent signs, right parentheses, left parentheses

Output:

Obs FirstNa

me

LastName JobTitle Salary

1 DONNY EVANS 112 29996.63

2 ALISA HELMS 105 18567.23

3 JOHN HIGGINS 111 25309.00

4 AMY LARSON 113 32696.78

5 MARY MOORE 112 28945.89

6 JASON POWELL 103 35099.50

7 JUDY RILEY 111 25309.00

Page 180: Day2_PPT

Format

A Format is an instruction that SAS uses to write data values

It is used to control the written appearance of data values

In some cases, used to group data values together for analysis

SAS software offers a variety of character, numeric, and date and time formats

Can also create and store formats

Can permanently assign a format to a variable in a SAS data set

Can temporarily specify a format in a PROC step to determine the way the data values appear in

output

Page 181: Day2_PPT

Syntax:

FORMAT <variablename> [<$>] format<w>.<d>;

Where,

variablename specifies the name of the variable for which the format is used

$ Indicates a character format; its absence indicates a numeric format.

Format – names the format

w Specifies the format width, which for most formats is the number of columns in the input data.

d Specifies an optional decimal scaling factor in the numeric formats.

Formats always contain a period (.) as a part of the name.

Page 182: Day2_PPT

If omit w and d values from the format, SAS uses default values

The d value specified with format tells SAS to display that many decimal places, regardless of how many decimal places are in the data

Formats never change or truncate the internally stored data values.

If the format width is too narrow to represent a value, SAS tries to squeeze the value into the space available

Character formats truncate values on the right

Numeric formats sometimes revert to the BESTw.d format

SAS prints asterisks if adequate width is not specified

When a FORMAT statement is used in a procedure step, the formats that are associated with the variables remain in the effect only for that particular step. That is the format association is temporary and not permanent

Page 183: Day2_PPT

Some Important Formats:

$w. – writes standard character data.

w.d – writes standard numeric data

COMMAw.d – writes numeric values with commas and decimal points

DATEw. – writes date values in the form ddmmmyy or ddmmmyyyy

DATETIMEw.d – writes datetime values in the form ddmmmyy hh:mm:ss.ss or ddmmmyyyy hh:mm:ss.ss

DDMMYYw. – writes date values in the form ddmmyy or ddmmyyyy

TIMEw.d – writes time values as hours, minutes, and seconds in the form hh:mm:ss.ss

Page 184: Day2_PPT

Example:

To display the value 1234 as $1234.00 in a report, use the DOLLAR8.2 format

The WORDS22. format, which converts numeric values to their equivalent in words, writes the

numeric value 692 as six hundred ninety-two.

Page 185: Day2_PPT

Reading Variable-Length Records (Using PAD option):

Variable-Length Records:

Files that have a variable-length record format. They have an end-of-record marker after the last field in each record

Variable-length records have values that are shorter than others or that are missing

This can cause problems when trying to read the raw data into SAS data set

Page 186: Day2_PPT

input Dept $ 1-11 @13 Receipts comma8.;

Here,

The asterisk symbolizes the end-of-record marker and is not part of the data

INPUT statement specifies a field width of 8 columns for Receipts

In the third record, the input pointer encounters an end-of-record marker before the 8th column

Input pointer moves down to the next record in an attempt to find a value for Receipts

However, GRILL is a character value, and Receipts is a numeric variable. Thus, an invalid data error occurs, and Receipts is set to missing

Example:

Page 187: Day2_PPT

The PAD Option:

When using column input or formatted input to read fixed-field data in variable-length records,

PAD option can be used to avoid problems

The PAD option is used in the INFILE statement

It PAD‟s each record with blanks so that all data lines have the same length

Example:

infile receipts pad;

Here,

The pad option pads all the values of the variable receipts with spaces