Date post: | 12-Apr-2017 |
Category: |
Data & Analytics |
Upload: | ayapparaj-sks |
View: | 4,684 times |
Download: | 20 times |
Ayapparaj / Praxis Business School 1 Chapter 7
Chapter 7
Performing Conditional Processing
/* 2. Using the SAS data set Hosp, use PROC PRINT to list observations for
Subject values of 5, 100, 150, and 200. Do this twice, once using OR
operators and onceusing the IN operator. Note: Subject is a numeric
variable */
data a15009.hospques2;
set a15009.hosp;
where Subject = 5 or Subject = 100 or Subject = 150 or Subject = 200;
*using or function in where statement to give the condition given;
run;
/* OR */
data a15009.hospques22;
set a15009.hosp;
where Subject in(5,100,150,200);
*using or function in where statement to give the condition given;
run;
proc print data=a15009.hospques22;
run;
/*4. Using the Sales data set, create a new, temporary SAS data set
containing Region and TotalSales plus a new variable called Weight with
values of 1.5 for the North Region, 1.7 for the South Region, and 2.0 for
the West and East Regions. Use a SELECT statement to do this */
data a15009.salesques4;
set a15009.sales (keep = TotalSales Region);
*dataset is in the blog folder uploaded in Dropbox;
select;
*using select statement for giving the conditions and values associated
each condition;
when (Region = 'North') Weight = 1.5;
when (Region = 'South') Weight = 1.7;
when (Region = 'East') Weight = 2.0;
when (Region = 'West') Weight = 2.0;
otherwise;
end;
run;
Ayapparaj / Praxis Business School 2 Chapter 8
proc print data=a15009.Salesques4;
run;
/*6. Using the Sales data set, list all the observations where Region is
North and Quantity is less than 60. Include in this list any observations
where the customer name (Customer) is Pet's are Us */
data a15009.salesques6;
set a15009.sales;
where Region = 'North' and Quantity < 60;
*using where statement to specify condition for region and quantity;
run;
proc print data=a15009.Salesques6;
run;
Chapter 8
Performing Iterative Processing: Looping
/*2. Run the program here to create a temporary SAS data set (MonthSales):
data monthsales;
input month sales @@;
---add your line(s) here---
datalines;
1 4000 2 5000 3 . 4 5500 5 5000 6 6000 7 6500 8 4500
9 5100 10 5700 11 6500 12 7500
;
Ayapparaj / Praxis Business School 3 Performing Iterative Processing: Looping
Modify this program so that a new variable, SumSales, representing Sales to
date, is added to the data set. Be sure that the missing value for Sales in
month 3 does not result in a missing value for SumSales */
data a15009.monthsales;
input month sales @@;
sumsales+sales;
*using sum function above for both sum of sales and sum of sales;
retain sumsales 0;
*using retain function initializing sumsales variable to 0;
datalines;
1 4000 2 5000 3 . 4 5500 5 5000 6 6000
7 6500 8 4500 9 5100 10 5700 11 6500 12 7500
;
proc print data=a15009.monthsales;
run;
/*4. Count the number of missing values for the variables A, B, and C in
the Missing data set. Add the cumulative number of missing values to each
observation (use variable names MissA, MissB, and MissC). Use the MISSING
function to test for the missing values */
data a15009.missingdata;
input X $ Y Z A;
if missing(X) then misscounterX+1;
if missing(Y) then misscounterY+1;
if missing(Z) then misscounterZ+1;
if missing(A) then misscounterA+1;
*using sum function for finding number of missing values in each variable;
datalines;
M 56 68 89
F 33 60 71
M 45 91 .
F 35 35 68
M . 71 81
M 50 68 71
. 23 60 46
M 65 72 103
. 35 65 67
M 15 71 75
;
proc print data=a15009.missingdata;run;
Ayapparaj / Praxis Business School 4 Performing Iterative Processing: Looping
/*6. Repeat Problem 5, except have the range of N go from 5 to 100 by 5 */
data a15009.logy2;
do N=5 to 100 by 5;
*using do loop to initialize N variable and assign values from 5 to 100 in
increments of 5;
LogN=LOG(N);
output;
end;
run;
proc print data=a15009.logy2;run;
*8. Use an iterative DO loop to plot the following equation:
Logit = log(p / (1 – p))
Use values of p from 0 to 1 (with a point at every .05). Using the
following GPLOT
statements will produce a very nice plot. (If you do not have SAS/GRAPH
software, use PROC PLOT to plot your points).
goptions reset=all
ftext='arial'
htext=1.0
ftitle='arial/bo'
htitle=1.5
colors=(black);
symbol v=none i=sm;
title "Logit Plot";
proc gplot data=logitplot;
plot Logit * p;
Ayapparaj / Praxis Business School 5 Performing Iterative Processing: Looping
run;
quit;*/
data a15009.logitplot;
do p=0 to 1 by 0.05;
*using do loop to initialize p variable with values from 0 to 1 increasing
by 0.05;
Logit=LOG(p/(1-p));
output;
end;
run;
goptions reset=all ftext='arial' htext=1.0 ftitle='arial/bo' htitle=1.5
colors=(black);
symbol v=none i=sm;
title "Logit Plot";
proc gplot data=a15009.logitplot;
*using gplot procedure to make plotting;
plot Logit * p;
run;
quit;
/*10. You are testing three speed-reading methods (A, B, and C) by randomly
assigning 10 subjects to each of the three methods. You are given the
results as three lines of reading speeds, each line representing the
results from each of the three methods,
respectively. Here are the results:
250 255 256 300 244 268 301 322 256 333
267 275 256 320 250 340 345 290 280 300
350 350 340 290 377 401 380 310 299 399
Create a temporary SAS data set from these three lines of data. Each
observation should contain Method (A, B, or C), and Score. There should be
30 observations in this data set. Use a DO loop to create the Method
variable and remember to use a single trailing @ in your INPUT statement.
Provide a listing of this data set using PROC PRINT */
data a15009.reading;
do Method = 'MethodA','MethodB','MethodC';
*using do loop to initialize method variable with three values;
do SNo=1 to 10;
input score @;
output;end;end;
datalines;
250 255 256 300 244 268 301 322 256 333
267 275 256 320 250 340 345 290 280 300
Ayapparaj / Praxis Business School 6 Performing Iterative Processing: Looping
350 350 340 290 377 401 380 310 299 399
;
proc print data=a15009.reading noobs;
var Method score;
run;
/* 12. You place money in a fund that returns a compound interest of 4.25%
annually. You deposit $1,000 every year. How many years will it take to
reach $30,000? Do not use compound interest formulas. Rather, use “brute
force” methods with DO WHILE or DO UNTIL statements to solve this problem
*/
data a15009.inte;
interest = 0.0425;*initializing the interest variable;
total=1000; *initializing the total valuable;
do year = 1 to 100 by 1 until (total ge 30000);
*specifying values for year and condition for total to stop the loop when
the
value reaches 30000;
total=total+interest*total;
output;
end;
format total dollar11.2; *specifying format for total variable;
run;
proc print data=a15009.inte;run;
Ayapparaj / Praxis Business School 7 Performing Iterative Processing: Looping
/*14. Generate a table of integers and squares starting at 1 and ending
when the square value is greater than 100. Use either a DO UNTIL or DO
WHILE statement to accomplish this*/
*using DO UNTIL;
data a15009.square;
do Integers = 1 to 100 until (squares ge 100);
*using do until taking values from 1 to 100 and
specifying the condition for squares variable to
stop the loop when it reaches 100;
Squares = Integers * integers;
output;end;run;
proc print data=a15009.square;run;
*using IF STMT;
data a15009.square;
do Integers = 1 to 100 by 1;
Squares = Integers * integers;
if Squares gt 100 then leave;
output;end;run;
proc print data=a15009.square;run;
Ayapparaj / Praxis Business School 8 Chapter 9 Working with Dates
Chapter 9 Working with Dates
/* 2. Using the following lines of data, create a temporary SAS data set
called ThreeDates. Each line of data contains three dates, the first two in
the form mm/dd/yyyy descenders and the last in the form ddmmmyyyy. Name the
three date variables Date1, Date2, and Date3. Format all three using the
MMDDYY10. format. Include in your data set the number of years from Date1
to Date2 (Year12) and the number of years from Date2 to Date3 (Year23).
Round these values to the nearest year. Here are the lines of data (note
that the columns do not line up):
01/03/1950 01/03/1960 03Jan1970
05/15/2000 05/15/2002 15May2003
10/10/1998 11/12/2000 25Dec2005 */
*loading the values as a separate data set in permanent library;
data a15009.three;
input @1 Date1 mmddyy10.
@12 Date2 mmddyy10.
@23 Date3 date9.;
format Date1 Date2 Date3 mmddyy10.;
datalines;
01/03/1950 01/03/1960 03Jan1970
05/15/2000 05/15/2002 15May2003
10/10/1998 11/12/2000 25Dec2005
;
*accessing the values from the above dataset using set function
Using yrdif function to calculate difference between date1,date2 and date3
variables and rounding them using round command along with yrdif;
data a15009.threedates;
set a15009..three;
year12=round(yrdif(Date1,Date2,'Actual'));
year23=round(yrdif(Date2,Date3,'Actual'));
run;
proc print data=threedates;
run;
proc print data=a15009.threedates;run;
/* 4. Using the Hosp data set, compute the subject’s ages two ways: as of
January 1, 2006 (call it AgeJan1), and as of today’s date (call it
AgeToday). The variable DOB represents the date of birth. Take the integer
portion of both ages. List the first 10 observations */
data a15009.hospques4;
set a15009.hosp;
AgeToday=int(yrdif(DOB,today(),'Actual'));
Ayapparaj / Praxis Business School 9 Chapter 9 Working with Dates
AgeJan1=int(yrdif(DOB,'01Jan2006'd,'Actual'));
*using yrdif to find the difference between DOB and today’s date and int to
get only integer value of the difference;
run;
proc print data=a15009.hospques4;run;
/* 6. Using the Medical data set, compute frequencies for the days of the
week for the date of the visit (VisitDate). Supply a format for the days of
the week and months of the year */
*loading the medical dataset in the permanent library;
data a15009.medical;
input @1 VisitDate mmddyy10. @12 patno $3.;
datalines;
11/29/2003 879
11/30/2003 880
09/04/2003 883
08/28/2003 884
09/04/2003 885
08/26/2003 886
08/31/2003 887
08/25/2003 888
11/16/2003 913
11/15/2003 914
;
run;
data a15009.sevenques6;
set a15009.medical(keep=VisitDate); *taking medical data using set
function;
Days = weekday(VisitDate); *fetching weekday from visitdate variable;
run;
proc format; *providing format for days variable;
value days 1='Sun' 2='Mon' 3='Tue'
4='Wed' 5='Thu' 6='Fri'
7='Sat';
run;
Ayapparaj / Praxis Business School 10 Chapter 9 Working with Dates
title "Frequencies for Visit Dates";
proc freq data=a15009.sevenques6;
tables Days / nocum nopercent;
format Days days.; run;
/* 8. Using the values for Day, Month, and Year in the raw data below,
create a temporary SAS data set containing a SAS date based on these values
(call it Date) and format this value using the MMDDYY10. format. Here are
the Day, Month, and Year values:
25 12 2005
1 1 1960
21 10 1946 */
*storing the data in the permanent library;
data a15009.dataset;
input Day Month Year;
datalines;
25 12 2005
1 1 1960
21 10 1946
;
data a15009.sevenques8;
set a15009.dataset;
Date = mdy(Month,Day,Year);
*merging the day month year values into mmddyyyy format;
format Date mmddyy10.;
run;
proc print data=a15009.sevenques8;run;
/* 10. Using the Hosp data set, compute the number of months from the
admission date (AdmitDate) and December 31, 2007 (call it MonthsDec). Also,
compute the number of months from the admission date to today's date (call
it MonthsToday). Use a date interval function to solve this problem. List
the first 20 observations for your solution */
Ayapparaj / Praxis Business School 11 Chapter 9 Working with Dates
data a15009.sevenques10;
set a15009.hosp; *you can find hosp dataset in the blog folder uploaded in
the dropbox;
MonthDec = intck('month',AdmitDate,'31Dec2007'd);
*using intck function to find month difference between admitdate and
31Dec2007;
MonthToday = intck('month',AdmitDate,today());
run;
proc print data=a15009.sevenques10;
run;
/* 12. You want to see each patient in the Medical data set on the same day
of the week 5 weeks after they visited the clinic (the variable name is
VisitDate). Provide a listing of the patient number (Patno), the visit
date, and the date for the return visit */
data a15009.sevenques12;
set a15009.medical;
Followdate=intnx('month',VisitDate,5,'sameday');
*using intcx function to execute the specified condition;
run;
proc print data=a15009.sevenques12;
format Followdate VisitDate date9.;
run;
Ayapparaj / Praxis Business School 12 Chapter 10
Chapter 10
Subsetting and Combining SAS Data
Sets
/* 2.Using the SAS data set Hosp, create a temporary SAS data set called
Monday2002, consisting of observations from Hosp where the admission date
(AdmitDate) falls on a Monday and the year is 2002. Include in this new
data set a variable called Age, computed as the person’s age as of the
admission date, rounded to the nearest year */
data a15009.monday2002;
set a15009.hosp;
*you can take hosp dataset from blog folder uploaded in dropbox;
where year(AdmitDate) eq 2002 and
weekday(AdmitDate) eq 2;
*using where statement to specify the condition for AdmitDate
Weekday gives value of Monday as 2 as series starts from 1 for Sunday
Year(admitdate) gives year value of admitdate;
Age = round(yrdif(DOB,AdmitDate,'Actual'));
*using yrdif function to find difference between DOB and AdmitDate;
run;
title "Listing of MONDAY2002";
proc print data=a15009.monday2002;
run;
/* 4. Using the SAS data set Bicycles, create two temporary SAS data sets
as follows: Mountain_USA consists of all observations from Bicycles where
State is Uttar Pradesh and Model is Mountain. Road_France consists of all
Ayapparaj / Praxis Business School 13 Subsetting and Combining SAS Data Sets
observations from Bicycles where State is Maharastra and Model is Road
Bike. Print these two data sets */
data a15009.Mountain_USA a15009.Road_France;
set a15009.Bicycles;
*bicycle dataset is available in the blog folder uploaded in dropbox;
if State="Uttar Pradesh" and Model="Mountain Bike" then output
a15009.Mountain_USA;
else if State="Maharastra" and Model="Road Bike" then output
a15009.Road_France;
run;
*introducing two new datasets as a15009.Mountain_USA a15009.Road_France and
saving the observations to both the datasets based on the conditions given;
proc print data= a15009.Mountain_USA;run;
proc print data= a15009.Road_France;run;
/*6. Repeat Problem 5, except this time sort Inventory and NewProducts
first (create two temporary SAS data sets for the sorted observations).
Next, create a new, temporary SAS data set (Updated) by interleaving the
two temporary, sorted SAS data sets. Print out the result.*/
*sorting inventory dataset by model variable;
proc sort data=a15009.inventory out=a15009.inventory;
by Model;
run;
*sorting newproducts dataset by model variable;
proc sort data=a15009.newproducts out=a15009.newproducts;
by Model;
run;
*merging all the rows of both the datasets into a single dataset updated;
data a15009.updated;
set a15009.inventory a15009.newproducts;
by Model;
run;
title "Listing of UPDATED";
proc print data=a15009.updated;
run;
Ayapparaj / Praxis Business School 14 Subsetting and Combining SAS Data Sets
/* 8. Run the program here to create a SAS data set called Markup:
data markup;
input manuf : $10. Markup;
datalines;
Cannondale 1.05
Trek 1.07
;
Combine this data set with the Bicycles data set so that each observation
in the Bicycles data set now has a markup value of 1.05 or 1.07, depending
on whether the bicycle is made by Cannondale or Trek. In this new data set
(call it Markup_Prices),create a new variable (NewTotal) computed as
TotalCost times Markup */
*combining both datasets using manuf variable;
data a15009.combi;
merge a15009.bicycles (rename=(Manuf=manuf)) a15009.markup2;
by manuf;
newtotal=sum(unitcost); run;
proc print data=a15009.combi;run;
data a15009.markup2;
input manuf : $10. Markup;
datalines;
Atlas 1.05
Hero 1.07
;
*sorting markup2 data by manuf variable;
proc sort data=a15009.markup2;
by manuf;
run;
*sorting markup2 data by manuf variable here the thing to note is
manufacturer is the label name not variable name;
proc sort data=a15009.Bicycles;
by Manuf;
run;
Ayapparaj / Praxis Business School 15 Subsetting and Combining SAS Data Sets
/*10 Using the Purchase and Inventory data sets, provide a list of all
Models (and the Price) that were not purchased*/
*sorting the inventory dataset by Model Variable;
proc sort data=a15009.inventory out=a15009.inventory;
by Model;
run;
*sorting the purchase dataset by Model Variable;
proc sort data=a15009.purchase out=a15009.purchase;
by Model;
run;
*merging two datasets by Model variable
using "IN=" to filter the datsets to find model that were not purchased
along with the proce;
data a15009.notpurchased;
merge a15009.inventory(in=InInventory)a15009.purchase(in=InPurchase);
by Model;
if InInventory and not InPurchase;
keep Model Price;
run;
title "Listing of NOT_BOUGHT";
proc print data=a15009.notpurchased noobs;
run;
/*12 You want to merge two SAS data sets, Demographic and Survey1, based on
an identifier. In Demographic, this identifier is called ID; in Survey1,
the identifier is called Subj. Both are character variables.*/
*you can find both demographictwo and survey1 dataset in the blog folder
uploaded in dropbox;
proc sort data=a15009.demographictwo out=a15009.demographictwo;
by ID;
Ayapparaj / Praxis Business School 16 Subsetting and Combining SAS Data Sets
run;
proc sort data=a15009.survey1 out=a15009.survey1;
by Subj;
run;
data a15009.combine12ten;
merge a15009.demographictwo a15009.survey1 (rename=(Subj = ID));
by ID;
run;
proc print data=a15009.combine12ten ;
run;
/*14 Data set Inventory contains two variables: Model (an 8-byte character
variable) and Price (a numeric value). The price of Model M567 has changed
to 25.95 and the price of Model X999 has changed to 35.99. Create a
temporary SAS data set (call it NewPrices) by updating the prices in the
Inventory data set*/
data a15009.modelnew;
input Model $ Price;
datalines;
M567 25.95
X999 35.99
;
*sorting inventory data by model variable;
proc sort data=a15009.inventory out=a15009.inventory;
by Model;
run;
*updating inventory data with modelnew for price for the models;
data a15009.newprices;
update a15009.inventory a15009.modelnew;
by Model;
run;
proc print data=a15009.newprices ;
run;
Ayapparaj / Praxis Business School 17 Chapter 11
Chapter 11
Working with Numeric Functions
/* 2. Count the number of missing values for WBC, RBC, and Chol in the
Blood data set. Use the MISSING function to detect missing values */
data a15009.choly;
set a15009.blood;
*blood dataset is present in the blog folder uploaded in dropbox folder;
if missing(Gender) then MissG+1;
if missing(WBC) then MissWBC+1;
if missing(RBC) then MissRBC+1;
if missing(Chol) then MissChol+1;
*using sum function to find the number of missing values in each variable;
run;
proc print data=a15009.choly;run;
/* 4. The SAS data set Psych contains an ID variable, 10 question responses
(Ques1– Ques10), and 5 scores (Score1–Score5). You want to create a new,
temporary SAS data set (Evaluate) containing the following:
a. A variable called QuesAve computed as the mean of Ques1–Ques10. Perform
this computation only if there are seven or more non-missing question
values.
b. If there are no missing Score values, compute the minimum score
(MinScore), the maximum score (MaxScore), and the second highest score
(SecondHighest) */
data a15009.evaluate;
set a15009.psych;
*pysch dataset is present in the blog folder uploaded in dropbox folder;
if n(of Ques1-Ques10) ge 7 then QuesAve=mean(of Ques1-Ques10);
if n(of Score1-Score5) eq 5 then maxscore=max(of Score1-Score5);
if n(of Score1-Score5) eq 5 then Minscore=min(of Score1-Score5);
if n(of Score1-Score5) eq 5 then SecondHighest=largest(2,of Score1-Score5);
*using if then stmt to find max score min score secondhighest of the score
variables;
run;
proc print data=a15009.evaluate;run;
Ayapparaj / Praxis Business School 18 Working with Numeric Functions
/* 6. Write a short DATA _NULL_ step to determine the largest integer you
can score on your computer in 3, 4, 5, 6, and 7 bytes */
data _null_;
set a15009.cons;
put int3= int4= int5= int6= int7= ;
run;
*output will appear in the log window;
/* 8. Create a temporary SAS data set (Random) consisting of 1,000
observations, each with a random integer from 1 to 5. Make sure that all
integers in the range are equally likely. Run PROC FREQ to test this
assumption */
data a15009.random;
do i=1 to 1000;
x=int(rand('uniform')*5)+1 /*OR*/ x=int(ranuni(0)*5+1);output ;end;
*here am using rand function to get random value between 1 and 5;
run;
proc freq data=a15009.random;
tables x/missing;run;
/* 10. Data set Char_Num contains character variables Age and Weight and
numeric variables SS and Zip. Create a new, temporary SAS data set called
Convert with new variables NumAge and NumWeight that are numeric values of
Age and Weight, respectively, and CharSS and CharZip that are character
variables created from SS and Zip. CharSS should contain leading 0s and
dashes in the appropriate places for Social Security numbers and CharZip
should contain leading 0s Hint: The Z5. format includes leading 0s for the
ZIP code */
Ayapparaj / Praxis Business School 19 Working with Numeric Functions
data a15009.convert;
set a15009.char_num;
*char_num dataset is present in the blog folder uploaded in dropbox folder;
NumAge = input(Age,8.);
NumWeight = input(weight,8.);
*converting character variables weight and age into numeric variables;
CharSS = put(SS,ssn11.);
CharZip = put(Zip,z5.);
*converting numeric variables SS and Zip into character variables;
run;
proc print data=a15009.convert;
run;
/* 12. Using the Stocks data set (containing variables Date and Price),
compute daily changes in the prices. Use the statements here to create the
plot.
Note: If you do not have SAS/GRAPH installed, use PROC PLOT and omit the
GOPTIONS and SYMBOL statements.
goptions reset=all colors=(black) ftext=swiss htitle=1.5;
symbol1 v=dot i=smooth;
title "Plot of Daily Price Differences";
proc gplot data=difference;
plot Diff*Date;
run;
quit; */
data a15009.difference;
set a15009.stocks;
Diff = Dif(Price);
*using dif function to calculate the difference in thr price compared to
the previous value;
run;
goptions reset=all colors=(black) ftext=swiss htitle=1.5;
symbol1 v=dot i=smooth;
title "Plot of Daily Price Differences";
proc gplot data=a15009.difference;
plot Diff * Date;
run;quit;
Ayapparaj / Praxis Business School 20 Chapter 12
Chapter 12
Working with Character Functions
/*2 Using the data set Mixed, create a temporary SAS data set (also called
Mixed) with the following new variables:
a. NameLow – Name in lowercase
b. NameProp – Name in proper case
c. (Bonus – difficult) NameHard – Name in proper case without using the
PROPCASE function*/
data a15009.mixed;
set a15009.mixed;
*you can find mixed dataset in the blog folder uploaded in dropbox;
length First Last $ 15 NameHard $ 20;
NameLow = lowcase(Name);
*converting entire word into lower case;
NameProp = propcase(Name);
*making first letter of each work into uppercase;
First = lowcase(scan(Name,1,' '));
*converting entire word into lower case;
Last = lowcase(scan(Name,2,' '));
*converting entire word into lower case;
substr(First,1,1) = upcase(substr(First,1,1));
*converting entire word into upper case;
substr(Last,1,1) = upcase(substr(Last,1,1));
*converting entire word into upper case;
NameHard = catx(' ',First,Last);
*using catx making first letter of each work into uppercase,without using
propcase;
drop First Last;
run;
proc print data=a15009.mixed;
Ayapparaj / Praxis Business School 21 Working with Character Functions
run;
/*4 Data set Names_And_More contains a character variable called Height. As
you can see in the listing in Problem 3, the heights are in feet and
inches. Assume that these units can be in upper- or lowercase and there may
or may not be a period following the units. Create a temporary SAS data set
(Height) that contains a numeric variable (HtInches) that is the height in
inches.*/
data a15009.height;
set a15009.names_and_more(keep = Height);
Height = compress(Height,'INFT.','i');
*using compress function with "i" argument to remove characters and to
ignore cases;
/* Alternative
Height = compress(Height,' ','kd');
*keep digits and blanks;
*/
Feet = input(scan(Height,1,' '),8.);
Inches = input(scan(Height,2,' '),?? 8.);
*using scan function to extract values around the characters from the
variable
1 value before space and 2 for value after two for ;
if missing(Inches) then HtInches = 12*Feet;
else HtInches = 12*Feet + Inches;
drop Feet Inches;
run;
proc print data=a15009.height;
run;
/*6 Data set Study (shown here) contains the character variables Group and
Dose. Create a new, temporary SAS data set (Study) with a variable called
GroupDose by putting these two values together, separated by a dash. The
length of the resulting variable should be 6 (test this using PROC CONTENTS
or the SAS Explorer). Make sure that there are no blanks (except trailing
blanks) in this value. Try this problem two ways: first using one of the
CAT functions, and second without using any CAT functions*/
*Using CAT functions;
Ayapparaj / Praxis Business School 22 Working with Character Functions
data a15009.study;
set a15009.study;
length GroupDose $ 6;
GroupDose = catx('-',Group,Dose);
*here we are using catx to supply "-" as a separator between Group and Dose
variables;
run;
proc print data=a15009.study;
run;
*Without using CAT functions;
data a15009.study;
set a15009.study;
length GroupDose $ 6;
GroupDose = trim(Group) || '-' || Dose;
*using trim function to trim any space around thr values in Group and
Dose and join them and supply "-" in between the two values;
run;
proc print data=a15009.study;
run;
/*8 Notice in the listing of data set Study in Problem 6 that the variable
called Weight contains units (either lbs or kgs). These units are not
always consistent in case and may or may not contain a period. Assume an
upper- or lowercase LB indicates pounds and an upper- or lowercase KG
indicates kilograms. Create a new, temporary SAS data set (Study) with a
numeric variable also called Weight (careful here) that represents weight
in pounds, rounded to the nearest 10th of a pound. Note: 1 kilogram = 2.2
pounds*/
data a15009.study;
set a15009.study(keep=Weight rename=(Weight = WeightUnits));
Weight = input(compress(WeightUnits,,'kd'),8.);
*using compress(kd)inside input function to keep numerical values alone
from the string
and change if character variables present to numerical;
if find(WeightUnits,'KG','i') then Weight = round(2.2*Weight,.1);
*using find function with "i" argument to remove characters and to ignore
cases;
else if find(WeightUnits,'LB','i') then Weight = round(Weight,.1);
run;
proc print data=a15009.study;
run;
Ayapparaj / Praxis Business School 23 Working with Character Functions
/*10 Data set Errors contains character variables Subj (3 bytes) and
PartNumber (8 bytes). (See the partial listing here.) Create a temporary
SAS data set (Check1) with any observation in Errors that violates either
of the following two rules: first, Subj should contain only digits, and
second, PartNumber should contain only the uppercase letters L and S and
digits. Here is a partial listing of Errors:*/
data a15009.violates_rules;
set a15009.errors;
where notdigit(trim(Subj)) or
verify(trim(PartNumber),'0123456789LS');
*using notdigit to check any invalid character type value present
Here you should use trim function along with notdigit because
Without the TRIM function "not" function used here would
return the position of the first trailing blank in each of the character
values;
run;
proc print data=a15009.violates_rules;
run;
/*12 List the subject number (Subj) for any observations in Errors where
PartNumber contains an upper- or lowercase X or D.*/
proc print data=a15009.errors;
where findc(PartNumber,'XD','i');
*using findc function with argument "i" to find if the variable values
contain any case ;
var Subj PartNumber;
Ayapparaj / Praxis Business School 24 Working with Character Functions
run;
/*14. List all patients in the Medical data set where the word antibiotics
is in the comment field (Comment).*/
title "Observations Involving the word Antibiotics";
proc print data=a15009.medicaltwo;
where findw(Comment,'antibiotics');
*using findw function to find if the comment variable contain the word
"antiboitics" in its values;
run;
<< Medicaltwo dataset >>
/*16 Provide a list, in alphabetical order by last name, of the
observations in the Names_And_More data set. Set the length of the last
name to 15 and remove multiple blanks from Name. Note: The variable Name
contains a first name, one or more spaces, and then a last name.*/
data a15009.names;
set a15009.names_and_more;
length Last $ 15;
Name = compbl(Name);
*using compbl function to compress any blanks values present;
Last = scan(Name,2,' ');
*using scan function to take only second part of the name and store it the
last vsriable;
run;
*sorting the data in names dataset based on last variable values;
proc sort data=a15009.names;
by Last;
run;
proc print data=a15009.names;
Ayapparaj / Praxis Business School 25 Chapter 13 Working with Arrays
id Name;
var Phone Height Mixed;
run;
Chapter 13 Working with Arrays /* 1 Using the SAS data set Survey1, create a new, temporary SAS data set
(Survey1) where the values of the variables Ques1–Ques5 are reversed as
follows: 1 ?? 5; 2 ?? 4; 3 ?? 3; 4 ?? 2; 5 ?? 1.
Note: Ques1–Ques5 are character variables. Accomplish this using an
array.*/
*Data set SURVEY;
proc format library=a15009;
value $gender 'M' = 'Male'
'F' = 'Female'
' ' = 'Not entered'
other = 'Miscoded';
value age low-29 = 'Less than 30'
30-50 = '30 to 50'
51-high = '51+';
value $likert '1' = 'Strongly disagree'
'2' = 'Disagree'
'3' = 'No opinion'
'4' = 'Agree'
'5' = 'Strongly agree';
run;
data a15009.survey12;
set a15009.survey1;
array Ques{5} $ Q1-Q5;
*creating array with 5 values for storing variables from Q1 to Q5;
do i = 1 to 5;
Ques{i} = translate(Ques{i},'54321','12345');
*using do loop to create "i" variable with values from 1 to 5 and to
reverse the question using translate function inside the Ques array;
end;
drop i;
run;
proc print data=a15009.survey12;
run;
/* 2.Redo Problem 1, except use data set Survey2. Note: Ques1–Ques5 are
numeric variables.*/
data a15009.survey22;
set a15009.survey2;
array Ques{5} Q1-Q5;
Ayapparaj / Praxis Business School 26 Chapter 14 Displaying Your Data
do i = 1 to 5;
Ques{i} = 6 - Ques{i};
end;
drop i;
run;
proc print data=a15009.survey22;
run;
/* 4.Data set Survey2 has five numeric variables (Q1–Q5), each with values
of 1, 2, 3, 4, or 5. You want to determine for each subject (observation)
if they responded with a 5 on any of the five questions. This is easily
done using the OR or the IN operators. However, for this question, use an
array to check each of the five questions. Set variable (ANY5) equal to Yes
if any of the five questions is a 5 and No otherwise.*/
data a15009.any5;
set a15009.survey2;
array Ques{5} Q1-Q5;
Any5 = 'No ';
do i = 1 to 5;
if Ques{i} = 5 then do;
Any5 = 'Yes';
leave;
end;
end;
drop i;
run;
proc print data=a15009.any5;
run;
Chapter 14 Displaying Your Data /*1 List the first 10 observations in data set Blood. Include only the
variables Subject,WBC (white blood cell), RBC (red blood cell), and Chol.
Label the last three variables “White Blood Cells,” “Red Blood Cells,” and
“Cholesterol,” respectively. Omit the Obs column, and place Subject in the
first column. Be sure the column headings are the variable labels, not the
variable names.*/
proc print data=a15009.blood (obs=10) label;
Ayapparaj / Praxis Business School 27 Chapter 14 Displaying Your Data
id Subject;
var WBC RBC Chol;
label WBC = 'White Blood Cells'
RBC = 'Red Blood Cells'
Chol = 'Cholesterol';
run;
/*2 Using the data set Sales, create the report shown here:*/
proc sort data=a15009.sales out=a15009.sales;
by Region;
run;
proc print data=a15009.sales;
by Region;
id Region;
var Quantity TotalSales;
sumby Region;
run;
Ayapparaj / Praxis Business School 28 Chapter 15 Creating Customized Reports
/*4.List the first five observations from data set Blood. Print only
variables Subject, Gender, and BloodType. Omit the Obs column.*/
proc print data=a15009.blood(obs=5) noobs;
var Subject Gender BloodType;
run;
Chapter 15 Creating Customized Reports
/*2 Using the Blood data set, produce a summary report showing the average
WBC and RBC count for each value of Gender as well as an overall average.
Your report should look like this:*/
proc report data=a15009.blood nowd headline;
column Gender WBC RBC;
define Gender / group width=6;
Ayapparaj / Praxis Business School 29 Chapter 15 Creating Customized Reports
define WBC / analysis mean "Average WBC"
width=7 format=comma6.0;
define RBC / analysis mean "Average RBC"
width=7 format=5.2;
rbreak after / dol summarize;
run;
quit;
/*4 Using the SAS data set BloodPressure, compute a new variable in your
report. This variable (Hypertensive) is defined as Yes for females
(Gender=F) if the SBP is greater than 138 or the DBP is greater than 88 and
No otherwise. For males (Gender=M), Hypertensive is defined as Yes if the
SBP is over 140 or the DBP is over 90 and No otherwise. Your report should
look like this:*/
*Data set BLOODPRESSURE;
proc report data=a15009.bloodpressure nowd;
column Gender SBP DBP Hypertensive;
define Gender / Group width=6;
define SBP / display width=5;
define DBP / display width=5;
define Hypertensive / computed "Hypertensive?" width=13;
compute Hypertensive / character length=3;
if Gender = 'F' and (SBP gt 138 or DBP gt 88)
then Hypertensive = 'Yes';
else Hypertensive='No';
if Gender = 'M' and
(SBP gt 140 or DBP gt 90)
then Hypertensive = 'Yes';
else Hypertensive = 'No';
endcomp;
run;
quit;
Ayapparaj / Praxis Business School 30 Chapter 15 Creating Customized Reports
/*6 Using the SAS data set BloodPressure, produce a report showing Gender,
Age, SBP, and DBP. Order the report in Gender and Age order as shown
here:*/
proc report data=a15009.bloodpressure nowd;
column Gender Age SBP DBP;
define Gender / order width=6;
define Age / order width=5;
define SBP / display "Systolic Blood Pressure" width=8;
define DBP / display "Diastolic Blood Pressure" width=9;
run;
quit;
/*8 Using the data set Blood, produce a report like the one here. The
numbers in the table are the average WBC and RBC counts for each
combination of blood type and gender.*/
proc report data=a15009.bloodnew nowd headline;
column BloodType Gender,WBC Gender,RBC;
define BloodType / group 'Blood Type' width=5;
define Gender / across width=8 '-Gender-';
define WBC / analysis mean format=comma8.;
define RBC / analysis mean format=8.2;
run;
quit;