Christina Gustafsson
Introductory Guide to
SAS Enterprise Guide 6.1
Part I
1. Introduction and Main Windows .................................................................................................. 1
2. Help? ............................................................................................................................................. 5
3. SAS datasets ................................................................................................................................. 6
3.1. Libraries ................................................................................................................................... 6
3.2. How to Type in a SAS Dataset in EG ...................................................................................... 7
3.3. How to Open a Microsoft Excel File in EG ........................................................................... 10
3.4. How to Open a SAS Dataset in EG....................................................................................... 13
3.5. How to Import Data from a Text File .................................................................................... 13
3.6. How to Save Datasets ............................................................................................................ 14
4. About SAS Tasks and Results .................................................................................................... 15
5. About Modifying Sas Datasets .................................................................................................. 19
5.1. Creating Display Formats ...................................................................................................... 19
5.2. Transforming Variables and Creating Computed Variables .................................................. 21
5.3. Classifying Continuous Variables And Recoding Data Values ............................................. 27
5.4. Sorting And Filtering Data ..................................................................................................... 31
5.5. How to Merge Datasets ......................................................................................................... 36
5.6. Some Other Tasks to Modify SAS Datasets .......................................................................... 37
1
1. INTRODUCTION AND MAIN WINDOWS
SAS Enterprise Guide 6.1 (EG) is a menu-driven Windows client application. SAS Enterprise
Guide is a graphical user interface to SAS 9.4 (= Statistical Analysis Software) software. When you
use SAS Enterprise Guide, you are also using SAS software behind the scenes. As you access data
and build tasks, SAS Enterprise Guide generates SAS code. When you run a task, the generated
code is sent to SAS for processing and the results are returned to SAS Enterprise Guide.
The first time you start SAS EG, a welcoming window is opened. You can select one of the options
or just close the window.
If you close the welcoming window and you are using SAS EG for the first time, the windows are
arranged in the default application layout which consists of the Project Tree, the Server List
window and the workspace area. The workspace area is the main area of the SAS EG application
and it is used to display for example your data, task results and process flows. At first, the Process
Flow is the only window that is open in the workspace area. When you generate reports or open
datasets, other windows open in the workspace with a tabbed interface. You can use the recently
viewed items menu in the upper-left corner of the workspace to navigate between the windows.
If you customize your layout by closing, opening, or changing the position of windows, your
changes are saved when you exit SAS Enterprise Guide. If you want to restore the default layout,
you can select from the menu bar Tools > Options, and then click Restore Window Layout button
and then the OK button. If you close one of the application windows and want to restore it, you can
select the window name from the View menu.
The File menu contains commands relating to handling of files, such as open, save, import and
print. The Edit menu contains commands relating to text editing, such as copy and paste. With this
menu you can also protect or unprotect your dataset. With the commands in the View menu you can
select for example which windows are open in your project. With the commands in the Tasks menu
you can modify your dataset, perform statistical analysis and create graphs. The Program menu
contains commands relating to creating and running SAS code. With the commands in the Tools
menu you can change EG’s options and assign libraries. With the commands in the Help menu you
can access comprehensive help.
2
In SAS Enterprise Guide, the work that you do is saved in projects. A project is a collection of data,
tasks and results. You can save a project and its contents to any location by selecting File > Save
Project As. The Save window opens, then select a location for the project and type a nice name for
your project in the File name box, and click “Save”. Project file is saved with the extension .egp
and the project file icon looks like this:
The Project Tree displays a hierarchical view of the active project. You can use the Project Tree
window to manage the objects in your project. You can delete, rename, and reorder the items in the
project. You can also modify and re-run your tasks via the Project Tree window. In front of the
name of the task there is a little icon to show what kind of result the task generated. For example the
icon means that the result of the task One-Way Frequencies is (at least) one table and the icon
illustrates that with the task Linear Regression the results are graphs (and tables, too). The icon
represents a SAS dataset.
3
You can have one or more process flows in your project. You can copy or move objects between
process flows. When you create a new project, an empty Process Flow window opens. As you add
data, run tasks, etc., an icon for each object is added to the process flow. The process flow displays
the objects, any relationships that exist between the objects, and the order in which the objects have
been generated.
In the previous Process Flow window, the SAS dataset questionnairedata is the input of several
tasks, for example the One-Way Frequencies task, and the SAS Report -One-Way Frequencies
object is the output of the task. So, this Process Flow window shows perhaps a little clearer picture
how your tasks have proceeded than the Project Tree window.
4
In SAS Enterprise Guide you use tasks for example to transform data and to run specific analytical
procedures. You choose tasks by using the Task List window or by using menus. The Task List
window is hidden by default, but you can display it by clicking the Task List button in the
Resources section in the lower-left corner of the workspace.
The Task List has four different views. The Task Gallery view shows individual tasks, grouped by
categories. The Tasks by Category view lists individual tasks, grouped by type. The Tasks by
Name view lists individual tasks alphabetically and it also lists the SAS procedures that are related
to the task. The Task Templates view shows your task templates if you have done some.
The Task Status window displays messages about the status of tasks as they are processing. The
Task Status window is not displayed by default, but you can view it by selecting from the menu bar
View > Task Status.
Project Log and Log Summary windows/tabs contain information about your SAS tasks: which
procedures have been used and did everything go as planned.
5
2. HELP?
There are many ways that you can get help with your work in SAS EG. Comprehensive help is
available by selecting from the menu bar Help > SAS Enterprise Guide Help. From this window
you can browse through the table of contents and index, or you can use the search feature.
There is a context help area at the bottom of each task window that provides targeted help for the
options that you can select within a SAS task window. In some task windows there is an animated
help to assign the variables. And in each task window there is Help button that provides primary
help. Sometimes it is even necessary to use the manual of the SAS 9.4 software
(http://support.sas.com/documentation/).
6
3. SAS DATASETS
3.1. Libraries
Libraries are shortcut names for directory locations or folders on your computer. Libraries contain
for example SAS datasets and variable display formats. You need the library reference to define the
folder where the new SAS datasets will be saved in, or where the existing SAS datasets are. Some
libraries are defined by SAS, and some are defined by SAS EG. The first time you start the SAS
EG, two folders will be created automatically: My SAS Files and its subfolder 9.4. In SAS software
a permanent SAS library called SASUSER is the shortcut name for the folder 9.4. Beware of the
SAS library called WORK because it is a temporary library that exists only during your SAS
session.
You can also create your own libraries. When you are assigning an own library, first you have to
create a new folder on your computer (e.g. SASDATAT). When you are assigning your own library
reference, it stands only the SAS session that is running. So, every time you start SAS EG or your
project again, the temporary library reference has to be assigned again. You can start assigning our
own library by selecting Tools > Assign Project Library… A four step tool window opens. In the
first step, type in a name for your library (e.g. MYOWN), and then select the server (usually the
only option is Local). Click “Next” to enter the second step.
In the second step, specify the path of your folder. Click “Next”. The next two steps you can just
skip by clicking “Next”. Lastly click “Finish”.
7
You do not necessarily have to use your own libraries, because as a permanent library you can use
the SASUSER library.
3.2. How to Type in a SAS Dataset in EG
SAS is a bit clumsy tool to type in a dataset, but here are some ideas how you might proceed. To
type in a new dataset select from the menu bar File > New > Data. A two-step tool window opens.
In the first step, type in a name for your dataset (e.g. Data1) in the Name box, and at the Locations
zone select the library (e.g. MYOWN) to save your data in. Click “Next” to enter the second step.
8
In the second step here you can define properties of the variables. Variables have properties such as
name, type, length, label, informat, and format.
Click the default name A of the first variable (=column). You can type in a new short name for the
first variable in the Name box (for example age). The name of the variable can contain up to 32
characters, but it’s recommended to use shorter names. The characters you can use are mainly
letters (not Scandinavian) and numbers. Do not use special characters or space. Start the variable
name with a letter.
You can type a label of the variable in the Label box (for example Age in years) to explain what the
variable really measures. You can use whatever characters you want to.
Column (variable) Type can be either character or numeric. The column's type is important because
it affects how the column/variable can be used in a SAS Enterprise Guide task. It’s recommended to
use numeric variables where every you can use them. Numeric variables can contain only numeric
values (as the variable age does). When the Type is numeric, the Group can be either numeric,
currency, date, or time. If your variable is numeric, currency, date, or time, you can select from a
selection of Read-in formats, how to type in the data values. You can also select from a selection
on Display formats, how to show the data values in your dataset. Unfortunately, the formats for
date and time do not always work as they should. (So, when you are typing in your data, check that
your data looks the way it should. If it doesn’t, you can try to use another format.) Most of
statistical variables are numeric, so the values are usually integers or real numbers and you do not
have to select any special read-in or display formats. But for special purposes (usually for
categorical variables) you can even define your own formats.
9
Character variables can contain any values. When the Type is character, the Group is also
character.
10
When you have defined your variables, click “Finish”. A data grid opens and you can type in the
data values. For numerical variables the decimal separator is period. If you do not have any value
for a certain numeric variable just leave the cell empty, and automatically there will appear a period
in the cell as a mark of a missing value if the variable is a numeric one. Missing character values are
represented by a blank. In the data grid one column is one variable, and one row is usually one case
(= observation). On the top of a column you can see the variable name and also information about
the type or group of that variable.
When you have finished typing in your data values, you can protect the data by selecting Edit >
Protect Data. If you want to edit your data again, you have to unprotect it first by unselecting Edit
> Protect Data. If you want to edit the properties of your variables, you can do it by right-clicking
the variable name and then select Properties.
Now this dataset is ready, the name of the dataset is DATA1 and it is stored in a library called
MYOWN which refers to the folder called SASDATAT. The file extension for a SAS dataset is
.sas7bdat and the icon for a dataset looks like this: .
3.3. How to Open a Microsoft Excel File in EG
Before you convert the Excel file into a SAS dataset, it’s a good idea to maybe edit the Excel sheet
a bit. The first row of the Excel sheet might contain just the short names of the variables (but do not
use special characters and Scandinavian letters), and starting from the second row you then have the
variable values of your data. It’s also a good idea to look through the data: how many variables
there are, how many cases there are, are the variables character or numeric ones, …
You can start to open a data file in SAS Enterprise Guide simply by selecting File > Open > Data.
Then you can browse the folder structure of your computer to locate your data file. You can open
any type of data file that SAS Enterprise Guide can read (for example SAS data files (with
extension .sas7bdat, .sas7bvew, .sd7, .sd2), Microsoft Excel data files (.xls, .xlsx, .xslm, .xlsb) and
other kind of data files (.txt, .html)).
11
Double-click the file icon you want to open. A four step tool window “Import Data from …”
opens. You can use this task to convert non-SAS data into SAS dataset. In the first step you can
define a name for your SAS dataset and the destination library (e.g. SASUSER). The source data
file (Consumptions.xls) is the file that you are using as a basis for your output SAS dataset
(Consumptions_in_Europe). Click “Next” to enter to the next step.
In the second step you select which Excel worksheet contains the data. You can also define which
columns and rows in the Excel sheet has the data you want to convert in to a SAS dataset. If you
don’t use this option, all the rows and columns in the worksheet are going to be converted. If there
are variables names in the first row of the selected range in your Excel sheet, the names will be used
in SAS also, when you check “First row of range contains field names..”. If you suspect that the
variable names in Excel sheet do not comply the SAS variable naming conventions, you should also
check “Rename columns to comply…”.
In the third step you can browse through the variables and their properties. If you do not need some
variable in you SAS dataset, you can uncheck “Inc” (=Included). You can rename the variables and
change their properties by clicking first the variable you want to modify and then Modify. Date and
time data might cause some problems, so sometimes it’s necessary to modify the formats of those
variables. Sometimes you even have to modify your Excel sheet beforehand to get the data as it
should be.
The fourth step you can usually skip. Lastly click “Finish”. A data grid opens and it consists of the
observations in your Excel file. Check that the dataset looks at it should look. If it doesn’t, you
might have to edit your Excel sheet or/and import the file again with some modifications.
12
13
3.4. How to Open a SAS Dataset in EG
You can open an existing SAS dataset in EG simply by selecting File > Open > Data. In the Open
Data window select the folder where you have saved your dataset and the double-click the dataset
icon. If you have saved your data with the library reference (for instance SASUSER), then you can
open the dataset in the Open Data window by first clicking Servers and then open Local by
double-clicking the icon, open the Libraries by double-clicking, open the proper library by double-
clicking the library icon, and open your SAS dataset by double-clicking the icon.
3.5. How to Import Data from a Text File
If your data is saved as a text file (extension .txt), first you have to figure out how the data values
are arranged in your file. If the variable fields are aligned so that each variable field starts in the
same column for each row, then your text format is Fixed Width. If there is a specific character (for
instance such as comma, tab, semicolon) that separates the variable fields, then your text format is
Delimited.
You start to import the data from a text file by selecting File > Import Data. Again you have to
select the proper folder to access your text file. Double-click your text file icon and again a four
step tool window opens. The first step is similar to cases when you’re converting an Excel sheet to a
SAS dataset. In the first step you select your destination library and type in a name for your SAS
dataset
In the second step, in the Text format zone, you select either Delimited fields or Fixed columns
depending on how the variable values are arranged in your text file. If you select Delemited fields,
you need to define which character is the delimiter.
If you select Fixed columns, you need to specify column breaks by clicking on the ruler at the start
of each variable (except first one). If the variable names are included in your text file for instance
on the first row, you should check “File contains field names on record number”.
14
In the third step, you can browse through the variables and their properties. If you do not need some
variable in you SAS dataset, you can uncheck “Inc” (=Included). You can rename the variables and
change their properties by clicking first the variable you want to modify and then “Modify”.
In the last step, you can accept the default selections and click “Finish”.
3.6. How to Save Datasets
Whether you create your SAS dataset by typing it in, opening an Excel file, or importing a text file,
your data is saved automatically as a SAS dataset file in your destination library (e.g. SASUSER).
So, even if you do not save your projects, you still have your SAS datasets (unless you have been
using WORK library as your destination library).
You can also save the dataset in a SAS data file format (file extension is then .sas7bdat) or a
different file format (for example as an Excel file) by selecting from the menu bar File > Export >
Export [Datasetsname]. Then choose the folder where you want to save in the file and select file
type (for example as Microsoft Excel) and type your file a name. You can do the saving also via
project tree or process flow by right-clicking the dataset icon and then from the pop-up menu select
Export [Datasetsname] and etc.
15
4. ABOUT SAS TASKS AND RESULTS In SAS Enterprise Guide, task windows have a common format, so once you are familiar with
running one task, running other tasks is easy.
So, let’s perform some simple statistics. I have a SAS dataset questionnairedata. There are 322
cases (= rows) and 10 variables (= columns). The cases represent students in the University of
Vaasa. One case is one student. When the SAS dataset is open in the workspace, you can browse
the properties of the dataset by clicking Properties button.
When the Properties window opens, you can see on the left side a selection pane. When you click
Columns tab, you can browse the properties of the variables.
In the project tree or process flow, select your dataset to make it active by double-clicking the
dataset icon. (I know and maybe you know, too that) the variable math is measured on ordinal level,
so a nice way to present the distribution of math, is to create a frequency table. Select from the
main menu bar Tasks > Describe > One-Way Frequencies. You can do the same task by selecting
from the dataset’s own menu bar (or from the Task List window) Describe > One-Way
Frequencies. In the selection pane, select Data. From the Variables to assign list select math and
drag it to the Analysis variables role. (You can select the proper role also by using the arrow
button). Click Run to perform the task. Save saves the task without running it, Cancel cancels the
task and Help displays help for the task. (As you select options in the task window, SAS Enterprise
16
Guide generates the SAS code that creates the results. If you want to see the code, click the Preview
code button in the task window.)
Wait a while and you can view the result and other tabs related to One-Way Frequencies task:
17
If you want to present the distribution as a bar graph, too, you can modify your task by clicking
Modify Task button. In the selection pane, select Plots. Then check either Vertical or Horizontal
and click Run. You can replace your previous results by answering Yes to the next question.
And again you can view the results.
Tasks generate SAS Report output by default. You can change the output type by selecting Tools >
Options. In the selection pane, select Results General and check one of the Result Formats. If
you use the default format: SAS Report, then you can copy result tables and graphs into a word
processing software and the copied tables are editable, however, the graphs are uneditable.
You can also change the output type of an individual task by changing the task properties. In the
project tree, right-click the task which output format you want to change, then select Properties
from the pop-up menu. In the selection pane, select Results. Select Customize result formats,
styles, and behavior and then select the desired type and click “OK”.
18
19
5. ABOUT MODIFYING SAS DATASETS
Before you can modify your SAS dataset, you have to make your dataset active (for instance
double- click the data set icon in the project tree).
You can start modifying your data by selecting from the main menu bar Tasks > Data. Then select
a task from the pop-up menu. You can select the modifying task also from the Task List window.
Some tasks you can even select from the dataset’s menu bar.
5.1. Creating Display Formats
When you have categorical variables in your dataset, you usually have some code numbers or
characters to present the data values. For example, in the questionnairedata, I have originally typed
in integers 1 and 2 as code numbers for gender. So, at first, the gender column looked like this:
20
In order to explain what the code numbers or character mean in a dataset, you can create a special
display format for a specific variable.
You start creating your own display format by selecting Tasks > Data > Create Format. In the
selection pane, select Options tab. Type a name (for example gend) for the format in the Format
name box, in the Location to store format zone select always Currently assigned libraries and
the SASUSER library. In the Format type zone select now Numeric (because integers 1 and 2 are
numbers and they have been used as code numbers for gender).
In the selection pane, select Define formats. Click the upper New button and replace Label1 with
the word male and type in 1 in the Values box to explain that number code 1 means male. Again
click the upper “New” and replace Label2 with the word female and type in 2 in the Values box to
explain that code 2 means female. Click “ Run“.
21
Activate your dataset, unprotect it and in the dataset tab right-click variable name gender to link
up the format with the right variable. Select Properties. In the selection pane, select Formats tab
and from the Categories select User Defined and from the Formats select the GEND and click
“OK”. In this Formats list you can only such display formats where the type is the same than the
current variable’s type. If you have created a display format for a character variable, SAS
automatically adds the prefix $ sign in the format’s name.
And now the gender column looks like this:
So, first you create a display format and then you link it up with the proper variable.
5.2. Transforming Variables and Creating Computed Variables
When you want to transform your variables or want to create new variables based on some existing
variables, you can use Query Builder tool. With this tool, the original input dataset does not
change at all, but as a result you get a new dataset with all the modifications you have defined.
By default, no variables are included in the result set of a query. If you want to add all the variables
that exist in your input dataset to the result dataset, you can do that beforehand by selecting from
the main menu bar Tools > Options and select in the selection pane Query and then check
“Automatically add columns from input tables to result set of query” and click “OK”.
22
Let’s make a new SAS dataset that has all the same variables as questionnairedata and some new
variables, too. Let’s assume that you have beforehand defined that all the variables in the input
dataset are included in the result dataset. Select Tasks > Data > Query Builder. You can type in a
descriptive name for your query task in the Query name box (e.g. Transformations and new
variables). You can change the result dataset name and library by clicking “Change” and typing in
a descriptive name (e.g. modified_questionnairedata) and selecting the desired library. In the
Output name box you see the name of the selected library (e.g. SASUSER) in front of the
dataset’s name. On the left side of the window, you can see the variables in the input dataset (=
QUESTIONNAIREDATA, Query Builder uses shortened name t1 (= table1). All the variables in
the Select Data tab are included in the result dataset.
Let’s make a new variable which measures the percentage of attendance.. The new variable is based
on two existing variables: attend1 and attend2. Click “Computed Columns” button and “New”.
A six (or four) step tool window opens. Select Advanced expression to build a mathematical
expression and click “Next”.
23
In the second step, you can enter for instance mathematical expression by using functions,
constants, mathematical operators and variables. In the selection pane, you can see three main
items: Functions, Tables and Selected Columns. If you double-click “Functions”, you can look at
all the functions (in alphabetical order) that are available for building an expression. If you click a
function (e.g. ABS) once, you can see a short description of that function. If you double-click a
function, you are adding that function to your expression. If you click “ Tables”, you can browse all
the datasets that you can use in expression building. If you click “t1”, you can see all the variables
that you can use in expression building. If you double-click a variable, you are adding that variable
to your expression. In that case the “t1” name will appear in front of the variable’s name. If you
click “Selected Columns”, you can see all the variables in the output dataset.
You can also enter an expression by just typing if you know for example the names of the variables
and functions you need in your expression.
Now, let’s enter the expression into the expression box at the top of the window in order to
calculate percentage of attendance. The first version was created by double-clicking the variable
names on t1 table’s name list and the latter version was created by just simply typing the names of
the variables. The result is the same whichever method you use. Then click “Next”.
24
In the third step, you can rename your new variable by typing in a new name in Column box (e.g.
attperc) and you can type a Label, too (e.g. Percentage of attendance).You can Change the
display Format if you need to. Click “Next”.
In the fourth step, you can see properties of the new variable. Click “Finish” and the Computed
Columns window opens.
25
In the previous expression, only variables, arithmetical operators and a constant were used. Let’s
create a new expression with a function. Many statistical tests have some special assumptions about
the distribution of the variable to be analyzed (e.g. normal distribution). If the assumptions are not
satisfied, you usually have to try out some kind of transformation to satisfy the assumptions.
If you examine the distribution age, you can see, that the distribution is positively skewed (= right-
skewed). To get the distribution less skewed, you could try out to do a logarithmical transformation.
So, click “New”. Select Advanced expression and continue with “Next”. Double-click Functions
and double-click the function LOG10 to add it into the expression box. Function LOG10 requires
one numeric argument in round brackets, so replace <numvalue> with the variable age by first
double-clicking the table “t1” and then double-clicking age to add it into the expression box OR
you can just type age to replace <numvalue>.
Click “Next” and again you can rename your new variable (e.g. Log10Age). And click “Finish”.
The Computed Columns window opens and now you can close it by clicking “Close”.
In the Query Builder task window, you can see the two new variables. You can check what your
dataset looks like by clicking “Preview” and then selecting Results tab. Close the window. If you
have to do some corrections, you can edit you expression by double-clicking your new variable in
the Select Data tab.
26
When everything is correct, click “Run” and now you can look your new dataset and admire your
two new variables along with original ones.
So, if you want to create a new variable with a (mathematical) expression, you start to create a
query where you give your result dataset a nice name and you specify the library you want to save
the dataset in and you can give a name for your query task. Then you compute the new column
(variable) by advanced expression where you enter the expression and then you give a name (and
maybe label and display format, too) for your new variable.
27
5.3. Classifying Continuous Variables And Recoding Data Values
With Query Builder you can replace individual values with values that you specify, or classify
values of continuous variable into specified classes.
Again, you start by selecting Tasks > Data > Query Builder. You can type in a descriptive name
for your query task in the Query name box and you can change the result dataset name (and
library) by clicking “Change” and typing in a nice name (and selecting the desired library).
Let’s classify exam into four classes: 1) at most 9 , 2) 10 - 19, 3) 20 - 29 and 4) 30 or more. Click
“Computed Columns” button and click “New”.
A six (or four) step tool window opens. Select Recoded column and click “Next”.
In the second step, select the variable you want to classify or recode (e.g. exam). Click “Next”.
28
In the third step, you can specify the type for the new variable and you can specify the classes (or
the new code numbers) by clicking “Add”.
T
Now, a new Specify a Replacement window opens. You can choose whether you want to replace a
single value, a range of values, or values that meet a specific condition.
Since exam is a continuous variable, click the Replace a Range tab. For the first class, you don’t
have to set a lower limit, so check the Set an upper limit check box. Then type the upper limit
value (e.g. 9) or select the value from the drop-down list. In the With this value box, type the value
that you want to assign (e.g. 1) and click OK to add the replacement value to the list. So, now the
first class is done.
29
Click “Add” again to create the second class, click the Replace a Range tab, check the Set a lower
limit check box, type the lower limit value (e.g. 10), check the Set an upper limit check box, type
the upper limit value (e.g. 19), in the With this value box, type the value that you want to assign
(e.g. 2) and click OK.
To create the third class click “Add” again, click the Replace a Range tab, check the Set a lower
limit check box, type the lower limit value (e.g. 20), check the Set an upper limit check box, type
the upper limit value (e.g. 29), in the With this value box, type the value that you want to assign
(e.g. 3) and click OK. To complete the fourth class, click “Add” again, click the Replace a Range
tab, check the Set a lower limit check box, type the lower limit value (e.g. 30). For the last
(=fourth) class, you do not have to set an upper limit. In the With this value box, type the last value
that you want to assign (e.g. 4) and click OK.
30
In the third step window, you can see all your new classes. In the Other values zone, you can select
the type of value with which you want to replace all other values that are not included in the list of
replacements. The current value option uses the current value of the selected variable, A missing
value option replaces the values with a missing value and Specify a value option replaces the
values with a new value that you can type in the value box. When you have done all the
replacements that you want, click “Next”.
In the fourth step, you can specify a name and a label for your new variable (e.g. examcl, Classified
exam) and you can change display format, too. Click “Next”. In the fifth step, you can see
properties of the new variable. Click “Finish” and the Computed Columns window opens and you
can close it by clicking “Close”.
31
In the Query Builder task window, you can see the new variable. Again, you can check what your
dataset looks like by clicking “Preview” and then selecting Results tab. If you have to do some
corrections, you can edit you expression by double-clicking your new variable in the Select Data
tab. When everything is correct, click “Run” and now you can look your new dataset and admire
your new variable.
To replace a single value, you have to click the Replace Values tab in the third step. Click “Add”
and type in the value you want to replace (you can also select the value from the list of values by
clicking the button next to “Add”). Then enter the new value in the With this value box and click
OK. You can repeat these steps to add more replacement expressions. After you have created all of
the replacement expressions that you want, click “Next” and continue in the same way you did
earlier when classifying a continuous variable.
So, if you want to create a new variable with classification or recoding, you start to create a query
where you give your result dataset a nice name and you specify the library you want to save the
dataset in and you can give a name for your query task. Then you compute the new column
(variable) by recoded columns where you enter replacements for your data values and then you give
a name (and maybe label and display format, too) for your new variable.
5.4. Sorting And Filtering Data
If you want to sort the data by one or more variables from the original dataset or you want to create
a filter in order to select only the cases that meet certain criteria, select Tasks > Data > Filter and
Sort. (You can also filter or sort data with Query Builder.)
32
Filter and Sort window opens. In the Variables tab, you have to select always whether you are
filtering or sorting, which variables to include in the output dataset. You add or remove the
variables by using the arrow buttons. In the next example three variables (math, prog and random)
are not included in the output dataset, however, seven variables are selected to the result dataset.
When you want to sort your data, select the Sort tab. From the Sort by drop-down list, select the
variable (e.g. age) that you want to use to sort the cases and the select whether you want to sort the
values for the variable in ascending or descending order. To sort by additional variables, select
another variable (e.g. exam) from the Then by drop-down list and select whether you want to sort
the values for the variable in ascending or descending order. You can repeat this step to sort your
data by additional variables.
Then select the Results tab in order to give your Task and your new Output dataset nice names
(e.g. Sort_dataset and sorted_by_age_exam) and also to specify desired library to save the dataset
in (e.g. SASUSER).
33
Now, let’s filter your data. Let’s create an output dataset that is a subset of the original dataset in
the sense, that the output dataset contains only those cases that meet certain criteria. To just filter
your data, select Filter tab. You can use just one variable in a filter, or you can use multiple
variables to create for instance several comparison expressions. Let’s try out now the latter version
of filtering.
Let’s create an output dataset where the cases are such female students who are at most 25 years
old. Select a variable from the first drop-down list (e.g. age). Select an operator from the second
drop-down list (e.g. Less than or equal to). Enter a value in the third box (e.g. 25). You can add
additional filter conditions by selecting AND or OR from the last drop-down list (e.g. AND). Then
select a variable from the drop-down list (e.g. gender) and select an operator from the drop-down
list (e.g. Equal to). Enter a value in the box (e.g. 2). You can repeat these steps to create additional
filter elements. Click “OK” to create the output dataset. Then select the Results tab in order to give
your Task and your new Output dataset nice names (e.g. Filter_dataset and females_under_26)
and also to specify desired library to save the dataset in (e.g. SASUSER).
If you want to filter your dataset and need to add some computed variables to the dataset at the
same time, you should do the filtering with Query Builder. Select Tasks > Data > Query Builder.
You can (and your should) type in a descriptive name for your query task in the Query name box
and you can change the result dataset name (and library) by clicking “Change” and typing in a
descriptive name. Then select the Filter Data tab. Click the New Filter button.
A four step tool window opens. If you are using just one variable in a filter, select in the first step
Basic Filter and click “Next”. Let’s create such an output dataset where the cases are such students
whose math values are at least 4.
34
In the second step, select the variable on which you want to filter the data (e.g. math) and click
Next.
In the third window, select a comparison operator (e.g. Greater than or equal to) from the
Operator drop-down list. If the operator that you have selected requires a value, you can enter the
value in the Value box (e.g. 4). Click “Next” or “Finish”. Now, the output dataset contains only
those students whose knowledge of basic mathematics has been either good (4) or very good (5).
35
If you are using multiple variables in creating a filter, select in the first step Advanced Filter and
click “Next”. In the second step, you can build your filtering expression by using mathematical and
logical operators (e.g. >=, =, OR), functions, variables (e.g. math, age), and constant values (e.g. 4,
22). Click “Next” or “Finish”. Now, the output dataset contains only students whose knowledge of
basic mathematics has been either good (4) or very good (5) or whose age is exactly 22.
36
5.5. How to Merge Datasets
Let’s assume that you have several datasets which you want to merge into a single dataset to
perform statistical tasks.
If the datasets have the same variables but the cases are different, then you can concatenate the
datasets. Activate one of your datasets and select Tasks > Data > Append Table. The Tables to
append area lists the tables that will be concatenated, in the order in which they will be
concatenated. First, the only dataset is the one you have just activated. To add another dataset, click
“Add Table”. Open Data window opens. Select the location and the name of the dataset to add. If
you want to select several datasets at the same time, hold down the CTRL key while selecting the
datasets that you want to add. Click “Open” to add the dataset to the Tables to append area. In the
selection pane click Results tab, and in order to create a SAS dataset, select Data table. Click
“Browse” and type in a name for your new dataset and select the location to save the data.
If the datasets have the same cases but at least some of the variables are different, you can join the
datasets by using Query Builder. But at first, activate one of your datasets and then select Tasks >
Data > Query Builder. Again, you can change the names of this task and the result dataset. Click
Join Tables button. Table and Joins window opens. To add another dataset to this task, click Add
Tables button. Open Data window opens. Select the location and the name of the dataset to add. If
both of your datasets have a key variable (e.g. Sukunimi) to identify and correctly match cases from
the two datasets, then EG automatically joins the datasets in a way that the cases in the result
dataset include those cases for which the value of a key variable in the first dataset matches the
value of a key variable in the second dataset. This join type is called an Inner join. There are other
kind of joins, too.
37
When you have defined the join type, click “Close”. In the Query Builder task window, it’s always
a good idea to check what your dataset looks like by clicking “Preview” and then selecting Results
tab.
5.6. Some Other Tasks to Modify SAS Datasets
When you want to delete SAS datasets or your own display formats, the best way to do it is to use
Delete Datasets and Formats tool.
To standardize your variable values for example in order to look if there are any outliers, you can
use Standardize Data tool.
With Compare data tool, you can compare whether two SAS datasets have the same variables and
cases. With this tool, you can also compare whether two variables have the same values.
With Import SPSS (or Stata) file, you can easily open SPSS (or Stata) datasets.