Guide to Using SPSS

7/29/2019 Guide to Using SPSS

1/77

For additional SPSS help, visit http://www.youtube.com/mycsula

PASW Statistics 17 (SPSS 17)INFORMATION TECHNOLOGY SERVICES

California State University, Los Angeles

www.youtube.com/mycsulaVersion 1.0 Winter 2010

Table of Contents

IntroductionPart 1 .............................................................................................. 4

Downloading the Data Files ................................................................................... 4

Starting PASW Statistics ........................................................................................ 4

The PASW Statistics Window................................................................................ 5

Data View ................................................................................................................. 5

Variable View .......................................................................................................... 6

Creating a Data File ................................................................................................ 6

Defining Variables .................................................................................................. 6

Data Entry ............................................................................................................... 8

Descriptive Statistics ............................................................................................... 9

Frequency Analysis ................................................................................................. 9

Crosstabs ................................................................................................................ 11

Data Manipulation ................................................................................................ 12

Select Cases ............................................................................................................ 12

Splitting a File ....................................................................................................... 14

Find and Replace................................................................................................... 15

Reporting ............................................................................................................... 16

Appendix ................................................................................................................ 17

IntroductionPart 2 ............................................................................................ 18

Downloading the Data Files ................................................................................. 18

Null Hypothesis ..................................................................................................... 18

Statistical Tests ...................................................................................................... 19

Tests of Significance .............................................................................................. 19


2/77

PASW Statistics 17 (SPSS 17), Part 1 2

Correlations ........................................................................................................... 19

Paired-Samples T Test .......................................................................................... 20

Independent-Samples T Test ............................................................................... 22

Multiple Response Sets ......................................................................................... 23

Multiple Response Frequencies ........................................................................... 24

Multiple Response Crosstabs ............................................................................... 25

Data Manipulation ................................................................................................ 27

Copying and Pasting Variable Properties .......................................................... 27

Inserting Variables and Cases ............................................................................. 29

Deleting Variables and Cases ............................................................................... 30

Merging Data Files ................................................................................................ 30

Creating the Data File for Merging ..................................................................... 30

Inputting the Data in Variable View ................................................................... 30

Merging the Data Files ......................................................................................... 32

Appendix ................................................................................................................ 35



Simple Regression ................................................................................................. 37

Scatter Plot ............................................................................................................ 37

Predicting Values of Dependent Variables ......................................................... 39

Predicting This Years Sales with Simple Regression Model ........................... 41

Multiple Regression .............................................................................................. 43

Predicting Values of Dependent Variables ......................................................... 43

Predicting This Years Sales with Multiple Regression Model ........................ 45

Data Transformation ............................................................................................ 46

Computing ............................................................................................................. 46

Polynomial Regression.......................................................................................... 47


3/77


Regression Analysis .............................................................................................. 48

Analyzing the Results ........................................................................................... 48

Chart Editing ......................................................................................................... 49

Adding a Line to the Scatter Plot ........................................................................ 49

Manipulating the Scales on X- and Y-axes ......................................................... 50

Adding a Title to the Chart .................................................................................. 52

Adding Colors to the Chart .................................................................................. 53

Filling a Background Color.................................................................................. 54



Chi-Square ............................................................................................................. 55

Chi-Square Test for Goodness-of-Fit .................................................................. 55

With Fixed Expected Values ................................................................................ 55

With Fixed Expected Values and within a Contiguous Subset of Values ........ 58

With Customized Expected Values ..................................................................... 59

One-Way Analysis of Variance ............................................................................ 60

Post Hoc Tests ....................................................................................................... 63

Two-Way Analysis of Variance ........................................................................... 65

Importing/Exporting Microsoft Excel and PowerPoint .................................... 68

Using Scripting for Redundant Statistical Analyses .......................................... 71


4/77


Introduction Part 1PASW stands for Predictive Analytics Software. This program can be used to analyze data

collected from surveys, tests, observations, etc.It can perform a variety of data analyses andpresentation functions, including statistical analysis and graphical presentation of data. Amongits features are modules for statistical data analysis. These include 1) descriptive statistics, suchas frequencies, central tendency, plots, charts, and lists; and 2) sophisticated inferential andmultivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis,cluster analysis, and categorical data analysis. PASW Statistics is particularly well-suited forsurvey research, though by no means is it limited to just this topic of exploration.

This handout (Descriptive Statistics) introduces basic skills necessary to run PASW Statistics. Itincludes how to create a data file and run descriptive statistics. It is especially tailored to answerthree research questions formulated in the sample survey questionnaire, eventually giving usersan overview of how PASW Statistics can be used for survey research. The three researchquestions formulated in the sample survey are as follows:

1. What kind of computer do people prefer to own?

2. What color do people prefer for their computer?3. Is computer color preference different between genders?

Downloading the Data FilesThis handout includes sample data files that can be used for hands-on practice. The data files arestored in a self-extracting archive. The archive must be downloaded and executed in order toextract the data files.

The data files used with this handout are available for download athttp://www.calstatela.edu/its/training/datafiles/pasw17p1.exe.

Instructions on how to download and extract the data files are available athttp://www.calstatela.edu/its/docs/download.php.

Starting PASW StatisticsThe following steps are for starting PASWStatistics 17 using the computers in the OpenAccess Labs (OALs). The steps for startingthe program at home or on other computersmay be slightly different.

To start PASW Statistics 17:1. Click the Start button, point to All

Programs, point to Course Work,point to SPSS Inc, point to PASWStatistics 17, and select PASWStatistics 17. The PASW Statistics 17dialog box opens (see Figure 1).

2. Click the Cancel button to create anew data file.

Figure 1 - PASW Statistics 17 Dialog Box
http://www.calstatela.edu/its/training/datafiles/pasw17p1.exehttp://www.calstatela.edu/its/training/datafiles/pasw17p1.exehttp://www.calstatela.edu/its/docs/download.phphttp://www.calstatela.edu/its/docs/download.phphttp://www.calstatela.edu/its/docs/download.phphttp://www.calstatela.edu/its/training/datafiles/pasw17p1.exe


5/77


The PASW Statistics WindowThe Data Editor window opens with two view tabs:Data View andVariable View. TheDataViewis used for data input, and the VariableView is used for adding variables and definingvariable properties (e.g., modifying attributes of variables). As displayed in Figure 2, the DataEditor window includes several components. The Title bar displays the name of the current fileand the application. The Menu bar allows you to access various commands that are groupedaccording to function. The Toolbar provides shortcuts to commonly used menu commands.

Figure 2 - PASW Statistics Data Editor Window

DATA VIEWWhen PASW Statistics is launched, the Data Editor window opens inData View, which lookssimilar to a MicrosoftExcel spreadsheet (which is just an array of rows and columns). Thedifference is that the rows and columns inData View are referred to as cases and variables,respectively (see Table 1).

Table 1 - Elements in Data View

Element DescriptionVariable Each column represents a variable. Any survey questionnaire item or test

item can be a variable. Commonly defined variable types are numeric orstring. When defining variables as numeric, users need to specify decimalplaces. Variable names can be up to 256 characters long and must startwith a letter. Make variable names meaningful and easily recognizable.

Case Each row represents a case. The participants in the study can be cases. Forexample, if 100 participants are involved in your study, then 100 cases (orrows) of information should be generated. Responses to the questionitems should be entered consistently from left to right for each participant.


6/77


Cell A cell is an intersection between cases and variables. Each response to asurvey question should be entered in a cell for each participant accordingto the defined variable data types.

VARIABLE VIEWVariable View is where variables are defined by assigning variable names and specifying theattributes, such as data type (String, Date, Numeric, etc.), value labels, and measurementscales (Nominal, Ordinal, or Scale). Users can think ofVariableView as the backbonestructure for theDataView; data cannot be entered nor viewed without first defining variables inVariableView (see Table 2).

Table 2 - Elements in Variable View

Element Description

Variable Name PASW Statistics will initially give a default variable name (var00001) thatusers can change. It is recommended to assign a brief and meaningfulname to variables (e.g., Name, Gender, and GPA).

Variable Type The variable type determines how the cases are entered. Generally, text-based characters are of String type and number-based characters are ofNumeric type. For example, if a user has a variable called Name,then its variable type should be String. Similarly, a variable namedGPA should be a Numeric type with (normally two) decimal places.

Value Labels Value labels allow users to describe what the variable name stands for.For example, if a variable has been defined as Fav, most likely othersmay not know what it stands for. To avoid misinterpretation, value labelscan be utilized to clearly define variable names.

Creating a Data FileCreating a new PASW Statistics data file consists of two stages: (1) defining variables and (2)entering the data. Defining the variables involves multiple processes and requires careful

planning. Once the variables have been defined, the data can then be added.

DEFINING VARIABLESFirst, variable names based on your research questionnaire need to be assigned. If variable namesare not assigned, PASW Statistics will assign default names that may not be recognizable.Second, the Type attribute should be specified for each variable. If necessary, assign labels tovalues to help all users of the file understand the data better.

To define variables (example):1. Click the Variable Viewtab at the lower left corner of the Data Editor window (see

Figure 3).2. Type [Name] in the first cell under theName column and press the [Enter] key.

3. Under the Type column, click the ellipses button . The Variable Type dialog box opens

(see Figure 4).4. Select the String option.5. Click the OK button.


7/77


Figure 3 - Variable View Tab

Figure 4 - Variable Type Dialog Box

6. Type [Gender] in row two under theName column.

7. Activate the cell in row two under theDecimalscolumn and change the entry to 0using the spin box.8. Type [Whatisyourgender?] in row two under theLabelcolumn.9. Click the ellipses button in row two under the Values column. The Value Labels dialog

box opens (see Figure 5).10.Type [1] in the Value: box.11.Type [female] in theLabel: box.12.Click the Add button.13.Repeat steps 10-12 using a value of[2] and a label of[male].

Figure 5 - Value Labels Dialog Box (Gender)

14.Click the OK button.15.Type [GPA] in row three under theName column and press the [Enter] key.16.Type [Age] in row four under theName column.17.Click row four under theDecimalscolumn and change the entry to 0 using the spin

box.

18.Type [Whatisyourage?] in row four under theLabelcolumn.19.In row four under the Values column, click the ellipses button. The Value Labels dialogbox opens (see Figure 6).

20.Type [1] in the Value:box.21.Type [19 or younger] in theLabel: box.22.Click the Add button.


8/77


23.Repeat steps 20-22 for values [2] through [5] and label them as shown in Table 3 (youmay also refer back to the sample questionnaire). See Figure 6 for the results.

24.Click the OK button.

Table 3 - Value Labels

Value Label

2 20-23

3 24-27

4 28-31

5 32 or over

Figure 6 - Value Labels Dialog Box (Age)

DATA ENTRYAfter defining the variables, users can enter data for each case. If variables are defined as havinga Numeric data type, then numeric data should be entered. PASW Statistics will only acceptnumeric digits (0-9) for a Numeric data type. If variables are defined as String data, anykeyboard character can be entered.

To enter data:1. Click theData View tab at the lower left corner of the Data Editor window (see Figure

7).2. Click in a cell and type the corresponding data. The entry will also appear in the Cell

Editor (see Figure 8).

Figure 7 - Data View Tab

Cell EditorCell Editor

Figure 8 - Data Entry


9/77


Descriptive StatisticsAfter data has been entered, users may begin analyzing the data by usingdescriptive statistics.Descriptive statistics are the most commonly used statistics for summarizing data frequency or

measures of central tendency (mean,median,and mode).

Research Question # 1What kind of computer do people prefer to own?

FREQUENCY ANALYSISWe can usefrequency analysis to answer the first research question. Frequency analysis is adescriptive statistical method that shows the number of occurrences of each response chosen bythe respondents. When using frequency analysis, PASW Statistics can also calculate the mean,median, and mode to help users analyze the results and draw conclusions. The followingexample will use a frequency analysis to answer Research Question # 1: What kind of computerdo people prefer to own? using the data collected from our sample survey (see Appendix).

To perform frequency analysis:

1. Click the Open button on the Data Editor toolbar. The Open Data dialog boxopens.

2. Locate and open the Part 1.sav file.3. Click the Analyze menu, point to Descriptive Statistics, and select Frequencies(see

Figure 9). The Frequencies dialog box opens (see Figure 10).4. Select the variable(s) desired to beanalyzed. In this case, select the variable Computer

Owned from the list box on the left.

5. Click the transfer arrow button . The selected variable is moved to the Variable(s): listbox.

6. Select theDisplay frequency tables check box if necessary.

Figure 9 - Frequency Analysis from Analyze Menu

Figure 10 - Frequencies Dialog Box

7. Click the Statistics button. The Frequencies: Statistics dialog box opens (see Figure11).

8. Select theMean,Median, andMode check boxes in the Central Tendency section; selectthe Std. deviation check box in theDispersion section.


10/77


Figure 11 - Frequencies: Statistics Dialog Box

9. Click the Continue button. This returns you to the Frequencies dialog box.10.Click the OK button. An Output Viewer window opens and displays the statistics and

frequency table (see Figure 12). The columns of the table Computer Owned display theFrequency, Percent, ValidPercent, and CumulativePercent for each differenttype of computer owned.

Figure 12 - Frequencies Output

The measures of central tendency (mean, median, and mode) can be used to summarize varioustypes of data. Mode can be used for nominal data, such as computer type, computer color,

ethnicity, etc. Mean or median can be usedfor interval/ratio data, such as test scores, age, etc.The mean is also useful for data with a skewed distribution.

Answer to Research Question # 1What kind of computer do people prefer to own?


11/77


Answer: IBM or CompatibleExplanation: Look at question # 7 in the Sample Survey.Notice that option # 3 is IBM orCompatible. In the output Statistics table, the mode for Computer Owned is 3, which isIBM or Compatible. In addition, the frequency analysis results for Computer Ownedindicates that 49 out of 80 people own an IBM or Compatible computer. This can beconsidered their preference.

Research Question # 2What color do people prefer for their computer?

CROSSTABSCrosstabs are used to examine the relationship between two variables. To answer the secondresearch question, users will need to analyze two variables: ComputerOwned and Color(which indicates color preference). Using crosstabs will show the intersection between these twovariables and reveal the computer type and color preferred by mostpeople.

To perform a crosstabs analysis:

1. InData View, click the Analyze menu, point to Descriptive Statistics, and selectCrosstabs(see Figure 13). The Crosstabs dialog box opens.

2. Select the variable Computer Ownedfrom the list box on the left.

3. Click the transfer arrow button to move it to theRow(s): list box.4. Select the variable color (see Figure 14).

5. Click the transfer arrow button to move it to the Column(s): list box.6. Click the OK button. An Output Viewer window opens and displays two tables: Case

ProcessingSummary and the Crosstabulation matrix (see Figure 15).

Figure 13 - Crosstab Analysis from Analyze Menu

Figure 14 - Crosstabs Dialog Box


12/77


Figure 15 - Crosstabs Output

Answer to Research Question # 2What color do people prefer for their computer?

Answer: IBM or Compatible in beige colorExplanation: As shown in the Crosstabulation matrix above, IBM or Compatible is themost preferred computer type from the row variable (ComputerOwned). From the columnvariable (color), beige is shown as the most preferred color. Therefore, you can concludethat most people prefer IBM or Compatible computers that are in beige color.

Data ManipulationData files are not always ideally organized in a form to meet specific needs. For example, usersmay wish to select a specific subject or split the data file into separate groups for analysis.

SELECT CASESIf you have two or more subject groups in your data and you want to analyze each subject inisolation, you can use theselect casesoption. For example, the data we are currently analyzing

has both male and female participants. However, if you wish to analyze only female cases, thenyou select Gender cases and set the condition for female cases only.

To select cases for analysis:1. Click the Data menu and select Select Cases (see Figure 16). The Select Cases dialog

box opens (see Figure 17).2. Click theIf condition is satisfiedoption.3. Click the If button. The Select Cases: Ifdialog box opens.4. Select the variable Gender in the left list box.

5. Click the transfer arrow button to move it to the right text box.

6. Click the = button .

7. Click the 1 button .8. Click the Continue button. This takes you back to the Select Cases dialog box.9. Click the OK button. This takes you back toData View.All males will be excluded from

the statistical analysis.10.Rerunthe crosstabs analysis by following steps 1-6 of theCrosstabssection of this

handout.11.Click the OK button. The Output Viewer window updates (see Figure 18).


13/77


Figure 16 - Select Cases from Data Menu

Figure 17 - Select Cases Dialog Box

From the cross tabulation in the Output Viewer window in Figure 18 below, look at the columnfor the most preferredcolor and the row for the computer types. Since we selected only femalecases, what is the computer color most preferred by women? Ten women chose IBM orCompatible with color option 5. Thus, you may conclude that most female participants preferthe color 5for IBM or Compatible computers. However, what does 5 represent? This

problem arose by not labeling the variable value 5 as Other. Moreover, even if it werelabeled Other, it does not indicate any particular color, making it difficult to draw aconclusion. In order to avoid such problems, it is suggested that you provide a blank space where

participants can specify Other color preferences besides the ones specified in the surveyquestionnaire.

Figure 18 - Select Cases Output

Example:What kind of color do you like to have for your computer?

1. Beige 2.Black 3.Gray 4.White 5.Other __________

Research Question # 3


14/77


Is computer color preference different between genders?

SPLITTING A FILE

To answer the third research question,we need to split the file. You can analyze one particulargroup of subjects using theselect cases option. However, if you wish to compare the response orperformance differences by groups within one variable, it is best to use the split files option.

To split a file for analysis:1. Turn off theselect cases option.2. Click the Data menu and select Select Cases.The Select Cases dialog box opens.3. Select theAll cases option.4. Click the OK button. Notice that the male cases that were excluded are now all included

in the data file.5. Select the Data menu and select Split File.(see Figure 19). The Split File dialog box

opens (see Figure 20).

Figure 19 - Split File from Data Menu

Figure 20 - Split File Dialog Box

6. Select the variable Gender from the left list box.7. Select the Compare groups option.

8. Click the transfer arrow button to move the variable Gender to the Groups Basedon: list box.

9. Click the OK button.10.Rerunthe crosstabs analysis by following steps 1-6 of theCrosstabssection of this

handout.11.Click the OK button. The Output Viewer window crosstabulation table opens (see

Figure 21).


15/77


Figure 21 - Split File Output Data

Answer to Research Question # 3Is computer color preference different between genders?

Answer: YesExplanation: There is a computer color preference difference based on gender. From thecrosstabulation output, females prefer IBM or Compatible of Other color over the colors

beige, black, gray, or white. The male group prefers IBM or Compatible of black color.

FIND AND REPLACEIn PASW Statistics, the Find and Replace function is more efficient to use. Users can use Findand Replace inData View. However, only the Findfunction is available for users in VariableView.

To use the Find and Replace function:1. Click the Edit menu and select Find. The Find and Replace dialog box opens (see

Figure 22).2. In theFind: box, type [Clinton].3. Select theReplacecheck box to replace Clinton with another word.

4. Click in theReplace with: box, and type the name [Cliff].5. Click the Show Options button.6. UnderMatch to, select theEntire celloption.7. Click the Replace All button.

Figure 22 - Find and Replace Dialog Box (Data View)

Comment [Mai1]: Chalsiewhat do yoby more active?!? Please explain


16/77


NOTE: Under theMatch to section of the Find and Replace dialog box (seeFigure 22),Contains means PASW Statistics will find each instance of the word/phrase/number appearing ina cell, whether or not it is the only information enclosed. TheEntire celloption will find theword/phrase/number that matches the entire cell as a whole. Selecting theBegins with andEnds

with options will search the character indicated by the user.

ReportingOnce the statistical analysis is complete, the final step is to create a report. In the report, you mayinclude PASW Statistics output (e.g., graphs and tables) for supporting your analysis. Using theCopy and Paste functions, the tables/graphs generated in PASW Statistics can be copied from theOutputViewer window and pasted into a Microsoft Word document without having to createnew tables or graphs.

To create a report using Microsoft Word:1. In the OutputViewer window, right-click a table. A box appears around the table and a

red arrow to the left of the table (which means it is selected).2. Select Copy from the shortcut menu.

3. Open Microsoft Word.4. Right-click in the Word document and select Paste from the shortcut menu. The table iscopied into the Word document.


17/77


Appendix

SAMPLE SURVEY

Research Questions

1. What kind of computer do people prefer to own?

2. What color do people prefer for their computer?

3. Is computer color preference different between genders?

Survey Questions

1. What is your name? ____________________________

2.What is your gender? ____________________________

3.What is your G.P.A.? ____________________________

4.What is your age?

1. 19 or younger 2. 20-23 3. 24-27 4. 28-31 5. 32 or over

5.How much do you make in a month?

1. Less than $1000 2. $1000$1499 3. $1500$1999 4. $2000$2499 5. Over $2500

6.What is your class standing?

1. Freshman 2. Sophomore 3. Junior 4. Senior 5. Graduate

7.What kind of computer do you own?

1. Toshiba 2. Apple 3. IBM or Compatible 4. Other 5. None

8.What kind of computer have you used?

1. IBM or Compatible 2. Apple 3. Toshiba 4. Other 5. None

9.What color do you like to have for your computer?

1. Beige 2. Black 3. Gray 4. White 5. Other


18/77


Introduction Part 2PASW stands for Predictive Analytics Software. This program can be used to analyze datacollected from surveys, tests, observations, etc. It can perform a variety of data analyses andpresentation functions, including statistical analysis and graphical presentation of data. Amongits features are modules for statistical data analysis. These include 1) descriptive statistics, suchas frequencies, central tendency, plots, charts, and lists; and 2) sophisticated inferential andmultivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis,cluster analysis, and categorical data analysis. PASW Statistics is particularly well-suited forsurvey research, though by no means is it limited to just this topic of exploration.

This handout (Test of Significance) introduces 1) several data entry and data manipulationtechniques that help you save time, 2) basic skills to perform tests of significance, such ascorrelations and t tests, and 3) an introduction to multiple response sets. The step-by-stepinstructions will help you understand how to interpret the output of your tests from data suppliedby your research question(s). Follow the steps carefully to get appropriate results. Please notethat a slightly different process might yield unexpected and complicated results. This is acontinuation of thePASW Statistics Descriptive Statisticshandout.




Null HypothesisThe null hypothesis(H0) represents a theory that has been presented, either because it is believedto be true or because it is to be used as a basis for an argument. It is a statement that has not beenproven. It is also important to realize that the null hypothesis is the statement of no difference.For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug isno better, on average, than the current drug (in other words, the new drug exhibits the samebehavior as the old drug). The null hypothesis (H0) and the alternative hypothesis (H1) can bestated as:

H0: There is no difference between the two drugs.H1: There is a significant difference between the two drugs.

Special consideration is given to the null hypothesis. This is due to the fact that the nullhypothesis relates to the statement being tested, whereas the alternative hypothesis relates to the

statement to be accepted if and when the null is rejected.

The final conclusion, once the test has been carried out, is always given in terms of the nullhypothesis. The result is either "Reject H0 in favor of H1" or "Do not reject H0"; the conclusion isnever "Reject H1" or "Accept H1."
http://www.calstatela.edu/its/docs/pdf/pasw17p1.pdfhttp://www.calstatela.edu/its/docs/pdf/pasw17p1.pdfhttp://www.calstatela.edu/its/docs/pdf/pasw17p1.pdfhttp://www.calstatela.edu/its/training/datafiles/pasw17p2.exehttp://www.calstatela.edu/its/training/datafiles/pasw17p2.exehttp://www.calstatela.edu/its/docs/download.phphttp://www.calstatela.edu/its/docs/download.phphttp://www.calstatela.edu/its/docs/download.phphttp://www.calstatela.edu/its/training/datafiles/pasw17p2.exehttp://www.calstatela.edu/its/docs/pdf/pasw17p1.pdf


19/77


If the conclusion is "Do not reject H0," this does not necessarily mean that the null hypothesis istrue. It only suggests that there is no sufficient evidence against H0 in favor of H1. Rejecting thenull hypothesis then suggests that the alternative hypothesis may be true.

NOTE: The null hypothesis essentially states that the given cases or items under consideration arestatistically the same or exhibit the same behavior without any significant difference. The alternatehypothesis states that the given cases exhibit different behavior or that they have a statistically significantdifference.

Statistical TestsStatistics is a set of mathematical techniques used to summarize research data and determinewhether the data supports a proposed hypothesis. PASW Statistics includes tools that can be usedto analyze variables and determine the strength and nature of the relationship between twovariables and whether the means (averages) of two data sets (samples) are statistically the sameor different.

Tests of SignificanceThe following examples are sample research questions that can be answered using PASWStatistics analytical methods.

CORRELATIONSA correlation is a statistical device that measures strength or degree of a supposed linearassociation between two or more variables. One of the more common measures used is thePearson correlation, which estimates a relationship between two interval variables.

Research Question # 1Is there a relationship between academic performance and Internet access?

H0: There is no difference between academic performance and Internet access.H1: There is a significant difference between academic performance and Internet access.

To run a correlation analysis:1. Locate and open the Part 2.sav file.2. Click the Analyze menu, point to Correlate, and select Bivariate. TheBivariate

Correlations dialog box opens (see Figure 23).3. Select the variables active,posttest, and gpa in the list box on the left.4. Click the transfer arrow button to move them to the Variables: list box.5. Select thePearson check box and the Two-tailedoption if necessary.6. Click the OK button. The Output Viewer window opens with a Correlationstable

(see Figure 24).


20/77


21/77


Research Question # 2Is there an instructional effect taking place in the computer class?

H0:There is no influence of using the Internet on academic achievement for this class.H1:There is an influence of using the Internet on academic achievement for this class.

The hypothesis is that Internet familiarity cannot influence the academic achievement in thecomputer class. The variables that reflect academic achievement are pretestand posttest.

To run a Paired-Samples T Test:1. Click the Analyze menu, point to Compare Means, and select Paired-Samples T

Test. The Paired-Samples T Testdialog box opens (see Figure 25).2. Select the variables pretest and posttest in the list box on the left.3. Click the transfer arrow button to move them to the Paired Variables: list box.4. Click the OK button. The Output Viewer window opens (see Figure 26).

Figure 25 - Paired-Samples T Test Dialog Box

The Answer to Research Question # 2Is there an instructional effect taking place in the computer class?

Figure 26 - Paired-Samples T Test Output Table

Answer: YesExplanation: The observed mean difference is -4.5172. Since the value of t is -3.820 at p < .001,the mean difference (-4.5172) between pretest and posttest is statistically significant.According to the Sig. of 0.001 (which is less than 0.05), the hypothesis is rejected. Therefore, itcan be inferred that there was instructional effect taking place in the computer class.


22/77


INDEPENDENT-SAMPLES TTESTAnIndependent-Samples T Testis used to determine the likelihood that two independent datasamples came from populations that have identical means. If this were true, then the difference

between the means should be equal to zero. The null hypothesis in this case would be that thetwo means are equal.

Two variables are required in the data set. One variable is the measured parameter. Examplesinclude weight, height, or frequency. The second variable divides the data set into two groups.Light and Dark are the groups whose means will be compared.

Research Question # 3Is there a difference in the average number of seedlings grown in the light

and those grown in the dark?

In this example, 20 Petri dishes each contained 10 celery seeds. Ten of the dishes were kept inthe dark for one week; the other 10 were placed under a grow light for the same amount of time.At the end of the week, the number of seeds that sprouted was counted in each dish.

H0:Variance (light) = variance (dark).H1:Variance (light) variance (dark).

H0:There is no difference between seedlings under the light and in the dark ( (light)= (dark)).

H1:There is sig. difference between seedlings under the light and in the dark ( (light) (dark)).

NOTE: The first set of hypotheses is testing the variance, while the proceeding set is testing for the mean.The variances have to be equal before we can determine if the means are equal.

NOTE: Variance: The arithmetic mean of the squared deviations from the mean, which is essentially usedto see how far the single samples are from the mean. We need to make sure the variances are equal beforewe can determine if the means are equal. If the variances are equal, users will be able to move to the TTest. If the variances are not equal, users will have to do more testing.

To run the Independent-Samples T Test:1. Locate and open the Seedlings.sav file.2. InData View, click the Analyze menu, point to CompareMeans, and select

Independent-Samples T Test. TheIndependent-Samples T Testdialog box opens (seeFigure 27).

3. Select the Seedlings variable in the list box on the left. 4. Click the transfer arrow button to move the variable to the Test Variable(s): list box.5. Select the Treatment variable in the list box on the left.6. Click the transfer arrow button to move the variable to the Grouping Variable: list box.7. Click the Define Groups button. TheDefine Groups dialog box opens (see Figure 28).8. Enter [0] in the Group 1: box, enter [1] in the Group 2: box, and then click the Continue

button.9. Click the OK button. The Output Viewer window opens with several tables, including

an Independent-Samples Test table (see Figure 29).


23/77


Figure 27 - Independent-Samples T Test Dialog Box Figure 28 - Define Groups Dialog Box

The Answer to Research Question # 3Is there a difference in the average number of seedlings grown in the light

and those grown in the dark?

Figure 29 - Independent-Samples T Test Output

Answer: YesExplanation: The mean difference in seedlings sprouted between the two treatments (light anddark) was -2.900. The value of t, which is -3.179, was statistically significant (p=0.005).

Therefore, the null hypothesis is rejected.

Multiple Response SetsVery often, a survey will contain questions where the respondent is allowed to select more thanone answer. Managing such questions in PASW Statistics can produce some difficulty. Eachresponse in a multiple response question should be coded as a separate variable and then groupedunder a multiple response setof variables. The multiple response set can then be analyzed usingfrequency counts or crosstabs.

To define a multiple response set of variables:1. Locate and open the Airlines.sav file.2. InData View, click the Analyze menu, point to Multiple Response, and select Define

Variable Sets (see Figure 30). TheDefine Multiple Response Sets dialog box opens(see Figure 31).


24/77


Figure 30 - Define Variable Sets from AnalyzeMenu

Figure 31 - Define Multiple Response Sets Dialog Box

3. Select the American, TWA, United, USAir, and Other airline variables andmove them to the Variables in Set: list box.

4. Make sure theDichotomies option is selected and enter [1] in the Counted value: box.5. Type [Airlines] in theName: box.6. Type [Airline frequency of response] in theLabel: box.7. Click the Add button. The set is created as $Airlines and listed in theMultiple

Response Sets: list box.8. Click the Close button.

MULTIPLE RESPONSE FREQUENCIESIt is possible to obtain the answer by running a frequency analysis for each of the airlinevariables. The result of such an analysis will only provide an overall raw frequency for eachresponse and will not allow percentage comparisons between the different airlines. A frequencyanalysis that uses a multiple response set will provide an appropriate response with conciseoutput.

Research Question # 4In a survey of airline passengers, which airline was selected as having been

flown most often in the previous six months?

To analyze the frequency of response for each variable in a multiple response set:

1. Click the Analyze menu, point to MultipleResponse, and select Frequencies. TheMultiple Response Frequencies dialog box opens (see Figure 32).2. Select the multiple response set labeled $Airlines and move it to the Table(s) for: list

box.3. Click the OK button. An Output Viewer window opens with the frequency analysis (see

Figure 33).


25/77


Figure 32 - Multiple Response Frequencies Dialog Box

The Answer to Research Question # 4In a survey of airline passengers, which airline was selected as having beenflown most often in the previous six months?

Figure 33 - Airline Frequency Analysis Output

Answer: UnitedExplanation: As seen in the OutputViewer window, there were 18 people surveyed and 44 totalresponses generated. Of the 44 total responses, United was selected most often with 12 responses(representing 27.3%the largest portion of the total responses).

MULTIPLE RESPONSE CROSSTABSWithout the use of a multiple response set, each airline would have to be analyzed against thevariable that the passengers used to identify themselves as being afraid of flying. This wouldrequire the use of a crosstab analysis. However, the overall results would not allow for easycomparison between each of the airlines. The best way to answer the question would be to

include the multiple response set into a crosstab analysis.

Research Question # 5In a survey of airline passengers, which airline was selected most often by

those passengers who identified themselves as afraid to fly?


26/77


To incorporate a multiple response set into a crosstab analysis:1. Click the Analyze menu, point to MultipleResponse, and select Crosstabs. The

Multiple Response Crosstabs dialog box opens (see Figure 34).

Figure 34 - Multiple Response Crosstabs Dialog Box

2. Select the FearFactor variable as theRow(s):variable and the $Airlines multipleresponse set as the Column(s): variable.

3. Select the FearFactor variable after it is designated as theRow(s): variable. TheDefine Ranges button becomes active.

4. Click the Define Ranges button. TheMultiple Response Crosstabs: Define VariableRanges dialog box opens (see Figure 35).

Figure 35 - Multiple Response Crosstabs: Define Variable Ranges Dialog Box

5. Enter [0] in theMinimum: box and [1] in theMaximum:box for the FearFactorvariable.

6. Click the Continue button.7. Click the Options button. TheMultiple Response Crosstabs: Options dialog box opens

(see Figure 36).8. Select the Cases option and then click the Continue button.9. Click the OK button. The Output Viewer window opens with the crosstab results (see

Figure 37).


27/77


Figure 36 - Multiple Response Crosstabs: Options Dialog Box

The Answer to Research Question # 5In a survey of airline passengers, which airline was selected most often by

those passengers who identified themselves as afraid to fly?

Figure 37 - Multiple Response Crosstabs Output

Answer: USAirExplanation: Of the 18 people surveyed, ten identified themselves as being afraid to fly. Withinthat group of survey respondents, USAir was the airline selected most often (seven times).

Data Manipulation

PASW Statistics also provides tools to make data manipulation a simple task.

COPYING AND PASTING VARIABLE PROPERTIESCopying and pasting is very useful when the same properties need to be given to differentvariables.

To copy and paste variable properties:1. Click the File menu, point to New, and select Data.2. Click the VariableViewtab at the lower left corner of the Data Editor window (see

Figure 38).



28/77


3. Type [active] in the first cell under theName column and press the [Enter] key.4. Click in the first cell under theDecimalscolumn and decrease the entry to 0.

5. Click in the first cell under the Values column and click the Ellipses button . TheValue Labels dialog box opens (see Figure 39).6. Type [1] in the Value: box.7. Type [Strongly Disagree] in theLabel: box.8. Click the Add button.9. Assign [2], [3], and [4] for [Disagree], [Agree], and [Strongly Agree], respectively, by

repeating steps 6-8 for each value added (see Figure 39).

Figure 39 - Value Labels Dialog Box

10.Click the OK button.11.Switch back toData View (see Figure 40).12.Click the activevariable heading to highlight the column.

13.Click the Edit menu and select Copyto copy the properties of the variable active.14.Highlight the number of variables needed to apply the same properties to by clicking on

the header of the first variable and dragging the pointer across to the last header (seeFigure 41 and Figure 42).

15.Click the Edit menu and select Paste. The copied properties of the variable active willbe applied to the target variables, and theDataView and Variable View will change (seeFigure 43 and Figure 44).

Figure 40 - Data View Tab

Figure 41 - Selected Variable


29/77


Figure 42 - Selecting Target VariablesFigure 43 - Data View Showing New Variables

Figure 44 - Variable View Showing New Variables

INSERTING VARIABLES AND CASESBy using Insert Variable and Insert Cases, variables and cases can be added into any locationof the data file in a simple, straightforward manner. Assume that one wants to insert a newvariable named midterm between pretest and posttest and use it for test score data. Thefollowing instructions describe how to insert a new variable and make it available for Numericdata type.

To insert a variable:1. Switch toData View.2. Click the posttest variable heading to highlight the column.3. Click the Edit menu and select Insert Variable. A new variable is inserted to the left of

the highlighted variable (posttest).

NOTE: The new variable is created with a default name VAR00001 which can be changedlater.

4. To define the properties of the new variable, double-click the variable heading. TheVariable View is activated for the new variable.

5. Type [midterm] in theName column of the new variable.6. Change the variable type if desired.

In the same manner, it is possible to insert cases in a particular location in DataView. Forinstance, assume that a case should be inserted between case 10 and 11 for a particularstudents record. By following the instructions below, one case will be inserted after the 10thcase.

To insert cases (example):

1. Switch toDataView.2. Click row number 11 to highlight the case.3. Click the Edit menu and select Insert Cases. A new case is inserted above case 11.


30/77


DELETING VARIABLES AND CASES

Variables and cases can be deleted by using the Delete command.

To delete a variable or case:1. InData View, click the variable heading or the case number to highlight what will be

deleted.2. Click the Edit menu and select Clear. The variable or case is deleted.

Merging Data FilesThe merging data files function is useful for users who store each of their topics in separate filesand eventually need or want to combine them together. This allows users to import data from onefile into another as long as both sets of data (from each file) contain a common identifier for eachof the cases that the user wishes to combine.

An identifier has no meaning other than to distinguish each case from one another, and toidentify the correlating cases from the additional data files. This identifier can be a unique value,number, or letter combination to be applied to each case.

NOTE: The variables do not have to be the same across data files.

CREATING THE DATA FILE FOR MERGINGScenario: A psychological focus group on campus needs to create a file for a longitudinal studyfor ten students on campus. Each file will have the same students, but four different focal pointsof study pertaining to each question. Over the five year span of the study, the ten students will beasked twelve questions each year (one a month), and the same questions will be asked each year.At the end of the year, the three files will be combined in an annual questionnaire file to beproperly analyzed.

The merging data files function can be used to satisfy this requirement.

Inputting the Data in Variable ViewFiles must be created first before being merged.

To create a data file for merging:1. Click the File menu, point to New, and select Data.2. Once the new file has been created, select the Variable View tab.3. For the first variable, name it [ID] to be your identifier variable, and press the [Enter]

key.4. Change the Type attribute by clicking the ellipses button and selecting the String option

from the Variable Type dialog box.5. Change the width to [10] and click the OK button.6. Click in the second variable cell, type [January], and press the [Enter] key.7. Change the Type attribute to String.8. In theLabel attribute, type [What pet would you like to own?] (see Figure 45).9. Repeat steps 6 through 8 to enter the data in Table 4.


31/77


Figure 45 - Define Variables in Variable View

Table 4 - Variables for Case Study

Month Attribute Type Length Label Attribute

February String 10 What is your favorite shape?

March String 12 It is 1:30pm, what are you eating?

April String 12 What is your preferred beverage?

10.Once this information has been defined in Variable View, switch by clicking theData

View tab to enter the corresponding case information.11.Enter [Alfred] in case 1 of theID variable, [Bethel] in case 2 of theID variable, down to

[Jessie] in case 10 of theID variable. Enter the corresponding information according toTable 5. See Figure 46 for the results.

Table 5 - Input Case Information

Case ID January February March April

1 Alfred Dog Star Pizza Water

2 Bethel Cat Square Fruit Soda Pop

3 Chris Cat Triangle Veggies Grape Juice

4 Dante Dog Rectangle Sandwich Orange Juice

5 Erica Tiger Oval Chips Aloe Water

6 Fernando Tarantula Circle Calzon Beer

7 Grenadine Dog Octagon Salad White Wine

8 Harold Bees Polygon Soup Naked Juices

9 Isadora Turtle Rhombus PandaExpress V8 Juice

10 Jessie Hamster Oval Egg Salad Lemonade

Comment [I3]: Im not sure, but I think tactually may be too much for t hem to have tout, what do you guys think? Maybe only hacases in each file?


32/77


Figure 46 - Input Case Information

12.Save the file by clicking the File menu and selecting Save. The Save Data As dialog boxopens.

13.Select the Desktop as the destination and type [Merge 1] in theFile name: text box.14.Click the Save button.15.Close the Output Viewer window.

MERGING THE DATA FILESTo merge data files, all files must have a common variable. The common variable in this case isID.

To merge data files: (First, make sure the files have the same IDs.)1. Open the files Merge 2 and Merge 3 and check for consistency across all of the IDs.2. Minimize the Merge 2 and Merge 3 data files.3. Once back in the Merge 1 file, click the Data menu, point to MergeFiles, and select

Add Variables (see Figure 47).

Figure 47 - Data Menu When Selecting Add Variables


33/77


4. TheAdd Variables toMerge 1.sav dialog box opens. Select theAn external PASWStatistics data file option and click the Browse button (see Figure 48).

Figure 48 - Add Variables to Merge 1.sav Dialog Box

5. Locate and select the Merge 2 data file and click the Open button.6. Click the Continue button. TheAddVariables from Merge 2.sav dialog box opens (see

Figure 49).7. Select theMatch cases on key variables in sorted files check box.8. From theExcluded Variables:list box, select ID>(+) (see Figure 49), and using the

transfer arrow button , move it to theKey Variables: box.

Figure 49 - Add Variable from Merge 2.sav Dialog Box

9. Click the OK button. A warning message dialog box opens (see Figure 50).

Comment [jyu4]: Have to change the daname to spss17p2.sav


34/77


Figure 50 - Sorting Warning Dialog Box

10.Click the OK button to close the warning message. The finished product should look likeFigure 51.

Figure 51 - Merged 1 and 2 Files

11.Repeat steps 3-10 for the Merge 3 file.


35/77


Appendix

QUESTIONNAIREThis survey is designed to investigate relationships between Internet access and academicsuccess. It consists of three parts: questions related to the background information of therespondent, questions about Internet use patterns, and several open-ended questions. Pleaseselect appropriate answers that best describe your activities on the Internet as truthfully aspossible. The results of this study will be used anonymously for the PASW Statistics Part 2: Testof Significance workshop.

Background Information

1. Age: ____________________________

2. Major: ___________________________

3. G.P.A.: __________________________

4. Monthly Income: __________________

Internet Access

5. Do you have a computer at home?

1. Yes 2. No

6. Where do you surf on the Internet? (You can circle more than one option for this question.)

1. At school 2. At home 3. At work 4. Other ____________

7. How long do you stay online per day?

1. Less than 30 minutes 2. 1-2 hours 3. More than two hours

Questions 8 through 19 are designed to investigate the frequency and types of activities onthe Internet. These questions have a 4 point Likert-scale ranging from strongly disagree tostrongly agree. Please circle the option that best describes your activities on the Internet.

SD: Strongly DisagreeD: DisagreeA: AgreeSA: Strongly Agree

SD D A SA8. I am a very active Internet surfer. 1 2 3 4

9. I surf the Internet to look for articles for researchpapers. 1 2 3 4


36/77


SD D A SA10. I surf the Internet to read current news. 1 2 3 4

11. I use the Internet only to e-mail my friends,family, and professors. 1 2 3 4

12. I surf the Internet to check movie schedules. 1 2 3 4

13. I surf the Internet to look for personalinformation (e.g., yellow pages). 1 2 3 4

14. I surf the Internet to look for job openings 1 2 3 4

15. I use the Internet to play games. 1 2 3 4

16. I use the Internet to download forms and files(e.g., income tax forms). 1 2 3 4

17. I surf the Internet to improve my computer skills. 1 2 3 4

18. I surf the Internet to purchase books. 1 2 3 4

19. I surf the Internet to purchase other merchandise(e.g., video tapes, clothes, computers). 1 2 3 4

Question 20 is an open-ended question.

20. Are there any other Internet activities that are not included in this survey? If so, pleasedescribe them below.

____________________________________________________________________

____________________________________________________________________

____________________________________________________________________

____________________________________________________________________


37/77


Introduction Part 3PASW stands for Predictive Analytics Software. This program can be used to analyze datacollected from surveys, tests, observations, etc. It can perform a variety of data analyses andpresentation functions, including statistical analysis and graphical presentation of data. Amongits features are modules for statistical data analysis. These include 1) descriptive statistics, suchas frequencies, central tendency, plots, charts, and lists; and 2) sophisticated inferential andmultivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis,cluster analysis, and categorical data analysis. PASW Statistics is particularly well-suited forsurvey research, though by no means is it limited to just this topic of exploration.

This handout (Regression Analysis) provides basic instructions on how to answer researchquestions and test hypotheses through the use of linear regression (a technique which examinesthe relationship between a dependent variable and a set of independent variables). The value ofthe dependent variable (e.g., salespersons total annual sales) can be predicted based on itsrelationship to the independent variables used in the analysis (e.g., age, education, and years ofexperience). The two research questions proposed for this workshop are as follows:

1. How much will each salesperson make this year?2. Who will qualify for a $1,000 bonus?




Simple RegressionSimple regression estimates how the value of one dependent variable (Y) can be predicted basedon the value of one independent variable (X). The linear equation for simple regression is asfollows:

Y = aX + b

Simple regression can answer the following research question:

Research Question # 1How much will each salesperson make this year?

SCATTER PLOTA scatter plotdisplays the nature of the relationship between two variables. It is recommended torun a scatter plot before performing a regression analysis to determine if there is a linearrelationship between the variables. If there is no linear relationship (i.e., points on a graph arenot clustered in a straight line), there is no need to run a simple regression.


38/77


To run a scatter plot:1. Start PASW Statistics 17.

2. Click the Open button on the Data Editor toolbar. The Open Data dialog box opens.3. Locate and open the Regression.savfile.4. Click the Graphs menu, point to Legacy Dialogs, and select Scatter/Dot (see Error!

Reference source not found.). The Scatter/Dotdialog box opens (see Error! Referencesource not found.).

NOTE: To estimate the relationship between two variables, select the Simple Scatter plot.

Figure 52 - Graphs Menu When SelectingScatter/Dot

Figure 53 - Scatter/Dot Dialog Box

5. If necessary, select the Simple Scatter option, and then click the Define button (seeError! Reference source not found.). The Simple Scatterplotdialog box opens (seeError! Reference source not found.).

Figure 54 - Simple Scatterplot Dialog Box

6. Select the variable Last year sales[lastsale] from the list box on the left.


39/77


7. Click the first transfer arrow button to move the variable to the Y Axis: box.8. Select the variable Years of experience[yearexpe] from the list box on the left.9. Click the second transfer arrow button to move the variable in theX Axis: box.10.Click the OK button. The Output Viewer window opens with a scatter plot of the

variables (see Figure 55).

NOTE: A graph similar to Figure 55 will be displayed in the OutputViewer window. This scatterplot indicates that there is a linear relationship between the variables Last year salesand Yearsof experience.The next step is to find a line that best accommodates the pattern of points in this scatter plot.

The steps on how to enhance graph appearance are included in the last section of this handout.

Figure 55 - Scatter Plot

PREDICTING VALUES OF DEPENDENT VARIABLESSince it is known that a linear relationship exists between the two variables, the regressionanalysis can be performed to predict this years sales.

To run a simple regression analysis:1. Switch to the Data Editor window.2. Click the Analyze menu, point to Regression, and select Linear(see Figure 56). The

Linear Regression dialogbox opens.

Figure 56 - Analyze Menu When Selecting Linear


40/77


3. Select the variableLast year sales[lastsale]from the variable list box on the leftandmove it to theDependent: box by clicking the first transfer arrow button (see Figure 57).

Figure 57 - Linear Regression Dialog Box

4. Select the variableYears of experience[yearexpe] from the variable list box on theleft and move it to theIndependent(s):box by clicking the second transfer arrow button.

5. Click the OK button.

The following tables present the results of a simple regression. R Square (.918) indicates thatthis model accounts for almost 92% of the total variation in the data (see Figure 58).

Figure 58 - Model Summary Output


41/77


Figure 59 - Coefficients Output

The slope and the y-intercept as seen in Figure 59 should be substituted in the following linearequation to predict this years sales: Y = aX + b. In this case, the values ofa, b, x, and y will beas follows:

a = 1954.658b = 440.987X = Years of experience (values of independent variable)Y = Last year sales (values of dependent variable)

PREDICTING THIS YEARS SALES WITH SIMPLE REGRESSION MODELTo predict this years sales for each salesman, the values ofa and b should be substituted in thefollowing linear equation:

Y = aX + b

Last year sales = (a * yearexpe) + bThis year sales = (1954.658 * yearexp2) + 440.987a = 1954.658b = 440.987X = Years of experience [yearexp2]Y = This year sales

NOTE: The new independent variable,yearexp2 is used instead of yearexpe in order to predictthis years sales.

To predict this years sales using the computing function:1. Switch to the DataEditor window.2. Click the Transform menu and select Compute Variable. The Compute Variable

dialog box opens (see Figure 60).3. In the Target Variable: box, type [Simple].


42/77


Figure 60 - Compute Variable Dialog Box

4. In theNumeric Expression: box, enter the following equation by typing or selectingfrom the dialog box keypad:

[1954.658 * yearexp2 + 440.987]

NOTE: It is recommended to select the variable yearexp2 directly from the variable list boxon the left of the Compute Variable dialog box to prevent typing mistakes.

5. Click the OK button. The results will be displayed in the Simple column inData View(see Figure 61).

Figure 61 - Simple Regression Results

To change the data type for the new variable Simple:1. Click the Variable View tab at the lower left corner of the Data Editor window (see

Error! Reference source not found.).


43/77



2. Locate the variable Simpleand click the Ellipses button under the Type column.The Variable Type dialog box opens (see Error! Reference source not found.).

3. Select theDollaroption, and then select the$###,###,### format (12 digits width with 0decimal places).

Figure 63 - Variable Type Dialog Box

4. Click theOKbutton, and then click theData Viewtab.

Figure 64 - Simple Regression Prediction

NOTE: The prediction of this years sales for each salesperson are computed under the newvariable named Simple as shown in Error! Reference source not found..

Multiple RegressionMultiple regression estimates the coefficients of the linear equation when there is more than oneindependent variable that best predicts the value of the dependent variable. For example, it ispossible topredict a salespersons total annual sales (the dependent variable) based onindependent variables such as age, education, and years of experience. The linear equation formultiple regression is as follows:

Z = aX + bY + c

PREDICTING VALUES OF DEPENDENT VARIABLESThe previous section demonstrated how topredict this years sales (the dependent variable)based on one independent variable (number of years of experience) by using simple regressionanalysis. Similarly, this years sales (the dependent variable) can be predicted from more than


44/77


one independent variable, such as Years of experience and Years of education, by usingmultiple regression analysis.

To run multiple regression analysis:1. Click the Analyze menu, point to Regression, and select Linear. TheLinear

Regression dialog box opens (see Figure 65).2. From the variable list box, select Last year sales[lastsale] as a dependent variable and

move it to theDependent:boxbyclicking the first transfer arrow button .3. From the variable list box, select Years of experience[yearexpe] and Years of

education[educatio] and move them to theIndependent(s): box by clicking the secondtransfer arrow button .


NOTE: If there are variables in theIndependent(s): orDependent: boxes, click the Reset buttonbefore performing steps 2 and 3 above.

Figure 65 - Linear Regression Dialog Box

Figure 66 - Model Summary Output for MultipleRegression

NOTE: The table should look similar to Error!Reference source not found..R Square =.976 indicates that this model can predict thisyears sales almost 98% correctly.

Figure 67 - Multiple Regression Output


45/77


The slopes and the y-intercept as seen in Figure 67 should be substituted in the following linearequation to predict this years sales: Z = aX+ bY + c

In this case, the values ofa, b, x, and y will be as follows:a = 1874.5b = 609.391c = (-8510.838)X = Years of experience (independent variable)Y = Years of education (independent variable)Z = This year sales (dependent variable)

As indicated in the output table, the coefficient for Years of experience is 1874.5and thecoefficient for Years of education is 609.391.

PREDICTING THIS YEARS SALES WITH MULTIPLE REGRESSION MODELTo predict this years sales for each salesman, the values ofa, b,and c should be substituted inthe following linear equation: Z = aX + bY + c

This year sales = 1874.5 * Years of experience + 609.391 * Years of education + (-8510.838)

To predict this years sales by multiple regression analysis:1. Switch to the Data Editor window.2. Click the Transform menu and select Compute Variable. The Compute Variable

dialog box opens (see Figure 68).3. Click the Reset button.4. In the Target Variable: box, type [multiple].5. In theNumeric Expression: box, enter the following equation by typing or selecting

from the dialog box keypad:

[1874.5 * yearexp2 + 609.391 * educatio - 8510.838]



46/77


6. Click the OK button. The results will be displayed in themultiple column inData View(see Error! Reference source not found.).

Figure 69 - Multiple Regression Results

NOTE: The predictions of sales for each salesperson using two independent variables are listed under thenew variable named multiple.

Data Transformation

Situations may arise where data transformation is useful. Most data transformations can be donewith the Compute command. Using this command, the data file can be manipulated to fitvarious statistical performances.

Research Question # 2Who will earn a $1,000 bonus?

COMPUTINGSince each persons yearly sales were already predicted, those who made more than $2,000above the predicted values, obtained via multiple regression analysis, will receive $1,000 as abonus. Using the Compute command, those salespeople who met the criteria can be easilylocated by comparing the values ofthis years actual sales with the predictions from multipleregression analysis computed in the previous lesson.

The first step in predicting who will receive a bonus is to calculate the difference between thisyears actual sales and the prediction of this years sales from the multiple regression analysis.

To predict who will qualify for the bonus:1. Open the Bonus.sav file.2. If the Save As dialog box opens, click the No button.3. Click the Transform menu and select Compute Variable. The Compute Variable

dialog box opens (see Error! Reference source not found.).

4. In the Target Variable: box, type [bonus].5. In theNumeric Expression: box, type [1000].


47/77



6. Click the If button. The Compute Variable: If Cases dialog box opens (see Figure 71).7. Select theInclude if case satisfies condition: option.8. Enter the following expression by typing or selecting from the dialog box keypad:

[thissale - multiple >= 2000]

Figure 71 - Compute Variable: If Cases Dialog Box

NOTE: It is recommended that you select the variables and the >= sign directly from the variablelist box and keypad provided in the dialog box to prevent mistakes.


48/77


9. Click the Continue button, and then click the OK button.

NOTE: Salespersons #49 Jasonand #44Ivett are a couple of the sales personnel

who will be qualified to receive a $1,000bonus due to them making $2,000 overtheir predicted sales from last lesson (seeError! Reference source not found.).

Figure 72 - Bonus Results

Polynomial RegressionThis type of regression involves fitting a dependent variable (Y i) to a polynomial function of asingle independent variable (Xi). The regression model is as follows (see Table 6 for the meaningof the variables):

Yi = a + b1Xi + b2Xi2 + b3Xi

3+ + bkXik+ ei

Table 6 - Breakdown of the Variables

Variable Meaninga Constant

bj The coefficient for the independent variable to the jth power

ei Random error term

REGRESSION ANALYSISTo look at the growth relationship between weight and age:

1. Open the Growth.savfile.2. Click the Analyze menu, point to Regression, and select Curve Estimation. The

Curve Estimation dialog box opens to define the parameters of the analysis (see Figure73).

3. Transfer the wght variable to theDependent(s): box and the age variable to theIndependent Variable: box.

NOTE: The weight (dependent) variable is what is being predicted using the age (independent)variable.

4. Deselect the Plot models check box.5. Select the Display ANOVA table check box.6. UnderModels, deselect theLinear check box and select the Cubic check box.7. Click the OK button.


49/77


Figure 73 - Curve Estimation Dialog Box

Analyzing the ResultsThis cubic model has an R2 of 99.567% (see Figure 74). The F-ratio indicates a highlysignificant fit. The best fitting cubic polynomial is given by the follow equation:

(Where Yi is weight and Xi is age);Yi = 0.0520.017 Xi + 0.010 Xi

20.001 Xi3 + ei

Multiple regression can be used to fit polynomials of higher order. If X is the dependent variable,use the Transform and Compute options of the Data Editor (as discussed earlier in this lesson)to create new variables X2 = X*X, X3 = X*X2, X4 = X*X3, etc., then use these new variables(X, X2, X3, X4, etc.) as a set of independent variables for a multiple regression analysis.


50/77


Figure 74 - Polynomial Regression Summary Results

Chart EditingDuring the final stage of research, enhancing the appearance of charts and figures can be veryhelpful for readers to understand what may seem to be confusing statistics. This will save thetime and effort to copy and paste an object from one program to another and to modify itsfeatures. The following steps explain some useful methods to enhance the appearance of a chart.

ADDING A LINE TO THE SCATTER PLOTAdding a straight line to fit the scattered pattern of a data chart can help emphasize the linearrelationship among the data.

To add a line to the scatter plot:1. Click the Graphs menu, point to Legacy Dialogs, and select Scatter/Dot.2. Select the Simple Scatter option, and then click the Define button.3. Transfer the age variable to theX Axis: box and the wght variable to the Y Axis:

box, and then click the OK button. A chart appears in the OutputViewer window.4. Double-click the chart in the OutputViewer window to modify it. The Chart Editor

window opens (see Error! Reference source not found.).5. Right-click a chart marker (see Error! Reference source not found.) and selectAdd Fit

Line at Totalfrom the shortcut menu.6. Under Fit Method, select the Cubic option, and then click the Apply button.7. Close the Chart Editor window.

NOTE: Notice that theAdd Fit Line at Totaldoes not capture the way the data curves, but thecubic method is almost a perfect fit (see Figure 77).


51/77


Figure 75 - Chart Editor Window

Figure 76 - Chart Markers

Figure 77 - Adding a Fit Line to the Scatter Plot

MANIPULATING THE SCALES ON X- AND Y-AXESTheX-axis and Y-axis can be adjusted to enhance the overall appearance and readability of thechart. Various elements of the axes can be manipulated, such as scale, ticksandgrids, number

format, and axislabel.

To manipulate the scales on the X-axis:1. If necessary, open the Regression.sav file.2. Run the scatter plot where the Y-axis is Last year sales and theX-axis is Years of

experience.3. Double-click the chart to open the Chart Editor window.

4. Click the Select theX axis button on the Standard toolbar to manipulate theX-axis.The Propertiesdialog box opens.

5. Select the Scaletab (see Error! Reference source not found.).6. Change the value in theLower margin (%): box to 0.7. Select theLabels & Ticks tab (see Error! Reference source not found.).8. In theMajorTicks section, select theDisplay ticks check box.9. Click the Style arrow and selectInside from the list.


52/77


Figure 78 - X-axis Properties Dialog Box: ScaleTab

Figure 79 - X-axis Properties Dialog Box: Labels& Ticks Tab

10.Click the Show Grid Lines button on the Standard toolbar to show the Propertiesdialog box.

11.Select the Grid Lines tab, select theMajor ticks only option, click the Apply button, andthen click the Close button (see Error! Reference source not found.).

12.Click the Select the Y axis button on the Standard toolbar to manipulate the Y-axis.The Propertiesdialog box opens.

13.Select the Scaletab (see Error! Reference source not found.).

Figure 80 - Properties Dialog Box: Grid Lines Tab


53/77


Figure 81 - Y-axis Properties Dialog Box: ScaleTab

14.Change the value in theLower margin (%:) box to 0.15.Click the Apply button, and then click the Close button.

Figure 82 - Before Manipulating the X-axis Figure 83 - After Manipulating the X-axis

ADDING A TITLE TO THE CHARTAdding a title to the chart is a simple process that enhances the charts appearance.

To add a title to a chart:1. In the ChartEditor window, click in a blank area outside the first chart to select the

whole chart, then move the mouse pointer to one of the selection handles until it becomesa two-headed arrow.

2. Drag the mouse pointer to reduce the chart size.3. Click the Insert a text box button on the Standard toolbar. The text box appears

above the chart and the Properties dialog box opens.4. Type Relationship Between Last Year Sales and Years of Experience in the text box.5. Click the border of the text box to select it.


54/77


6. Select the Text Style tab in the Properties dialog box, select a color for the title text, clickthe Apply button, and then click the Close button.

7. Click the Bold button on the Standard toolbar, and change the Font Sizeto 12.8. Resize the text box to fit the text.9. If necessary, resize the chart to display the title at the top of the chart (see Error!

Reference source not found.).

Figure 84 - Adding a Title to the Chart

ADDING COLORS TO THE CHARTAll elements on the chart can be colored differently to add emphasis or distinguish betweenelements.

To add colors to a chart:1. In the ChartEditor window, select the chart element to change or add color to, such as

one of the plots (see Error! Reference source not found.).2. Click the Show Properties Window button on the Standard toolbar. The Properties

dialog box opens (see Error! Reference source not found.).

3. Select theMarkertab, and then select a color from the color palette.4. To change the marker type, click the Type arrow in theMarkersection and select a

symbol from the menu (see Error! Reference source not found.).5. View the changes in the Preview section.6. Click the Apply button, and then click the Close button.


55/77


Figure 85 - Adding Color to the Chart

Figure 86 - Properties Dialog Box

FILLING A BACKGROUND COLORThe background color can also be filled to make the chart stand out.

To fill in a background color:1. Click inside a blank area of the chart to select the entire chart area (see Error! Reference

source not found.).2. Click the Show Properties Window button on the Standard toolbar. The Properties

dialog box opens.

3. Select the Fill swatch .4. Click the Pattern arrow and select a background pattern.

5. Click the Apply button, and then click the Close button.

Figure 87 - Filling a Background Color


56/77



57/77


Introduction Part 4PASW stands for Predictive Analytics Software.This program can be used to analyze datacollected from surveys, tests, observations, etc. It can perform a variety of data analyses and

presentation functions, including statistical analysis and graphical presentation of data. Amongits features are modules for statistical data analysis. These include 1) descriptive statistics, suchas frequencies, central tendency, plots, charts, and lists; and 2) sophisticated inferential andmultivariate statistical procedures, such as analysis of variance (ANOVA), factor analysis,cluster analysis, and categorical data analysis. PASW Statistics is particularly well-suited forsurvey research, though by no means is it limited to just this topic of exploration.

This handout (Chi-Square and ANOVA) introduces basic skills for performing hypothesis testsutilizing Chi-Square test for Goodness-of-Fit and generalized pooled t tests, such as ANOVA.The step-by-step instructions will guide the user in performing tests of significance usingPASW Statistics and help the user understand how to interpret the output for research questions.

Downloading the Data Files

This handout includes sample data files that can be used for hands-on practice. The data files arestored in a self-extracting archive. The archive must be downloaded and executed in order toextract the data files.



Chi-SquareThe Chi-Square (2) test is a statistical tool used to examine differences between nominal orcategorical variables. The Chi-Square test is used in two similar but distinct circumstances:

To estimate how closely an observed distribution matches an expected distributionalsoknown as the Goodness-of-Fit test.

To determine whether two random variables are independent.

CHI-SQUARE TEST FOR GOODNESS-OF-FITThis procedure can be used to perform a hypothesis test about the distribution of a qualitative(categorical) variable or a discrete quantitative variable having only finite possible values. Itanalyzes whether the observed frequency distribution of a categorical or nominal variable isconsistent with the expected frequency distribution.

With Fixed Expected Values

Research Question # 1Can the hospital schedule discharge support staff evenly throughout the week?

A large hospital schedules discharge support staff assuming that patients leave the hospital at afairly constant rate throughout the week. However, because of increasing complaints of staffshortages, the hospital administration wants to determine whether the number of dischargesvaries by the day of the week.


58/77


H0: Patients leave the hospital at a constant rate (there is no difference between the dischargerates for each day of the week).

To perform the analysis:1. Start PASW Statistics 17.2. Click the Open button on the Data Editor toolbar. The Open Data dialog box opens.3. Navigate to the data files folder, select the chi-hospital.sav file, and then clickthe

Open button.

Before the Chi-Square test is run, the observed values need to be declared.

To declare the observed values:1. Click the Data menu and select Weight Cases. The Weight Cases dialog box opens

(see Figure 88).

Figure 88 - Weight Cases Dialog Box

2. Select the Weight cases by option.

3. Select the Average Daily Discharges [discharge] variable and transfer it to theFrequency Variable: box.


To perform the analysis:1. Click the Analyze menu, point to Nonparametric Tests, and select Ch

Date post:	03-Apr-2018
Category:	Documents
Upload:	sultan-omar-salim
View:	234 times
Download:	2 times

Guide to Using SPSS

Documents