Correlation to the nth degree

The nth Degree 17670556

Correlation to the nth Degree:Does Sample Size Matter?

17670556Psych. 4000

Dr. McGahanOctober 20, 2015

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1

Correlations for All Samples

n=3 n=7 n=30 n=100

1


Correlation to the nth Degree:Does Sample Size Matter?

Michael Guice17670556

Psych. 4000Dr. McGahan

October 20, 2015

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1


n=3 n=7 n=30 n=100

2


Abstract

The age old debate between freewill and determinism still stands today. Everything boils down to essentially one question, who/what is in control? Some individuals see it as they are completely in control of their lives, while others see it as be controlled by something else. For freewill to exist there cannot be a predetermined causal chain. Using Karl Pearson’s Product moment correlation coefficient we try to determine whether randomness exist. Following these lines, if randomness does exist, would it vary as a function of sample size? This idea permeates as the theme of this study. Looking back at the Central Limit Theorem we see that normal distributions tend to arise when an object is subjected to a larger number of independent disturbances. This being said, a study has been designed to measure the effect of sample size on randomness. Four conditions exist within this study. Each condition has a sample size associated with it, and each condition was subjected to 30 trials a piece. When comparing the conditions we used aggregate means to allow for greater understanding. The means showed us that as sample size increased the standard deviations decreased. This means that the group became a tighter fit around the predicted mean of 50. Randomness showed us its existence with a sample size of 100. It was predicted that as sample size increased we would fall more within the bounds of the normal distribution. Although this may be true, condition 4 showed signs of chance. Throughout the study only 7 type I errors were committed. This means seven null hypotheses were incorrectly rejected.

3


Correlation to the nth degree:Does Sample Size Matter?

In one of Anthony Burgess’s more famous works, A Clockwork Orange, the protagonist is put through a classical conditioning scenario. After the scenario is over Alex, the protagonist, complains of how he had lost his freewill. What Alex was referring to was his inability to have control over his actions. Merriam-Webster defines Free Will as the ability to make choices that are not controlled by fate or God. Some believe that free will is a function of many “inputs, including genetic and environmental factors” [Bradley, 2012]. This type of free will is known as incompatibilist free will. Bradley referred to a coin flipping machine when mentioning this issue. He said that if the machine had incompatibilist free will, “the exact moment and the manner of (a) release is inherently unpredictable”. This unpredictability breaks a metaphorical causal chain of events. The causal chain that locked Alex and his sickness together was the same chain that removed his ability to steer his way through life. Without being able to steer Alex was more, or less, dragged through life by something different than free will. But what was doing the dragging? Alex believed it to be determinism.Determinism

The most basic meaning of determinism is that only one course of events is possible. James Bradley’s work, “Randomness and God’s Nature”, elaborates on the various type of determinism. In his work Bradley defines determinism as “the philosophical position that ontological randomness does not exist in the physical world”. Ontological randomness assumes that randomness is a property of the very nature of things; as a side note, Bradley also mentioned the idea of epistemic randomness (apparent randomness-a function of human perception of things and not their nature). In comparison, ontological is “true randomness” and epistemic is only random to the perceiver. With this being said, determinism would hold that free will is imaginary from an epistemic viewpoint. Looking at Darwin’s Theory of Evolution, this would suggest that evolution is a causal chain of predetermined events. If anything or anyone could have complete control over any specific moment in life, the whole layout could change.

The idea of free will is the basic theme of Darwin’s Theory of Evolution. In this theory Darwin stresses the idea that “the most suited part will exist to reproduce”. This idea delineates from the idea of determinism in that there is the factor of “most suited”. Going back to Alex, we see that anytime he thought of doing something “bad” a sickness feel over his entire body and forced him to stop. This sickness created a disadvantage for Alex. Without control Alex believed he could not be held responsible for any action he did, or did not, commit. At this point he began to believe his fate was sealed. This deterministic mindset forced Alex into a depression and later an attempted suicide. The very people who “helped” the protagonist later tried to dispose of him. **Spoiler Alert** Just so happens luck was on his side and he lived through it. The question is, was the luck predetermined or was it a matter of “blind, purposeless chance” [Bradley, 2012]?Randomness

Chance is usually mentioned in the study of randomness and proportionality. A popular conception of randomness is “not having a governing design, method, or purpose; unsystematic” [Bradley, 2012]. A fair die has six sides and when thrown has the probability of landing on one of those six sides. Much like the results in Neuringer’s study, “humans failed to produce random

4


like-behavior”, die tend to fail to produce random like-behavior. For instance, when an individual plays a game of chance and rolls a die they expect one-out-of-six possibilities. What happens when the die balances on a corner? What if the die rolls of the table and disappears forever? These questions give way to randomness. The question proposed now is, does randomness function as sample size fluctuates? Will the die produce any response other than one of the six possibilities mentioned before? Seeing as those chance is a function of randomness this study was aimed at determine whether randomness existed, and if so does it vary as a function of sample size.Correlation and Prediction

The Pearson product-moment correlation allows researchers the ability to detect linear relationships. In order to use the method three assumptions must be meet;

1.) “The Sample is independently and randomly selected from the population of interest.” 2.) “The population distributions of X and Y are such that their joint distribution (that is, their scatterplot) represents a bivariate normal distribution. This is called the assumption of bivariate normality and requires that the distribution of Y scores be normal in the population at every value of X.” 3.) “The variances of the Y scores are equal at every value of X in the population.”

What does a correlation coefficient do? A correlation coefficient measures the strength of the association between two variables if there is one. “The Pearson product-moment correlation coefficient measures the strength of the linear association between variables.”[StatTrek]. The coefficient bounds are between negative one and positive one. Using this scale allows the researcher to depict how influence on one variable may affect the other. A correlation of negative one should represent a perfect negative-relationship between the two variables. Variable one should move incrementally away from variable two when two is manipulated, and vice versa. When two variables move sequentially in the same direction when the other is adjusted we would consider this a positive relationship. Correlation coefficients equal to zero do not represent “zero relationship between two variables; rather, it means zero linear relationship” [StatTrek]. What is a linear relationship?

A linear relationship is a relationship of proportionality. Generally this relationship is plotted on a graph and the points create a straight line. Any change in one variable, the independent variable, will emit a corresponding change in another variable, the dependent variable. Practice makes perfect. As an individual spends more time practicing their ability should increase within that activity. Consequently, by reducing the time spent practicing one’s ability should diminish as well.

Galton:The concept of correlation was first discovered by Francis Galton in 1888. This came

after his intensified research on heredity. Galton wanted to “reconcile an empirical test with a mathematical theorem” [Stigler, 1989]. Galton used a Quincunx, an ingenious analogue computer, to help him with his correlation formulations. Although Galton used median deviations from the median, today we “express measurements in terms of the number of standard deviations units from the mean” [Stigler, 1989]. Although Galton is credited with the discovery

5


of correlation, Karl Pearson is credited with the discovery of the Pearson r product-moment correlation.

Pearson:Before Pearson’s discovery, Venn diagrams were used to determine linear relationships.

Karl Pearson was quoted saying that “recess deserves a commerative tablet as the birth place of the truce conception of correlation” [Sigler, 1989]. It’s believed that Pearson meant no disrespect towards Galton. More simply stated, the idea of correlation did not “click” for Galton until he took time to reflect.

Pearson was raised as a Quacker until the age of 24. While Pearson studied under Edward Routh at Cambridge he began to lose his religious faith. It’s been said he began to adhere to agnosticism or “freethought” [Brittancia]. Freethought holds that positions regarding truth should be formed on the basis of logic, reason, and empiricism. This idea is directly negatively correlated with the idea of authority, tradition, revelation or other dogma. Once the correlational method became an integral part of science it opened the door for a new zeitgeist.Intelligence Testing and General Model of Reliability

Intelligence Testing:Intelligence testing was first introduced by Lewis Terman in 1916. Terman was a

psychologist at Stanford University. At the time the test was administered to two year old and older participants. These test were individually administered and consisted of “an age-graded series of problems whose solutions involved arithmetical, memory, and vocabulary skills.” [Britannica]

The most popular intelligence tests are the Stanford-Binet Intelligence Scale and the Wechsler Scales. The Stanford-Binet is the American adaptation of the original French Binet-Simon intelligence test. An IQ, or an intelligence quotient, is a concept first suggested by William Stern. It was originally computed as the “ratio of a person’s mental age to his chronological (physical) age, multiplied by 100”. [Britannica] Today the mental age has fallen by the wayside. Test results still yield an IQ, but the concept is now configured on the basis of statistical percentage of people who are expected to have a certain IQ.

Intelligence test scores follow an approximately “normal” distribution. This normal distribution suggest most people score near the middle of the distribution curve. As one diverts attention away from the mean scores tend to drop off fairly rapidly in frequency. Even though intelligence testing seems simple there is still room for error.

Errors:There are two types of errors that plague the scientific community, type 1 and type 2.

Type 1 error is seen as incorrectly rejecting the null hypothesis. The scientific community is protected against over hastily rejection of the null hypothesis by statistical testing. Although statistics are never 100% accurate this allows “some policing powers to members who would rather live by the law of small numbers” [Kahneman and Tversky, 1971]. Kahneman and Tversky went on to state “there are no comparable safeguards against the risk of failing to confirm a valid research hypothesis (Type II Error).

Replication

6


For this study samples of 30 will be collected. This idea goes back to the article “Probable error of a correlation coefficient” published in 1908. At the time William Gosset was publishing under the pseudonym Student. William, head brewer of Guinness at the time, proposed the idea that “with samples of 30…the mean value (of the correlation coefficient) approaches the real value (of the population) comparatively rapidly”. Why should we replicate in the first place?

Kahneman and Tversky (1971) stated that “the decision to replicate a once obtained finding often express (es) a great fondness for that finding and a desire to see it accepted by a skeptical community”. In this same research, Kahneman and Tversky showed that 88% of this skeptical community believed the results of a single significant results are likely do to chance. This idea falls directly in line with the Central Limit Theorem (CLT). Stigler, 1989, referenced the CLT and said “that the normal distribution arises when an object is subjected to a larger number of independent disturbances, no few of them dominant”.Design to Test

One study with four conditions has been developed. The study requires the use of “true random numbers”. These numbers will be retrieved from a random generator, Random.org. Unlike pseudo random numbers, created from algorithm, Random.org generates numbers from atmospheric noise. Conditions were separated by sample size;

Condition 1 Sample Size of 3Condition 2 Sample Size of 7Condition 3 Sample Size of 30Condition 4 Sample Size of 100

As the conditions change so do the sample size. This is due to Kahneman and Tversky’s idea that “a replication sample should often be larger than the original”.

Variables X and Variable Y were pooled from the environment. Due to the nature of randomness variable X should not predict the Variable Y. Variable Y should not predict Variable X. This being the case Variable X and Variable Y should not be highly correlated. When correlated the Pearson-product moment correlation should be non-significant at an alpha level of .05. This alpha level was selected due to its ability to produce a confidence level of 95%. This is also the most prominent alpha within the scientific community. Also, the aggregate mean of the Pearson-product moment correlations between the samples should not exceed the critical values. Critical values for the samples are listed below.

Sample Size: Critical Value: Degrees of Freedom (two-tailed test)

N=3 0.997 1 (3-2=1)N=7 0.754 5 (7-2=5)N=30 0.361 28 (30-2=28)N=100 0.197 98 (100-2=98)

Degrees of freedom are essential to this research and thus it was added to the table of critical values.

Degrees of freedom and accuracy of estimates tend to be positively correlated. As the sample size increases the degrees of freedom increase. With this the accuracy of the estimate, the aggregate Pearson-moment correlation means, should be more closely related to the population mean. As the sample sizes increase the aggregate means should group more tightly around zero, with zero being understood as representing no linear relationship. This in turn should be representative of smaller standard deviations. Each condition’s standard deviation

7


“represents an average deviation from the mean” [Jaccard, J., 2010]. In theory as sample size increases one should be better able to predict the effect size of the variable in question. For this study the question does randomness function as a variable of sample size, has been posed. The hypothesis for this study is that randomness does vary as a function of sample size.

To test this hypothesis random numbers will be generated by Random.org. Two variables will be drawn for each subject within the conditions and formatted into two columns. Column1 will thus be denoted as Variable 1 and Column2 will be denoted as Variable 2. Each condition should use a multiplier of 2 for each participant. This means there will be a total of six variables for a sample size of three, fourteen for a sample size of 7, sixty for a sample size of 30, and two hundred for a sample size of 100.

Each condition should have thirty trials. The first initial trial and twenty-nine replications. At the end of each trial a test of correlation will be ran to determine the relationship between the two variables. While testing for the correlation, standard deviations and means for each variable should be collected for later comparison. Once all 30 trials have been ran for each condition the Pearson correlation results should be averaged to determine the grand mean of the samples. “The mean of a sampling distribution of the mean will always be equal to the population mean (of the raw scores)” [Jaccard, J., 2010]. This process should also be completed for the standard deviations, as well as the means, of Variable1 and 2 for all conditions.Sample Size

As mentioned before, sample size is a key variable for this study. In fact, it is the only variable and thus is labeled as the independent variable (IV).

N=3Ever heard either of the following sayings, “The third times the charm” or “All good/bad

things come in threes”? Three seems to be a special number when it comes to people in general, but especially in religion. For instance, Christians tend to symbolize the Holy Trinity in terms of three (Father, Son, and Holy Spirit). Even God’s attributes tend to be focused in threes (omniscience, omnipresence, and omnipotence). The Christian faith is not the only faith where this trend of three’s appears. Taoist believe in the Great Triad which includes heaven, human and Earth. Throughout the research 3’s and religion go hand and hand. Even in everyday life the number three pervades our minds (past, present, and future). Due to its evasive presence in the natural, and supernatural, world condition 1 will include a sample size of three. It only seems appropriate to study natural evasiveness when researching randomness. Also, this particular sample size will produce one degree of freedom. If the accuracy of a prediction varies as a function of degrees of freedom, then condition1 should have aggregate means much different than the other three conditions. This idea is supported by the Law of Large Numbers, the more samples you use the closer your estimates will be to the true population values. Only when a self-correcting tendency is in place should a small sample be highly representative and similar to that of a larger sample [Kahneman and Tversky, 1971]. Due to the nature of our data, self-correcting tendencies are nor, and should not be in place. Thus condition 1 should show the largest variance in comparison to the population so long as the other conditions represent larger samples sizes.

N=7The number 3 is not the only significant number that appears throughout religion and our

daily lives. It took seven days for God to create the Earth. “Seven is the second perfect number”

8


only losing to three. [Scripture] In the book of Revelation seven makes thirty-five (5 x 7 = 35) appearances. Similarly, “seven is a significant number in the natural world: mammals and birds have a gestation of multiples of 7” [Scripture]. These issues alone make seven seem like an attractive number to include within a study of randomness. Then there’s Miller’s “Magical Number of Seven, plus or Minus Two”. Miller’s research concluded that a person’s memory span is limited to seven chunks, plus or minus two. For these reasons condition2 has been identified as the sample size of seven.

N=30A sample size of thirty was particularly interesting for this study. In “God Counts” W.E.

Filmer translated Bible verses into its native tongue, Hebrew. After doing so he associated a numerical value to an idea. This is known as Bible numeric. “As each idea is introduced the associated number echoes throughout all manner of divisions and classifications in a way which cannot be put down to mere chance” [Filmer]. This research became relevant with further investigation of the value of thirty. Filmer shows us that 30 seems to representative of the idea of “Blood of Christ; Dedication”. What was really cool was the degrees of freedom for a sample size of thirty. A sample size of thirty has twenty-eight degrees of freedom. The idea associated with twenty-eight degrees of freedom is “Eternal Life”, according to Filmer. A sample size of thirty was a must-have for condition 3.

N=100For the fourth condition the study required a sample size that was significantly larger than

any of the other three. Looking back to the Law of Large numbers, Kahneman and Tversky stated that “(the law) guarantees that very large samples will indeed by highly representative of the population from which they are drawn” [1971]. What’s larger than 100%? Although we can never really achieve 100% in almost anything in real life, the idea is attractive. In fact, scientist have acknowledged the fact that we are unable to obtain a 100% accurate finding due to minimal levels of error. Another strange factoid, 100 is 3.33 the size of condition 3 (all those threes!!). For these reasons condition 4 is identified as sample size of 100.

Method

Rather than running the correlation by hand I chose to test the data with statistical software. So long as the data was entered correctly and the correct boxes where checked, statistical software has a better chance of acquiring the correct outcome. Secondly, the software saves valuable time.

Generating random numbers. As mentioned before, random.org generates random numbers through the use of “true

randomness” [random.org]. Using this websites features allows the ability to test whether “true randomness” exist. By typing the web address into the URL, “randomness.org”, and initiating the search, a webpage should appear. Welcome to Random.org! Now for the numbers that you will be using for data.

Near the top of the webpage there should be a strand of blue, hyperlink, words. The one we are most concerned with today is the hyperlink entitled “Numbers”. Scroll over this

9


hyperlink and a drop box should appear. The drop box has a list of choices in the following order; integers, sequences, integer sets, Gaussian numbers, decimal fractions, and raw bytes. The choice most relevant to this study is “integers”. Reason being is you want plain data. Now, click “integers”.

When the new page is finished loading you will notice you are now looking at the “Random Integer Generator”. This generator will “allow you to generate random integers” that are produced by atmospheric noise. [random.org] before you begin you should become familiar with the layout of the webpage. There are two very important sections to this site; Part 1: The Integers and Part 2: Go! Part 1 is where you inform the generator how many integers you want to produce, the range in which you wish the numbers to be derived, and how many columns you want the generator to format your data.

For the first condition, sample size of three, the generator should produce 6 random integers with a value between 0 and 100. This scale has been used because students tend to understand this scale seeing as though most grading rubrics have a range of 101, 0-100. Also, the data should be formatted into two columns. By doing so, we produce three samples with two variables each. Column one should be recognized as variable one and column two as variable two. Once the form has been filled out correctly the generating can begin! Advert your attention to Part 2: Go! At this point three options are available; get numbers, reset form, switch to advanced mode. For this experiment the first option, Get Numbers, will suffice. Option 2 will clear all your hard work thus far and opinion three is beyond the scope of this particular topic. Now generate.

As your generated numbers appear you will see two columns with three rows each. For instance, if you generated a set of numbers that look like table 2.1 you did it correctly. What this table tells its audience is that subject 1 had a variable 1 equal to 27. Likewise, subject 3 has a response of 14 for its variable 2. The pattern continues as you follow the flow of the table. In its entirety, table 2.1 represents trial 1 for a sample size of 3. For each condition in this study 30 trials will be ran for each sample size.

Returning to the website, near the bottom left of the webpage there are two options. Option 1(Again!) and option 2 (Go Back). The quota for the condition has not been met. 29 trials still remain, thus making option 2 an unlikely candidate for selection. With that being said, Option 1 should be selected. This action will provoke the generator to produce six more integers in the same format. Now for the other three conditions.

The other three conditions require a sample size of 7, 30 and 100 be used respectively. The same process that is listed above for a sample size of 3 will be used with a few exceptions. The range and the number of variables (columns) are controlled within the respective confounds of this experiment. These controls allow the researcher to pin point the relationship between variability and sample size. For a sample size of 7, fourteen integers will be generated. These fourteen integers will be divided into two separate columns providing two variable responses for seven subjects. This method is continued for both sample sizes of 30 and 100. For the sample

10

Table 2.1

Variable 1 Variable 2

Subject 1 27 37

Subject 2 92 5

Subject 3 16 14


size of 30, sixty responses will be required. You guessed it! Two hundred responses will be required for the sample size of 100. Remember!! Each condition requires thirty trials. Now what do with this generated data?

Each data set should be placed into a safe place for keeping and further study. For this study we have chosen to insert our data into SPSS, Statistical Package for the Social Sciences, software. This program has been chosen due to the nature of the data and the high utility of the software. It’s worth mentioning that version 22 of SPSS was used for this particular study. Upon opening the SPSS software the user is met with two windows. The first of which inquires about the user’s need, and the second should be the dataset viewer page. On the first window, near the top left you should double-click “New Dataset” underneath the New Files tab. By doing so, the software program will present a document usually entitled “Output 1”. This is where all of your formulations and equations will send their answers. For now, let’s put our attention on the other window now available, “Data Editor”.

Once inside this window you should notice various amounts of columns and rows. The rows should be numbered in an ascending manner, and the columns should be labeled “var”. For instance, column will be list as “Var1”. Also, near the bottom left hand side of the window there should be two tabs. The tabs should be labeled “Data View” and “Variable View”. Variable view allows the user to determine the scale of measurement for each variable. If there are any concerns on this matter please see S.S. Stevens’s article entitled “On the Theory of Scales of Measurement”. In this article Stevens refers to the four scales of measurement, NOIR, and how each is used. For the moment any further elaboration is beyond the scope of this study. The “Data View” is where the user will spend most of their time. As mentioned before, there are two variables for each trial and this theme permeates throughout the entire experiment.

Inserting the Data:

After generating the numbers in the random generator one must transpose these numbers into the SPSS software. Column 1 created by the random generator will be inserted into the first available odd-numbered column inside SPSS. For instance, if VAR1 and VAR5 were both empty, VAR1 would be the home for the first column of the first round of data. VAR2 would be the home for the second column of generated data. Each column of generated data should be entered consecutively. The goal is to create sixty VARs within each condition. This decision is based on the idea that each condition requires 30 trials, and each trial requires 2 sets of variables. As the sample size increases the amount of variables, VAR, do not increase; only the amount of participants under each column heading will do so. These steps have been take to ensure accuracy within the calculations, as well as to avoid any confusing results.

Churning the Data:

All four conditions, sample size of 3,7,30, and 100, will be analyzed through SPSS. Although each condition must be performed in the same manner, each must be performed separately. This protects against the misrepresentation of data and later confusions as to the results.

11


Once all the data has been inserted and saved into its respectable files the data can then be “churned”, or analyzed. Near the top-left corner of the SPSS window there is an option called “File”. This particular item is unimportant for the time being, but its location is pivotal in a manner of speaking. To the right of “File” should be “Edit”. Continue to look further right and you will come across an item referred to as “Analyze”. Remember the location for this selection. It is a key path way to analyzing your data. Now select “Analyze”. A scroll down bar should pop up and the first item for selection should be “Reports”. Continue to look further down until you see the selection “Correlate”. Upon placing the cursor over the “Correlate” function another drop box appears. Select the option “Bivariate”. This option has been selected because our data involves two variables; variable 1 and variable 2. After selecting “Bivariate” a new window entitled “Bivariate Correlations” should pop-up. This is where you will select the variables you want to “churn”.

The left side of the new window contains all the variable you have entered. Again, for each condition there should be 60 variables listed. By highlighting the variable in the left column and either double-clicking, or manually inserting with the arrow, you can turn your available variables into variables of interest. Once the two variables, VAR1 and 2 (or VAR9 and 10), have been successfully been transformed in the right side you can begin to inform the program what it is you need. The “Options…” tab allows the user to add other useful data inside the output. This data may be beneficial if selected and depicted in the proper manner. For this experiment “means and standard deviations” underneath the statistics heading should be selected. Once this has been selected you can select “Continue” near the bottom of the page. This will send you back to the “Bivariate Correlations” page. While in this page be sure to select the “Pearson” box under Correlation Coefficients. The lets the program know you are looking for the Pearson r moment-correlation, as mentioned before. The test of significance should also have the option “two-tailed” selected. Lastly, be sure to check the box “Flag significant correlations”. This will make you work much easier while looking at the outputs. Once all the necessary steps have been completed select “OK” at the bottom of the page. This process will need to be completed 30 times for each condition. After the first correlation is ran there is no need to go through the options selection again. SPSS saves your selection choices so long as you do not exit the program. The key thing to remember when running these analyses is the variable must be replaced with each replication. For instance, VAR 1 and 2 must be removed from the right side so VAR 3 and 4 can be analyzed.

SPSS: “Output”

After selecting “OK” in the bivariate correlations page the program should direct attention to the output window. This is where the churned data can be analyzed and viewed. Some numbers of interest on this page include; the mean of both VAR1 and VAR2, the standard deviation for both variables, and the Pearson correlation for VAR1 as compared to VAR2. The Pearson correlation for VAR2 as compared to VAR1 should be identical due to the mirror effect of the matrix. These numbers should be entered and saved into a Microsoft Excel workbook page. Data labels for each condition should be five columns wide. Underneath these labels there should be five column labels listed horizontally as follows; “r, Mean V1, Std. Dev V1, Mean V2, and Std. Dev V2”. Along the vertical axis of the workbook should be the word “Trial (with its

12


respective number of 1-30)”. The data that will be inserted into this workbook will be derived from the previous application in SPSS.

Excel: Tables and Charts

Visual representations have the potential to help the viewer follow the flow of the data and allow them to observe its physical nature. Once the new data is plugged into Excel it can now be used to create visual representations of the data. Particular visual of interest for this study include; Pearson r scatter plots, Pearson r line charts, sample means line chart from each condition, as well as the sample means scatter plots.

A Pearson r scatter plot can be created by highlighting the correlation data for all four conditions and selecting the option “Insert”. This option can be located at the top of the Excel window to the right of “Home”. Once the tab has adjusted items such as “Recommended Charts”, “Pivot Chart”, and “Tables” should be visible. There should also be a diagram eluding to that of a scatterplot. The diagram appears to have an X and Y axis with small dots spatially organized. If this option does not appear click any image of a chart and redefine your selection on the right side of the pop-up window; if it does, click it. This new window allows you to manipulate the visual aspects of the chart itself. Pick the best way to represent your data that’s not misleading or uninformative.

Pearson r line charts have the potential to be confusing. This may be true because all four of the conditions correlations are laid on-top-of one another. If there is lots of variation between the conditions this chart has the potential to be confusing. Nonetheless, the confusion may allow the user the potential to differentiate between the conditions. This type of chart can be created by highlighting all the variables of interest and inserting them into a line chart the same way the scatter plot was completed, but by selecting line chart instead of scatter plot.

Results

For each condition one chart has been selected to represent both variables. Var1 for each condition will be used for a visual reference point.

The results of this study show that for condition 1, n=3, the aggregate means for Var1 and Var2 are

47.4332 and 49.9444, respectively. Similarly the aggregate mean for the standard deviation of Var1 is 20.16363 and Var2 is 19.26425. The grand mean for the ranges for Var1 and Var2 are 83.33 and 71.00, respectively. The chart entitled “Var00001” resembles the general sample of

13

1 2 3 4 5 6 7 8 9 101112131415161718192021222324252627282930

-1-0.75

-0.5-0.25

00.25

0.50.75

1


n=3 n=7 n=30 n=100


condition one. Notice how the x-axis ranges from 0-100. Also notice the shape of a “normal bell curve” and how the histogram does not exactly “fit”.

For condition 2, n=7, the aggregate means of Var1 and Var2 are 50.0097 and 49.7048, respectively. Aggregate mean of the standard deviation for Var1 was 10.39435, while Var2 was 9.06379. The two grand means for the ranges were 38.43 and 35.86, respectively. The chart entitled “VAR00003” represents Var1 from condition 2. Be sure to notice the x-axis is no longer bound to 0-100, but rather 20-70. This reveals the tightening of the bounds. Also, be aware of the tighter grouping as compared to the graph illustrated in condition 1. As the standard deviation begins to decrease the means begin to fall under the normal distribution.

Condition 3, n=30, aggregate means of Var1 and Var2 were 49.9722 and 49.8778. The aggregate mean of the standard deviations were 5.11790 and 5.32132. The grand means for range equaled 19.77 and 20.17. The graph entitled “VAR0005” represents Var1 from condition 3. Notice once again how the x-axis bounds have decreased. The bounds are now restricted between 40-65. As the sample size increases more of the means tend to fall under the normal distribution.

14


Lastly, condition 4, n=100, the aggregate means were 48.9833 and 50.0453. The aggregate means of the standard deviations were 2.66304 and 2.58452, respectively. The grand means of range equaled 9.71 and 10.09. The graph entitled “VAR00007” represents the Var1 from conditon 4. Yet again as sample size decreases the bounds of the x-axis have decreased. However, unlike the other graphs to this point, this graphs means tend to be more spread out from the curve.The pattern till now was a tighter-more compressed grouping. It would appear as though this sample size was effected by chance.

Correlation ResultsCorrelation ResultsCritical Values

Mean Standard Deviation

Range

Condition 1 0.997 -.1000 .60864 1.93Condition 2 0.754 .0248 .49867 1.59Condition 3 0.361 -.0344 .18345 .72Condition 4 0.197 .0009 .12054 .43

The table entitled “Correlation Results” shows the correlation results for this study. The columns entitled “Mean”, “Standard Deviation” and “Range” should all be seen as aggregate means. The means were taken from the 30 trials within each condition and collapsed into one average. By doing so, we have saved valuable time and space for both the reader and researcher alike. This table shows that on average none of the conditions grand mean were greater than the critical values associated with each sample size. Thus no significant results were found to disprove that randomness existed. Similarly, as predicted, as the sample size increased throughout the conditions the aggregate mean of the standard deviation shrank incrementally. Likewise, the range followed suite. On the contrary it’s worth mentioning that although the aggregate mean of the correlations did not exceed the critical values, some individual trials did.

For the entire study only seven type I errors were made. These errors are known due to the nature of the data. The data is random data and should not portray a significant correlation as mentioned before. In the table below I have listed the amount of type I errors per condition;

Condition: Number of Type I Errors:

15


Condition1 1Condition 2 4Condition 3 1Condition 4 1Total Type I Errors 7

Scatterplots for the Pearson r Product-Moment correlations for this study are shown and described below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

-1

-0.75

-0.5

-0.25

0

0.25

0.5

0.75

1


n=3 n=7 n=30 n=100

This image has made an appearance multiple times throughout this study. Now it’s time to dissect it. The blue line, n=3, shows the variation in the correlations. Notice how it seems to take greater leaps “of faith” as in crosses the flat horizontal line. The horizontal line can be seen as 0, or having no linear relationship. In all the chaos it seems as though only two lines know exactly where it is they are headed, the grey and yellow line. These lines represent condition 3 and 4 respectively. Although they both seem to be navigating quite well through the chaos, condition 4 seems to hug the horizontal line a little tighter. This would fall right in line with the idea that two random variables should not significantly correlate with one another. As sample size increases, the degrees of freedom increase. When this happens it makes it easier for statisticians, researchers, and students alike to predict linear relationships.

Conclusion:The age old debate between freewill and determinism is not a simple cut-and-dry

scenario. If it were it probably would have been resolved long ago. The monkey wrench being thrown into the mix is something called randomness. Randomness behaves in a spurious way. It can be said that randomness is a function of chance. In this study we used Pearson product moment correlation to determine the effect of sample size on randomness. Four conditions were created with various amounts of samples. Sample size was the independent variable for this study. Thus it should be noted that condition 1 had a sample size of 3, condition 2 had a sample size of 7, condition 3 had a sample size of 30, and condition 4 had a sample size of 100. Another key to this research was the amount of replications. Kahneman and Tversky’s advice from the

16


law of small numbers was used to help predict our outcomes. We predicted that as sample size increased we would be more able to predict the nature of randomness. This turned out to be false. As sample size increased we were able to predict to a certain degree plus a minus a certain degree, but we were never able to predict the next number to be generated. Ideas for future research include, but are not limited to a sample size of 1000. With a sample size of 1000 it is believed that the aggregate means of the standard deviations will be relatively small in comparison to our data.

17


References:

http://www.britannica.com/biography/Karl-Pearson

http://www.britannica.com/science/intelligence-test

Bradley, J. (2012). Randomness and God’s Nature. Perspectives on Science and Christian Faith, Vol. 64, 2, 75-89.

Filmer, W.E. (1984) God Counts: Amazing Discoveries in Bible Numbers.

Jaccard, J. Becker, M. (2010). Statistics for the Behavioral Sciences. Cengage Learning. Belmont, CA.

Neuringer, A. (2002). Operant Variability and the Power of Reinforcement. The Behavior Analyst Today, Vol. 10, 2, 319-343.

Stigler, S. (1989). Francis Galton’s Account of the Invention of Correlation. Statistical Science, Vol 4, No. 2, 73-86.

S.S. Stevens. (1946). On the Theory of Scales of Measurement. Science, Vol. 103, 2684, 677-680.

Tversky, A. Kahneman, D. (1971). Belief in the law of small numbers. Psychological Bulletin, 2, 105-10.

“Scripture” The Significance of Threes (1988). Agape Bible Study.

18

http://www.britannica.com/science/intelligence-test

http://www.britannica.com/biography/Karl-Pearson

Date post:	15-Apr-2017
Category:	Documents
Upload:	michael-guice
View:	70 times
Download:	0 times

Correlation to the nth degree

Documents