unknown - Global Consciousness...

Examining Psychokinesis 1

Examining Psychokinesis:

The Interaction of Human Intention with Random Number

Generators.

A Meta-Analysis

Submitted: August 19, 2004

Acknowledgments

(removed for blind review)

Radin comments inserted with change-tracking


Examining Psychokinesis:

The Interaction of Human Intention with Random Number

Generators.

A Meta-Analysis

Submitted: August 19, 2004

Abstract

Séance-room phenomena and apports have fascinated mankind for

decades. Experimental research has reduced these phenomena to

attempts to influence (i) the fall of dice and, later, (ii) the output of

random number generators (RNGs). This overlooks dozens of other PK

experiments. It also overlooks the fact that most of the impetus for

developing RNG experiments was not to simulate séance phenomena, but

to test quantum observational theories to tighten methodologies, and to

reduce the need for special subjects. The meta-analysis presented here

combined 357 studies that assessed whether human intention could

correlate with RNG output. The studies yielded a significant, but very

small effect. Study size was strongly and inversely related to effect size;

Why the authors prefer to ignore the voluminous literature on this issue, I

cannot fathom. this finding was consistent across all examined moderator

and safeguard variables. A (not well specified) Monte Carlo simulation

revealed that the small effect size, the relation between study size and

effect size, as well as the extreme effect size heterogeneity, might be a

result of publication bias.


The idea that individuals can influence inanimate objects by the power

of their own mind is a relatively recent concept. Huh? Isn't sympathetic

magic one of the most ancient beliefs? During the 1970s, Uri Geller

reawakened mass interest in this putative ability through his

demonstrations of spoon bending using his alleged psychic powers

(Targ & Puthoff, 1977; Wilson, 1976) and lays claim to this ability even

now (e.g., Geller, 1998). Belief in this phenomenon is widespread. In 1991

(Gallup & Newport), 17 percent of American adults believed in "the ability

of the mind to move or bend objects using just metal mental energy"

(p. 138) and seven percent even claimed that they had "seen somebody

moving or bending an object using mental energy" (p. 141).

Unknown to most academics, a large amount of experimental data has

accrued testing the hypothesis of a direct connection between the human

mind and the physical world. It is one of the very few lines of research

where replication is the main and central target (initially perhaps, but

surely not for the past 20 years), a commitment that some methodologists

wish to be the commitment of experimental psychologists in general (e.g.,

Cohen, 1994; Rosenthal & Rosnow, 1991). This article will trace the

development of the empirical evaluation of this alleged phenomenon and

will present a new meta-analysis of a large set of studies examining the

interaction between human intention and random number generators.

Psi research

Psi phenomena (Thouless, 1942; Thouless & Wiesner, 1946) can be split

into two main categories. Psychokinesis (PK) is the common label for the

apparent ability of humans to affect objects solely by the power of the


mind. Extra-sensory-perception (ESP), on the other hand, refers to the

apparent ability of humans to acquire information without the mediation

of the recognized senses or logical inference. Many researchers believe

that PK and ESP phenomena are idiosyncratic (from context I'm guessing

they mean something like "identical," not idiosyncratic ) (e.g., Pratt, 1949;

J. B. Rhine, 1946; Schmeidler, 1982; Stanford, 1978; Thouless & Wiesner,

1946). Nevertheless, the two phenomena have been treated very

differently right from the start of their scientific examination. For instance,

whereas J. B. Rhine and his colleagues at the Psychology Department at

Duke University published the results of their first ESP card experiments

right after they had been conducted (Pratt, 1937; Price & Pegram, 1937;

J.B. Rhine, 1934, 1936, 1937; L. E. Rhine, 1937), they withheld the results

of their first PK experiments for nine years (L. E. Rhine & J. B. Rhine, 1943)

even though they had been carried out at the same time as the ESP

experiments: Rhine and his colleagues did not want to undermine the

scientific credibility that they had gained through their pioneering

monograph on ESP (Pratt, J. B. Rhine, Smith, Stuart & Greenwood, 1940).

When L. E. Rhine & J. B. Rhine (1943) went public with their early dice

experiments, the evidence was based not only on above-chance results,

but primarily on a particular scoring pattern. In those early PK

experiments, the participants' task was to obtain combinations of given

die faces. The researchers discovered a decline of "success" during longer

series of experiments, a pattern suggestive of mental fatigue (Reeves &

Rhine, 1943; J. B. Rhine & Humphrey, 1944, 1945). This psychologically

plausible pattern of decline seemed to eliminate several counter-


hypotheses for the positive results obtained, such as die bias or trickery,

because they would not lead to such a systematic decline. However, as

experimental evidence grew, the decline pattern lost its impact in the

chain of evidence.

Verifying psi

Today, in order to verify the existence of psi phenomena, one of two

meta-analytic approaches is generally undertaken - either the "proof-

oriented" or the "process-oriented" meta-analytical approach. The proof-

oriented meta-analytical approach tries to verify the existence of psi

phenomena by establishing an overall effect. The process-oriented meta-

analytical approach tries to verify the existence of psi by establishing a

connection between results and moderator variables.

Alleged [probably the intended meaning here and elsewhere is

"potential" since the legalistic term "alleged" is inappropriate} moderators

of PK, such as the distance between the participant and the target, and

various psychological variables, have never been investigated as

systematically as alleged moderators of ESP. So far, there have not been

any meta-analyses of PK moderators and the three main literature reviews

of PK moderators (Gissurarson, 1992 & 1997; Gissurarson & Morris, 1991;

Schmeidler, 1977) have come up with inconclusive results. On the other

hand, the three meta-analyses on ESP moderators established significant

correlations between ESP and extraversion (Honorton, Ferrari & Bem,

1998), ESP and belief in ESP (Lawrence, 1998), and ESP and

defensiveness (Watt, 1994). The imbalance between systematic reviews

of PK and ESP moderators reflects the general disparity between the


experimental investigations of the two categories of psi. From the very

beginning of experimental investigation into psi, researchers have focused

on ESP.

The imbalance between research in ESP and PK is also evident from the

proof-oriented meta-analytical approach. Only three (Radin & Ferrari,

1991; Radin & Nelson, 1989, 2002) of the 13 (Bem & Honorton, 1994;

Honorton, 1985; Honorton & Ferrari, 1989; Milton, 1993, 1997; Milton &

Wiseman, 1999a, 1999b; Radin & Ferrari, 1991; Radin & Nelson, 1989,

2002; Stanford & Stein, 1994; Steinkamp, Milton & Morris, 1998; Storm &

Ertel, 2001) meta-analyses on psi data address research on PK. Only two

of which provide no evidence for psi (Milton & Wiseman, 1999a, 1999b).

(The point of discussing/claiming an "imbalance between research in ESP

and PK" is not clear. It also may be incorrect in more recent years

depending on how one counts RV and Ganzfeld, neither of which are

classical ESP, and both of which have a relatively small number of

datapoints. In any case, if this issue is to be pursued ESP should be

defined -- the next section shows evidence of confounding very different

approaches, e.g., experimental research, social observation, and

anecdote.)

Psychology and psi

Psychological approaches to psi have also almost exclusively focused

on ESP. For example, there is a large amount of research (I disagree:

certainly there is ample rhetoric, but systematic research, no) supporting

the hypothesis that alleged ESP experiences are the result of delusions

and misinterpretations (e.g., Alcock, 1981; Blackmore, 1992; Persinger,


2001). Personality-oriented research established connections between

belief in ESP and several personality variables (Irwin, 1993; see also,

Dudley, 2000; McGarry & Newberry, 1981; Musch & Ehrenberg, 2002).

Experience-oriented approaches to paranormal beliefs, which stress the

connection between paranormal belief and paranormal experiences (e.g.,

Alcock, 1981; Blackmore, 1992; Schouten, 1983) and media-oriented

approaches, which examine the connection between paranormal belief

and depictions of paranormal events in the media (e.g., Sparks, 1998;

Sparks, Hansen & Shah, 1994; Sparks, Nelson & Campbell, 1997) both

focus on ESP, although the paranormal belief scale most frequently used

in those studies also has some items on PK (Thalbourne, 1995).

The beginning of the experimental approach to Psychokinesis

Reports of séance room sessions during the late 19th century are filled

with claims of extraordinary movements of objects (e.g., Crookes, Horsley,

Bull, & Meyers, 1885), prompting some outstanding researchers of the

time to devote at least part of their career to determining whether the

alleged phenomena were real (e.g., Crookes, 1889); James, 1896; Richet,

1923). In these early days, as in psychology, case studies and field

investigations predominated. Hence, it is not surprising that in this era

experimental approaches and statistical analyses were used only

occasionally (e.g., Edgeworth, 1885, 1886; Fisher, 1924; Sanger, 1895;

Taylor, 1890). Even J.B. Rhine, the founder of the experimental study of

psi phenomena, abandoned case studies and field investigations as a

means of obtaining scientific proof only after he exposed several mediums

as frauds (e.g., J.B. Rhine & L.E. Rhine, 1927). However, after a period of


several years when he and his colleagues focused almost solely on ESP

research, their interest in PK was reawakened in 1937 when a gambler

visited the laboratory at Duke University and casually mentioned that

many gamblers believed they could mentally influence the outcome of a

throw of dice. This inspired J.B. Rhine to perform a series of informal

experiments using dice. Very soon experiments with dice became the

standard approach for investigating PK.

Difficulties in devising an appropriate methodology soon became

apparent and improvements in the experimental procedures were quickly

implemented. Standardized methods for throwing the dice were

developed. Dice-throwing machines were used to prevent participants

from manipulating their throw of the dice. Recording errors were

minimized by having experimenters either photograph the outcome of

each throw or having a second experimenter independently record the

results. Commercial, pipped dice were found to have sides of unequal

weight, with the sides with the larger number of excavated pips, such as

the 6, being lighter and hence more likely to land uppermost than lower

numbers, such as the 1. Consequently, studies required participants to

attempt to score seven with two dice, or used a balanced design in which

the target face alternated from one side of the die (e.g., 6) to the opposite

site (e.g., 1).

In 1962 Girden (1962a) published a comprehensive critique of dice

experiments in the Psychological Bulletin. Among other things, he

criticized the experimenters for pooling data as it suited them, and for

changing the experimental design once it appeared that results were not


going in a favorable direction. He concluded that the results from the

early experiments were largely due to the bias in the dice and that the

later, better-controlled studies were progressively tending toward non-

significant results. Although Murphy (1962) disagreed with Girden's

conclusion, he did concede that no "ideal" experiment had yet been

published that met all six quality criteria - namely one with (i) a

sufficiently large sample size; (ii) a standardized method of throwing the

dice; (iii) a balanced design; (iv) an objective record of the outcome of the

throw; (v) the hypothesis stated in advance; and (vi) with a prespecified

end point.

The controversy about the validity of the dice experiments continued

(e.g., Girden, 1962b; Girden & Girden, 1985; Rush, 1977). Over time,

experimental and statistical methods improved and, in 1991, Radin &

Ferrari undertook a meta-analysis of the dice experiments.

Dice Meta-Analysis

The dice meta-analysis comprised 148 experimental studies and 31

control studies published between 1935 and 1987. In the experimental

studies 2569 participants tried to mentally influence 2,592,817 die-casts.

In the control studies a total of 153,288 die-casts were made without any

attempt mentally to influence the dice. The experimental studies were

coded for various quality measures, including a number of those

mentioned by Girden (1962a). Table 1 provides the main meta-analytic


results1. (Given the importance of these calculations, it seems odd to

relegate all this to a gigantic footnote.) The overall effect size, weighted

by the inverse of the variance, is small but highly significant (¯t

= .50610, z =19.68). Radin & Ferrari calculated that approximately

18,000 null effect studies would be required to reduce the result to a non-

significant level (Rosenthal, 1979). When the studies were weighted for

quality, the effect size decreased considerably (z? = 5.27, p = 1.34*10-7),

but was still significantly above chance.1 To compare the meta-analytic findings from the dice and previous RNG meta-analyses

with those from our RNG meta-analysis, we converted all effect size measures to the

proportion index (pi) (Rosenthal & Rubin, 1989) which we use throughout the paper.

This one-sample effect size ranges from 0 to 1 with .50 representing the null value (MCE).

For two equally likely outcomes, e.g. when tossing a coin, represents the proportion of

"hits". For example, if heads win at a hit rate of 50%, the effect size = .50 indicates

that heads and tails came down equally often; if the hit rate for heads were 75%, the

effect size would be = .75. The most important property of is that it converts all cases

with more than two equally likely outcomes, like e.g. tossing a die, to the proportional hit

rate as if there were just two alternatives.

In order to combine effect sizes from independent studies we used a fixed effect model,

weighted by the inverse of the variance (how is the variance per study calculated?) (e.g.,

Shadish & Haddock, 1994). Because Dean Radin kindly provided us with the basic data

files of the dice meta-analysis, we were able to compute the overall effect size ¯.

However, although we were able to calculate the overall effect sizes ¯o for all meta-

analysis on the basis of the original data (see Table 2), the die data provided did not

enable us to carry out the specific subgroup analyses presented in the meta-analysis and

summarized in Table 1. Consequently, in order to provide this information we

transformed the published results, which used the effect size r = z/sqrt(n), using

¯t = .5¯r + .5. This transformation is accurate as long as the z-values of the individual

studies are based on two equally likely alternatives (p = q = .5).


The authors found that there were indeed problems regarding die bias,

with the effect size of the target face 6 being significantly larger than the

effect size of any other target face. They concluded that this bias was

sufficient to cast doubt on the whole database. They subsequently

reduced their database to only those 69 studies that had correctly

controlled for die bias (the "balanced database"). As shown in Table 1 the

resultant overall effect size remained statistically highly significant.

However, the effect sizes of the studies in the balanced database were

statistically heterogeneous. When Radin & Ferrari trimmed the sample

until the effect sizes in the balanced database became homogenous, the

effect size was reduced to only ¯t = .50158 and fell yet further to

¯t = .50147 when the 59 studies were weighted for quality. Only 60

unpublished null effect studies (our calculation (explain) are required to

bring the balanced, homogenous and quality-weighted studies down to a

non-significant level. Ultimately, the dice meta-analysis did not advance

However, the z-scores of most dice experiments are based on six equally likely

alternatives (p = 1/6 and q = 5/6). (what about the position studies?) Consequently ¯o

as computed on the basis of the original data and ¯t as computed on the basis of the

transformation formula diverge slightly because r no longer remains in the limits of +/-1.

However, the difference between ¯o and ¯t is very small (< .05%) as long as the z-

values are not extreme (z > 10, p < 1*10-10). The difference is smaller the closer the

value is to the null value of .50, which is the case for all effect sizes presented here. The

difference between the two approaches can be seen when the results of the overall dice

meta-analysis that are presented in Table 1 are compared with the results presented in

Table 2. The differences between the two estimates are determined using z? = (¯o - ¯t) /

sqrt (SEo2 + SEt2). Although the difference is statistically significant (z? = 4.12,

p = 3.72*10-5, two-tailed), the order of magnitude is the same.


the controversy over the putative PK effect beyond the verdict of "not

proven", as mooted by Girden (1962b, p. 530) almost 30 years earlier.

Moreover, the meta-analysis has several limitations; Radin & Ferrari

neither examined the source(s) of heterogeneity in their meta-analysis,

nor addressed whether the strong correlation between effect size and

target face disappeared when they trimmed the 79 studies not using a

balanced design from the overall sample. The authors did not analyze

potential moderator variables and did not specify inclusion and exclusion

criteria. The studies included varied considerably regarding the type of

feedback given to participants. Some studies were even carried out totally

without feedback. The studies also differed substantially regarding the

participants who were recruited; some participants were psychic

claimants and others made no claims to having any "psychic powers" at

all. However, fundamentally as well as psychologically, the studies differ

most in respect of the experimental instructions they received and the

time window in which participants had to try to influence the dice.

Although most experiments were real time, with the participant's task

being mentally to influence the dice as they were thrown, some

experiments were "precognition experiments" in which participants were

asked to predict what die face would land uppermost in a future die cast

thrown by someone other than the participant.

From Dice to Random Number Generator

With the arrival of computationcomputers, dice experiments were

slowly replaced by a new approach. Beloff & Evans (1961) were the first

experimenters to use radioactive decay as a source of randomness to be


influenced in a PK study. In the initial experiments, participants would try

mentally to slow down or speed up the rate of decay of a radioactive

source. The mean disintegration rate subjected to influence was then

compared with that of a control condition in which there was no attempt

at human influence.

Soon after this, experiments were devised in which the output from the

radioactive source was transformed into bits (1s or 0s) that could be

stored on a computer. These devices were known as random number

generators (RNGs). Later, RNGs used electronic noise or other truly

random origins as the source of randomness.

This line of PK research was, and continues to be, pursued by many

experimenters, but predominantly by Schmidt (e.g., 1969), and later by

the Princeton Anomalies and Engineering Research (Princeton Engineering

Anomalies Research) (PEAR) group at Princeton University (e.g., Jahn,

Dunne & Nelson, 1980).

RNG Experiments

In a typical PK RNG-experiment, a participant presses a button to start

the accumulation of experimental RNG data. The participant's task is

mentally to influence the RNG to produce, say, more 1s than 0s for a

predefined number of bits. Participants are generally given real-time

feedback of their ongoing performance. The feedback can take a variety

of forms. For example, it may consist in the lighting of lamps "moving" in

a clockwise or counter clockwise direction, or in clicks provided to the

right or left ear, depending on whether the RNG produces a 1 or a 0.

Today, feedback is generally software implemented and is primarily


visual. If the RNG is based on a truly random source, it should generate 1s

and 0s an equal number of times. However, because small drifts cannot

be totally eliminated, experimental precautions such as the use of an XOR

filter, or a balanced experimental design are still required.

The RNG studies have many advantages over the earlier dice

experiments, making it much easier to perform quality research with

much less effort. Computerization alone meant that many of Girden

(1962a) and Murphy's (1962) concerns about methodological quality could

be overcome. If we return to Murphy's list of six methodological criteria,

then (i) unlike with manual throws of dice, RNGs made it possible to

conduct studies with large sample sizes (reflects a fundamental

assumption for which there is no sound evidence: the bit is the sample) in

a short space of time; (ii) the RNG was completely impersonal - unlike the

dice, it was not open to any classical (normal human) biasing of its output;

(iii) balanced designs were still necessary due to potential drifts in the

RNG; (iv) the output of the RNG could be stored automatically by

computer, thus eliminating recording errors that may have been present

in the dice studies; (v) like the dice studies, the hypotheses still had to be

formulated in advance; and (vi) like the dice studies, optional stopping

could still be a potential problem. Thus, RNG research entailed that, in

practical terms, researchers no longer had to be concerned about alleged

weak points (i), (ii) and (iv).

New Limits

From a methodological point of view, RNG experiments have many

advantages over the older dice studies. However, in respect of ecological


validity, the RNG studies have some failings. Originally, the PK effect to be

assessed was macroscopic and visual. Experimentalists then reduced

séance room PK, first to PK on dice, and then to PK on a random source in

an RNG. I don't think this historical sequence is correct. I doubt that dice

and RNG tests were created to simulate macro effects. But PK may not be

reducible to a microscopic level (e.g., Braude, 1997). Moreover, a dice

experiment is psychologically very different from an RNG study. Most

people have played with dice, but few have had prior experience with

RNGs. Additionally, an RNG id technical gadget from which the output

must be computed before feedback can be presented. Nevertheless, the

ease with which PK data can be accumulated using an RNG has led to PK

RNG experiments forming a substantial proportion of available data. Three

related meta-analyses of these data have already been published.

Previous RNG Meta-Analyses

The first RNG meta-analysis was published by Radin & Nelson (1989) in

Foundations of Physics. This meta-analysis of 597 experimental studies

published between 1959 and 1987 found a small but significant effect of

¯o = .50018 (SE = .00003, z = 6.53, p < 1*10-10)2.The size of the effect

did not diminish when the studies were weighted for quality or when they

were trimmed by 101 studies to render the database homogenous.

The limitations of this meta-analysis are very similar to the limitations

of the dice meta-analysis. The authors did not examine the source(s) of

2 The meta-analysis provided the overall effect size only in a figure (Fig. 3, p. 1506). Because its first author

kindly provided us with the original data, we were able to calculate the overall effect size and the proper

statistics.


heterogeneity and did not specify inclusion and exclusion criteria. (From

our FoP paper: "Experiments selected for review examined the following

hypothesis: The statistical output of an electronic RNG is correlated with

observer intention in accordance with pre-specified instructions, as

indicated by the directional shift of distribution parameters (usually the

mean) from expected values." That seems pretty well specified to me!)

Consequently, participants varied from humans to cockroaches, and the

feedback ranged from no feedback at all to the administration of an

electric shock. The meta-analysis included not only studies using true

RNGs, which are RNGs based on true random sources such as electronic

noise or radioactive decay, but also studies using pseudo RNGs (e.g.,

Radin, 1982), which are based on deterministic algorithms (with truly

random starting points). It might be argued that the authors simply took a

very inclusive approach. However, the authors did not discuss the

extreme variance in the distribution of the studies' z-scores (Again, from

the FoP article: "Finally, following the practice of reviewers in the physical

sciences (23,24), we deleted potential "outlier" studies to obtain a

homogeneous distribution of effect sizes and to reduce the possibility that

the calculated mean effect size may have been spuriously enlarged by

extreme values." Certainly this is a rationale for what we did, although

perhaps it is not a discussion.) and did not assess any potential moderator

variables, which were also two weaknesses in the dice meta-analysis.

Nevertheless, this first RNG meta-analysis served to justify further

experimentation and analyses with the PK RNG approach.


Almost 10 years later, in his book aimed at a popular audience, Radin

(1997) recalculated the effect size of the first RNG meta-analysis claiming

that the "overall experimental effect, calculated per study, was about 51

percent" (p. 141). However, this newly calculated effect size is two orders

of magnitude larger than the effect size of the first RNG meta-analysis

(50.018%). (Doesn't this reflect the different aggregation unit, bit vs

experiment?) The increase has two sources. First, Radin removed the 258

PEAR studies included in the first meta-analysis (without discussing why,

because the whole purpose of that piece of the chapter was to address

whether PEAR had been replicated, so it didn't make any sense to include

it in the mix!) and second, he presented simple mean values instead of

weighted means as presented 10 years earlier. The use of simple mean

values in meta-analyses is generally discredited (e.g., Shadish & Haddock,

1994), because it does not take into account that larger studies provide

more accurate estimates of effect size. (ONLY assuming es is independent

of N) In this case, the difference between computing an overall effect size

using mean values rather than weighted mean values is dramatic. The

removal of the PEAR studies effectively increased the impact of other

small studies that had very large effect sizes. The effect of small studies

on the overall outcome will be a very important topic in the current meta-

analysis.

Recently, Radin & Nelson (20022003) published an update of their

earlier (1989) RNG meta-analysis, adding a further 176 studies to their

database. In this update, the PEAR data were collapsed into a new, single

datapoint. The authors reported a simple mean effect size of 50.7%.


Presented as such, the data appear to suggest that this updated effect

size replicates that found in the first RNG meta-analysis. However, when

the weighted fixed-effect model is applied to the data, as was used in the

first RNG meta-analysis, the effect size of the updated database becomes

¯o = .50005, which is significantly smaller than the effect size of the

original RNG meta-analysis (z? = 4.27, p = 1.99*10-5; see Table 2 for

comparison). One reason for the difference is the increase in sample size

of the more recent experiments, which also have a concomitant decline in

effect size. (es vs. N issue again)

Like the other meta-analyses, the updated 2002 meta-analysis did not

investigate any potential moderator variables and no inclusion and

exclusion criteria were specified (I don't understand this latter business

about not specifying the inclusion criteria); it also did not include a

heterogeneity test of the database. All three meta-analyses were

conducted by related research teams and thus an independent replication

of their findings is lacking. The need for a more thoroughgoing meta-

analysis of PK RNG experiments is clear. (fair enough)

Human Intention Interacting with Random Number Generators:

A New Meta-Analysis

The meta-analysis presented here was part of a five-year consortium

project on RNG experiments. The consortium comprised research groups

from PEAR, USA; the University of Giessen, Germany; and the Institut für

Grenzgebiete der Psychologie und Psychohygiene [Institute for Border

Areas of Psychology and Mental Hygiene] in Freiburg, Germany. After all

three groups in the consortium failed in their appropriately-powered (what


was beta?) experiments (assuming an effect per bit model!!!!!) to

replicate the mean shift of the PEAR group data (Jahn et al., 2000), which

form one of the strongest and most influential datasets in psi research,

the question about possible moderating variables in RNG experiments

rose to the forefront (historically, and ironically, doing a M-A of the REG

literature focusing on the moderating variable question was a task I

assigned to my Freiburg team -- Boesch and Boller -- in 1996 and 1997).

Consequently, a meta-analysis was conducted to determine whether the

existence of an anomalous interaction could be established between

direct human intention and the concurrent output of a true RNG, and if so,

whether there were moderators or other explanations that influenced the

apparent connection.

Method

Literature Search

The meta-analysis began with a search for any experimental studies

that examined the possibility of an anomalous connection between the

output of an RNG and the presence of a living being. This search was

designed to be as comprehensive as possible in the first instance, and to

be trimmed later in accordance with our prespecified inclusion and

exclusion criteria. Both published and unpublished manuscripts were

sought.

Manual searches were undertaken at the library and archives of the

Institut für Grenzgebiete der Psychologie und Psychohygiene in Freiburg,

Germany. They included searches of the following journals: Proceedings of


the Parapsychological Association Annual Convention (1968, 1977-1981,

1983-1999), Research in Parapsychology (1969-1976, 1982, 1984, 1985,

1988), Journal of Parapsychology (1959-1998), Journal of the Society for

Psychical Research (1959-1999), European Journal of Parapsychology

(1975-1998), The Journal of the American Society for Psychical Research

(1959-1998), Journal of Scientific Exploration (1987-1998), Subtle Energies

(1991-1997), Journal of Indian Psychology (1978-1999), Tijdschrift voor

Parapsychologie (1959-1999), International Journal of Parapsychology

(1959-1968), Cuadernos de Parapsicologia (1963-1996), Revue

Métapsychique (1960-1983), Australian Parapsychological Review (1983-

1991), Research letter of the Parapsychological Division of the

Psychological Laboratory of Utrecht (1971-1984), Bulletin PSILOG (1981-

1983), Journal of the Southern California Society for Psychical Research

(1979-1985), and the Arbeitsberichte Parapsychologie der technischen

Universität Berlin (1971-1980). (Why so many missing years in these

journals?)

Electronic searches were conducted on the Psiline Database System

(Vers. 1999), a continuously updated specialized electronic resource of

parapsychologically-relevant writings (White, 1991). The key words used

to identify relevant articles in this specialized database were Random

Number Generator, RNG, Random Event Generator and REG. Electronic

searches were also conducted on six CDs of Dissertation Abstracts Ondisc

(Jan. 1961 - Sep. 1999) using four different search strategies. First, the

key words random number generator, RNG, random event generator,

REG, randomness, radioactive, parapsychology, perturbation,


psychokinesis, PK, extra-sensory perception, telepathy, precognition and

calibration were used. Second, the key words random and experiment

were combined with event, number, noise, anomalous, anomaly,

influence, generator, apparatus or binary. Third, the key word machine

was combined with man or mind. Fourth, the key word zener was

combined with diode.

To obtain as many relevant unpublished manuscripts as possible, visits

were made to three other prolific parapsychology research institutes: the

Rhine Research Center, Durham NC; PEAR at Princeton University; and the

Koestler Parapsychology Unit at Edinburgh University. Furthermore, a

request for unpublished studies was placed on an electronic mailing list

for professional parapsychologists (Parapsychology Research Forum -

PRF). Is this really a professional forum? I haven't read it for many years,

and it seems to be open to anyone? It is, or was, indeed open to anyone.

Finally the reference sections of all retrieved journal articles, conference

proceedings, reports and thesis/dissertations were searched. The search

covered a broad range of languages and included items in Dutch, English,

French, German, Italian and Spanish and was otherwise limited only

because of lack of further available linguistic expertise.

Inclusion and Exclusion Criteria

The final database included only studies that examined the correlation

between direct human intention and the concurrent output of true RNGs.

Thus, after the comprehensive literature search was conducted we

excluded studies that: (a) involved, implicitly or explicitly, only an indirect

intention toward the RNG. For example, telepathy studies, in which a


receiver attempts to gain impressions about the sender's viewing of a

target that had been randomly selected by a true RNG, were excluded

(e.g., Tart, 1976). Here, the receiver's intention is presumably directed to

gaining knowledge about what the sender is viewing, rather than on

influencing the RNG; (b) used animals or plants as participants (e.g.,

Schmidt, 1970); (c) assessed the possibility of a non-intentional, or only

ambiguously intentional, effect. For instance, studies evaluating whether

hidden RNGs could be influenced when the participant's intention was

directed to another task or another RNG (e.g., Varvoglis & McCarthy,

1986) or studies that used babies as participants (e.g., Bierman, 1985);

(d) looked for an effect backwards in time or, similarly, in which

participants observed the same bits a number of times (e.g., Morris, 1982;

Schmidt, 1985); (e) evaluated whether there was an effect of human

intention on a pseudo RNG (e.g., Radin, 1982). (This list seems to exclude

all Field REG studies.)

Additionally, studies were excluded if their outcome could not be

transformed into the effect size ¯o that was prespecified for this meta-

analysis (if the tool we have is a hammer, we will only look at nails). As a

result, studies that compared the rate of radioactive decay in the

presence of attempted human influence with that of the same element in

the absence of human intention (e.g., Beloff & Evans, 1961), were

excluded. The cut-off date for inclusion of studies in the meta-analysis was

prespecified as 30th August 2000. Then why do none of the searched

literature sources go up to this date?


Defining Studies

Some studies were reported in both published and unpublished forms,

or both as a full journal article and elsewhere as an abstract. In these

cases, all reports of the same study were used to obtain information for

the coding, but the report with the most details was classified as the

"main report". The main reports often contained more than one "study". A

study was the smallest experimental unit described that did not overlap

with other data in the report. This enabled the maximum amount of

information to be included. In cases where the same data could be split up

in two different ways (e.g., men vs. women or morning sessions vs.

afternoon sessions), the split was used that appeared to reflect the

author's greatest interest in designing the study.

Many studies performed unattended randomness checks of the RNG to

ensure that the apparatus was functioning properly. These control runs

were coded in a separate "control" database. Data for these control runs,

like the experimental database, were split based on the smallest unit

described. In some experiments, data were gathered in the presence of a

participant with an instruction to the participant "not to influence" the

RNG (e.g., Jahn et al., 2000). These data were excluded from both

experimental and control databases due to the inherent ambiguity of

whether intention is playing an influential role. (So something called a

control condition is ambiguous and can't be included in control? In fact,

the PEAR lab did a huge amount of " unattended randomness checks" that

are not included in this M-A Now we see that even the baseline condition,


which was explicitly designed to be compared against the bi-polar high-vs-

low intention conditions is not included in controls?)

Moderator Variables

To identify potential moderators, all variables suggested by previous

literature reviews were coded [blindly? No -- this coding was certainly not

blind] (Gissurarson, 1992 & 1997; Gissurarson & Morris, 1991; Schmeidler,

1977). Additionally, several descriptive variables and variables explicitly

or implicitly held responsible for the presence of absence of an anomalous

correlation were placed on the internet and discussed by researchers

interested in the topic. After considering the feedback and making any

requisite revisions to the coding form, 20 papers were randomly selected

and independently pilot coded by FS and EB. Afterwards, the two sets of

coding were compared, coder disagreements were discussed, ambiguities

in the coding descriptions were clarified, and the coding form was

finalized.

The variables coded covered six main areas: (i) Basic information, such

as year of publication and study status (i.e., formal, pilot, mixed, control);

(ii) Participant information, such as selection criteria (e.g., none, psychic

claimant, prior success in psi experiment, ?); (iii) Experimenter

information, such as whether the experimenter also acted as a participant

(e.g., no, yes, partially); (iv) Experimental setting, such as type of

feedback (visual, auditory, ?); (v) Statistical information variables, such as

total number of bits (sample size) [what about bits/effort, bits/subject,

etc.?]; and (vi) Safeguard variables.


The final coding form contained 67 variables. The comprehensive

coding was applied because, prior to coding the studies, it was not clear

which variables would provide enough data for a sensible moderator

variable analysis. However, because of the importance of the safeguard

variables, i.e., the moderators of quality, we prespecified that the impact

of the three safeguard variables would be examined independently of

their frequency distribution. The safeguards were: (1) RNG control, which

recorded whether malfunction of the RNG had been ruled out by the

study, either by using a balanced design or by performing control runs of

the RNG (but note that real control data were excluded in some cases); (2)

all data reported, which addressed whether the final study size matched

the planned size of the study or whether optional stopping or selective

reporting may have occurred (note that PEAR REG studies do not report a

planned size, but that optional stopping CAN NOT have and effect on the

outcome); (3) split of data (define) , which noted whether the split of data

reported was explicitly planned or was potentially post-hoc. All safeguards

were ranked on a three point scale (yes [2], earlier3/unclear[1], no[0]) with

the intermediate value being coded either when it was unclear whether

the study actually took the safeguard into account or where it was only

partially taken into account. Because summary scores of safeguard

variables are problematic if considered exclusively (e.g., Jüni, Witschi,

Bloch, & Egger, 1999), we examined the influence of the safeguard

variables separately and in conjunction.

3 When authors referred to previous studies in which the RNG was tested, studies were

coded as controlled "earlier".


The main coding was undertaken by FS [blindly?]. For any potentially

controversial or difficult decisions, FS consulted with EB. If FS and EB

could not agree, the final decision fell to HB, who was blind as to who held

which opinion. Over time, HB's decisions generally supported FS and EB

equally, thus suggesting that HB served well as mediator.

Analyses

All analyses were performed using SPSS (Vers. 11.5) software. The

effect sizes of individual studies were combined into composite mean

weighted effect size measures as described in Footnote 1. To determine

whether the from each subsample (class) significantly differed from

MCE, the standard error based on the within-study variance was

calculated (Shadish & Haddock, 1994). (but in this database and M-A, the

between-study variance within subsamples is at issue. The chosen se is

arguably not appropriate. It could only be correct if the subsamples are

indeed homogeneous -- did the authors really identify all the moderators?

One that is apparently not identified is subject differences, what PEAR

called "operators" and showed to be a powerful moderator.) The resulting

z-score indicates whether ¯o differs from MCE. To determine whether

each subsample of s shared a common effect size (i.e., was consistent

across studies), a homogeneity statistic Q was calculated, which has an

approximately 2 distribution with k - 1 degrees of freedom, where k is the

number of effect sizes (Shadish & Haddock, 1994). Given the importance

of this, the method for calculating Q should be better specified. I'm

guessing it is the sum of z scores. The difference between two effect size

estimates was determined using z? = (¯1 - ¯2) / sqrt (SE12 + SE22).


As an initial, straightforward, sensitivity approach to estimating the

overall effect size, we trimmed the overall experimental sample until it

became homogeneous. This was conducted by applying an algorithm to

the data which successively excluded the study that contributed most to

the heterogeneity of the sample. The procedure stopped when the 2

heterogeneity statistic became non-significant (Hedges & Olkin, 1985). A

comparison between the resulting homogeneous sample and the studies

that had to be trimmed from the overall database (the "trimmed studies"

sample) allows one to assess the reliability of the estimated effect size

and to estimate the impact of aberrant values on the overall result.

In an attempt to explore the putative impact of moderator and

safeguard variables on the effect size and to determine the source(s) of

heterogeneity, a meta-regression analysis was carried out. Meta-

regression is a multivariate regression analysis with independent studies

as the unit of observation (e.g., Thompson & Higgins, 2002; Thompson &

Sharp, 1999). This analysis determines how the variables in the model

account for the heterogeneity of effect size. We applied a weighted

stepwise multiple regression analysis with the moderators as predictors

and effect size as the dependent variable.

In the absence of homogeneity, and as a general sensitivity measure,

we additionally calculated a random-effect model (Shadish & Haddock,

1994), which takes into account the variance between studies (i.e.,

heterogeneity on the basis of the Q homogeneity statistics). This should

be better described. I'm not entirely clear on the difference between fixed

and random effect models. Because, generally, the standard error using a


random effects model is larger, the test statistic is consequently more

conservative then the test statistic of the fixed effect model. The z-score

(rnd) indicates whether ¯o differs from MCE using a random effect

approach. However, even in the absence of homogeneity, the fixed-effect

model is particularly appropriate in the context of the studies collected

here because although the impact of some alleged moderator variables

will be examined, no moderator has been established yet and the overall

effect remains a matter of contention.

Results

Study Characteristics

The literature search retrieved 155 main reports containing 712

experimental studies and 158 control studies. After applying the inclusion

and exclusion criteria, the meta-analysis included 114 reports containing

357 experimental studies and 142 control studies (see Appendix).

The basic study characteristics are summarized in Table 3. The heyday

of RNG experimentation was in the 1970s, when more than half of the

studies were published. A quarter of the studies were published in

conference proceedings and reports, but most of the studies were

published in journals. The number of participants in the studies varied

considerably. Approximately one quarter of studies were conducted with a

sole participant and another quarter with up to 10 participants. There

were only a few studies with more than 100 participants. The sample size

of the average study is 6,095,359 bits. However, most studies were much

smaller, as indicated by a median sample size of 6,400 bits (see Table 4).


The few very large studies considerably increased the average sample

size and resulted in an extremely right-skewed distribution of sample size.

This variable was therefore log10-transformed. Consequently, a significant

linear correlation or regression coefficient of sample size with another

variable would indicate an underlying exponential relationship.

Overall Analyses

When combined, the 357 experimental studies yielded a small, but

statistically significant effect (¯o = .500029, SE = .000011, z = 2.73,

p = .003, one-tailed). The 142 control studies yielded a non-significant

effect (¯o = .500026, SE = .000015, z = 1.76, p = .08) that was

nevertheless comparable in size to the effect demonstrated in the

experimental studies (z? = 0.15, p = .87). However, because RNG

experiments do not follow a classical control group design, this

comparison is merely descriptive. The control studies had a much larger

median sample size than the experimental studies (50,000 bits vs. 6400

bits, respectively).

The two samples differed considerably in respect of their effect size

distribution. Whereas the control data distributed homogeneously

(2(141) = 138.16, p = .55), the effect size distribution of the

experimental studies was extremely heterogeneous (2(356) = 1442.89,

p = 1.45*10-130). We therefore conducted several sensitivity analyses on

the intentional data in an attempt to determine the source(s) of the

heterogeneity.


Trimmed Sample

As can be seen in Table 4, 70 studies had to be excluded before the 2

heterogeneity statistic became non-significant, a value (20%) that,

although at higher end of the span (what span?), is not uncommon in

meta-analyses on psychological topics (Hedges, 1987). The homogenous

sample of 287 studies had very similar characteristics to the original

overall sample. Both samples had comparable mean and median sample

sizes (numbers of bits per study), and comparable mean and median

number of participants (see Table 4). Their effect sizes were also similar

(z? = .80, p = .42), although the effect in the smaller, homogenized

sample did not reach significance.

The heterogeneous sample that had been removed from the

experimental database differed considerably from both the overall and the

homogenized sample. The removed studies had generally been published

earlier and had been carried out with fewer participants than the other

two samples; they also contained fewer studies with large sample sizes

than the homogeneous subsample (see Table 4). (note the convergence

with the observation that es is anticorrelated with N) The effect size, too,

of the studies that had been removed was almost one order of magnitude

larger than that of the overall sample (z? = 3.33, p = 8.68*10-4) and of the

homogenized sample (z? = 2.99, p = 2.77*10-3).

Safeguard Variable Analyses

The majority of studies had adequately implemented the specified

safeguards (see Table 5). Almost 40% of the studies (n = 138) were given

the highest rating for each of the three safeguards.


Generally, inadequate safeguards did not appear to be a reason for the

significant effect found in the experimental studies. Studies implementing

RNG controls or reported all data safeguards did not have significantly

lower effect sizes than studies that did not implement these safeguards

(z? = .14, p = .89; z? = 1.13, p = 0.19; respectively). However, studies

with a post-hoc split of data had a significantly larger effect size than

studies that had preplanned their split of data (z? = 3.30, p = 9.55*10-4).

The safeguard sum-score revealed that the mean effect size of the 138

studies with the maximum safeguard rating was tenfold larger than that in

the overall experimental database (z? = 2.59, p = 9.50*10-3). These high-

quality studies had a smaller mean sample size and were conducted

slightly more recently than the average study in the overall database.

(later studies are smaller?) However, it is not possible to draw any

conclusions about the impact of study quality on effect size because of the

uneven distribution of studies across the summary scale. Nevertheless,

the trend that study quality is connected with study size and publication

date is suggestive. As can be seen from Table 5, smaller and older studies

seem to be of lower quality. (and have smaller effect size?)

In summary, the heterogeneity in the database is not primarily due to

the contribution of misleadingly significant results from badly-designed

studies.

Moderator Variable Analyses

Beside sample size and year of publication, which are generally highly

underrated potential moderators, only very few of the variables coded

provided enough entries for us to be able to carry out sensible moderator


variable analyses. For instance, we were interested whether participants

filled in psychological questionnaires. Although this was the case in 96

studies, only 12 used an established measure. Therefore, beside sample

size and year of publication, we focused on five primary variables for RNG

experiments which provided enough data for a sensible moderator

variable analysis.

The summary given in Table 4 compares the mean effect sizes

associated with the 5 potential moderators with those from the overall

and the trimmed sample. It is quite obvious that sample size is the most

important moderator of effect size. (a repeated refrain) The studies in the

quartile comprising the smallest studies (Q1) have an effect size which is

three orders of magnitude bigger than the effect size in the quartile with

the largest studies (Q4). The difference is highly significant (z? = 8.84, p <

1*10-10). The trend is continuous: the smaller the sample size, the bigger

the effect size; a connection which Sterne, Gavaghan, & Egger (2000)

called the "small-study effect". The funnel plot (Figure 1) illustrates the

effect. Whereas the bigger studies distribute symmetrically round the

overall effect size, the distribution of studies below 10,000 bits is

increasingly asymmetrical. In respect of the mean year of publication, the

largest studies (Q4) stand out from the other three, smaller-study

quartiles. The largest studies are, on average, published 4-5 years later

than the smaller studies. Most of the big studies, with very small effect

sizes, have been published only recently (e.g., Jahn, Dunne, Dobyns,

Nelson, & Bradish, 1999; Jahn et al., 2000; Nelson, 1994).


The year of publication underpins the importance of sample size for the

outcome of the studies (see Table 4). The oldest studies (Q1), which have

the smallest mean sample size, have an effect size which is two orders of

magnitude bigger than the effect size of the newest studies (z? = 13.53,

p < 1*10-10), which have the largest mean sample size. (This seems to

differ from statements on preceding page) However, the impact of

sample size is not evident for the two middle quartiles. (Note that the

largest studies, PEAR, use only unselected subjects, who evidently have a

smaller average es. Is this accounted for in the present modeling and M-

A?) The effect size of the old studies (Q2) is significantly larger (z? = 2.26,

p = 0.02) than the effect size of the new studies (Q3), although the

sample size of the old studies is larger (old study N is larger?). Thus, time

might play a role on its own. The experiments might have changed in

other ways than increase of samples size. (Isn't that what a moderator

variable study is supposed to assess?)

Although causal connections are difficult to establish in meta-analyses,

we examined the interaction of sample size and year of publication and

their impact on effect size in order to understand how the two moderators

are linked to one another. The oldest studies (Q1) are particularly

interesting because their effect size is considerably bigger than the effect

size of the subsequent studies (Q2-Q4). The z-value of the oldest studies is

also the highest of the four quartiles and thus indicates that they were the

most successful (see Table 4). A median split of the oldest studies

according to sample size revealed that the effect sizes of the two halves

differ significantly from each other (z? = 5.62, p = 5.62*10-8). The half with


the smaller sample size (n = 45, M = 982, Mdn = 500) has an effect size

of ¯o = .519137 (SE = .002484, z = 7.71, p < 1*10-10) whereas the half

with the bigger sample size (n = 45, M = 36123, Mdn = 9600) has an

effect size of ¯o = .505000 (SE = .000399, z = 12.53, p < 1*10-10). The

mean year of publication in both subsamples is the same M = 1971 and

not different from the mean year of publication of the whole quartile. The

analysis suggests that sample size is the deciding moderator and not year

of publication, a finding that the pure magnitude of the effect size in the

quartiles of sample size and year of publication also suggests (see

Table 4). (It would be good to report the same analysis on the newer

studies)

The number of participants in RNG experiments is clearly linked to

effect size (see Table 4). Studies with a sole participant (Q1) have an

effect size that is at least one order of magnitude bigger than studies with

more than one participant. However, this finding, too, is confounded by

sample size (see Table 4). Applying the same method to the quartile of

studies with one participant (Q1) as used before, we found that the

studies with smaller sample sizes (n = 47, M = 1817, Mdn = 960) have a

bigger effect size (¯o = .510062, SE = .001784, z = 5.64, p = 1.70*10-8)

than the studies with larger sample sizes (n = 44, M = 239197,

Mdn = 24124, ¯o = .500368, SE = .00017, z = 2.19, p = 0.03). The

difference is highly significant (z? = 5.41, p = 6.32*10-8). This analysis puts

the apparent superiority of studies with a sole participant into question

and identifies sample size as an important confounder. However, it can be

argued that small studies in general and small one-participant studies in


particular are fundamentally different from larger studies - an argument

that cannot easily be dismissed. It actually is one of the potential sources

accounting for the small-study effect, which will be discussed later. (But

the preceding analysis apparently discounts this with an explicit and

relevant comparison.)

The current meta-analysis also seems to support the claim that selected

participants perform better than non-selected participants. The claim has

already been confirmed by an earlier meta-analysis (Honorton & Ferrari,

1989). As can be seen in Table 4 the effect size of studies with selected

participants is one order of magnitude bigger than the effect size of

studies that did not select their participants on e.g. grounds of prior

success in a psi experiment or for being a psychic claimant. Studies with

selected participants are predominantly carried out with only one or very

few participants, as indicated by the mean and median number of

participants in Table 4. Studies with unselected populations are regularly

carried out with more participants than studies with selected participants.

However, this finding is confounded by sample size (and by number of

participants. Studies with unselected populations also have a larger

sample size than studies with selected participants. The systematic

selection of participants might play an important role in RNG experiments,

and it is certainly not implausible that longer experiments (is this an

implicit equation of length of experiment with N of samples? It is not so

simple.) are tiring for participants and therefore might produce different

results. The argument is similar to that regarding the number of


participants in RNG experiments, where experiments with fewer

participants may be shorter and/or make participants feel more involved.

Study status is an important moderator in meta-analyses that include

both formal and pilot experiments. Pilot experiments are likely to

comprise a selective sample insofar as they tend to be published if they

yield significant results (and hence larger-than-usual effect sizes) and not

published if they yield unpromising directions for further study. However,

pilot and formal studies in this sample did not differ in respect of effect

size (z? = 0.68, p = 0.50). Although the effect size of the pilot studies was

bigger, it is not significantly different from the null value because its SE is

more than four times larger (see Table 4). Pilot experiments are, as one

would expect, smaller than formal experiments.

The type of feedback to the participant in RNG studies has been

regarded as an important issue in psi research from its very inception. The

majority of RNG experiments provide participants with visual and some

with auditory feedback. Beside the two main categories the coding

resulted in a large "other" category with 101 studies, which used, for

example, alternating visual and auditory feedback, or no feedback at all.

The result is clear-cut: studies providing exclusively auditory feedback

outperform not only the studies using visual feedback (z? = 6.12, p =

9.24*10-10), but also the studies in the "other" category (z? = 5.93, p =

2.01*10-9). However, this finding is based on a very small and very

heterogeneous sample of large studies (see Table 4), although the studies

using visual feedback are, on average, even larger. Nevertheless, the

auditory feedback studies were surprisingly comparable to the large


sample size studies (Q3) in terms of their mean numbers of participants,

year of study, and other variables (see Table 4).

The core (this isn't quite the right word) of all RNG studies is the

random source. Although the participants' intention is generally directed

(by the instructions given to them) to the feedback and not to the

technical details of the RNG, it is the sequence of random numbers that is

compared with the theoretical expectation (e.g., binominal distribution)

and that is, therefore, allegedly influenced. RNGs are based on truly

random radioactive decay, Zener noise, or occasionally thermal noise. As

shown in Table 4, the effect size of studies with RNGs based on

radioactive decay is two orders of magnitude larger than the effect size of

studies using noise (z? = 4.28, p = 1.87*10-5). However, this variable, too,

is confounded by sample size. Studies using radioactive decay are much

smaller than studies using noise (see Table 4). Chronologically, studies

with RNGs based on radioactive decay predominated in the very early

years of RNG experimentation, as indicated by their mean year of

publication, which is just two years above the mean year of publication of

the oldest studies in our sample (see Table 4).

Meta-Regression Analysis

The meta-regression analysis included the seven moderator variables

(see Table 4) and the three safeguard variables (see Table 5) discussed

above, using effect size as the dependent variable and the moderators

and the safeguards as predictors. The three variables sample size, year of

publication, and number of participants, which were split into quartiles for

the previous moderator variable analyses (see Table 4), went into the


regression analysis with their nominal values. All other variables were

dummy coded. The analysis was weighted by the inverse of the within-

study variances (this again assumes that all important moderators have

been identified, and that probably is not the case as noted before). From

the 10 predictors only two, year of publication and auditory feedback,

entered the model (see Table 6)4. However, the model accounts for only

5% of the variance. This suggests that none of the variables we put into

the regression analysis, nor any combination of the variables, accounts for

the great variability in effect size we found in the data.

Random Effect Model

As can be seen in Table 4, the z-score (rnd) for the effect size based on

a random-effects model for all heterogeneous subsamples of studies

4 The moderator variable analyses already clearly indicated that sample size is the most

important predictor of effect size. The importance of the predictor was not confirmed by

our weighted meta-regression analyses because it is weighted by the inverse of the

within-study variance which almost perfectly correlates with sample size. Therefore,

sample size cannot enter the regression model. An unweighted stepwise multiple

regression analysis with the same predictor variables clearly stresses the importance of

sample size in this meta-analysis. Sample size enters the model first and accounts for 9%

of the variance. After three more steps the model accounts for 20% of the variance with

sample size ( = -.26), RNG control earlier ( = .28), random source radioactive ( = .14)

and split of data preplanned ( = -.11) as predictors. However, an unweighted regression-

analysis is difficult to interpret, especially when the effect size is so strongly connected

with the study variance. Larger studies provide better estimates and should have more

weight, otherwise the regression is dominated by the impact of smaller studies, which

might be more prone to publication bias and other effects which will be discussed under

the heading of the small-study effect in the discussion section.


becomes non-significant. This measure also indicates, as all previous

analyses have shown, that there is no simple overall effect. If there is an

effect of human intention on the concurrent output of true RNGs then at

least one moderator must be involved. From all sensitivity analyses

presented here, sample size seems to be the most promising candidate.

However, the connection between sample size and effect size can have

many different causes, as will be discussed in the next section.

Discussion

Altogether, the meta-analysis divulged (an awkward word, perhaps

"indicated" is better) three main findings: (i) a very small but statistically

significant overall effect, (ii) a tremendous variability of effect size and (iii)

a highly visible (I'd say "suspected") small-study effect. (the tremendous

variability points to an inadequate model, and incomplete specification of

moderators. It cannot be appropriate to simply lay a random effects model

on such data. Moreover, this also points to an inadequate fixed effects

model.)

Statistical Significance

The meta-analysis replicated the finding of the previous meta-analyses

in the sense that the very small overall effect was significantly different

from the expected value. The mean effect size of the control studies did

not differ significantly from MCE, although the size of the effect was

comparable to that of the experimental studies. This does not necessarily

imply that the effect found in the experimental studies is spurious; RNG

experiments do not follow a standard test-control design to determine an


effect, rather the test is always (not always) against MCE. Control data in

RNG experiments are simply used to demonstrate that the RNG output fits

the theoretical premise (e.g., binominal distribution). (this is not true, or it

is an idiosyncratic, and inappropriate definition of control -- which I believe

is in fact presented early in this paper.) If the control data do not fit the

theoretical premise, the RNG will be revised or a different RNG will be

used. As a consequence, published control data are unequivocally (how do

they know?) subject to a positive selection process - that is, divergent

control data will generally not be published because they would cast

doubt on the experiment as a whole. (This seems to imply that studies

published without citing control data have withheld that data, which I

doubt is always, or even often, the case.) The fact that the experimental

studies reached statistical significance and the control studies did not is a

matter of statistical power - the sample size of the control studies is only

half as large as the sample size of the experimental studies (1.07*109 bits

vs. 2.17*109 bits, respectively). (as noted earlier, by the author's

definition, huge quantities of "control" data are not included in this

analysis, so any discussion of the es is misleading.) Moreover, the p-value

for the control studies was based on two-sided testing, whereas the

p-value for the intentional studies was one-sided.

The safeguard analyses demonstrated that the significance of the

experimental studies is not the result of low quality studies. Nevertheless,

the statistical significance, as well as the overall effect size, of the

combined experimental studies has dropped continuously from the first

meta-analysis to the one reported here. This is partially the result of the


more recent meta-analyses including newer, larger studies. However,

another difference between the current and the previous meta-analysis

lies in the application of unequivocal inclusion and exclusion criteria. We

focused exclusively on studies examining the alleged concurrent

interaction between direct human intention and RNGs. All previous meta-

analyses also included non-intentional (in the sense of fieldreg studies, no;

but we did include studies with implied intention) and non-human studies.

Although this difference might explain the reduction in effect size and

significance level, it cannot explain the extreme statistical heterogeneity

of the database. This topic was neglected in the previous RNG meta-

analyses (not really, we did look at the issue of homo- and heterogeneity;

we just didn't try to figure out where it came from, as that wasn't the

purpose of those MAs). We believe that the overall statistical significance

found in our meta-analysis cannot be unequivocally interpreted in favor of

an anomalous interaction as long as the tremendous variability of effect

size remains unexplained. (well, yeah -- so isn't that the purpose here?)

Variability of Effect Size

The variability of effect size in this meta-analysis is tremendous. We

took several approaches to address this variability. For instance, trimming

20% of the studies from the overall sample resulted in a homogeneous

subsample of studies. (but your other analyses indicated there was in fact

no justification for the trimming on basis of, say, quality. Therefore, given

the time and N vs es correlations, this trivially must weaken the

database.) Although the effect size of the homogenous sample did not

differ significantly from that of the overall sample, the outcome did not


differ significantly from MCE. However, although this indicates how

vulnerable the overall result is in terms of statistical significance, this

approach cannot explain what variables or (selection) processes are

responsible for the variability. The extreme variability does not seem to be

the result of any of the moderator variables examined - none of the

moderator variable subsamples was independently homogeneous, not

even sample size.

The moderator variable analyses demonstrated that sample size was a

consistent and prominent moderator of effect size (variability). All of the

moderator variables we analyzed were confounded by sample size. The

Monte Carlo simulation of publication bias at the end of the next section

will demonstrate that even though variability of effect size and the small-

study effect are two separate concepts, they are in fact connected.

Small-Study Effect

For a similar class of studies (what sorts of studies are similar to these?)

it is generally assumed that effect size is independent of sample size.

(yes) However, from the sensitivity analyses it is evident that the effect

sizes in this meta-analysis strongly depend on sample size. (yes!) The

asymmetric distribution of effect sizes in the funnel plot (see Figure 1), as

well as in the continuous decline of effect size with increasing sample size

in the sample size quartiles, illustrate this. How can this be explained?

Table 7 provides a list of potential sources for the small-study effect.

The sources fall into three main categories (1) true heterogeneity, (2) data

irregularities, and (3) selection biases. (Is true heterogeneity the

representation of a better model? If so it is sensible, but needs to be


spelled out.) Chance, another possible explanation for a small-study

effect, seems very unlikely because of the magnitude of the effect and the

sample size of the meta-analysis.

True heterogeneity

The higher effect sizes of the smaller studies may be due to specific

differences in experimental design or setting in the smaller compared with

the larger studies. In other words, the small-study effect is seen to be the

result of certain moderator variable(s). For instance, smaller studies might

be more successful because the participant-experimenter relationship is

more intense. The routine of longer experimental series may make it

difficult for the experimenter to maintain enthusiasm in the study.

However, explanations such as these remain speculative as long as they

are not systematically investigated and meta-analyzed. (There is another

general category of potential explantion that is consistent with the N vs es

finding -- the mechanism of the effect may not be correctly modeled by

assuming bit-wise effects.)

From the moderator variables investigated in this meta-analysis, the

hypotheses that smaller studies on average tested a different type of

participant and used a different form of feedback are the most interesting.

However, the moderator variable analyses showed that these two

variables, as well as all other variables examined, are linked to sample

size. For almost all moderator variables, the subsamples ("class" in

Table 4) with the smallest mean sample sizes had the biggest effect. This

suggests (but does not demonstrate. Given other issues, such as the

model assumptions, this conclusion should not be accepted.) that sample


size, and not the moderator variables, was the prime factor driving the

heterogeneity of the database. This view is also supported by the

heterogeneity of effect size observed in all of the moderator subsamples .

Empirically, true heterogeneity among studies cannot be eliminated as

a causal factor for the small-study effect, especially regarding complex

interactions, which we have disregarded here. However, the heterogeneity

of the moderator-variable subsamples and the outstanding importance of

sample size at all levels of analysis exclude true heterogeneity as the

main source accounting for the small-study effect. This ignores goal-

directedness, DAT, effort per time, etc. It also ignores the possibility that

PK operates on the sample size distribution, which would predict a perfect

sqrt(N) dependency.

Data irregularities

A small-study effect may be due to data irregularities threatening the

validity of the data. For example, smaller studies might be of poorer

methodological quality, thereby artificially raising their effect size

compared with that of larger studies. However, methodological quality

improves only marginally with increasing sample size (r(357) = .18,

p = 8.86*10-4) and therefore cannot explain the small-study effect in this

meta-analysis. Another form of data irregularity, namely inadequate

analysis, is based on the assumption that smaller trials are generally

analyzed with less methodological rigor and therefore are more likely to

report "false-positive results". However, this potential source is excluded

here due to the straightforward and simple effect size measure used. A

general source of data irregularity might be that smaller studies are more


easily manipulated by fraud than larger studies because, for example,

fewer people are involved. However, the number of researchers that

would have to be implicated over the years renders this hypothesis very

unlikely. In general, none of the data irregularity hypotheses considered

can explain the small-study effect.


Selection biases

When the inclusion of studies in a meta-analysis is systematically

biased in a way that smaller studies with larger smaller (original may be

correct?) p-values, i.e., larger effect sizes, are more likely to be included

than larger studies with smaller p-values, i.e., smaller effect sizes, a small-

study effect may be the result. Several well-known biases such as

publication bias, selective reporting bias, foreign language bias, citation

bias and time lag bias may be responsible for a small-study effect (e.g.,

Egger, Dickersin, & Smith, 2001; Mahoney, 1985).

Biased inclusion criteria refer to biases on the side of the meta-analyst.

In this particular domain, the two most prominent biases are foreign

language bias and citation bias. Foreign language bias occurs when

significant results are published in well-circulated, high-impact journals in

English, whereas non-significant findings are published in small journals in

the authors' native language. Therefore a meta-analysis including studies

solely from journals in English may include a disproportionately large

number of significant studies. Citation bias refers to selective quoting.

Studies with significant p-values are quoted more often and are more

likely to be retrieved by the meta-analyst. However, the small-study effect

in this meta-analysis is probably not due to these biases due to the

inclusion of non-English publications and a very comprehensive search

strategy. The authors claimed to use a very comprehensive search to

retrieve all known studies, including unpublished studies. In which case,

where are all these missing studies coming from? With say only 20

different groups of researchers ever having done these sorts of studies,


each group would have had to hide on average 70 filedrawer studies

given the authors' later estimate of 1400 filedrawer studies. This is

implausible.

One of the most important selection biases to consider in any meta-

analysis is publication bias. Publication bias refers to the fact that the

probability of a study being published depends to some extent on its

p-value. Several independent factors affect the publication of a study.

Rosenthal's term "file drawer problem" (Rosenthal, 1979) focuses on the

author as the main source of publication bias, but there are other issues

too. Editors' and reviewers' decisions also affect whether a study is

published. The time lag from the completion of a study to its publication

might also depend on the p-value of the study (e.g., Ioannidis, 1998) and

additionally contribute to the selection of studies available. Since the

development of Rosenthal's "file drawer" calculation (1979), numerous

other methods have been developed to examine the impact of publication

bias on meta-analyses (e.g., Dear & Begg, 1992; Duval & Tweedie, 2000;

Hedges, 1992; Iyengar & Greenhouse, 1988; Sterne & Egger, 2001).

In an attempt to examine publication bias we ran a Monte Carlo

simulation based on Hedges (1992) stepped weight function model and

simulated a simple selection process. According to this model, the

authors', reviewers', and editors' perceived conclusiveness of a p-value is

subject to certain "cliff effects" (Hedges, 1992) and this impacts on the

likelihood of a study getting published. Hedges (1992) estimates the

weights of the step function based on the available meta-analytical data.

However, different from Hedges, we used a predefined step-weight


function model, because we were primarily interested in seeing whether a

simple selection model may in principle account for the small-study effect

present in our meta-analytic data.

We assumed that 100% of studies with a p-value ? .01, 80% of studies

with a p-value between p ? .05 and p > .01, 50% of studies with a p-value

between p ? .10 and p > .05, 20% of studies with a p-value between

p ? .50 and p > .10 and 10% of studies with p-value > .50 (one-sided) are

published5. From this assumption, we randomly generated uniformly

distributed p-values (Sure this doesn't mean p = 0.01 is just as likely as p

= 0.5. I think they must mean normally distributed around chance

expectation.) and calculated the effect sizes for all "published" studies

and counted the number of "not published" studies. That is, on the basis

of the sample size for each of the 357 studies, we simulated a selective

null-effect publication process (meaning, I think, the distribution of studies

averages at chance, or p = 0.5). The averaged results of the simulation of

1000 meta-analyses are shown in Table 8. As can be seen, the overall

effect size based on the Monte Carlo simulation perfectly matches the

overall effect size found in our meta-analysis (see Table 4). The simulated

data clearly replicated the small-study effect (see Table 8). The simulation

5 The term published is used here very broadly to include publications of conference

proceedings and reports which in terms of our literature search were considered

unpublished. Importantly, in our discussion of the Monte Carlo simulation, the term

"published" also refers to studies obtained by splitting reports into studies. For simplicity,

we assumed in the Monte Carlo simulation that the splitting of the 114 reports into 357

experimental studies was subject to the same selection process as the published reports

themselves.


also shows that, in order for these results to emerge, a total of 1453

studies had to be unpublished, i.e. for every study published four studies

(non-significant) had to remain unpublished. That's awfully close to

Rosenthal's criterion of "robust." I wonder how long they worked with

their model parameters to produce these results. Also, the results for the

small sample size is nearly significantly different from their simulation. I'll

see if I can replicate their model to see how insensitive it is to the choice

of parameters. (also -- is the Hedges work on selective reporting in

ordinary psychology a good model for parapsychology?)

A secondary finding, which additionally confirms the value of the

simulation, is that publication bias might be responsible not only for the

small overall effect and the small-study effect found, but also for a large

proportion of the effect size variability. The simulated overall sample as

well as the 4th quartile of the largest studies show a highly significant

effect size variability, replicating what was found in our meta-analytical

data. The effect size variability in the first three quartiles of the simulation

is certainly different from the effect size variability in our meta-analytical

data. However, this might be due to the highly idealized boundary

conditions of the simulation.

Conclusion

Altogether, the simulation results are in very good agreement with the

meta-analytical data (except for 3/4ths of it), especially regarding the

overall effect size and the small-study effect, which was the primary

objective of the simulation. The simulation must be considered at least

very suggestive, especially as it accounts for all three main findings


divulged in the meta-analysis: (i) a very small but statistically significant

overall effect, (ii) a tremendous variability of effect size (only in the large

study quartile) and (iii) a highly visible small-study effect. In comparison

with all other sources potentially accounting for the small-study effect

discussed here, publication bias clearly is the most far-reaching

explanation regarding the main findings of this meta-analysis. (this monte

carlo sounds like a simple case of carefully designed inputs to yield the

expected outputs, and even then glossing over the mis-fitting.)

Nevertheless, whether the simulation of publication bias is considered

conclusive evidence for publication bias in this meta-analysis strongly

depends on how many unpublished studies one believes it is reasonable

to assume there are. However, the number of unpublished studies must

be seen against the background of the enormous pressure to publish

significant results. It is clear that not all results get published and that

journals are filled with a selective sample of statistically significant studies

(e.g., Sterling, 1959, Sterling, Rosenbaum, & Weinkam, 1995, Rosenthal,

1979) (all references to non-parapsychology fields). Given that the

majority of published studies are underpowered, it is surprising that 95%

of articles in psychological journals and more than 85% of articles in

medical journals report statistically significant results (Sterling,

Rosenbaum, & Weinkam, 1995). Authors, reviewers, as well as editors, are

all involved in the selection process.

J.B. Rhine was the first editor of the Journal of Parapsychology (inception

in 1937), the leading journal for experimental work in parapsychology. He

initially believed "that little can be learned from a report of an experiment


that failed to find psi" (Broughton, 1987, p. 27). More than 25% (n = 96) of

the studies included in the current meta-analysis were published in this

journal. However, from 1975, the Council of the Parapsychological

Association rejected the policy of suppressing non-significant studies in

parapsychological journals (Broughton, 1987; Honorton, 1985). Whereas

48% of the studies in Q1 (1959 - 1973) were statistically significant, this

rate dropped to 19% (Q2), 8% (Q3) and 14% (Q4) in the subsequent

quartiles, indicating that the policy was implemented. (Interesting to

consider the likelihood of these percentages being significant -- as a

simple study-based model to compare with the bit-based model.) This

demonstrates not only that the publication rate of significant studies in

this domain is very different from the rate in conventional fields, but also,

and more importantly, it demonstrates that, at least in the early period of

RNG experimentation, the publication process was highly moderately

selective in favor of statistically significant studies (Huh? If - as they state

- 95% of studies in standard psych journals are significant, yet for psi the

figures range from 8% to 48%, then surely this leads us away from

publication bias as an viable explanation, not towards!?).

Statistically, significance is a matter of power, but, when conducting a

meta-analysis, it can also be a matter of artifacts like publication bias.

From this perspective, not only is the early period of RNG experimentation

in great danger of being a highly selective sample, but also the database

as a whole. A power analysis based on the overall effect size found in our

meta-analysis shows that, for an RNG study to have a power of 80%, its

sample size would have to be greater than 1,800,000,000 bits. (Ok, is this


just obstinacy, or are they wearing blinders, or what?) (I am afraid it is or

what!) None of the studies included in our meta-analysis comes even

close to this sample size. Therefore the number of significant studies is

highly questionable. (sigh)

The studies published and collected here are probably a highly selective

sample. The Monte Carlo simulation indicated that there would have to be

1400 unpublished studies to account for the main findings presented here.

However, "real world" publication decisions do not simply fit to a single,

consistent set of threshold values. Publication decisions in respect of the

smallest studies might follow completely different and more complex

patterns. Consequently, far fewer (or far more!) studies might be needed

in order to replicate the main findings of our meta-analysis than indicated

by the Monte Carlo simulation. We believe that the 1400 unpublished

studies mark the upper limit of the range of unpublished studies (why?).

As a result, we doubt that the RNG database provides good evidence for

an anomalous connection between direct human intention and the

concurrent output of true RNGs. (what on earth justifies this conclusion?)

The limits of this conclusion are, of course, the assumptions to be

made. (well, yeah) One of the main assumptions in undertaking a meta-

analysis is the assumption that effect size is independent of sample size

(finally!). The independence of sample size actually defines effect size

measures (Not necessarily; there are all kinds of effect sizes). However,

there might be effects where sample size is related to effect size. In such

a case, the p-values would be independent of sample size and e.g. be

constant across studies. Although our data does not confirm this simple


model because the z-score distribution is far too heterogeneous, (explain

how this conclusion is justifiable. No such calculation is reported in this

paper.) other more complex models are conceivable. However, so far

meta-analysts in RNG research have not argued along these lines, they

have argued that there is a small but replicable constant effect. Another

assumption that is generally made is that intention affects the mean value

of the random sequence. (that is not an assumption, but the dependent

variable in most experimental designs.) Although other outcomes have

been suggested (e.g., Atmanspacher, Bösch, Boller, Nelson &

Scheingraber, 1999; Pallikari & Boller, 1999; Radin, 1989) they have been

used only occasionally. (Ignores virtually all of Schmidt's and Kennedy's

papers on this topic. Also, on p. 45 of our RNG MA paper in the Jonas

book, we say "This means the statistical effects observed in these

experiments are effectively independent of sample size, and cannot be

explained as simple, linear, force- like mechanisms?. Further indication

that a novel approach will be required to explain these effects are

experiments strongly resembling RNG studies, but involving pre-recorded

random bits rather than bits generated in real-time. Those studies show

significant cumulative results similar to those reported here (Bierman,

1998). This implies that some MMI effects, perhaps including those

claimed for distant healing, may involve acausal processes."

Although we question the conclusions of our predecessors, we would

like to remind the reader that these experiments are highly refined

operationalizations of phenomena which have challenged mankind for a

very long period of time. The dramatic anomalous PK effects reported in


séance rooms shrunk to experiments with electronic noise during a 100

year history of PK experiments. This achievement is certainly humble.

However, further experiments will be conducted. They should be

registered. This is the most straightforward solution for determining with

any accuracy the rate of publication bias (e.g., Chalmers, 2001). It allows

subsequent meta-analysts to resolve more firmly the question whether

the overall effect in RNG experiments is an artifact of publication bias as

we suspect. The PK effect in general is of great fundamental importance -

if genuine. However, until we know with any certainty just how many non-

significant, unpublished studies there are likely to be, we doubt that this

unique experimental approach will gain the status of being of scientific

value. Of course, such a pessimistic statement only serves to diminish

interest in this line of work, thus guaranteeing that no one will ever know

"the answer."

A quick look through their MA references doesn't show list the four studies below, all of

which fit their inclusion criteria. This doesn't persuade me that they were as

comprehensive as they claim. What else might they have missed? They also do not

include the Jahn, et al. Publication of the 12-year databse, and other PEAR publications

that would fit, e.g., Ibison. And they do include something like the Nelson 1994 Time

normalization paper. Go figure.

Radin, D. I. & Utts, J. M. (1989). Experiments investigating the influence of intention on

random and pseudorandom events. Journal of Scientific Exploration, 3, 65-79.

Radin, D. I. (1993). Environmental modulation and statistical equilibrium in mind-matter

interaction. Subtle Energies, 4 (1), 1-30.


Radin, D. I. (1992). Beyond belief: Exploring interactions among mind, body and

environment. Subtle Energies, 2 (3), 1 - 40.

Radin, D. I. (1990). Testing the plausibility of psi-mediated computer system failures.

Journal of Parapsychology, 54, 1-19.

Human Intention Interacting with Random Number Generators (Printed:

27 August 2004) 56

References

Alcock, J. E. (1981). Parapsychology: Science or magic? A psychological

perspective. Oxford: Pergamon.

Atmanspacher, H., Bösch, H., Boller, E., Nelson, R. D., & Scheingraber, H.

(1999). Deviations from physical randomness due to human agent

intention? Chaos, Solitons & Fractals, 10, 935-952.

Beloff, J., & Evans, L. (1961). A radioactivity test of psycho-kinesis. Journal

of the Society for Psychical Research, 41, 41-46.

Bem, D. J., & Honorton, C. (1994). Does psi exist? Replicable evidence for

an anomalous process of information transfer. Psychological Bulletin,

115 (1), 4-18.

Bierman, D. J. (1985). A retro and direct PK test for babies with the

manipulation of feedback: A first trial of independent replication using

software exchange. European Journal of Parapsychology, 5, 373-390.

Blackmore, S. J. (1992). Psychic experiences: Psychic illusions. Skeptical

Inquirer, 16, 367-376.

Braude, S. E. (1997). The limits of influence: Psychokinesis and the

philosophy of science. Lanham: University Press of America.

Broughton, R. S. (1987). Publication policy and the Journal of

Parapsychology. Journal of Parapsychology, 51, 21-32.


27 August 2004) 57

Chalmers, I. (2001). Using systematic reviews and registers of ongoing

trials for scientific and ethical trial design, monitoring, and reporting. In

M. Egger, G. D. Smith, & D. Altman (Eds.), Systematic reviews in health

care: Meta-analysis in context (pp. 429-443). London: BMJ Books.

Cohen, J. (1994). The earth is round (p< .05). American Psychologist, 49,

997-1003.

Crookes, W. (1889). Notes of seances with D. D. Home. Proceedings of the

Society for Psychical Research, 6, 98-127.

Crookes, W., Horsley, V., Bull, W. C., & Myers, A. T. (1885). Report on an

alleged physical phenomenon. Proceedings of the Society for Psychical

Research, 3, 460-463.

Dear, K. B. G., & Begg, C. B. (1992). An approach for assessing publication

bias prior to performing a meta-analysis. Statistical Science, 7, 237-245.

Dudley, R. T. (2000). The relationship between negative affect and

paranormal belief. Personality and Individual Differences, 28, 315-321.

Duval, S., & Tweedie, R. (2000). A nonparametric "trim and fill" method of

accounting for publication bias in meta-analysis. Journal of the American

Statistical Association, 95, 89-98.

Edgeworth, F. Y. (1885). The calculus of probabilities applied to psychical

research. Proceedings of the Society for Psychical Research, 3, 190-199.


27 August 2004) 58

Edgeworth, F. Y. (1886). The calculus of probabilities applied to psychical

research. II. Proceedings of the Society for Psychical Research, 4, 189-

208.

Egger, M., Dickersin, K., & Smith, G. D. (2001). Problems and limitations in

conducting systematic reviews. In M. Egger, G. D. Smith, & D. Altman

(Eds.), Systematic reviews in health care: Meta-analysis in context (pp.

43-68). London: BMJ Books.

Fisher, R. A. (1924). A method of scoring coincidences in tests with playing

cards. Proceedings of the Society for Psychical Research, 34, 181-185.

Gallup, G., & Newport, F. (1991). Belief in paranormal phenomena among

adult Americans. Skeptical Inquirer, 15, 137-146.

Geller, U. (1998). Uri Geller's little book of mind-power. London: Robson.

Girden, E. (1962a). A review of psychokinesis (PK). Psychological Bulletin,

59, 353-388.

Girden, E. (1962b). A postscript to "A Review of Psychokinesis (PK)".

Psychological Bulletin, 59, 529-531.

Girden, E., & Girden, E. (1985). Psychokinesis: Fifty years afterward. In P.

Kurtz (Eds.), A Skeptic's Handbook of Parapsychology (pp. 129-146).

Buffalo, NY: Prometheus Books.


27 August 2004) 59

Gissurarson, L. R. (1992). Studies of methods of enhancing and potentially

training psychokinesis: A review. Journal of the American Society for

Psychical Research, 86, 303-346.

Gissurarson, L. R. (1997). Methods of enhancing PK task performance. In

S. Krippner (Eds.), Advances in Parapsychological Research 8 (pp. 88-

125). Jefferson, North Carolina: Mc Farland Company.

Gissurarson, L. R., & Morris, R. L. (1991). Examination of six

questionnaires as predictors of psychokinesis performance. Journal of

Parapsychology, 55, 119-145.

Hedges, L. V. (1987). How hard is hard science, how soft is soft science?

The empirical cumulativeness of research. American Psychologist, 42,

443-455.

Hedges, L. V. (1992). Modeling publication selection effects in meta-

analysis. Statistical Science, 7, 246-255.

Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis.

Orlando: Academic Press.

Honorton, C. (1985). Meta-analysis of psi ganzfeld research: A response to

Hyman. Journal of Parapsychology, 49, 51-91.

Honorton, C., & Ferrari, D. C. (1989). "Future telling": A meta-analysis of

forced-choice precognition experiments, 1935-1987. Journal of



27 August 2004) 60

Honorton, C., Ferrari, D. C., & Bem, D. J. (1998). Extraversion and ESP

performance: A meta-analysis and a new confirmation. Journal of


Irwin, H. J. (1993). Belief in the paranormal: A review of the empirical

literature. Journal of the American Society for Psychical Research, 87, 1-

39.

Iyengar, S., & Greenhouse, J. B. (1988). Selection models and the file

drawer problem. Statistical Science, 3, 109-117.

Ioannidis, J. P. (1998). Effect of the statistical significance of results on the

time to completion and publication of randomized efficacy trials. The

Journal of the American Medical Association, 279, 281-286.

Jahn, R. G., Dunne, B. J., Dobyns, Y. H., Nelson, R. D., & Bradish, G. J.

(1999). ArtREG: A random event experiment utilizing picture-preference

feedback. (Technical Note PEAR 99003). Princeton NJ 08544: PEAR

Laboratory, School of Engineering and Applied Science.

Jahn, R. G., Dunne, B. J., & Nelson, R. D. (1980). Princeton Engineering

Anomalies Research. Program Statement. (Technical Report). Princeton,

New Jersey: Princeton University, Princeton Engineering Anomalies

Research, School of Engineering/Applied Science.


27 August 2004) 61

Jahn, R. G., Mischo, J., Vaitl, D., Dunne, B. J., Bradish, G. J., Dobyns, Y. H.,

Lettieri, A., Nelson, R. D., Boller, E., Bösch, H., Houtkooper, J. M., &

Walter, B. (2000). Mind/Machine interaction consortium: PortREG

replication experiments. Journal of Scientific Exploration, 14, 499-555.

James, W. (1896). Psychical research. Psychological Review, 3, 649-652.

Jüni, P., Witschi, A., Bloch, R., & Egger, M. (1999). The hazards of scoring

the quality of clinical trials for meta-analysis. The Journal of the

American Medical Association, 282, 1054-1060.

Lawrence, T. R. (1998). Gathering in the sheep and goats... A meta-

analysis of forced choice sheep-goat ESP studies, 1947-1993. In N. L.

Zingrone, M. J. Schlitz, C. S. Alvarado, & J. Milton (Eds.), Research in

Parapsychology 1993 (pp. 27-31). Lanham, MD: Scarecrow Press.

Mahoney, M. J. (1985). Open exchange and epistemic progress. American

Psychologist, 40, 29-39.

McGarry, J. J., & Newberry, B. H. (1981). Beliefs in paranormal phenomena

and locus of control: A field study. Journal of Personality and Social

Psychology, 41, 725-736.

Milton, J. (1993). A meta-analysis of waking state of consciousness, free

response ESP studies. Proceedings of Presented Papers: The

Parapsychological Association 36th Annual Convention, 87-104.


27 August 2004) 62

Milton, J. (1997). Meta-Analysis of free-response ESP studies without

altered states of consciousness. Journal of Parapsychology, 61, 279-319.

Milton, J., & Wiseman, R. (1999a). A meta-analysis of mass-media tests of

extrasensory perception. British Journal of Psychology, 90, 235-240.

Milton, J., & Wiseman, R. (1999b). Does psi exist? Lack of replication of an

anomalous process of information transfer. Psychological Bulletin, 125,

387-391.

Morris, R. L. (1982). Assessing experimental support for true precognition.


Murphy, G. (1962). Report on paper by Edward Girden on psychokinesis.


Musch, J., & Ehrenberg, K. (2002). Probability misjudgement, cognitive

ability, and belief in the paranormal. British Journal of Psychology, 93,

177

Nelson, R. D. (1994). Effect size per hour: A natural unit for interpreting

anomalies experiments. (Technical Note PEAR 94003). Princeton

University, Princeton, NJ 08544: Princeton Engineering Anomalies

Research, School of Engineering/ Applied Science.

Pallikari, F., & Boller, E. (1999). A rescaled range analysis of random

events. Journal of Scientific Exploration, 13, 35-40.


27 August 2004) 63

Persinger, M. A. (2001). The neuropsychiatry of paranormal experiences.

Journal of Neuropsychiatry & Clinical Neurosciences, 13, 515-523.

Pratt, J. G. (1937). Clairvoyant blind matching. Journal of Parapsychology,

1, 10-17.

Pratt, J. G. (1949). The meaning of performance curves in ESP and PK test

data. Journal of Parapsychology, 13, 9-22.

Pratt, J. G., Rhine, J. B., Smith, B. M., Stuart, C. E., & Greenwood, J. A.

(1940). Extra-sensory perception after sixty years: A critical appraisal of

the research in extra-sensory perception. New York: Henry Holt and

Company.

Price, M. M., & Pegram, M. H. (1937). Extra-sensory perception among the

blind. Journal of Parapsychology, 1, 143-155.

Radin, D. I. (1982). Experimental attempts to influence pseudorandom

number sequences. Journal of the American Society for Psychical

Research, 76, 359-374.

Radin, D. I. (1989). Searching for "signatures" in anomalous human-

machine interaction data: A neural network approach. Journal of

Scientific Exploration, 3, 185-200.

Radin, D. I. (1997). The conscious universe. San Francisco: Harper Edge.

Radin, D. I., & Ferrari, D. C. (1991). Effects of consciousness on the fall of

dice: A meta-analysis. Journal of Scientific Exploration, 5, 61-83.


27 August 2004) 64

Radin, D. I., & Nelson, R. D. (1989). Evidence for consciousness-related

anomalies in random physical systems. Foundations of Physics, 19,

1499-1514.

Radin, D. I., & Nelson, R. D. (2002). Meta-analysis of mind-matter

interaction experiments: 1959 to 2000. In W. B. Jonas (Eds.), Spiritual

Healing, Energy Medicine and Intentionality: Research and Clinical

Implications Edinburgh: Harcourt Health Sciences.

Reeves, M. P., & Rhine, J. B. (1943). The PK effect: II. A study in declines.


Rhine, J. B. (1934). Extrasensory perception. Boston: Boston Society for

Psychic Research.

Rhine, J. B. (1936). Some selected experiments in extra-sensory

perception. Journal of Abnormal and Social Psychology, 29, 151-171.

Rhine, J. B. (1937). The effect of distance in ESP tests. Journal of


Rhine, J. B. (1946). Editorial: ESP and PK as "psi phenomena". Journal of


Rhine, J. B., & Humphrey, B. M. (1944). The PK effect: Special evidence

from hit patterns. I. Quarter distribution of the page. Journal of



27 August 2004) 65

Rhine, J. B., & Humphrey, B. M. (1945). The PK effect with sixty dice per

throw. Journal of Parapsychology, 9, 203-218.

Rhine, J. B., & Rhine, L. E. (1927). One evening's observation on the

Margery mediumship. Journal of Abnormal and Social Psychology, 21,

421.

Rhine, L. E. (1937). Some stimulus variations in extra-sensory perception

with child subjects. Journal of Parapsychology, 1, 102-113.

Rhine, L. E., & Rhine, J. B. (1943). The psychokinetic effect: I. The first

experiment. Journal of Parapsychology, 7, 20-43.

Richet, C. (1923). Thirty years of physical research. A treatise on

metapsychics. New York: MacMillan Company.

Rosenthal, R. (1979). The "file drawer problem" and tolerance for null

results. Psychological Bulletin, 86, 638-641.

Rosenthal, R., & Rosnow, R. L. (1991). Essentials of behavioral research.

Methods and data analysis. (2 ed.). New York: McGraw-Hill Publishing.

Rosenthal, R., & Rubin, D. B. (1989). Effect size estimation for one-sample

multiple-choice type data: Design, analysis, and meta-analysis.


Rush, J. H. (1977). Problems and methods in psychokinesis research. In S.

Krippner (Eds.), Advances in Parapsychological Research 1.

Psychokinesis (pp. 15-78). New York: Plenum.


27 August 2004) 66

Sanger, C. P. (1895). Analysis of Mrs. Verrall's card experiments.

Proceedings of the Society for Psychical Research, 11, 193-197.

Schmeidler, G. R. (1977). Research findings in psychokinesis. In S.

Krippner (Eds.), Advances in Parapsychological Research 1.

Psychokinesis (pp. 79-132). New York: Plenum.

Schmeidler, G. R. (1982). PK research: Findings and theories. In S.

Krippner (Eds.), Advances in Parapsychological Research 3 (pp. 115-

146). New York: Plenum Press.

Schmidt, H. (1969). Anomalous prediction of quantum processes by some

human subjects. Seattle, WA: Plasma Physics Laboratory. D1-82-0821:1-

38.

Schmidt, H. (1970). PK experiments with animals as subjects. Journal of


Schmidt, H. (1985). Addition effect for PK on pre-recorded targets. Journal

of Parapsychology, 49, 229-244.

Schouten, S. A. (1983). Personal experience and belief in ESP. The Journal

of Psychology, 114, 219-222.

Shadish, W. R., & Haddock, K. C. (1994). Combining estimates of effect

size. In L. V. Hedges & H. Cooper (Eds.), The handbook of research

synthesis (pp. 261-281). New York: Russell Sage Foundation.


27 August 2004) 67

Sparks, G. G. (1998). Paranormal depictions in the media: How do they

affect what people believe? Skeptical Inquirer, 22, 35-39.

Sparks, G. G., Hansen, T., & Shah, R. (1994). Do televised depictions of

paranormal events influence viewers´ beliefs? Skeptical Inquirer, 18,

386-395.

Sparks, G. G., Nelson, C. L., & Campbell, R. G. (1997). The relationship

between exposure to televised messages about paranormal phenomena

and paranormal beliefs. Journal of Broadcasting & Electronic Media, 41,

345-359.

Stanford, R. G. (1978). Toward reinterpreting psi events. Journal of the

American Society for Psychical Research, 72, 197-214.

Stanford, R. G., & Stein, A. G. (1994). A meta-analysis of ESP studies

contrasting hypnosis and a comparison condition. Journal of


Steinkamp, F., Milton, J., & Morris, R. L. (1998). A meta-analysis of forced-

choice experiments comparing clairvoyance and precognition. Journal of


Sterling, T. D. (1959). Publication decisions and their possible effects on

inferences drawn from tests of significance - or vice versa. Journal of the

American Statistical Association, 54, 34


27 August 2004) 68

Sterling, T. D., Rosenbaum, W. L., & Weinkam, J. J. (1995). Publication

decisions revisited: The effect of the outcome of statistical tests on the

decision to publish and vice versa. American Statistician, 49, 112

Sterne, J. A. C., & Egger, M. (2001). Funnel plots for detecting bias in

meta-analysis: Guidelines on choice of axis. Journal of Clinical

Epidemiology, 54, 1046-1055.

Sterne, J. A. C., Egger, M., & Smith, G. D. (2001). Investigating and dealing

with publication and other biases. In M. Egger, G. D. Smith, & D. Altman

(Eds.), Systematic reviews in health care: Meta-analysis in context (pp.

189-208). London: BMJ Books.

Sterne, J. A. C., Gavaghan, D., & Egger, M. (2000). Publication and related

bias in meta-analysis: Power of statistical tests and prevalence in the

literature. Journal of Clinical Epidemiology, 53, 1119-1129.

Storm, L., & Ertel, S. (2001). Does psi exist? Comments on Milton and

Wiseman's (1999) meta-analysis of ganzfeld research. Psychological

Bulletin, 127, 424-433.

Targ, R., & Puthoff, H. E. (1977). Mind-reach. Scientists Look at Psychic

Ability. Delacorte Press.

Tart, C. T. (1976). Effects of immediate feedback on ESP performance.

Research in Parapsychology 1975, 80-82.


27 August 2004) 69

Taylor, G. Le M. (1890). Experimental comparison between chance and

thought-transference in correspondence of diagrams. Proceedings of

the Society for Psychical Research, 6, 398-405.

Thalbourne, M. A. (1995). Further studies of the measurement and

correlates of belief in the paranormal. Journal of the American Society

for Psychical Research, 89, 233-247.

Thompson, S. G., & Higgins, J. P. T. (2002). How should meta-regression

analyses be undertaken and interpreted? Statistics in Medicine, 2002,

1559-1574.

Thompson, S. G., & Sharp, S. J. (1999). Explaining heterogeneity in meta-

analysis: A comparison of methods. Statistics in Medicine, 18, 2693-

2708.

Thouless, R. H. (1942). The present position of experimental research into

telepathy and related phenomena. Proceedings of the Society for

Psychical Research, 47, 1-19.

Thouless, R. H., & Wiesner, B. P. (1946). The psi processes in normal and

"paranormal" psychology. Proceedings of the Society for Psychical

Research, 48, 177-196.

Varvoglis, M. P., & McCarthy, D. (1986). Conscious-purposive focus and PK:

RNG activity in relation to awareness, task-orientation, and feedback.

Journal of the American Society for Psychical Research, 80, 1-29.


27 August 2004) 70

Watt, C. A. (1994). Meta-analysis of DMT-ESP studies and an experimental

investigation of perceptual defense/vigilance and extrasensory

perception. In E. W. Cook & D. L. Delanoy (Eds.), Research in

Parapsychology 1991 (pp. 64-68). Metuchen, NJ: Scarecrow Press.

White, R. A. (1991). The psiline database system. Exceptional Human

Experience, 9, 163-167.

Wilson, C. (1976). The Geller phenomenon. London: Aldus Books.

Appendix

Studies Used in the Meta-Analysis

André, E. (1972). Confirmation of PK action on electronic equipment.


Berger, R. E. (1986). Psi effects without real-time feedback using a

PsiLab ][ video game experiment. The Parapsychological Association

29th Annual Convention, 111-128.

Bierman, D. J., & Houtkooper, J. M. (1975). Exploratory PK tests with a

programmable high speed random number generator. European Journal


Bierman, D. J., De Diana, I. P. F., & Houtkooper, J. M. (1976). Preliminary

report on the Amsterdam experiments with Matthew Manning.

European Journal of Parapsychology, 1, 6-16.


27 August 2004) 71

Bierman, D. J., & Noortje, V. T. (1977). The performance of healers in PK

tests with different RNG feedback algorithms. Research in

Parapsychology 1976, 131-133.

Bierman, D. J., & Weiner, D. H. (1980). A preliminary study of the effect of

data destruction on the influence of future observers. Journal of


Bierman, D. J., & Houtkooper, J. M. (1981). The potential observer effect or

the mystery of irreproducibility. European Journal of Parapsychology, 3,

345-371.

Bierman, D. J. (1987). Explorations of some theoretical frameworks using a

PK-test environment. Proceedings of Presented Papers: The


Bierman, D. J. (1988). Testing the IDS model with a gifted subject.

Theoretical Parapsychology, 6, 31-36.

Bierman, D. J., & Van Gelderen, W. J. M. (1994). Geomagnetic activity and

PK on a low and high trial-rate RNG. Proceedings of Presented Papers:

The Parapsychological Association 37th Annual Convention, 50-56.

Braud, L., & Braud, W. (1977). Psychokinetic effects upon a random event

generator under conditions of limited feedback to volunteers and

experimenter. Proceedings of Presented Papers: The Parapsychological

Association 20th Annual Convention.


27 August 2004) 72

Braud, W., & Kirk, J. (1977). Attempt to observe psychokinetic influences

upon a random event generator by person-fish teams. European Journal


Braud, W. (1981). Psychokinesis experiments with infants and young

children. Research in Parapsychology 1980, 30-31.

Braud, W. G., & Hartgrove, J. (1976). Clairvoyance and psychokinesis in

transcendental meditators and matched control subjects: A preliminary

study. European Journal of Parapsychology, 1, 6-16.

Braud, W. G. (1978). Recent investigations of microdynamic

psychokinesis, with special emphasis on the roles of feedback, effort

and awareness. European Journal of Parapsychology, 2, 137-162.

Braud, W. G. (1983). Prolonged visualization practice and psychokinesis: A

pilot study (RB). Research in Parapsychology 1982, 187-189.

Breederveld, H. (1988). Towards reproducible experiments in

psychokinesis IV. Experiments with an electronic random number

generator. Theoretical Parapsychology, 6, 43-51.

Breederveld, H. (1989). The Michels experiments: An attempted

replication. Journal of the Society for Psychical Research, 55, 360-363.


27 August 2004) 73

Breederveld, H. (2001). De Optimal Stopping Strategie XL PK-

experimenten met een random number generator. [The optimal

stopping strategy XL. PK experiments with a random number

generator]. SRU-Bulletin, 13, 22-23.

Broughton, R. S., & Millar, B. (1977). A PK experiment with a covert

release-of-effort test. Research in Parapsychology 1976, 28-30.

Broughton, R. S. (1979). An experiment with the head of Jut. European


Broughton, R. S., Millar, B., & Johnson, M. (1981). An investigation into the

use of aversion therapy techniques for the operant control of PK

production in humans. European Journal of Parapsychology, 3, 317-344.

Broughton, R. S., & Higgins, C. A. (1994). An investigation of micro-PK and

geomagnetism. Proceedings of Presented Papers: The


Broughton, R. S., & Alexander, C. H. (1997). Destruction testing DAT.

Proceedings of Presented Papers: The Parapsychological Association


Crandall, J. E. (1993). Effects of extrinsic motivation on PK performance

and its relations to state anxiety and extraversion. Proceedings of

Presented Papers: The Parapsychological Association 36th Annual

Convention, 372-377.


27 August 2004) 74

Curry, C. K. (1978). A modularized random number generator: Engineering

design and psychic experimentation. Unpublished senior thesis,

Department of Electrical Engineering and Computer Science, School of

Engineering/Applied Science, Princeton University.

Dalton, K. S. (1994). Remotely influenced ESP performance in a computer

task: A preliminary study. Proceedings of Presented Papers: The


Davis, J. W., & Morrison, M. D. (1978). A test of the Schmidt model's

prediction concerning multiple feedback in a PK test. Research in


Debes, J., & Morris, R. L. (1982). Comparison of striving and nonstriving

instructional sets in a PK study. Journal of Parapsychology, 46, 297-312.

Gerding, J. L. F., Wezelman, R., & Bierman, D. J. (1997). The Druten

disturbances - Exploratory RSPK research. Proceedings of Presented

Papers: The Parapsychological Association 40th Annual Convention,

146-161.

Giesler, P. V. (1985). Differential micro-PK effects among Afro-Brazilian

cultists: Three studies using trance-significant symbols as targets.



27 August 2004) 75

Gissurarson, L. R. (1986). RNG-PK microcomputer "games" overviewed: An

experiment with the videogame "PSI INVADERS". European Journal of


Gissurarson, L. R., & Morris, R. L. (1990). Volition and psychokinesis:

Attempts to enhance PK performance through the practice of imagery

strategies. Journal of Parapsychology, 54, 331-370.

Gissurarson, L. R. (1990). Some PK attitudes as determinants of PK

performance. European Journal of Parapsychology, 8, 112-122.

Gissurarson, L. R., & Morris, R. L. (1991). Examination of six

questionnaires as predictors of psychokinesis performance. Journal of


Haraldsson, E. (1970). Subject selection in a machine precognition test.


Heseltine, G. L. (1977). Electronic random number generator operation

associated with EEG activity. Journal of Parapsychology, 41, 103-118.

Heseltine, G. L., & Mayer-Oakes, S. A. (1978). Electronic random generator

operation and EEG activity: Further studies. Journal of Parapsychology,

42, 123-136.

Heseltine, G. L., & Kirk, J. (1980). Examination of a majority-vote

technique. Journal of Parapsychology, 44, 167-176.


27 August 2004) 76

Hill, S. (1977). PK effects by a single subject on a binary random number

generator based on electronic noise. Research in Parapsychology 1976,

26-28.

Honorton, C. (1971). Group PK performance with waking suggestions for

muscle tension/ relaxation and active/ passive concentration.

Proceedings of the Parapsychological Association, 8, 14-15.

Honorton, C. (1971). Automated forced-choice precognition tests with a

"sensitive". Journal of the American Society for Psychical Research, 65,

476-481.

Honorton, C., & Barksdale, W. (1972). PK performance with waking

suggestions for muscle tension vs. relaxation. Journal of the American


Honorton, C., Ramsey, M., & Cabibbo, C. (1975). Experimenter effects in

ESP research. Journal of the American Society for Psychical Research,

69, 135-139.

Honorton, C., & May, E. C. (1976). Volitional control in a psychokinetic task

with auditory and visual feedback. Research in Parapsychology 1975,

90-91.

Honorton, C. (1977). Effects of meditation and feedback on psychokinetic

performance: A pilot study with an instructor of Transcendental

Meditation. Research in Parapsychology 1976, 95-97.


27 August 2004) 77

Honorton, C., & Tremmel, L. (1980). Psitrek: A preliminary effort toward

development of psi-conducive computer software. Research in


Honorton, C., Barker, P., & Sondow, N. (1983). Feedback and participant-

selection parameters in a computer RNG study (RB). Research in


Honorton, C. (1987). Precognition and real-time ESP performance in a

computer task with an exceptional subject. Journal of Parapsychology,

51, 291-320.

Houtkooper, J. M. (1976). Psychokinesis, clairvoyance and personality

factors. Proceedings of Presented Papers: The Parapsychological

Association 19th Annual Convention, 1-15.

Houtkooper, J. M. (1977). A study of repeated retroactive psychokinesis in

relation to direct and random PK effects. European Journal of


Houtkooper, J. M., Schienle, A., Vaitl, D., & Stark, R. (1999). Atmospheric

electromagnetism: An attempt at replicating the correlation between

natural sferics and ESP. Proceedings of Presented Papers: The

Parapsychological Association 42nd Annual Convention, 123-135.


27 August 2004) 78

Jacobs, J. C., Michels, J. A. G., Millar, B., & Millar-De Bruyne, M.-L. F. L.

(1987). Building a PK trap: The adaptive trial speed method.



Jahn, R. G., Dunne, B. J., Dobyns, Y. H., Nelson, R. D., & Bradish, G. J.

(1999). ArtREG: A random event experiment utilizing picture-preference

feedback. (Technical Note PEAR 99003). Princeton NJ 08544: PEAR

Laboratory, School of Engineering and Applied Science.

Jahn, R. G., Mischo, J., Vaitl, D., Dunne, B. J., Bradish, G. J., Dobyns, Y. H.,

Lettieri, A., Nelson, R. D., Boller, E., Bösch, H., Houtkooper, J. M., &

Walter, B. (2000). Mind/Machine interaction consortium: PortREG

replication experiments. Journal of Scientific Exploration, 14, 499-555.

Kelly, E. F., & Kanthamani, B. K. (1972). A subject's efforts toward

voluntary control. Journal of Parapsychology, 36, 185-197.

Kelly, E. F., & Lenz, J. (1976). EEG correlates of trial-by-trial performance in

a two-choice clairvoyance task: A preliminary study. Research in


Kugel, W., Bauer, B., & Bock, W. (1979). Versuchsreihe Telbin.

[Experimental series Telbin]. (Arbeitsbericht 7). Berlin: Technische

Universität Berlin.


27 August 2004) 79

Kugel, W. (1999). Amplifying precognition: Four experiments with roulette.


42nd Annual Convention, 136-146.

Levi, A. (1979). The influence of imagery and feedback on PK effects.


Lignon, Y., & Faton, L. (1977). Le factor psi sexerce sur un appareil

électronique. [The psi factor affects an electronic apparatus]. Psi-

Réalité, 54-62.

Lounds, P. (1993). The influence of psychokinesis on the randomly-

generated order of emotive and non-emotive slides. Journal of the


Lucadou, W. v. (1991). Locating Psi-bursts - correlations between

psychological characteristics of observers and observed quantum

physical fluctuations. Proceedings of Presented Papers: The


Mabilleau, P. (1982). Electronic dice: A new way for experimentation in

"psiology". Le Bulletin PSILOG, 2, 13-14.

Matas, F., & Pantas, L. (1971). A PK experiment comparing meditating vs.

nonmeditating subjects. Proceedings of the Parapsychological

Association, 8, 12-13.


27 August 2004) 80

May, E. C., & Honorton, C. (1976). A dynamic PK experiment with Ingo

Swann. Research in Parapsychology 1975, 88-89.

Michels, J. A. G. (1987). Consistent high scoring in self-test PK experiments

using a stopping strategy. Journal of the Society for Psychical Research,

54, 119-129.

Millar, B., & Broughton, R. (1976). A preliminary PK experiment with a

novel comuter-linked high speed random number generator. Research

in Parapsychology 1975, 83-84.

Millar, B., & Mackenzie, P. (1977). A test of intentional vs unintentional PK.


Millar, B., & Broughton, R. S. (1977). An investigation of the psi

enhancement paradigm of Schmidt. Research in Parapsychology 1976,

23-25.

Millar, B. (1983). Random bit generator experimenten. Millar-replicatie.

[Random bit generator experiments. Millar's replication]. SRU-Bulletin,

8, 119-123.

Morris, R., Nanko, M., & Phillips, D. (1978). Intentional observer influence

upon measurements of a quantum mechanical system: A comparison of

two imagery strategies. Program and Presented Papers:

Parapsychological Association 21st Annual Convention, 266-275.


27 August 2004) 81

Morris, R. L., & Reilly, V. (1980). A failure to obtain results with goal-

oriented imagery PK and a random event generator with varying hit

probability. Research in Parapsychology 1979, 166-167.

Morris, R. L., & Harnaday, J. (1981). An attempt to employ mental practice

to facilitate PK. Research in Parapsychology 1980, 103-104.

Morris, R. L., & Garcia-Noriega, C. (1982). Variations in feedback

characteristics and PK success. Research in Parapsychology 1981, 138-

140.

Morrison, M. D., & Davis, J. W. (1978). PK with immediate, delayed, and

multiple feedback: A test of the Schmidt model's predictions. Program

and Presented Papers: Parapsychological Association 21st Annual

Convention, 97-117.

Nanko, M. (1981). Use of goal-oriented imagery strategy on a

psychokinetic task with "selected" subjects. Journal of the Southern

California Society for Psychical Research, 2, 1-5.

Nelson, R. D. (1994). Effect size per hour: A natural unit for interpreting

anomalies experiments. (Technical Note PEAR 94003). Princeton

University, Princeton, NJ 08544: Princeton Engineering Anomalies

Research, School of Engineering/ Applied Science.


27 August 2004) 82

Palmer, J., & Perlstrom, J. R. (1986). Random event generator PK in

relation to task instructions: A case of "motivated" error? The


Palmer, J. (1995). External psi influence of ESP task performance.


38th Convention, 270-282.

Palmer, J., & Broughton, R. S. (1995). Performance in a computer task with

an exceptional subject: A failure to replicate. Proceedings of Presented

Papers: The Parapsychological Association 38th Convention , 289-294.

Palmer, J. (1998). ESP and REG PK with Sean Harribance: Three new

studies. Proceedings of Presented Papers: The Parapsychological

Association 41st Annual Convention, 124-134.

Pantas, L. (1971). PK scoring under preferred and nonpreferred conditions.

Proceedings of the Parapsychological Association, 8, 47-49.

Pare, R. (1983). Random bit generator experimenten. Pare-replicatie.

[Random bit generator experiments. Pare's replication]. SRU-Bulletin, 8,

123-128.

Randall, J. L. (1974). An extended series of ESP and PK tests with three

English schoolboys. Journal of the Society for Psychical Research, 47,

485-494.


27 August 2004) 83

Reinsel, R. (1987). PK performance as a function of prior stage of sleep

and time of night. Proceedings of Presented Papers: The


Schechter, E. I., Barker, P., & Varvoglis, M. P. (1983). A preliminary study

with a PK game involving distraction from the Psi task (RB). Research in


Schechter, E. I., Honorton, C., Barker, P., & Varvoglis, M. P. (1984).

Relationships between participant traits and scores on two computer-

controlled RNG-PK games. Research in Parapsychology 1983, 32-33.

Schmeidler, G. R., & Borchardt, R. (1981). Psi-scores with random and

pseudo-random targets. Research in Parapsychology 1980, 45-47.

Schmidt, H. (1969). Anomalous prediction of quantum processes by some

human subjects. (D1-82-0821). Plasma Physics Laboratory.

Schmidt, H. (1970). A PK test with electronic equipment. Journal of


Schmidt, H., & Pantas, L. (1972). Psi tests with internally different

machines. Journal of Parapsychology, 36, 222-232.

Schmidt, H. (1972). An attempt to increase the efficiency of PK testing by

an increase in the generation speed. Proceedings of Presented Papers:

Parapsychological Association 15th Annual Convention


27 August 2004) 84

Schmidt, H. (1973). PK tests with a high-speed random number generator.


Schmidt, H. (1974/a). PK effect on random time intervals. Research in


Schmidt, H. (1974/b). Comparison of PK action on two different random

number generators. Journal of Parapsychology, 38, 47-55.

Schmidt, H. (1975). PK experiment with repeated, time displaced

feedback. Proceedings of Presented Papers: The Parapsychological

Association 18th Annual Convention.

Schmidt, H. (1976). PK effects on pre-recorded targets. Journal of the

American Society for Psychical Research, 70, 267-291.

Schmidt, H., & Terry, J. (1977). Search for a relationship between

brainwaves and PK performance. Research in Parapsychology 1976, 30-

32.

Schmidt, H. (1978). Use of stroboscopic light as rewarding feedback in a

PK test with pre-recorded and momentarily generated random events.

Program and Presented Papers: Parapsychological Association 21st

Annual Convention, 85-96.

Schmidt, H. (1990). Correlation between mental processes and external

random events. Journal of Scientific Exploration, 4, 233-241.


27 August 2004) 85

Schouten, S. A. (1977). Testing some implications of a PK observational

theory. European Journal of Parapsychology, 1, 21-31.

Stanford, R. G. (1981). "Associative activation of the unconscious" and

"visualization" as methods for influencing the PK target: A second study.

Journal of the American Society for Psychical Research, 75, 229-240.

Stanford, R. G., & Kottoor, T. M. (1985). Disruption of attention and PK-

task performance. Proceedings of Presented Papers: The

Parapsychological 28th Annual Convention, 117-132.

Stewart, W. C. (1959). Three new ESP tests machines and some

preliminary results. Journal of Parapsychology, 23, 44-48.

Talbert, R., & Debes, J. (1981). Time-displacement psychokinetic effects

on a random number generator using varying amounts of feedback.

Program and Presented Papers: Parapsychological Association 24th

Annual Convention.

Tedder, W., & Braud, W. (1981). Long-distance, nocturnal psychokinesis.


Tedder, W. (1984). Computer-based long distance ESP: An exploratory

examination. Research in Parapsychology 1983, 100-101.

Thouless, R. H. (1971). Experiments on psi self-training with Dr. Schmidt's

pre-cognitive apparatus. Journal of the Society for Psychical Research,

46, 15-21.


27 August 2004) 86

Tremmel, L., & Honorton, C. (1980). Directional PK effects with a

computer-based random generator system: A preliminary study.


Varvoglis, M. P. (1988). A "psychic contest" using a computer-RNG task in

a non-laboratory setting. Proceedings of Presented Papers: The

Parapsychological Association 31st Annual Convention, 36-52.

Verbraak, A. (1981). Onafhankelijke random bit generator experimenten -

Verbraak-replicatie. [Independent random bit generator experiments -

Verbraak's replication]. SRU-Bulletin, 6, 134-139.

Weiner, D. H., & Bierman, D. J. (1979). An observer effect in data analysis?


Winnett, R. (1977). Effects of meditation and feedback on psychokinetic

performance: Results with practitioners of Ajapa Yoga. Research in



27 August 2004) 87

Table 1

Main Results of Radin & Ferrari's (1991) Dice Meta-Analysis

N ¯t SE z

Dice-casts "Influenced"

All studies 148 0.50610 .00031 19.68***

All studies, quality weighted 148 0.50362 .00036 10.18***

Balanced studies 69 0.50431 .00055 7.83***

Balanced studies, homogenous 59 0.50158 .00061 2.60***

Balanced studies, homogenous, quality

weighted

59 0.50147 .00063 2.33***

Dice-casts Control

All studies 31 0.50047 .00128 0.36***

Note. The z-value is based on the null value of ¯t = .5

* p < .05. ** p < .01. *** p < .001


27 August 2004) 88

Table 2

Previous PK Meta-analyses - Total Samples

N ¯o SE z mean

Dice

1991 Meta-analysis 148 .50822 .00041 20.23*** .51105

RNG

1989 First meta-analysis 597 .50018 .00003 6.53*** .50414

1997 First MA without PEAR data 339 .50061 .00009 6.41*** .50701

2000 Second meta-analysis 515 .50005 .00001 3.81*** .50568

Note. mean = the unweighted averaged effect size of studies

*** p < .001


27 August 2004) 89

Table 3

Basic Study Characteristics - Intentional Studies

Studies

(n)

Studies

(n)

Source of studies Year of publication

Journal 276 1960 2

Conference proceeding 63 1961 - 1970 14

Report 16 1971 - 1980 202

Thesis/Dissertation 2 1981 - 1990 101

Number of participants 1991 - 2000 37

1 91 > 2000 1

>1 - 10 101 Sample size (bit)

>10 - 20 52 >10 - 100 8

>20 - 30 33 >100 - 1,000 74

>30 - 40 13 >1,000 - 10,000 123

>40 - 50 12 >10,000 - 100,000 91

>50 - 60 12 >100,000 - 1,000,000 42

>60 - 70 2 >1,000,000 - 10,000,000 7

>70 - 80 3 >10,000,000 - 100,000,000 8

>80 - 90 2 >100,000,000 - 1,000,000,000 3

>90 - 100 2 >1,000,000,000 -

10,000,000,000

1

>100 7


27 August 2004) 90

Table 4Overall, Trimmed Sample and Moderator Variables Summary Statistics

Variable and class n ¯o SE z M bit

Mdnbit

M py

M sub. Mdnsub.

2 z (rnd)

Intentional overall 357 .500029 .000011 2.73*** 6095359 6400 1979 20 10 1442.89*** .06Trimmed

Homogeneous 287 .500017 .000011 1.54*** 6824190 6400 1981 22 10 323.96***Heterogeneous 70 .500136 .000034 4.02*** 3107153 6593 1976 14 4 1107.88*** .07

Sample size (bit)(Q1) Smallest 89 .523500 .002655 8.85*** 446 320 1979 23 10 318.92*** .58(Q2) Small 91 .505519 .000914 6.03*** 3683 4000 1978 19 10 269.52*** .45(Q3) Large 91 .503249 .000421 7.71*** 16726 15360 1979 14 10 419.04*** .42(Q4) Largest 86 .500026 .000011 2.43*** 25280771 200000 1983 25 11 262.79*** .13

Year of publication(Q1) Oldest 90 .505356 .000394 13.59*** 18553 4364 1971 24 6 636.32*** .54(Q2) Old 97 .500178 .000147 1.21*** 120378 8000 1977 16 10 255.24*** .09(Q3) New 85 .500292 .000247 1.18*** 52993 7500 1981 16 12 122.17*** .18(Q4) Newest 85 .500024 .000011 2.23*** 25390500 18000 1991 25 8 244.02*** .13

Number of participants(Q1) One 91 .500453 .000168 2.70*** 116550 5000 1980 1 1 632.14*** .09(Q2) Few 101 .500043 .000045 .96*** 1232867 4800 1978 7 8 330.26*** .04(Q3) Several 57 .499986 .000039 -.35*** 2863035 12288 1980 16 16 168.60*** -.02(Q4) Many 81 .500024 .000012 2.10*** 22955969 18000 1982 61 40 134.18*** .18Unknown 27 .500627 .000117 5.35*** 677453 7500 1979 143.80*** .20

ParticipantsSelected 55 .500628 .000185 3.40*** 134727 5000 1977 4 1 579.62*** .08Unselected 244 .500027 .000011 2.53*** 8881236 10321 1980 27 16 675.64*** .09Other 58 .500237 .000408 .58*** 27790 1280 1980 8 10 176.85*** .05

Study status Formal 192 .500027 .000011 2.46*** 10744544 7727 1982 26 10 616.39*** .08Pilot 152 .500061 .000049 1.26*** 698876 6400 1978 14 6 801.66*** .02Other 13 .500202 .000191 1.06*** 527819 5000 1977 9 6 23.55*** .15

FeedbackVisual 221 .500017 .000012 1.49*** 8370435 5000 1979 19 10 814.63*** .04Auditory 35 .502357 .000382 6.18*** 50385 17000 1976 14 6 254.83*** .39Other 101 .500085 .000028 3.06*** 3212016 10000 1983 24 10 331.14*** .16

Random sourcesNoise 194 .500028 .000011 2.60*** 11202736 11456 1983 15 10 852.46*** .07Radioactive 96 .502357 .000544 4.33*** 10442 1600 1973 23 6 459.53*** .22Other 67 .500766 .000389 1.97*** 25523 10800 1979 30 11 109.01*** .29

Note. z (rnd) = z-score random model* p < .05. ** p < .01. *** p < .001


27 August 2004) 91

Table 5Safeguard Variables Summary Statistics

Variable and class n ¯o SE z M bit

Mdnbit

M py

M sub. Mdnsub.

2 z (rnd)

RNG controlYes (2) 246 .500031 .000011 2.82*** 8448064 10000 1981 18 10 866.46*** .08Earlier (1) 7 .499996 .000052 -.08*** 13471208 1000 1982 3 2 286.75*** .00No (0) 104 .499979 .000377 -.06*** 33856 4048 1977 26 10 289.22*** .00

All data reportedYes (2) 289 .500028 .000011 2.58*** 7477090 6400 1980 21 10 1352.94*** .06Unclear (1) 11 .501074 .000537 2.00*** 80726 37000 1976 30 10 16.75***No (0) 57 .500204 .000134 1.52*** 250459 7200 1978 18 10 67.71***

Split of dataPreplanned (2) 233 .500064 .000043 1.48*** 583819 6400 1980 19 10 734.71*** .05Unclear (1) 45 .500026 .000011 2.32*** 45275647 25600 1983 14 8 173.27*** .14Not preplanned

(0)79 .501111 .000314 3.53*** 33029 4400 1977 29 10 522.33*** .14

Safeguard sum-scoreSum = 6

(highest)138 .500285 .000098 2.90*** 187695 6144 1982 21 10 455.05*** .13

Sum = 5 47 .500025 .000011 2.27*** 45373113 48000 1983 18 10 210.59*** .13Sum = 4 104 .500178 .000137 1.30*** 143255 6400 1979 12 6 391.64*** .07Sum = 3 46 .500303 .000339 .89*** 55091 1600 1976 43 17 105.72*** .09Sum = 2 8 .517376 .002755 6.31*** 4185 1472 1978 2 1 220.96*** .36Sum = 1 4 .500000 .026352 .00*** 120 120 1977 10 10 .00***Sum = 0

(lowest)10 .503252 .001344 2.42*** 13840 16000 1975 28 10 4.75***

Note. z (rnd) = z-score random model* p < .05. ** p < .01. *** p < .001


27 August 2004) 92

Table 6

Summary of Weighted Stepwise Linear Meta-Regression Analysis for Variables Predicting Effect size of RNG Studies

Step and variable B SE B ? t R2

Step 1

Year of publication -.000025 .000007 -.180 -3.45*** .032

Step 2

Year of publication -.000021 .000007 -.153 -2.88***

Auditory feedback .001856 .000770 .128 2.41*** .048

* p < .05. ** p < .01. *** p < .001


27 August 2004) 93

Table 7

Potential Sources of the Small-Study Effect

True heterogeneity

Different intensity/quality

Different participants

Different feedback

Other moderator(s)

Data irregularities

Poor methodological design

Effect intrinsically irreplicable

Inadequate analysis

Fraud

Selection biases

Biased inclusion criteria

Publication bias

Chance

The table is adapted from Sterne, Egger, & Smith, 2001, p. 193.


27 August 2004) 94

Table 8

Stepped Weight Function Monte Carlos Simulation of Publication Bias

n ¯o SE z Stud z? 2

Overall 357 0.500027 0.000021 2.48*** 1453 0.36 592.19***

Sample size

(Q1) Smallest 89 0.514658 0.004908 5.80*** 364 1.58 116.52***

(Q2) Small 91 0.505118 0.001692 5.92*** 367 0.21 115.21***

(Q3) Large 91 0.502462 0.000794 6.07*** 372 0.88 113.27***

(Q4) Largest 86 0.500024 0.000021 2.22*** 350 0.08 138.64***

Note. Stud = number of studies which did not get published (simulated); z? = difference between effect size of the simulated and

experimental data

* p < .05. ** p < .01. *** p < .001


27 August 2004) 95

Figure 1Funnel Plot Intentional Studies

Three very extreme values (pi > .70; n < 400) are omitted form the figure for the sake of better representation of all other values.

10

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

1,000,000,000

10,000,000,000

0.35 0.4 0.45 0.5 0.55 0.6 0.65Effect size (pi)

Samp

le siz

e (Nu

mber

of b

its)

Date post:	22-Apr-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

unknown - Global Consciousness...

Documents