Examining Psychokinesis 1
Examining Psychokinesis:
The Interaction of Human Intention with Random Number
Generators.
A Meta-Analysis
Submitted: August 19, 2004
Acknowledgments
(removed for blind review)
Radin comments inserted with change-tracking
Examining Psychokinesis 2
Examining Psychokinesis:
The Interaction of Human Intention with Random Number
Generators.
A Meta-Analysis
Submitted: August 19, 2004
Abstract
Séance-room phenomena and apports have fascinated mankind for
decades. Experimental research has reduced these phenomena to
attempts to influence (i) the fall of dice and, later, (ii) the output of
random number generators (RNGs). This overlooks dozens of other PK
experiments. It also overlooks the fact that most of the impetus for
developing RNG experiments was not to simulate séance phenomena, but
to test quantum observational theories to tighten methodologies, and to
reduce the need for special subjects. The meta-analysis presented here
combined 357 studies that assessed whether human intention could
correlate with RNG output. The studies yielded a significant, but very
small effect. Study size was strongly and inversely related to effect size;
Why the authors prefer to ignore the voluminous literature on this issue, I
cannot fathom. this finding was consistent across all examined moderator
and safeguard variables. A (not well specified) Monte Carlo simulation
revealed that the small effect size, the relation between study size and
effect size, as well as the extreme effect size heterogeneity, might be a
result of publication bias.
Examining Psychokinesis 3
The idea that individuals can influence inanimate objects by the power
of their own mind is a relatively recent concept. Huh? Isn't sympathetic
magic one of the most ancient beliefs? During the 1970s, Uri Geller
reawakened mass interest in this putative ability through his
demonstrations of spoon bending using his alleged psychic powers
(Targ & Puthoff, 1977; Wilson, 1976) and lays claim to this ability even
now (e.g., Geller, 1998). Belief in this phenomenon is widespread. In 1991
(Gallup & Newport), 17 percent of American adults believed in "the ability
of the mind to move or bend objects using just metal mental energy"
(p. 138) and seven percent even claimed that they had "seen somebody
moving or bending an object using mental energy" (p. 141).
Unknown to most academics, a large amount of experimental data has
accrued testing the hypothesis of a direct connection between the human
mind and the physical world. It is one of the very few lines of research
where replication is the main and central target (initially perhaps, but
surely not for the past 20 years), a commitment that some methodologists
wish to be the commitment of experimental psychologists in general (e.g.,
Cohen, 1994; Rosenthal & Rosnow, 1991). This article will trace the
development of the empirical evaluation of this alleged phenomenon and
will present a new meta-analysis of a large set of studies examining the
interaction between human intention and random number generators.
Psi research
Psi phenomena (Thouless, 1942; Thouless & Wiesner, 1946) can be split
into two main categories. Psychokinesis (PK) is the common label for the
apparent ability of humans to affect objects solely by the power of the
Examining Psychokinesis 4
mind. Extra-sensory-perception (ESP), on the other hand, refers to the
apparent ability of humans to acquire information without the mediation
of the recognized senses or logical inference. Many researchers believe
that PK and ESP phenomena are idiosyncratic (from context I'm guessing
they mean something like "identical," not idiosyncratic ) (e.g., Pratt, 1949;
J. B. Rhine, 1946; Schmeidler, 1982; Stanford, 1978; Thouless & Wiesner,
1946). Nevertheless, the two phenomena have been treated very
differently right from the start of their scientific examination. For instance,
whereas J. B. Rhine and his colleagues at the Psychology Department at
Duke University published the results of their first ESP card experiments
right after they had been conducted (Pratt, 1937; Price & Pegram, 1937;
J.B. Rhine, 1934, 1936, 1937; L. E. Rhine, 1937), they withheld the results
of their first PK experiments for nine years (L. E. Rhine & J. B. Rhine, 1943)
even though they had been carried out at the same time as the ESP
experiments: Rhine and his colleagues did not want to undermine the
scientific credibility that they had gained through their pioneering
monograph on ESP (Pratt, J. B. Rhine, Smith, Stuart & Greenwood, 1940).
When L. E. Rhine & J. B. Rhine (1943) went public with their early dice
experiments, the evidence was based not only on above-chance results,
but primarily on a particular scoring pattern. In those early PK
experiments, the participants' task was to obtain combinations of given
die faces. The researchers discovered a decline of "success" during longer
series of experiments, a pattern suggestive of mental fatigue (Reeves &
Rhine, 1943; J. B. Rhine & Humphrey, 1944, 1945). This psychologically
plausible pattern of decline seemed to eliminate several counter-
Examining Psychokinesis 5
hypotheses for the positive results obtained, such as die bias or trickery,
because they would not lead to such a systematic decline. However, as
experimental evidence grew, the decline pattern lost its impact in the
chain of evidence.
Verifying psi
Today, in order to verify the existence of psi phenomena, one of two
meta-analytic approaches is generally undertaken - either the "proof-
oriented" or the "process-oriented" meta-analytical approach. The proof-
oriented meta-analytical approach tries to verify the existence of psi
phenomena by establishing an overall effect. The process-oriented meta-
analytical approach tries to verify the existence of psi by establishing a
connection between results and moderator variables.
Alleged [probably the intended meaning here and elsewhere is
"potential" since the legalistic term "alleged" is inappropriate} moderators
of PK, such as the distance between the participant and the target, and
various psychological variables, have never been investigated as
systematically as alleged moderators of ESP. So far, there have not been
any meta-analyses of PK moderators and the three main literature reviews
of PK moderators (Gissurarson, 1992 & 1997; Gissurarson & Morris, 1991;
Schmeidler, 1977) have come up with inconclusive results. On the other
hand, the three meta-analyses on ESP moderators established significant
correlations between ESP and extraversion (Honorton, Ferrari & Bem,
1998), ESP and belief in ESP (Lawrence, 1998), and ESP and
defensiveness (Watt, 1994). The imbalance between systematic reviews
of PK and ESP moderators reflects the general disparity between the
Examining Psychokinesis 6
experimental investigations of the two categories of psi. From the very
beginning of experimental investigation into psi, researchers have focused
on ESP.
The imbalance between research in ESP and PK is also evident from the
proof-oriented meta-analytical approach. Only three (Radin & Ferrari,
1991; Radin & Nelson, 1989, 2002) of the 13 (Bem & Honorton, 1994;
Honorton, 1985; Honorton & Ferrari, 1989; Milton, 1993, 1997; Milton &
Wiseman, 1999a, 1999b; Radin & Ferrari, 1991; Radin & Nelson, 1989,
2002; Stanford & Stein, 1994; Steinkamp, Milton & Morris, 1998; Storm &
Ertel, 2001) meta-analyses on psi data address research on PK. Only two
of which provide no evidence for psi (Milton & Wiseman, 1999a, 1999b).
(The point of discussing/claiming an "imbalance between research in ESP
and PK" is not clear. It also may be incorrect in more recent years
depending on how one counts RV and Ganzfeld, neither of which are
classical ESP, and both of which have a relatively small number of
datapoints. In any case, if this issue is to be pursued ESP should be
defined -- the next section shows evidence of confounding very different
approaches, e.g., experimental research, social observation, and
anecdote.)
Psychology and psi
Psychological approaches to psi have also almost exclusively focused
on ESP. For example, there is a large amount of research (I disagree:
certainly there is ample rhetoric, but systematic research, no) supporting
the hypothesis that alleged ESP experiences are the result of delusions
and misinterpretations (e.g., Alcock, 1981; Blackmore, 1992; Persinger,
Examining Psychokinesis 7
2001). Personality-oriented research established connections between
belief in ESP and several personality variables (Irwin, 1993; see also,
Dudley, 2000; McGarry & Newberry, 1981; Musch & Ehrenberg, 2002).
Experience-oriented approaches to paranormal beliefs, which stress the
connection between paranormal belief and paranormal experiences (e.g.,
Alcock, 1981; Blackmore, 1992; Schouten, 1983) and media-oriented
approaches, which examine the connection between paranormal belief
and depictions of paranormal events in the media (e.g., Sparks, 1998;
Sparks, Hansen & Shah, 1994; Sparks, Nelson & Campbell, 1997) both
focus on ESP, although the paranormal belief scale most frequently used
in those studies also has some items on PK (Thalbourne, 1995).
The beginning of the experimental approach to Psychokinesis
Reports of séance room sessions during the late 19th century are filled
with claims of extraordinary movements of objects (e.g., Crookes, Horsley,
Bull, & Meyers, 1885), prompting some outstanding researchers of the
time to devote at least part of their career to determining whether the
alleged phenomena were real (e.g., Crookes, 1889); James, 1896; Richet,
1923). In these early days, as in psychology, case studies and field
investigations predominated. Hence, it is not surprising that in this era
experimental approaches and statistical analyses were used only
occasionally (e.g., Edgeworth, 1885, 1886; Fisher, 1924; Sanger, 1895;
Taylor, 1890). Even J.B. Rhine, the founder of the experimental study of
psi phenomena, abandoned case studies and field investigations as a
means of obtaining scientific proof only after he exposed several mediums
as frauds (e.g., J.B. Rhine & L.E. Rhine, 1927). However, after a period of
Examining Psychokinesis 8
several years when he and his colleagues focused almost solely on ESP
research, their interest in PK was reawakened in 1937 when a gambler
visited the laboratory at Duke University and casually mentioned that
many gamblers believed they could mentally influence the outcome of a
throw of dice. This inspired J.B. Rhine to perform a series of informal
experiments using dice. Very soon experiments with dice became the
standard approach for investigating PK.
Difficulties in devising an appropriate methodology soon became
apparent and improvements in the experimental procedures were quickly
implemented. Standardized methods for throwing the dice were
developed. Dice-throwing machines were used to prevent participants
from manipulating their throw of the dice. Recording errors were
minimized by having experimenters either photograph the outcome of
each throw or having a second experimenter independently record the
results. Commercial, pipped dice were found to have sides of unequal
weight, with the sides with the larger number of excavated pips, such as
the 6, being lighter and hence more likely to land uppermost than lower
numbers, such as the 1. Consequently, studies required participants to
attempt to score seven with two dice, or used a balanced design in which
the target face alternated from one side of the die (e.g., 6) to the opposite
site (e.g., 1).
In 1962 Girden (1962a) published a comprehensive critique of dice
experiments in the Psychological Bulletin. Among other things, he
criticized the experimenters for pooling data as it suited them, and for
changing the experimental design once it appeared that results were not
Examining Psychokinesis 9
going in a favorable direction. He concluded that the results from the
early experiments were largely due to the bias in the dice and that the
later, better-controlled studies were progressively tending toward non-
significant results. Although Murphy (1962) disagreed with Girden's
conclusion, he did concede that no "ideal" experiment had yet been
published that met all six quality criteria - namely one with (i) a
sufficiently large sample size; (ii) a standardized method of throwing the
dice; (iii) a balanced design; (iv) an objective record of the outcome of the
throw; (v) the hypothesis stated in advance; and (vi) with a prespecified
end point.
The controversy about the validity of the dice experiments continued
(e.g., Girden, 1962b; Girden & Girden, 1985; Rush, 1977). Over time,
experimental and statistical methods improved and, in 1991, Radin &
Ferrari undertook a meta-analysis of the dice experiments.
Dice Meta-Analysis
The dice meta-analysis comprised 148 experimental studies and 31
control studies published between 1935 and 1987. In the experimental
studies 2569 participants tried to mentally influence 2,592,817 die-casts.
In the control studies a total of 153,288 die-casts were made without any
attempt mentally to influence the dice. The experimental studies were
coded for various quality measures, including a number of those
mentioned by Girden (1962a). Table 1 provides the main meta-analytic
Examining Psychokinesis 10
results1. (Given the importance of these calculations, it seems odd to
relegate all this to a gigantic footnote.) The overall effect size, weighted
by the inverse of the variance, is small but highly significant (¯t
= .50610, z =19.68). Radin & Ferrari calculated that approximately
18,000 null effect studies would be required to reduce the result to a non-
significant level (Rosenthal, 1979). When the studies were weighted for
quality, the effect size decreased considerably (z? = 5.27, p = 1.34*10-7),
but was still significantly above chance.1 To compare the meta-analytic findings from the dice and previous RNG meta-analyses
with those from our RNG meta-analysis, we converted all effect size measures to the
proportion index (pi) (Rosenthal & Rubin, 1989) which we use throughout the paper.
This one-sample effect size ranges from 0 to 1 with .50 representing the null value (MCE).
For two equally likely outcomes, e.g. when tossing a coin, represents the proportion of
"hits". For example, if heads win at a hit rate of 50%, the effect size = .50 indicates
that heads and tails came down equally often; if the hit rate for heads were 75%, the
effect size would be = .75. The most important property of is that it converts all cases
with more than two equally likely outcomes, like e.g. tossing a die, to the proportional hit
rate as if there were just two alternatives.
In order to combine effect sizes from independent studies we used a fixed effect model,
weighted by the inverse of the variance (how is the variance per study calculated?) (e.g.,
Shadish & Haddock, 1994). Because Dean Radin kindly provided us with the basic data
files of the dice meta-analysis, we were able to compute the overall effect size ¯.
However, although we were able to calculate the overall effect sizes ¯o for all meta-
analysis on the basis of the original data (see Table 2), the die data provided did not
enable us to carry out the specific subgroup analyses presented in the meta-analysis and
summarized in Table 1. Consequently, in order to provide this information we
transformed the published results, which used the effect size r = z/sqrt(n), using
¯t = .5¯r + .5. This transformation is accurate as long as the z-values of the individual
studies are based on two equally likely alternatives (p = q = .5).
Examining Psychokinesis 11
The authors found that there were indeed problems regarding die bias,
with the effect size of the target face 6 being significantly larger than the
effect size of any other target face. They concluded that this bias was
sufficient to cast doubt on the whole database. They subsequently
reduced their database to only those 69 studies that had correctly
controlled for die bias (the "balanced database"). As shown in Table 1 the
resultant overall effect size remained statistically highly significant.
However, the effect sizes of the studies in the balanced database were
statistically heterogeneous. When Radin & Ferrari trimmed the sample
until the effect sizes in the balanced database became homogenous, the
effect size was reduced to only ¯t = .50158 and fell yet further to
¯t = .50147 when the 59 studies were weighted for quality. Only 60
unpublished null effect studies (our calculation (explain) are required to
bring the balanced, homogenous and quality-weighted studies down to a
non-significant level. Ultimately, the dice meta-analysis did not advance
However, the z-scores of most dice experiments are based on six equally likely
alternatives (p = 1/6 and q = 5/6). (what about the position studies?) Consequently ¯o
as computed on the basis of the original data and ¯t as computed on the basis of the
transformation formula diverge slightly because r no longer remains in the limits of +/-1.
However, the difference between ¯o and ¯t is very small (< .05%) as long as the z-
values are not extreme (z > 10, p < 1*10-10). The difference is smaller the closer the
value is to the null value of .50, which is the case for all effect sizes presented here. The
difference between the two approaches can be seen when the results of the overall dice
meta-analysis that are presented in Table 1 are compared with the results presented in
Table 2. The differences between the two estimates are determined using z? = (¯o - ¯t) /
sqrt (SEo2 + SEt2). Although the difference is statistically significant (z? = 4.12,
p = 3.72*10-5, two-tailed), the order of magnitude is the same.
Examining Psychokinesis 12
the controversy over the putative PK effect beyond the verdict of "not
proven", as mooted by Girden (1962b, p. 530) almost 30 years earlier.
Moreover, the meta-analysis has several limitations; Radin & Ferrari
neither examined the source(s) of heterogeneity in their meta-analysis,
nor addressed whether the strong correlation between effect size and
target face disappeared when they trimmed the 79 studies not using a
balanced design from the overall sample. The authors did not analyze
potential moderator variables and did not specify inclusion and exclusion
criteria. The studies included varied considerably regarding the type of
feedback given to participants. Some studies were even carried out totally
without feedback. The studies also differed substantially regarding the
participants who were recruited; some participants were psychic
claimants and others made no claims to having any "psychic powers" at
all. However, fundamentally as well as psychologically, the studies differ
most in respect of the experimental instructions they received and the
time window in which participants had to try to influence the dice.
Although most experiments were real time, with the participant's task
being mentally to influence the dice as they were thrown, some
experiments were "precognition experiments" in which participants were
asked to predict what die face would land uppermost in a future die cast
thrown by someone other than the participant.
From Dice to Random Number Generator
With the arrival of computationcomputers, dice experiments were
slowly replaced by a new approach. Beloff & Evans (1961) were the first
experimenters to use radioactive decay as a source of randomness to be
Examining Psychokinesis 13
influenced in a PK study. In the initial experiments, participants would try
mentally to slow down or speed up the rate of decay of a radioactive
source. The mean disintegration rate subjected to influence was then
compared with that of a control condition in which there was no attempt
at human influence.
Soon after this, experiments were devised in which the output from the
radioactive source was transformed into bits (1s or 0s) that could be
stored on a computer. These devices were known as random number
generators (RNGs). Later, RNGs used electronic noise or other truly
random origins as the source of randomness.
This line of PK research was, and continues to be, pursued by many
experimenters, but predominantly by Schmidt (e.g., 1969), and later by
the Princeton Anomalies and Engineering Research (Princeton Engineering
Anomalies Research) (PEAR) group at Princeton University (e.g., Jahn,
Dunne & Nelson, 1980).
RNG Experiments
In a typical PK RNG-experiment, a participant presses a button to start
the accumulation of experimental RNG data. The participant's task is
mentally to influence the RNG to produce, say, more 1s than 0s for a
predefined number of bits. Participants are generally given real-time
feedback of their ongoing performance. The feedback can take a variety
of forms. For example, it may consist in the lighting of lamps "moving" in
a clockwise or counter clockwise direction, or in clicks provided to the
right or left ear, depending on whether the RNG produces a 1 or a 0.
Today, feedback is generally software implemented and is primarily
Examining Psychokinesis 14
visual. If the RNG is based on a truly random source, it should generate 1s
and 0s an equal number of times. However, because small drifts cannot
be totally eliminated, experimental precautions such as the use of an XOR
filter, or a balanced experimental design are still required.
The RNG studies have many advantages over the earlier dice
experiments, making it much easier to perform quality research with
much less effort. Computerization alone meant that many of Girden
(1962a) and Murphy's (1962) concerns about methodological quality could
be overcome. If we return to Murphy's list of six methodological criteria,
then (i) unlike with manual throws of dice, RNGs made it possible to
conduct studies with large sample sizes (reflects a fundamental
assumption for which there is no sound evidence: the bit is the sample) in
a short space of time; (ii) the RNG was completely impersonal - unlike the
dice, it was not open to any classical (normal human) biasing of its output;
(iii) balanced designs were still necessary due to potential drifts in the
RNG; (iv) the output of the RNG could be stored automatically by
computer, thus eliminating recording errors that may have been present
in the dice studies; (v) like the dice studies, the hypotheses still had to be
formulated in advance; and (vi) like the dice studies, optional stopping
could still be a potential problem. Thus, RNG research entailed that, in
practical terms, researchers no longer had to be concerned about alleged
weak points (i), (ii) and (iv).
New Limits
From a methodological point of view, RNG experiments have many
advantages over the older dice studies. However, in respect of ecological
Examining Psychokinesis 15
validity, the RNG studies have some failings. Originally, the PK effect to be
assessed was macroscopic and visual. Experimentalists then reduced
séance room PK, first to PK on dice, and then to PK on a random source in
an RNG. I don't think this historical sequence is correct. I doubt that dice
and RNG tests were created to simulate macro effects. But PK may not be
reducible to a microscopic level (e.g., Braude, 1997). Moreover, a dice
experiment is psychologically very different from an RNG study. Most
people have played with dice, but few have had prior experience with
RNGs. Additionally, an RNG id technical gadget from which the output
must be computed before feedback can be presented. Nevertheless, the
ease with which PK data can be accumulated using an RNG has led to PK
RNG experiments forming a substantial proportion of available data. Three
related meta-analyses of these data have already been published.
Previous RNG Meta-Analyses
The first RNG meta-analysis was published by Radin & Nelson (1989) in
Foundations of Physics. This meta-analysis of 597 experimental studies
published between 1959 and 1987 found a small but significant effect of
¯o = .50018 (SE = .00003, z = 6.53, p < 1*10-10)2.The size of the effect
did not diminish when the studies were weighted for quality or when they
were trimmed by 101 studies to render the database homogenous.
The limitations of this meta-analysis are very similar to the limitations
of the dice meta-analysis. The authors did not examine the source(s) of
2 The meta-analysis provided the overall effect size only in a figure (Fig. 3, p. 1506). Because its first author
kindly provided us with the original data, we were able to calculate the overall effect size and the proper
statistics.
Examining Psychokinesis 16
heterogeneity and did not specify inclusion and exclusion criteria. (From
our FoP paper: "Experiments selected for review examined the following
hypothesis: The statistical output of an electronic RNG is correlated with
observer intention in accordance with pre-specified instructions, as
indicated by the directional shift of distribution parameters (usually the
mean) from expected values." That seems pretty well specified to me!)
Consequently, participants varied from humans to cockroaches, and the
feedback ranged from no feedback at all to the administration of an
electric shock. The meta-analysis included not only studies using true
RNGs, which are RNGs based on true random sources such as electronic
noise or radioactive decay, but also studies using pseudo RNGs (e.g.,
Radin, 1982), which are based on deterministic algorithms (with truly
random starting points). It might be argued that the authors simply took a
very inclusive approach. However, the authors did not discuss the
extreme variance in the distribution of the studies' z-scores (Again, from
the FoP article: "Finally, following the practice of reviewers in the physical
sciences (23,24), we deleted potential "outlier" studies to obtain a
homogeneous distribution of effect sizes and to reduce the possibility that
the calculated mean effect size may have been spuriously enlarged by
extreme values." Certainly this is a rationale for what we did, although
perhaps it is not a discussion.) and did not assess any potential moderator
variables, which were also two weaknesses in the dice meta-analysis.
Nevertheless, this first RNG meta-analysis served to justify further
experimentation and analyses with the PK RNG approach.
Examining Psychokinesis 17
Almost 10 years later, in his book aimed at a popular audience, Radin
(1997) recalculated the effect size of the first RNG meta-analysis claiming
that the "overall experimental effect, calculated per study, was about 51
percent" (p. 141). However, this newly calculated effect size is two orders
of magnitude larger than the effect size of the first RNG meta-analysis
(50.018%). (Doesn't this reflect the different aggregation unit, bit vs
experiment?) The increase has two sources. First, Radin removed the 258
PEAR studies included in the first meta-analysis (without discussing why,
because the whole purpose of that piece of the chapter was to address
whether PEAR had been replicated, so it didn't make any sense to include
it in the mix!) and second, he presented simple mean values instead of
weighted means as presented 10 years earlier. The use of simple mean
values in meta-analyses is generally discredited (e.g., Shadish & Haddock,
1994), because it does not take into account that larger studies provide
more accurate estimates of effect size. (ONLY assuming es is independent
of N) In this case, the difference between computing an overall effect size
using mean values rather than weighted mean values is dramatic. The
removal of the PEAR studies effectively increased the impact of other
small studies that had very large effect sizes. The effect of small studies
on the overall outcome will be a very important topic in the current meta-
analysis.
Recently, Radin & Nelson (20022003) published an update of their
earlier (1989) RNG meta-analysis, adding a further 176 studies to their
database. In this update, the PEAR data were collapsed into a new, single
datapoint. The authors reported a simple mean effect size of 50.7%.
Examining Psychokinesis 18
Presented as such, the data appear to suggest that this updated effect
size replicates that found in the first RNG meta-analysis. However, when
the weighted fixed-effect model is applied to the data, as was used in the
first RNG meta-analysis, the effect size of the updated database becomes
¯o = .50005, which is significantly smaller than the effect size of the
original RNG meta-analysis (z? = 4.27, p = 1.99*10-5; see Table 2 for
comparison). One reason for the difference is the increase in sample size
of the more recent experiments, which also have a concomitant decline in
effect size. (es vs. N issue again)
Like the other meta-analyses, the updated 2002 meta-analysis did not
investigate any potential moderator variables and no inclusion and
exclusion criteria were specified (I don't understand this latter business
about not specifying the inclusion criteria); it also did not include a
heterogeneity test of the database. All three meta-analyses were
conducted by related research teams and thus an independent replication
of their findings is lacking. The need for a more thoroughgoing meta-
analysis of PK RNG experiments is clear. (fair enough)
Human Intention Interacting with Random Number Generators:
A New Meta-Analysis
The meta-analysis presented here was part of a five-year consortium
project on RNG experiments. The consortium comprised research groups
from PEAR, USA; the University of Giessen, Germany; and the Institut für
Grenzgebiete der Psychologie und Psychohygiene [Institute for Border
Areas of Psychology and Mental Hygiene] in Freiburg, Germany. After all
three groups in the consortium failed in their appropriately-powered (what
Examining Psychokinesis 19
was beta?) experiments (assuming an effect per bit model!!!!!) to
replicate the mean shift of the PEAR group data (Jahn et al., 2000), which
form one of the strongest and most influential datasets in psi research,
the question about possible moderating variables in RNG experiments
rose to the forefront (historically, and ironically, doing a M-A of the REG
literature focusing on the moderating variable question was a task I
assigned to my Freiburg team -- Boesch and Boller -- in 1996 and 1997).
Consequently, a meta-analysis was conducted to determine whether the
existence of an anomalous interaction could be established between
direct human intention and the concurrent output of a true RNG, and if so,
whether there were moderators or other explanations that influenced the
apparent connection.
Method
Literature Search
The meta-analysis began with a search for any experimental studies
that examined the possibility of an anomalous connection between the
output of an RNG and the presence of a living being. This search was
designed to be as comprehensive as possible in the first instance, and to
be trimmed later in accordance with our prespecified inclusion and
exclusion criteria. Both published and unpublished manuscripts were
sought.
Manual searches were undertaken at the library and archives of the
Institut für Grenzgebiete der Psychologie und Psychohygiene in Freiburg,
Germany. They included searches of the following journals: Proceedings of
Examining Psychokinesis 20
the Parapsychological Association Annual Convention (1968, 1977-1981,
1983-1999), Research in Parapsychology (1969-1976, 1982, 1984, 1985,
1988), Journal of Parapsychology (1959-1998), Journal of the Society for
Psychical Research (1959-1999), European Journal of Parapsychology
(1975-1998), The Journal of the American Society for Psychical Research
(1959-1998), Journal of Scientific Exploration (1987-1998), Subtle Energies
(1991-1997), Journal of Indian Psychology (1978-1999), Tijdschrift voor
Parapsychologie (1959-1999), International Journal of Parapsychology
(1959-1968), Cuadernos de Parapsicologia (1963-1996), Revue
Métapsychique (1960-1983), Australian Parapsychological Review (1983-
1991), Research letter of the Parapsychological Division of the
Psychological Laboratory of Utrecht (1971-1984), Bulletin PSILOG (1981-
1983), Journal of the Southern California Society for Psychical Research
(1979-1985), and the Arbeitsberichte Parapsychologie der technischen
Universität Berlin (1971-1980). (Why so many missing years in these
journals?)
Electronic searches were conducted on the Psiline Database System
(Vers. 1999), a continuously updated specialized electronic resource of
parapsychologically-relevant writings (White, 1991). The key words used
to identify relevant articles in this specialized database were Random
Number Generator, RNG, Random Event Generator and REG. Electronic
searches were also conducted on six CDs of Dissertation Abstracts Ondisc
(Jan. 1961 - Sep. 1999) using four different search strategies. First, the
key words random number generator, RNG, random event generator,
REG, randomness, radioactive, parapsychology, perturbation,
Examining Psychokinesis 21
psychokinesis, PK, extra-sensory perception, telepathy, precognition and
calibration were used. Second, the key words random and experiment
were combined with event, number, noise, anomalous, anomaly,
influence, generator, apparatus or binary. Third, the key word machine
was combined with man or mind. Fourth, the key word zener was
combined with diode.
To obtain as many relevant unpublished manuscripts as possible, visits
were made to three other prolific parapsychology research institutes: the
Rhine Research Center, Durham NC; PEAR at Princeton University; and the
Koestler Parapsychology Unit at Edinburgh University. Furthermore, a
request for unpublished studies was placed on an electronic mailing list
for professional parapsychologists (Parapsychology Research Forum -
PRF). Is this really a professional forum? I haven't read it for many years,
and it seems to be open to anyone? It is, or was, indeed open to anyone.
Finally the reference sections of all retrieved journal articles, conference
proceedings, reports and thesis/dissertations were searched. The search
covered a broad range of languages and included items in Dutch, English,
French, German, Italian and Spanish and was otherwise limited only
because of lack of further available linguistic expertise.
Inclusion and Exclusion Criteria
The final database included only studies that examined the correlation
between direct human intention and the concurrent output of true RNGs.
Thus, after the comprehensive literature search was conducted we
excluded studies that: (a) involved, implicitly or explicitly, only an indirect
intention toward the RNG. For example, telepathy studies, in which a
Examining Psychokinesis 22
receiver attempts to gain impressions about the sender's viewing of a
target that had been randomly selected by a true RNG, were excluded
(e.g., Tart, 1976). Here, the receiver's intention is presumably directed to
gaining knowledge about what the sender is viewing, rather than on
influencing the RNG; (b) used animals or plants as participants (e.g.,
Schmidt, 1970); (c) assessed the possibility of a non-intentional, or only
ambiguously intentional, effect. For instance, studies evaluating whether
hidden RNGs could be influenced when the participant's intention was
directed to another task or another RNG (e.g., Varvoglis & McCarthy,
1986) or studies that used babies as participants (e.g., Bierman, 1985);
(d) looked for an effect backwards in time or, similarly, in which
participants observed the same bits a number of times (e.g., Morris, 1982;
Schmidt, 1985); (e) evaluated whether there was an effect of human
intention on a pseudo RNG (e.g., Radin, 1982). (This list seems to exclude
all Field REG studies.)
Additionally, studies were excluded if their outcome could not be
transformed into the effect size ¯o that was prespecified for this meta-
analysis (if the tool we have is a hammer, we will only look at nails). As a
result, studies that compared the rate of radioactive decay in the
presence of attempted human influence with that of the same element in
the absence of human intention (e.g., Beloff & Evans, 1961), were
excluded. The cut-off date for inclusion of studies in the meta-analysis was
prespecified as 30th August 2000. Then why do none of the searched
literature sources go up to this date?
Examining Psychokinesis 23
Defining Studies
Some studies were reported in both published and unpublished forms,
or both as a full journal article and elsewhere as an abstract. In these
cases, all reports of the same study were used to obtain information for
the coding, but the report with the most details was classified as the
"main report". The main reports often contained more than one "study". A
study was the smallest experimental unit described that did not overlap
with other data in the report. This enabled the maximum amount of
information to be included. In cases where the same data could be split up
in two different ways (e.g., men vs. women or morning sessions vs.
afternoon sessions), the split was used that appeared to reflect the
author's greatest interest in designing the study.
Many studies performed unattended randomness checks of the RNG to
ensure that the apparatus was functioning properly. These control runs
were coded in a separate "control" database. Data for these control runs,
like the experimental database, were split based on the smallest unit
described. In some experiments, data were gathered in the presence of a
participant with an instruction to the participant "not to influence" the
RNG (e.g., Jahn et al., 2000). These data were excluded from both
experimental and control databases due to the inherent ambiguity of
whether intention is playing an influential role. (So something called a
control condition is ambiguous and can't be included in control? In fact,
the PEAR lab did a huge amount of " unattended randomness checks" that
are not included in this M-A Now we see that even the baseline condition,
Examining Psychokinesis 24
which was explicitly designed to be compared against the bi-polar high-vs-
low intention conditions is not included in controls?)
Moderator Variables
To identify potential moderators, all variables suggested by previous
literature reviews were coded [blindly? No -- this coding was certainly not
blind] (Gissurarson, 1992 & 1997; Gissurarson & Morris, 1991; Schmeidler,
1977). Additionally, several descriptive variables and variables explicitly
or implicitly held responsible for the presence of absence of an anomalous
correlation were placed on the internet and discussed by researchers
interested in the topic. After considering the feedback and making any
requisite revisions to the coding form, 20 papers were randomly selected
and independently pilot coded by FS and EB. Afterwards, the two sets of
coding were compared, coder disagreements were discussed, ambiguities
in the coding descriptions were clarified, and the coding form was
finalized.
The variables coded covered six main areas: (i) Basic information, such
as year of publication and study status (i.e., formal, pilot, mixed, control);
(ii) Participant information, such as selection criteria (e.g., none, psychic
claimant, prior success in psi experiment, ?); (iii) Experimenter
information, such as whether the experimenter also acted as a participant
(e.g., no, yes, partially); (iv) Experimental setting, such as type of
feedback (visual, auditory, ?); (v) Statistical information variables, such as
total number of bits (sample size) [what about bits/effort, bits/subject,
etc.?]; and (vi) Safeguard variables.
Examining Psychokinesis 25
The final coding form contained 67 variables. The comprehensive
coding was applied because, prior to coding the studies, it was not clear
which variables would provide enough data for a sensible moderator
variable analysis. However, because of the importance of the safeguard
variables, i.e., the moderators of quality, we prespecified that the impact
of the three safeguard variables would be examined independently of
their frequency distribution. The safeguards were: (1) RNG control, which
recorded whether malfunction of the RNG had been ruled out by the
study, either by using a balanced design or by performing control runs of
the RNG (but note that real control data were excluded in some cases); (2)
all data reported, which addressed whether the final study size matched
the planned size of the study or whether optional stopping or selective
reporting may have occurred (note that PEAR REG studies do not report a
planned size, but that optional stopping CAN NOT have and effect on the
outcome); (3) split of data (define) , which noted whether the split of data
reported was explicitly planned or was potentially post-hoc. All safeguards
were ranked on a three point scale (yes [2], earlier3/unclear[1], no[0]) with
the intermediate value being coded either when it was unclear whether
the study actually took the safeguard into account or where it was only
partially taken into account. Because summary scores of safeguard
variables are problematic if considered exclusively (e.g., Jüni, Witschi,
Bloch, & Egger, 1999), we examined the influence of the safeguard
variables separately and in conjunction.
3 When authors referred to previous studies in which the RNG was tested, studies were
coded as controlled "earlier".
Examining Psychokinesis 26
The main coding was undertaken by FS [blindly?]. For any potentially
controversial or difficult decisions, FS consulted with EB. If FS and EB
could not agree, the final decision fell to HB, who was blind as to who held
which opinion. Over time, HB's decisions generally supported FS and EB
equally, thus suggesting that HB served well as mediator.
Analyses
All analyses were performed using SPSS (Vers. 11.5) software. The
effect sizes of individual studies were combined into composite mean
weighted effect size measures as described in Footnote 1. To determine
whether the from each subsample (class) significantly differed from
MCE, the standard error based on the within-study variance was
calculated (Shadish & Haddock, 1994). (but in this database and M-A, the
between-study variance within subsamples is at issue. The chosen se is
arguably not appropriate. It could only be correct if the subsamples are
indeed homogeneous -- did the authors really identify all the moderators?
One that is apparently not identified is subject differences, what PEAR
called "operators" and showed to be a powerful moderator.) The resulting
z-score indicates whether ¯o differs from MCE. To determine whether
each subsample of s shared a common effect size (i.e., was consistent
across studies), a homogeneity statistic Q was calculated, which has an
approximately 2 distribution with k - 1 degrees of freedom, where k is the
number of effect sizes (Shadish & Haddock, 1994). Given the importance
of this, the method for calculating Q should be better specified. I'm
guessing it is the sum of z scores. The difference between two effect size
estimates was determined using z? = (¯1 - ¯2) / sqrt (SE12 + SE22).
Examining Psychokinesis 27
As an initial, straightforward, sensitivity approach to estimating the
overall effect size, we trimmed the overall experimental sample until it
became homogeneous. This was conducted by applying an algorithm to
the data which successively excluded the study that contributed most to
the heterogeneity of the sample. The procedure stopped when the 2
heterogeneity statistic became non-significant (Hedges & Olkin, 1985). A
comparison between the resulting homogeneous sample and the studies
that had to be trimmed from the overall database (the "trimmed studies"
sample) allows one to assess the reliability of the estimated effect size
and to estimate the impact of aberrant values on the overall result.
In an attempt to explore the putative impact of moderator and
safeguard variables on the effect size and to determine the source(s) of
heterogeneity, a meta-regression analysis was carried out. Meta-
regression is a multivariate regression analysis with independent studies
as the unit of observation (e.g., Thompson & Higgins, 2002; Thompson &
Sharp, 1999). This analysis determines how the variables in the model
account for the heterogeneity of effect size. We applied a weighted
stepwise multiple regression analysis with the moderators as predictors
and effect size as the dependent variable.
In the absence of homogeneity, and as a general sensitivity measure,
we additionally calculated a random-effect model (Shadish & Haddock,
1994), which takes into account the variance between studies (i.e.,
heterogeneity on the basis of the Q homogeneity statistics). This should
be better described. I'm not entirely clear on the difference between fixed
and random effect models. Because, generally, the standard error using a
Examining Psychokinesis 28
random effects model is larger, the test statistic is consequently more
conservative then the test statistic of the fixed effect model. The z-score
(rnd) indicates whether ¯o differs from MCE using a random effect
approach. However, even in the absence of homogeneity, the fixed-effect
model is particularly appropriate in the context of the studies collected
here because although the impact of some alleged moderator variables
will be examined, no moderator has been established yet and the overall
effect remains a matter of contention.
Results
Study Characteristics
The literature search retrieved 155 main reports containing 712
experimental studies and 158 control studies. After applying the inclusion
and exclusion criteria, the meta-analysis included 114 reports containing
357 experimental studies and 142 control studies (see Appendix).
The basic study characteristics are summarized in Table 3. The heyday
of RNG experimentation was in the 1970s, when more than half of the
studies were published. A quarter of the studies were published in
conference proceedings and reports, but most of the studies were
published in journals. The number of participants in the studies varied
considerably. Approximately one quarter of studies were conducted with a
sole participant and another quarter with up to 10 participants. There
were only a few studies with more than 100 participants. The sample size
of the average study is 6,095,359 bits. However, most studies were much
smaller, as indicated by a median sample size of 6,400 bits (see Table 4).
Examining Psychokinesis 29
The few very large studies considerably increased the average sample
size and resulted in an extremely right-skewed distribution of sample size.
This variable was therefore log10-transformed. Consequently, a significant
linear correlation or regression coefficient of sample size with another
variable would indicate an underlying exponential relationship.
Overall Analyses
When combined, the 357 experimental studies yielded a small, but
statistically significant effect (¯o = .500029, SE = .000011, z = 2.73,
p = .003, one-tailed). The 142 control studies yielded a non-significant
effect (¯o = .500026, SE = .000015, z = 1.76, p = .08) that was
nevertheless comparable in size to the effect demonstrated in the
experimental studies (z? = 0.15, p = .87). However, because RNG
experiments do not follow a classical control group design, this
comparison is merely descriptive. The control studies had a much larger
median sample size than the experimental studies (50,000 bits vs. 6400
bits, respectively).
The two samples differed considerably in respect of their effect size
distribution. Whereas the control data distributed homogeneously
(2(141) = 138.16, p = .55), the effect size distribution of the
experimental studies was extremely heterogeneous (2(356) = 1442.89,
p = 1.45*10-130). We therefore conducted several sensitivity analyses on
the intentional data in an attempt to determine the source(s) of the
heterogeneity.
Examining Psychokinesis 30
Trimmed Sample
As can be seen in Table 4, 70 studies had to be excluded before the 2
heterogeneity statistic became non-significant, a value (20%) that,
although at higher end of the span (what span?), is not uncommon in
meta-analyses on psychological topics (Hedges, 1987). The homogenous
sample of 287 studies had very similar characteristics to the original
overall sample. Both samples had comparable mean and median sample
sizes (numbers of bits per study), and comparable mean and median
number of participants (see Table 4). Their effect sizes were also similar
(z? = .80, p = .42), although the effect in the smaller, homogenized
sample did not reach significance.
The heterogeneous sample that had been removed from the
experimental database differed considerably from both the overall and the
homogenized sample. The removed studies had generally been published
earlier and had been carried out with fewer participants than the other
two samples; they also contained fewer studies with large sample sizes
than the homogeneous subsample (see Table 4). (note the convergence
with the observation that es is anticorrelated with N) The effect size, too,
of the studies that had been removed was almost one order of magnitude
larger than that of the overall sample (z? = 3.33, p = 8.68*10-4) and of the
homogenized sample (z? = 2.99, p = 2.77*10-3).
Safeguard Variable Analyses
The majority of studies had adequately implemented the specified
safeguards (see Table 5). Almost 40% of the studies (n = 138) were given
the highest rating for each of the three safeguards.
Examining Psychokinesis 31
Generally, inadequate safeguards did not appear to be a reason for the
significant effect found in the experimental studies. Studies implementing
RNG controls or reported all data safeguards did not have significantly
lower effect sizes than studies that did not implement these safeguards
(z? = .14, p = .89; z? = 1.13, p = 0.19; respectively). However, studies
with a post-hoc split of data had a significantly larger effect size than
studies that had preplanned their split of data (z? = 3.30, p = 9.55*10-4).
The safeguard sum-score revealed that the mean effect size of the 138
studies with the maximum safeguard rating was tenfold larger than that in
the overall experimental database (z? = 2.59, p = 9.50*10-3). These high-
quality studies had a smaller mean sample size and were conducted
slightly more recently than the average study in the overall database.
(later studies are smaller?) However, it is not possible to draw any
conclusions about the impact of study quality on effect size because of the
uneven distribution of studies across the summary scale. Nevertheless,
the trend that study quality is connected with study size and publication
date is suggestive. As can be seen from Table 5, smaller and older studies
seem to be of lower quality. (and have smaller effect size?)
In summary, the heterogeneity in the database is not primarily due to
the contribution of misleadingly significant results from badly-designed
studies.
Moderator Variable Analyses
Beside sample size and year of publication, which are generally highly
underrated potential moderators, only very few of the variables coded
provided enough entries for us to be able to carry out sensible moderator
Examining Psychokinesis 32
variable analyses. For instance, we were interested whether participants
filled in psychological questionnaires. Although this was the case in 96
studies, only 12 used an established measure. Therefore, beside sample
size and year of publication, we focused on five primary variables for RNG
experiments which provided enough data for a sensible moderator
variable analysis.
The summary given in Table 4 compares the mean effect sizes
associated with the 5 potential moderators with those from the overall
and the trimmed sample. It is quite obvious that sample size is the most
important moderator of effect size. (a repeated refrain) The studies in the
quartile comprising the smallest studies (Q1) have an effect size which is
three orders of magnitude bigger than the effect size in the quartile with
the largest studies (Q4). The difference is highly significant (z? = 8.84, p <
1*10-10). The trend is continuous: the smaller the sample size, the bigger
the effect size; a connection which Sterne, Gavaghan, & Egger (2000)
called the "small-study effect". The funnel plot (Figure 1) illustrates the
effect. Whereas the bigger studies distribute symmetrically round the
overall effect size, the distribution of studies below 10,000 bits is
increasingly asymmetrical. In respect of the mean year of publication, the
largest studies (Q4) stand out from the other three, smaller-study
quartiles. The largest studies are, on average, published 4-5 years later
than the smaller studies. Most of the big studies, with very small effect
sizes, have been published only recently (e.g., Jahn, Dunne, Dobyns,
Nelson, & Bradish, 1999; Jahn et al., 2000; Nelson, 1994).
Examining Psychokinesis 33
The year of publication underpins the importance of sample size for the
outcome of the studies (see Table 4). The oldest studies (Q1), which have
the smallest mean sample size, have an effect size which is two orders of
magnitude bigger than the effect size of the newest studies (z? = 13.53,
p < 1*10-10), which have the largest mean sample size. (This seems to
differ from statements on preceding page) However, the impact of
sample size is not evident for the two middle quartiles. (Note that the
largest studies, PEAR, use only unselected subjects, who evidently have a
smaller average es. Is this accounted for in the present modeling and M-
A?) The effect size of the old studies (Q2) is significantly larger (z? = 2.26,
p = 0.02) than the effect size of the new studies (Q3), although the
sample size of the old studies is larger (old study N is larger?). Thus, time
might play a role on its own. The experiments might have changed in
other ways than increase of samples size. (Isn't that what a moderator
variable study is supposed to assess?)
Although causal connections are difficult to establish in meta-analyses,
we examined the interaction of sample size and year of publication and
their impact on effect size in order to understand how the two moderators
are linked to one another. The oldest studies (Q1) are particularly
interesting because their effect size is considerably bigger than the effect
size of the subsequent studies (Q2-Q4). The z-value of the oldest studies is
also the highest of the four quartiles and thus indicates that they were the
most successful (see Table 4). A median split of the oldest studies
according to sample size revealed that the effect sizes of the two halves
differ significantly from each other (z? = 5.62, p = 5.62*10-8). The half with
Examining Psychokinesis 34
the smaller sample size (n = 45, M = 982, Mdn = 500) has an effect size
of ¯o = .519137 (SE = .002484, z = 7.71, p < 1*10-10) whereas the half
with the bigger sample size (n = 45, M = 36123, Mdn = 9600) has an
effect size of ¯o = .505000 (SE = .000399, z = 12.53, p < 1*10-10). The
mean year of publication in both subsamples is the same M = 1971 and
not different from the mean year of publication of the whole quartile. The
analysis suggests that sample size is the deciding moderator and not year
of publication, a finding that the pure magnitude of the effect size in the
quartiles of sample size and year of publication also suggests (see
Table 4). (It would be good to report the same analysis on the newer
studies)
The number of participants in RNG experiments is clearly linked to
effect size (see Table 4). Studies with a sole participant (Q1) have an
effect size that is at least one order of magnitude bigger than studies with
more than one participant. However, this finding, too, is confounded by
sample size (see Table 4). Applying the same method to the quartile of
studies with one participant (Q1) as used before, we found that the
studies with smaller sample sizes (n = 47, M = 1817, Mdn = 960) have a
bigger effect size (¯o = .510062, SE = .001784, z = 5.64, p = 1.70*10-8)
than the studies with larger sample sizes (n = 44, M = 239197,
Mdn = 24124, ¯o = .500368, SE = .00017, z = 2.19, p = 0.03). The
difference is highly significant (z? = 5.41, p = 6.32*10-8). This analysis puts
the apparent superiority of studies with a sole participant into question
and identifies sample size as an important confounder. However, it can be
argued that small studies in general and small one-participant studies in
Examining Psychokinesis 35
particular are fundamentally different from larger studies - an argument
that cannot easily be dismissed. It actually is one of the potential sources
accounting for the small-study effect, which will be discussed later. (But
the preceding analysis apparently discounts this with an explicit and
relevant comparison.)
The current meta-analysis also seems to support the claim that selected
participants perform better than non-selected participants. The claim has
already been confirmed by an earlier meta-analysis (Honorton & Ferrari,
1989). As can be seen in Table 4 the effect size of studies with selected
participants is one order of magnitude bigger than the effect size of
studies that did not select their participants on e.g. grounds of prior
success in a psi experiment or for being a psychic claimant. Studies with
selected participants are predominantly carried out with only one or very
few participants, as indicated by the mean and median number of
participants in Table 4. Studies with unselected populations are regularly
carried out with more participants than studies with selected participants.
However, this finding is confounded by sample size (and by number of
participants. Studies with unselected populations also have a larger
sample size than studies with selected participants. The systematic
selection of participants might play an important role in RNG experiments,
and it is certainly not implausible that longer experiments (is this an
implicit equation of length of experiment with N of samples? It is not so
simple.) are tiring for participants and therefore might produce different
results. The argument is similar to that regarding the number of
Examining Psychokinesis 36
participants in RNG experiments, where experiments with fewer
participants may be shorter and/or make participants feel more involved.
Study status is an important moderator in meta-analyses that include
both formal and pilot experiments. Pilot experiments are likely to
comprise a selective sample insofar as they tend to be published if they
yield significant results (and hence larger-than-usual effect sizes) and not
published if they yield unpromising directions for further study. However,
pilot and formal studies in this sample did not differ in respect of effect
size (z? = 0.68, p = 0.50). Although the effect size of the pilot studies was
bigger, it is not significantly different from the null value because its SE is
more than four times larger (see Table 4). Pilot experiments are, as one
would expect, smaller than formal experiments.
The type of feedback to the participant in RNG studies has been
regarded as an important issue in psi research from its very inception. The
majority of RNG experiments provide participants with visual and some
with auditory feedback. Beside the two main categories the coding
resulted in a large "other" category with 101 studies, which used, for
example, alternating visual and auditory feedback, or no feedback at all.
The result is clear-cut: studies providing exclusively auditory feedback
outperform not only the studies using visual feedback (z? = 6.12, p =
9.24*10-10), but also the studies in the "other" category (z? = 5.93, p =
2.01*10-9). However, this finding is based on a very small and very
heterogeneous sample of large studies (see Table 4), although the studies
using visual feedback are, on average, even larger. Nevertheless, the
auditory feedback studies were surprisingly comparable to the large
Examining Psychokinesis 37
sample size studies (Q3) in terms of their mean numbers of participants,
year of study, and other variables (see Table 4).
The core (this isn't quite the right word) of all RNG studies is the
random source. Although the participants' intention is generally directed
(by the instructions given to them) to the feedback and not to the
technical details of the RNG, it is the sequence of random numbers that is
compared with the theoretical expectation (e.g., binominal distribution)
and that is, therefore, allegedly influenced. RNGs are based on truly
random radioactive decay, Zener noise, or occasionally thermal noise. As
shown in Table 4, the effect size of studies with RNGs based on
radioactive decay is two orders of magnitude larger than the effect size of
studies using noise (z? = 4.28, p = 1.87*10-5). However, this variable, too,
is confounded by sample size. Studies using radioactive decay are much
smaller than studies using noise (see Table 4). Chronologically, studies
with RNGs based on radioactive decay predominated in the very early
years of RNG experimentation, as indicated by their mean year of
publication, which is just two years above the mean year of publication of
the oldest studies in our sample (see Table 4).
Meta-Regression Analysis
The meta-regression analysis included the seven moderator variables
(see Table 4) and the three safeguard variables (see Table 5) discussed
above, using effect size as the dependent variable and the moderators
and the safeguards as predictors. The three variables sample size, year of
publication, and number of participants, which were split into quartiles for
the previous moderator variable analyses (see Table 4), went into the
Examining Psychokinesis 38
regression analysis with their nominal values. All other variables were
dummy coded. The analysis was weighted by the inverse of the within-
study variances (this again assumes that all important moderators have
been identified, and that probably is not the case as noted before). From
the 10 predictors only two, year of publication and auditory feedback,
entered the model (see Table 6)4. However, the model accounts for only
5% of the variance. This suggests that none of the variables we put into
the regression analysis, nor any combination of the variables, accounts for
the great variability in effect size we found in the data.
Random Effect Model
As can be seen in Table 4, the z-score (rnd) for the effect size based on
a random-effects model for all heterogeneous subsamples of studies
4 The moderator variable analyses already clearly indicated that sample size is the most
important predictor of effect size. The importance of the predictor was not confirmed by
our weighted meta-regression analyses because it is weighted by the inverse of the
within-study variance which almost perfectly correlates with sample size. Therefore,
sample size cannot enter the regression model. An unweighted stepwise multiple
regression analysis with the same predictor variables clearly stresses the importance of
sample size in this meta-analysis. Sample size enters the model first and accounts for 9%
of the variance. After three more steps the model accounts for 20% of the variance with
sample size ( = -.26), RNG control earlier ( = .28), random source radioactive ( = .14)
and split of data preplanned ( = -.11) as predictors. However, an unweighted regression-
analysis is difficult to interpret, especially when the effect size is so strongly connected
with the study variance. Larger studies provide better estimates and should have more
weight, otherwise the regression is dominated by the impact of smaller studies, which
might be more prone to publication bias and other effects which will be discussed under
the heading of the small-study effect in the discussion section.
Examining Psychokinesis 39
becomes non-significant. This measure also indicates, as all previous
analyses have shown, that there is no simple overall effect. If there is an
effect of human intention on the concurrent output of true RNGs then at
least one moderator must be involved. From all sensitivity analyses
presented here, sample size seems to be the most promising candidate.
However, the connection between sample size and effect size can have
many different causes, as will be discussed in the next section.
Discussion
Altogether, the meta-analysis divulged (an awkward word, perhaps
"indicated" is better) three main findings: (i) a very small but statistically
significant overall effect, (ii) a tremendous variability of effect size and (iii)
a highly visible (I'd say "suspected") small-study effect. (the tremendous
variability points to an inadequate model, and incomplete specification of
moderators. It cannot be appropriate to simply lay a random effects model
on such data. Moreover, this also points to an inadequate fixed effects
model.)
Statistical Significance
The meta-analysis replicated the finding of the previous meta-analyses
in the sense that the very small overall effect was significantly different
from the expected value. The mean effect size of the control studies did
not differ significantly from MCE, although the size of the effect was
comparable to that of the experimental studies. This does not necessarily
imply that the effect found in the experimental studies is spurious; RNG
experiments do not follow a standard test-control design to determine an
Examining Psychokinesis 40
effect, rather the test is always (not always) against MCE. Control data in
RNG experiments are simply used to demonstrate that the RNG output fits
the theoretical premise (e.g., binominal distribution). (this is not true, or it
is an idiosyncratic, and inappropriate definition of control -- which I believe
is in fact presented early in this paper.) If the control data do not fit the
theoretical premise, the RNG will be revised or a different RNG will be
used. As a consequence, published control data are unequivocally (how do
they know?) subject to a positive selection process - that is, divergent
control data will generally not be published because they would cast
doubt on the experiment as a whole. (This seems to imply that studies
published without citing control data have withheld that data, which I
doubt is always, or even often, the case.) The fact that the experimental
studies reached statistical significance and the control studies did not is a
matter of statistical power - the sample size of the control studies is only
half as large as the sample size of the experimental studies (1.07*109 bits
vs. 2.17*109 bits, respectively). (as noted earlier, by the author's
definition, huge quantities of "control" data are not included in this
analysis, so any discussion of the es is misleading.) Moreover, the p-value
for the control studies was based on two-sided testing, whereas the
p-value for the intentional studies was one-sided.
The safeguard analyses demonstrated that the significance of the
experimental studies is not the result of low quality studies. Nevertheless,
the statistical significance, as well as the overall effect size, of the
combined experimental studies has dropped continuously from the first
meta-analysis to the one reported here. This is partially the result of the
Examining Psychokinesis 41
more recent meta-analyses including newer, larger studies. However,
another difference between the current and the previous meta-analysis
lies in the application of unequivocal inclusion and exclusion criteria. We
focused exclusively on studies examining the alleged concurrent
interaction between direct human intention and RNGs. All previous meta-
analyses also included non-intentional (in the sense of fieldreg studies, no;
but we did include studies with implied intention) and non-human studies.
Although this difference might explain the reduction in effect size and
significance level, it cannot explain the extreme statistical heterogeneity
of the database. This topic was neglected in the previous RNG meta-
analyses (not really, we did look at the issue of homo- and heterogeneity;
we just didn't try to figure out where it came from, as that wasn't the
purpose of those MAs). We believe that the overall statistical significance
found in our meta-analysis cannot be unequivocally interpreted in favor of
an anomalous interaction as long as the tremendous variability of effect
size remains unexplained. (well, yeah -- so isn't that the purpose here?)
Variability of Effect Size
The variability of effect size in this meta-analysis is tremendous. We
took several approaches to address this variability. For instance, trimming
20% of the studies from the overall sample resulted in a homogeneous
subsample of studies. (but your other analyses indicated there was in fact
no justification for the trimming on basis of, say, quality. Therefore, given
the time and N vs es correlations, this trivially must weaken the
database.) Although the effect size of the homogenous sample did not
differ significantly from that of the overall sample, the outcome did not
Examining Psychokinesis 42
differ significantly from MCE. However, although this indicates how
vulnerable the overall result is in terms of statistical significance, this
approach cannot explain what variables or (selection) processes are
responsible for the variability. The extreme variability does not seem to be
the result of any of the moderator variables examined - none of the
moderator variable subsamples was independently homogeneous, not
even sample size.
The moderator variable analyses demonstrated that sample size was a
consistent and prominent moderator of effect size (variability). All of the
moderator variables we analyzed were confounded by sample size. The
Monte Carlo simulation of publication bias at the end of the next section
will demonstrate that even though variability of effect size and the small-
study effect are two separate concepts, they are in fact connected.
Small-Study Effect
For a similar class of studies (what sorts of studies are similar to these?)
it is generally assumed that effect size is independent of sample size.
(yes) However, from the sensitivity analyses it is evident that the effect
sizes in this meta-analysis strongly depend on sample size. (yes!) The
asymmetric distribution of effect sizes in the funnel plot (see Figure 1), as
well as in the continuous decline of effect size with increasing sample size
in the sample size quartiles, illustrate this. How can this be explained?
Table 7 provides a list of potential sources for the small-study effect.
The sources fall into three main categories (1) true heterogeneity, (2) data
irregularities, and (3) selection biases. (Is true heterogeneity the
representation of a better model? If so it is sensible, but needs to be
Examining Psychokinesis 43
spelled out.) Chance, another possible explanation for a small-study
effect, seems very unlikely because of the magnitude of the effect and the
sample size of the meta-analysis.
True heterogeneity
The higher effect sizes of the smaller studies may be due to specific
differences in experimental design or setting in the smaller compared with
the larger studies. In other words, the small-study effect is seen to be the
result of certain moderator variable(s). For instance, smaller studies might
be more successful because the participant-experimenter relationship is
more intense. The routine of longer experimental series may make it
difficult for the experimenter to maintain enthusiasm in the study.
However, explanations such as these remain speculative as long as they
are not systematically investigated and meta-analyzed. (There is another
general category of potential explantion that is consistent with the N vs es
finding -- the mechanism of the effect may not be correctly modeled by
assuming bit-wise effects.)
From the moderator variables investigated in this meta-analysis, the
hypotheses that smaller studies on average tested a different type of
participant and used a different form of feedback are the most interesting.
However, the moderator variable analyses showed that these two
variables, as well as all other variables examined, are linked to sample
size. For almost all moderator variables, the subsamples ("class" in
Table 4) with the smallest mean sample sizes had the biggest effect. This
suggests (but does not demonstrate. Given other issues, such as the
model assumptions, this conclusion should not be accepted.) that sample
Examining Psychokinesis 44
size, and not the moderator variables, was the prime factor driving the
heterogeneity of the database. This view is also supported by the
heterogeneity of effect size observed in all of the moderator subsamples .
Empirically, true heterogeneity among studies cannot be eliminated as
a causal factor for the small-study effect, especially regarding complex
interactions, which we have disregarded here. However, the heterogeneity
of the moderator-variable subsamples and the outstanding importance of
sample size at all levels of analysis exclude true heterogeneity as the
main source accounting for the small-study effect. This ignores goal-
directedness, DAT, effort per time, etc. It also ignores the possibility that
PK operates on the sample size distribution, which would predict a perfect
sqrt(N) dependency.
Data irregularities
A small-study effect may be due to data irregularities threatening the
validity of the data. For example, smaller studies might be of poorer
methodological quality, thereby artificially raising their effect size
compared with that of larger studies. However, methodological quality
improves only marginally with increasing sample size (r(357) = .18,
p = 8.86*10-4) and therefore cannot explain the small-study effect in this
meta-analysis. Another form of data irregularity, namely inadequate
analysis, is based on the assumption that smaller trials are generally
analyzed with less methodological rigor and therefore are more likely to
report "false-positive results". However, this potential source is excluded
here due to the straightforward and simple effect size measure used. A
general source of data irregularity might be that smaller studies are more
Examining Psychokinesis 45
easily manipulated by fraud than larger studies because, for example,
fewer people are involved. However, the number of researchers that
would have to be implicated over the years renders this hypothesis very
unlikely. In general, none of the data irregularity hypotheses considered
can explain the small-study effect.
Examining Psychokinesis 46
Selection biases
When the inclusion of studies in a meta-analysis is systematically
biased in a way that smaller studies with larger smaller (original may be
correct?) p-values, i.e., larger effect sizes, are more likely to be included
than larger studies with smaller p-values, i.e., smaller effect sizes, a small-
study effect may be the result. Several well-known biases such as
publication bias, selective reporting bias, foreign language bias, citation
bias and time lag bias may be responsible for a small-study effect (e.g.,
Egger, Dickersin, & Smith, 2001; Mahoney, 1985).
Biased inclusion criteria refer to biases on the side of the meta-analyst.
In this particular domain, the two most prominent biases are foreign
language bias and citation bias. Foreign language bias occurs when
significant results are published in well-circulated, high-impact journals in
English, whereas non-significant findings are published in small journals in
the authors' native language. Therefore a meta-analysis including studies
solely from journals in English may include a disproportionately large
number of significant studies. Citation bias refers to selective quoting.
Studies with significant p-values are quoted more often and are more
likely to be retrieved by the meta-analyst. However, the small-study effect
in this meta-analysis is probably not due to these biases due to the
inclusion of non-English publications and a very comprehensive search
strategy. The authors claimed to use a very comprehensive search to
retrieve all known studies, including unpublished studies. In which case,
where are all these missing studies coming from? With say only 20
different groups of researchers ever having done these sorts of studies,
Examining Psychokinesis 47
each group would have had to hide on average 70 filedrawer studies
given the authors' later estimate of 1400 filedrawer studies. This is
implausible.
One of the most important selection biases to consider in any meta-
analysis is publication bias. Publication bias refers to the fact that the
probability of a study being published depends to some extent on its
p-value. Several independent factors affect the publication of a study.
Rosenthal's term "file drawer problem" (Rosenthal, 1979) focuses on the
author as the main source of publication bias, but there are other issues
too. Editors' and reviewers' decisions also affect whether a study is
published. The time lag from the completion of a study to its publication
might also depend on the p-value of the study (e.g., Ioannidis, 1998) and
additionally contribute to the selection of studies available. Since the
development of Rosenthal's "file drawer" calculation (1979), numerous
other methods have been developed to examine the impact of publication
bias on meta-analyses (e.g., Dear & Begg, 1992; Duval & Tweedie, 2000;
Hedges, 1992; Iyengar & Greenhouse, 1988; Sterne & Egger, 2001).
In an attempt to examine publication bias we ran a Monte Carlo
simulation based on Hedges (1992) stepped weight function model and
simulated a simple selection process. According to this model, the
authors', reviewers', and editors' perceived conclusiveness of a p-value is
subject to certain "cliff effects" (Hedges, 1992) and this impacts on the
likelihood of a study getting published. Hedges (1992) estimates the
weights of the step function based on the available meta-analytical data.
However, different from Hedges, we used a predefined step-weight
Examining Psychokinesis 48
function model, because we were primarily interested in seeing whether a
simple selection model may in principle account for the small-study effect
present in our meta-analytic data.
We assumed that 100% of studies with a p-value ? .01, 80% of studies
with a p-value between p ? .05 and p > .01, 50% of studies with a p-value
between p ? .10 and p > .05, 20% of studies with a p-value between
p ? .50 and p > .10 and 10% of studies with p-value > .50 (one-sided) are
published5. From this assumption, we randomly generated uniformly
distributed p-values (Sure this doesn't mean p = 0.01 is just as likely as p
= 0.5. I think they must mean normally distributed around chance
expectation.) and calculated the effect sizes for all "published" studies
and counted the number of "not published" studies. That is, on the basis
of the sample size for each of the 357 studies, we simulated a selective
null-effect publication process (meaning, I think, the distribution of studies
averages at chance, or p = 0.5). The averaged results of the simulation of
1000 meta-analyses are shown in Table 8. As can be seen, the overall
effect size based on the Monte Carlo simulation perfectly matches the
overall effect size found in our meta-analysis (see Table 4). The simulated
data clearly replicated the small-study effect (see Table 8). The simulation
5 The term published is used here very broadly to include publications of conference
proceedings and reports which in terms of our literature search were considered
unpublished. Importantly, in our discussion of the Monte Carlo simulation, the term
"published" also refers to studies obtained by splitting reports into studies. For simplicity,
we assumed in the Monte Carlo simulation that the splitting of the 114 reports into 357
experimental studies was subject to the same selection process as the published reports
themselves.
Examining Psychokinesis 49
also shows that, in order for these results to emerge, a total of 1453
studies had to be unpublished, i.e. for every study published four studies
(non-significant) had to remain unpublished. That's awfully close to
Rosenthal's criterion of "robust." I wonder how long they worked with
their model parameters to produce these results. Also, the results for the
small sample size is nearly significantly different from their simulation. I'll
see if I can replicate their model to see how insensitive it is to the choice
of parameters. (also -- is the Hedges work on selective reporting in
ordinary psychology a good model for parapsychology?)
A secondary finding, which additionally confirms the value of the
simulation, is that publication bias might be responsible not only for the
small overall effect and the small-study effect found, but also for a large
proportion of the effect size variability. The simulated overall sample as
well as the 4th quartile of the largest studies show a highly significant
effect size variability, replicating what was found in our meta-analytical
data. The effect size variability in the first three quartiles of the simulation
is certainly different from the effect size variability in our meta-analytical
data. However, this might be due to the highly idealized boundary
conditions of the simulation.
Conclusion
Altogether, the simulation results are in very good agreement with the
meta-analytical data (except for 3/4ths of it), especially regarding the
overall effect size and the small-study effect, which was the primary
objective of the simulation. The simulation must be considered at least
very suggestive, especially as it accounts for all three main findings
Examining Psychokinesis 50
divulged in the meta-analysis: (i) a very small but statistically significant
overall effect, (ii) a tremendous variability of effect size (only in the large
study quartile) and (iii) a highly visible small-study effect. In comparison
with all other sources potentially accounting for the small-study effect
discussed here, publication bias clearly is the most far-reaching
explanation regarding the main findings of this meta-analysis. (this monte
carlo sounds like a simple case of carefully designed inputs to yield the
expected outputs, and even then glossing over the mis-fitting.)
Nevertheless, whether the simulation of publication bias is considered
conclusive evidence for publication bias in this meta-analysis strongly
depends on how many unpublished studies one believes it is reasonable
to assume there are. However, the number of unpublished studies must
be seen against the background of the enormous pressure to publish
significant results. It is clear that not all results get published and that
journals are filled with a selective sample of statistically significant studies
(e.g., Sterling, 1959, Sterling, Rosenbaum, & Weinkam, 1995, Rosenthal,
1979) (all references to non-parapsychology fields). Given that the
majority of published studies are underpowered, it is surprising that 95%
of articles in psychological journals and more than 85% of articles in
medical journals report statistically significant results (Sterling,
Rosenbaum, & Weinkam, 1995). Authors, reviewers, as well as editors, are
all involved in the selection process.
J.B. Rhine was the first editor of the Journal of Parapsychology (inception
in 1937), the leading journal for experimental work in parapsychology. He
initially believed "that little can be learned from a report of an experiment
Examining Psychokinesis 51
that failed to find psi" (Broughton, 1987, p. 27). More than 25% (n = 96) of
the studies included in the current meta-analysis were published in this
journal. However, from 1975, the Council of the Parapsychological
Association rejected the policy of suppressing non-significant studies in
parapsychological journals (Broughton, 1987; Honorton, 1985). Whereas
48% of the studies in Q1 (1959 - 1973) were statistically significant, this
rate dropped to 19% (Q2), 8% (Q3) and 14% (Q4) in the subsequent
quartiles, indicating that the policy was implemented. (Interesting to
consider the likelihood of these percentages being significant -- as a
simple study-based model to compare with the bit-based model.) This
demonstrates not only that the publication rate of significant studies in
this domain is very different from the rate in conventional fields, but also,
and more importantly, it demonstrates that, at least in the early period of
RNG experimentation, the publication process was highly moderately
selective in favor of statistically significant studies (Huh? If - as they state
- 95% of studies in standard psych journals are significant, yet for psi the
figures range from 8% to 48%, then surely this leads us away from
publication bias as an viable explanation, not towards!?).
Statistically, significance is a matter of power, but, when conducting a
meta-analysis, it can also be a matter of artifacts like publication bias.
From this perspective, not only is the early period of RNG experimentation
in great danger of being a highly selective sample, but also the database
as a whole. A power analysis based on the overall effect size found in our
meta-analysis shows that, for an RNG study to have a power of 80%, its
sample size would have to be greater than 1,800,000,000 bits. (Ok, is this
Examining Psychokinesis 52
just obstinacy, or are they wearing blinders, or what?) (I am afraid it is or
what!) None of the studies included in our meta-analysis comes even
close to this sample size. Therefore the number of significant studies is
highly questionable. (sigh)
The studies published and collected here are probably a highly selective
sample. The Monte Carlo simulation indicated that there would have to be
1400 unpublished studies to account for the main findings presented here.
However, "real world" publication decisions do not simply fit to a single,
consistent set of threshold values. Publication decisions in respect of the
smallest studies might follow completely different and more complex
patterns. Consequently, far fewer (or far more!) studies might be needed
in order to replicate the main findings of our meta-analysis than indicated
by the Monte Carlo simulation. We believe that the 1400 unpublished
studies mark the upper limit of the range of unpublished studies (why?).
As a result, we doubt that the RNG database provides good evidence for
an anomalous connection between direct human intention and the
concurrent output of true RNGs. (what on earth justifies this conclusion?)
The limits of this conclusion are, of course, the assumptions to be
made. (well, yeah) One of the main assumptions in undertaking a meta-
analysis is the assumption that effect size is independent of sample size
(finally!). The independence of sample size actually defines effect size
measures (Not necessarily; there are all kinds of effect sizes). However,
there might be effects where sample size is related to effect size. In such
a case, the p-values would be independent of sample size and e.g. be
constant across studies. Although our data does not confirm this simple
Examining Psychokinesis 53
model because the z-score distribution is far too heterogeneous, (explain
how this conclusion is justifiable. No such calculation is reported in this
paper.) other more complex models are conceivable. However, so far
meta-analysts in RNG research have not argued along these lines, they
have argued that there is a small but replicable constant effect. Another
assumption that is generally made is that intention affects the mean value
of the random sequence. (that is not an assumption, but the dependent
variable in most experimental designs.) Although other outcomes have
been suggested (e.g., Atmanspacher, Bösch, Boller, Nelson &
Scheingraber, 1999; Pallikari & Boller, 1999; Radin, 1989) they have been
used only occasionally. (Ignores virtually all of Schmidt's and Kennedy's
papers on this topic. Also, on p. 45 of our RNG MA paper in the Jonas
book, we say "This means the statistical effects observed in these
experiments are effectively independent of sample size, and cannot be
explained as simple, linear, force- like mechanisms?. Further indication
that a novel approach will be required to explain these effects are
experiments strongly resembling RNG studies, but involving pre-recorded
random bits rather than bits generated in real-time. Those studies show
significant cumulative results similar to those reported here (Bierman,
1998). This implies that some MMI effects, perhaps including those
claimed for distant healing, may involve acausal processes."
Although we question the conclusions of our predecessors, we would
like to remind the reader that these experiments are highly refined
operationalizations of phenomena which have challenged mankind for a
very long period of time. The dramatic anomalous PK effects reported in
Examining Psychokinesis 54
séance rooms shrunk to experiments with electronic noise during a 100
year history of PK experiments. This achievement is certainly humble.
However, further experiments will be conducted. They should be
registered. This is the most straightforward solution for determining with
any accuracy the rate of publication bias (e.g., Chalmers, 2001). It allows
subsequent meta-analysts to resolve more firmly the question whether
the overall effect in RNG experiments is an artifact of publication bias as
we suspect. The PK effect in general is of great fundamental importance -
if genuine. However, until we know with any certainty just how many non-
significant, unpublished studies there are likely to be, we doubt that this
unique experimental approach will gain the status of being of scientific
value. Of course, such a pessimistic statement only serves to diminish
interest in this line of work, thus guaranteeing that no one will ever know
"the answer."
A quick look through their MA references doesn't show list the four studies below, all of
which fit their inclusion criteria. This doesn't persuade me that they were as
comprehensive as they claim. What else might they have missed? They also do not
include the Jahn, et al. Publication of the 12-year databse, and other PEAR publications
that would fit, e.g., Ibison. And they do include something like the Nelson 1994 Time
normalization paper. Go figure.
Radin, D. I. & Utts, J. M. (1989). Experiments investigating the influence of intention on
random and pseudorandom events. Journal of Scientific Exploration, 3, 65-79.
Radin, D. I. (1993). Environmental modulation and statistical equilibrium in mind-matter
interaction. Subtle Energies, 4 (1), 1-30.
Examining Psychokinesis 55
Radin, D. I. (1992). Beyond belief: Exploring interactions among mind, body and
environment. Subtle Energies, 2 (3), 1 - 40.
Radin, D. I. (1990). Testing the plausibility of psi-mediated computer system failures.
Journal of Parapsychology, 54, 1-19.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 56
References
Alcock, J. E. (1981). Parapsychology: Science or magic? A psychological
perspective. Oxford: Pergamon.
Atmanspacher, H., Bösch, H., Boller, E., Nelson, R. D., & Scheingraber, H.
(1999). Deviations from physical randomness due to human agent
intention? Chaos, Solitons & Fractals, 10, 935-952.
Beloff, J., & Evans, L. (1961). A radioactivity test of psycho-kinesis. Journal
of the Society for Psychical Research, 41, 41-46.
Bem, D. J., & Honorton, C. (1994). Does psi exist? Replicable evidence for
an anomalous process of information transfer. Psychological Bulletin,
115 (1), 4-18.
Bierman, D. J. (1985). A retro and direct PK test for babies with the
manipulation of feedback: A first trial of independent replication using
software exchange. European Journal of Parapsychology, 5, 373-390.
Blackmore, S. J. (1992). Psychic experiences: Psychic illusions. Skeptical
Inquirer, 16, 367-376.
Braude, S. E. (1997). The limits of influence: Psychokinesis and the
philosophy of science. Lanham: University Press of America.
Broughton, R. S. (1987). Publication policy and the Journal of
Parapsychology. Journal of Parapsychology, 51, 21-32.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 57
Chalmers, I. (2001). Using systematic reviews and registers of ongoing
trials for scientific and ethical trial design, monitoring, and reporting. In
M. Egger, G. D. Smith, & D. Altman (Eds.), Systematic reviews in health
care: Meta-analysis in context (pp. 429-443). London: BMJ Books.
Cohen, J. (1994). The earth is round (p< .05). American Psychologist, 49,
997-1003.
Crookes, W. (1889). Notes of seances with D. D. Home. Proceedings of the
Society for Psychical Research, 6, 98-127.
Crookes, W., Horsley, V., Bull, W. C., & Myers, A. T. (1885). Report on an
alleged physical phenomenon. Proceedings of the Society for Psychical
Research, 3, 460-463.
Dear, K. B. G., & Begg, C. B. (1992). An approach for assessing publication
bias prior to performing a meta-analysis. Statistical Science, 7, 237-245.
Dudley, R. T. (2000). The relationship between negative affect and
paranormal belief. Personality and Individual Differences, 28, 315-321.
Duval, S., & Tweedie, R. (2000). A nonparametric "trim and fill" method of
accounting for publication bias in meta-analysis. Journal of the American
Statistical Association, 95, 89-98.
Edgeworth, F. Y. (1885). The calculus of probabilities applied to psychical
research. Proceedings of the Society for Psychical Research, 3, 190-199.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 58
Edgeworth, F. Y. (1886). The calculus of probabilities applied to psychical
research. II. Proceedings of the Society for Psychical Research, 4, 189-
208.
Egger, M., Dickersin, K., & Smith, G. D. (2001). Problems and limitations in
conducting systematic reviews. In M. Egger, G. D. Smith, & D. Altman
(Eds.), Systematic reviews in health care: Meta-analysis in context (pp.
43-68). London: BMJ Books.
Fisher, R. A. (1924). A method of scoring coincidences in tests with playing
cards. Proceedings of the Society for Psychical Research, 34, 181-185.
Gallup, G., & Newport, F. (1991). Belief in paranormal phenomena among
adult Americans. Skeptical Inquirer, 15, 137-146.
Geller, U. (1998). Uri Geller's little book of mind-power. London: Robson.
Girden, E. (1962a). A review of psychokinesis (PK). Psychological Bulletin,
59, 353-388.
Girden, E. (1962b). A postscript to "A Review of Psychokinesis (PK)".
Psychological Bulletin, 59, 529-531.
Girden, E., & Girden, E. (1985). Psychokinesis: Fifty years afterward. In P.
Kurtz (Eds.), A Skeptic's Handbook of Parapsychology (pp. 129-146).
Buffalo, NY: Prometheus Books.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 59
Gissurarson, L. R. (1992). Studies of methods of enhancing and potentially
training psychokinesis: A review. Journal of the American Society for
Psychical Research, 86, 303-346.
Gissurarson, L. R. (1997). Methods of enhancing PK task performance. In
S. Krippner (Eds.), Advances in Parapsychological Research 8 (pp. 88-
125). Jefferson, North Carolina: Mc Farland Company.
Gissurarson, L. R., & Morris, R. L. (1991). Examination of six
questionnaires as predictors of psychokinesis performance. Journal of
Parapsychology, 55, 119-145.
Hedges, L. V. (1987). How hard is hard science, how soft is soft science?
The empirical cumulativeness of research. American Psychologist, 42,
443-455.
Hedges, L. V. (1992). Modeling publication selection effects in meta-
analysis. Statistical Science, 7, 246-255.
Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis.
Orlando: Academic Press.
Honorton, C. (1985). Meta-analysis of psi ganzfeld research: A response to
Hyman. Journal of Parapsychology, 49, 51-91.
Honorton, C., & Ferrari, D. C. (1989). "Future telling": A meta-analysis of
forced-choice precognition experiments, 1935-1987. Journal of
Parapsychology, 53, 281-308.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 60
Honorton, C., Ferrari, D. C., & Bem, D. J. (1998). Extraversion and ESP
performance: A meta-analysis and a new confirmation. Journal of
Parapsychology, 62, 255-276.
Irwin, H. J. (1993). Belief in the paranormal: A review of the empirical
literature. Journal of the American Society for Psychical Research, 87, 1-
39.
Iyengar, S., & Greenhouse, J. B. (1988). Selection models and the file
drawer problem. Statistical Science, 3, 109-117.
Ioannidis, J. P. (1998). Effect of the statistical significance of results on the
time to completion and publication of randomized efficacy trials. The
Journal of the American Medical Association, 279, 281-286.
Jahn, R. G., Dunne, B. J., Dobyns, Y. H., Nelson, R. D., & Bradish, G. J.
(1999). ArtREG: A random event experiment utilizing picture-preference
feedback. (Technical Note PEAR 99003). Princeton NJ 08544: PEAR
Laboratory, School of Engineering and Applied Science.
Jahn, R. G., Dunne, B. J., & Nelson, R. D. (1980). Princeton Engineering
Anomalies Research. Program Statement. (Technical Report). Princeton,
New Jersey: Princeton University, Princeton Engineering Anomalies
Research, School of Engineering/Applied Science.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 61
Jahn, R. G., Mischo, J., Vaitl, D., Dunne, B. J., Bradish, G. J., Dobyns, Y. H.,
Lettieri, A., Nelson, R. D., Boller, E., Bösch, H., Houtkooper, J. M., &
Walter, B. (2000). Mind/Machine interaction consortium: PortREG
replication experiments. Journal of Scientific Exploration, 14, 499-555.
James, W. (1896). Psychical research. Psychological Review, 3, 649-652.
Jüni, P., Witschi, A., Bloch, R., & Egger, M. (1999). The hazards of scoring
the quality of clinical trials for meta-analysis. The Journal of the
American Medical Association, 282, 1054-1060.
Lawrence, T. R. (1998). Gathering in the sheep and goats... A meta-
analysis of forced choice sheep-goat ESP studies, 1947-1993. In N. L.
Zingrone, M. J. Schlitz, C. S. Alvarado, & J. Milton (Eds.), Research in
Parapsychology 1993 (pp. 27-31). Lanham, MD: Scarecrow Press.
Mahoney, M. J. (1985). Open exchange and epistemic progress. American
Psychologist, 40, 29-39.
McGarry, J. J., & Newberry, B. H. (1981). Beliefs in paranormal phenomena
and locus of control: A field study. Journal of Personality and Social
Psychology, 41, 725-736.
Milton, J. (1993). A meta-analysis of waking state of consciousness, free
response ESP studies. Proceedings of Presented Papers: The
Parapsychological Association 36th Annual Convention, 87-104.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 62
Milton, J. (1997). Meta-Analysis of free-response ESP studies without
altered states of consciousness. Journal of Parapsychology, 61, 279-319.
Milton, J., & Wiseman, R. (1999a). A meta-analysis of mass-media tests of
extrasensory perception. British Journal of Psychology, 90, 235-240.
Milton, J., & Wiseman, R. (1999b). Does psi exist? Lack of replication of an
anomalous process of information transfer. Psychological Bulletin, 125,
387-391.
Morris, R. L. (1982). Assessing experimental support for true precognition.
Journal of Parapsychology, 46, 321-336.
Murphy, G. (1962). Report on paper by Edward Girden on psychokinesis.
Psychological Bulletin, 59, 638-641.
Musch, J., & Ehrenberg, K. (2002). Probability misjudgement, cognitive
ability, and belief in the paranormal. British Journal of Psychology, 93,
177
Nelson, R. D. (1994). Effect size per hour: A natural unit for interpreting
anomalies experiments. (Technical Note PEAR 94003). Princeton
University, Princeton, NJ 08544: Princeton Engineering Anomalies
Research, School of Engineering/ Applied Science.
Pallikari, F., & Boller, E. (1999). A rescaled range analysis of random
events. Journal of Scientific Exploration, 13, 35-40.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 63
Persinger, M. A. (2001). The neuropsychiatry of paranormal experiences.
Journal of Neuropsychiatry & Clinical Neurosciences, 13, 515-523.
Pratt, J. G. (1937). Clairvoyant blind matching. Journal of Parapsychology,
1, 10-17.
Pratt, J. G. (1949). The meaning of performance curves in ESP and PK test
data. Journal of Parapsychology, 13, 9-22.
Pratt, J. G., Rhine, J. B., Smith, B. M., Stuart, C. E., & Greenwood, J. A.
(1940). Extra-sensory perception after sixty years: A critical appraisal of
the research in extra-sensory perception. New York: Henry Holt and
Company.
Price, M. M., & Pegram, M. H. (1937). Extra-sensory perception among the
blind. Journal of Parapsychology, 1, 143-155.
Radin, D. I. (1982). Experimental attempts to influence pseudorandom
number sequences. Journal of the American Society for Psychical
Research, 76, 359-374.
Radin, D. I. (1989). Searching for "signatures" in anomalous human-
machine interaction data: A neural network approach. Journal of
Scientific Exploration, 3, 185-200.
Radin, D. I. (1997). The conscious universe. San Francisco: Harper Edge.
Radin, D. I., & Ferrari, D. C. (1991). Effects of consciousness on the fall of
dice: A meta-analysis. Journal of Scientific Exploration, 5, 61-83.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 64
Radin, D. I., & Nelson, R. D. (1989). Evidence for consciousness-related
anomalies in random physical systems. Foundations of Physics, 19,
1499-1514.
Radin, D. I., & Nelson, R. D. (2002). Meta-analysis of mind-matter
interaction experiments: 1959 to 2000. In W. B. Jonas (Eds.), Spiritual
Healing, Energy Medicine and Intentionality: Research and Clinical
Implications Edinburgh: Harcourt Health Sciences.
Reeves, M. P., & Rhine, J. B. (1943). The PK effect: II. A study in declines.
Journal of Parapsychology, 7, 76-93.
Rhine, J. B. (1934). Extrasensory perception. Boston: Boston Society for
Psychic Research.
Rhine, J. B. (1936). Some selected experiments in extra-sensory
perception. Journal of Abnormal and Social Psychology, 29, 151-171.
Rhine, J. B. (1937). The effect of distance in ESP tests. Journal of
Parapsychology, 1, 172-184.
Rhine, J. B. (1946). Editorial: ESP and PK as "psi phenomena". Journal of
Parapsychology, 10, 74-75.
Rhine, J. B., & Humphrey, B. M. (1944). The PK effect: Special evidence
from hit patterns. I. Quarter distribution of the page. Journal of
Parapsychology, 8, 18-60.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 65
Rhine, J. B., & Humphrey, B. M. (1945). The PK effect with sixty dice per
throw. Journal of Parapsychology, 9, 203-218.
Rhine, J. B., & Rhine, L. E. (1927). One evening's observation on the
Margery mediumship. Journal of Abnormal and Social Psychology, 21,
421.
Rhine, L. E. (1937). Some stimulus variations in extra-sensory perception
with child subjects. Journal of Parapsychology, 1, 102-113.
Rhine, L. E., & Rhine, J. B. (1943). The psychokinetic effect: I. The first
experiment. Journal of Parapsychology, 7, 20-43.
Richet, C. (1923). Thirty years of physical research. A treatise on
metapsychics. New York: MacMillan Company.
Rosenthal, R. (1979). The "file drawer problem" and tolerance for null
results. Psychological Bulletin, 86, 638-641.
Rosenthal, R., & Rosnow, R. L. (1991). Essentials of behavioral research.
Methods and data analysis. (2 ed.). New York: McGraw-Hill Publishing.
Rosenthal, R., & Rubin, D. B. (1989). Effect size estimation for one-sample
multiple-choice type data: Design, analysis, and meta-analysis.
Psychological Bulletin, 106, 332-337.
Rush, J. H. (1977). Problems and methods in psychokinesis research. In S.
Krippner (Eds.), Advances in Parapsychological Research 1.
Psychokinesis (pp. 15-78). New York: Plenum.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 66
Sanger, C. P. (1895). Analysis of Mrs. Verrall's card experiments.
Proceedings of the Society for Psychical Research, 11, 193-197.
Schmeidler, G. R. (1977). Research findings in psychokinesis. In S.
Krippner (Eds.), Advances in Parapsychological Research 1.
Psychokinesis (pp. 79-132). New York: Plenum.
Schmeidler, G. R. (1982). PK research: Findings and theories. In S.
Krippner (Eds.), Advances in Parapsychological Research 3 (pp. 115-
146). New York: Plenum Press.
Schmidt, H. (1969). Anomalous prediction of quantum processes by some
human subjects. Seattle, WA: Plasma Physics Laboratory. D1-82-0821:1-
38.
Schmidt, H. (1970). PK experiments with animals as subjects. Journal of
Parapsychology, 34, 255-261.
Schmidt, H. (1985). Addition effect for PK on pre-recorded targets. Journal
of Parapsychology, 49, 229-244.
Schouten, S. A. (1983). Personal experience and belief in ESP. The Journal
of Psychology, 114, 219-222.
Shadish, W. R., & Haddock, K. C. (1994). Combining estimates of effect
size. In L. V. Hedges & H. Cooper (Eds.), The handbook of research
synthesis (pp. 261-281). New York: Russell Sage Foundation.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 67
Sparks, G. G. (1998). Paranormal depictions in the media: How do they
affect what people believe? Skeptical Inquirer, 22, 35-39.
Sparks, G. G., Hansen, T., & Shah, R. (1994). Do televised depictions of
paranormal events influence viewers´ beliefs? Skeptical Inquirer, 18,
386-395.
Sparks, G. G., Nelson, C. L., & Campbell, R. G. (1997). The relationship
between exposure to televised messages about paranormal phenomena
and paranormal beliefs. Journal of Broadcasting & Electronic Media, 41,
345-359.
Stanford, R. G. (1978). Toward reinterpreting psi events. Journal of the
American Society for Psychical Research, 72, 197-214.
Stanford, R. G., & Stein, A. G. (1994). A meta-analysis of ESP studies
contrasting hypnosis and a comparison condition. Journal of
Parapsychology, 58, 235-269.
Steinkamp, F., Milton, J., & Morris, R. L. (1998). A meta-analysis of forced-
choice experiments comparing clairvoyance and precognition. Journal of
Parapsychology, 62, 193-218.
Sterling, T. D. (1959). Publication decisions and their possible effects on
inferences drawn from tests of significance - or vice versa. Journal of the
American Statistical Association, 54, 34
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 68
Sterling, T. D., Rosenbaum, W. L., & Weinkam, J. J. (1995). Publication
decisions revisited: The effect of the outcome of statistical tests on the
decision to publish and vice versa. American Statistician, 49, 112
Sterne, J. A. C., & Egger, M. (2001). Funnel plots for detecting bias in
meta-analysis: Guidelines on choice of axis. Journal of Clinical
Epidemiology, 54, 1046-1055.
Sterne, J. A. C., Egger, M., & Smith, G. D. (2001). Investigating and dealing
with publication and other biases. In M. Egger, G. D. Smith, & D. Altman
(Eds.), Systematic reviews in health care: Meta-analysis in context (pp.
189-208). London: BMJ Books.
Sterne, J. A. C., Gavaghan, D., & Egger, M. (2000). Publication and related
bias in meta-analysis: Power of statistical tests and prevalence in the
literature. Journal of Clinical Epidemiology, 53, 1119-1129.
Storm, L., & Ertel, S. (2001). Does psi exist? Comments on Milton and
Wiseman's (1999) meta-analysis of ganzfeld research. Psychological
Bulletin, 127, 424-433.
Targ, R., & Puthoff, H. E. (1977). Mind-reach. Scientists Look at Psychic
Ability. Delacorte Press.
Tart, C. T. (1976). Effects of immediate feedback on ESP performance.
Research in Parapsychology 1975, 80-82.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 69
Taylor, G. Le M. (1890). Experimental comparison between chance and
thought-transference in correspondence of diagrams. Proceedings of
the Society for Psychical Research, 6, 398-405.
Thalbourne, M. A. (1995). Further studies of the measurement and
correlates of belief in the paranormal. Journal of the American Society
for Psychical Research, 89, 233-247.
Thompson, S. G., & Higgins, J. P. T. (2002). How should meta-regression
analyses be undertaken and interpreted? Statistics in Medicine, 2002,
1559-1574.
Thompson, S. G., & Sharp, S. J. (1999). Explaining heterogeneity in meta-
analysis: A comparison of methods. Statistics in Medicine, 18, 2693-
2708.
Thouless, R. H. (1942). The present position of experimental research into
telepathy and related phenomena. Proceedings of the Society for
Psychical Research, 47, 1-19.
Thouless, R. H., & Wiesner, B. P. (1946). The psi processes in normal and
"paranormal" psychology. Proceedings of the Society for Psychical
Research, 48, 177-196.
Varvoglis, M. P., & McCarthy, D. (1986). Conscious-purposive focus and PK:
RNG activity in relation to awareness, task-orientation, and feedback.
Journal of the American Society for Psychical Research, 80, 1-29.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 70
Watt, C. A. (1994). Meta-analysis of DMT-ESP studies and an experimental
investigation of perceptual defense/vigilance and extrasensory
perception. In E. W. Cook & D. L. Delanoy (Eds.), Research in
Parapsychology 1991 (pp. 64-68). Metuchen, NJ: Scarecrow Press.
White, R. A. (1991). The psiline database system. Exceptional Human
Experience, 9, 163-167.
Wilson, C. (1976). The Geller phenomenon. London: Aldus Books.
Appendix
Studies Used in the Meta-Analysis
André, E. (1972). Confirmation of PK action on electronic equipment.
Journal of Parapsychology, 36, 283-293.
Berger, R. E. (1986). Psi effects without real-time feedback using a
PsiLab ][ video game experiment. The Parapsychological Association
29th Annual Convention, 111-128.
Bierman, D. J., & Houtkooper, J. M. (1975). Exploratory PK tests with a
programmable high speed random number generator. European Journal
of Parapsychology, 1, 3-14.
Bierman, D. J., De Diana, I. P. F., & Houtkooper, J. M. (1976). Preliminary
report on the Amsterdam experiments with Matthew Manning.
European Journal of Parapsychology, 1, 6-16.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 71
Bierman, D. J., & Noortje, V. T. (1977). The performance of healers in PK
tests with different RNG feedback algorithms. Research in
Parapsychology 1976, 131-133.
Bierman, D. J., & Weiner, D. H. (1980). A preliminary study of the effect of
data destruction on the influence of future observers. Journal of
Parapsychology, 44, 233-243.
Bierman, D. J., & Houtkooper, J. M. (1981). The potential observer effect or
the mystery of irreproducibility. European Journal of Parapsychology, 3,
345-371.
Bierman, D. J. (1987). Explorations of some theoretical frameworks using a
PK-test environment. Proceedings of Presented Papers: The
Parapsychological Association 30th Annual Convention, 33-40.
Bierman, D. J. (1988). Testing the IDS model with a gifted subject.
Theoretical Parapsychology, 6, 31-36.
Bierman, D. J., & Van Gelderen, W. J. M. (1994). Geomagnetic activity and
PK on a low and high trial-rate RNG. Proceedings of Presented Papers:
The Parapsychological Association 37th Annual Convention, 50-56.
Braud, L., & Braud, W. (1977). Psychokinetic effects upon a random event
generator under conditions of limited feedback to volunteers and
experimenter. Proceedings of Presented Papers: The Parapsychological
Association 20th Annual Convention.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 72
Braud, W., & Kirk, J. (1977). Attempt to observe psychokinetic influences
upon a random event generator by person-fish teams. European Journal
of Parapsychology, 2, 228-237.
Braud, W. (1981). Psychokinesis experiments with infants and young
children. Research in Parapsychology 1980, 30-31.
Braud, W. G., & Hartgrove, J. (1976). Clairvoyance and psychokinesis in
transcendental meditators and matched control subjects: A preliminary
study. European Journal of Parapsychology, 1, 6-16.
Braud, W. G. (1978). Recent investigations of microdynamic
psychokinesis, with special emphasis on the roles of feedback, effort
and awareness. European Journal of Parapsychology, 2, 137-162.
Braud, W. G. (1983). Prolonged visualization practice and psychokinesis: A
pilot study (RB). Research in Parapsychology 1982, 187-189.
Breederveld, H. (1988). Towards reproducible experiments in
psychokinesis IV. Experiments with an electronic random number
generator. Theoretical Parapsychology, 6, 43-51.
Breederveld, H. (1989). The Michels experiments: An attempted
replication. Journal of the Society for Psychical Research, 55, 360-363.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 73
Breederveld, H. (2001). De Optimal Stopping Strategie XL PK-
experimenten met een random number generator. [The optimal
stopping strategy XL. PK experiments with a random number
generator]. SRU-Bulletin, 13, 22-23.
Broughton, R. S., & Millar, B. (1977). A PK experiment with a covert
release-of-effort test. Research in Parapsychology 1976, 28-30.
Broughton, R. S. (1979). An experiment with the head of Jut. European
Journal of Parapsychology, 2, 337-357.
Broughton, R. S., Millar, B., & Johnson, M. (1981). An investigation into the
use of aversion therapy techniques for the operant control of PK
production in humans. European Journal of Parapsychology, 3, 317-344.
Broughton, R. S., & Higgins, C. A. (1994). An investigation of micro-PK and
geomagnetism. Proceedings of Presented Papers: The
Parapsychological Association 37th Annual Convention, 87-94.
Broughton, R. S., & Alexander, C. H. (1997). Destruction testing DAT.
Proceedings of Presented Papers: The Parapsychological Association
40th Annual Convention, 100-104.
Crandall, J. E. (1993). Effects of extrinsic motivation on PK performance
and its relations to state anxiety and extraversion. Proceedings of
Presented Papers: The Parapsychological Association 36th Annual
Convention, 372-377.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 74
Curry, C. K. (1978). A modularized random number generator: Engineering
design and psychic experimentation. Unpublished senior thesis,
Department of Electrical Engineering and Computer Science, School of
Engineering/Applied Science, Princeton University.
Dalton, K. S. (1994). Remotely influenced ESP performance in a computer
task: A preliminary study. Proceedings of Presented Papers: The
Parapsychological Association 37th Annual Convention, 95-103.
Davis, J. W., & Morrison, M. D. (1978). A test of the Schmidt model's
prediction concerning multiple feedback in a PK test. Research in
Parapsychology 1977, 163-168.
Debes, J., & Morris, R. L. (1982). Comparison of striving and nonstriving
instructional sets in a PK study. Journal of Parapsychology, 46, 297-312.
Gerding, J. L. F., Wezelman, R., & Bierman, D. J. (1997). The Druten
disturbances - Exploratory RSPK research. Proceedings of Presented
Papers: The Parapsychological Association 40th Annual Convention,
146-161.
Giesler, P. V. (1985). Differential micro-PK effects among Afro-Brazilian
cultists: Three studies using trance-significant symbols as targets.
Journal of Parapsychology, 49, 329-366.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 75
Gissurarson, L. R. (1986). RNG-PK microcomputer "games" overviewed: An
experiment with the videogame "PSI INVADERS". European Journal of
Parapsychology, 6, 199-215.
Gissurarson, L. R., & Morris, R. L. (1990). Volition and psychokinesis:
Attempts to enhance PK performance through the practice of imagery
strategies. Journal of Parapsychology, 54, 331-370.
Gissurarson, L. R. (1990). Some PK attitudes as determinants of PK
performance. European Journal of Parapsychology, 8, 112-122.
Gissurarson, L. R., & Morris, R. L. (1991). Examination of six
questionnaires as predictors of psychokinesis performance. Journal of
Parapsychology, 55, 119-145.
Haraldsson, E. (1970). Subject selection in a machine precognition test.
Journal of Parapsychology, 34, 182-191.
Heseltine, G. L. (1977). Electronic random number generator operation
associated with EEG activity. Journal of Parapsychology, 41, 103-118.
Heseltine, G. L., & Mayer-Oakes, S. A. (1978). Electronic random generator
operation and EEG activity: Further studies. Journal of Parapsychology,
42, 123-136.
Heseltine, G. L., & Kirk, J. (1980). Examination of a majority-vote
technique. Journal of Parapsychology, 44, 167-176.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 76
Hill, S. (1977). PK effects by a single subject on a binary random number
generator based on electronic noise. Research in Parapsychology 1976,
26-28.
Honorton, C. (1971). Group PK performance with waking suggestions for
muscle tension/ relaxation and active/ passive concentration.
Proceedings of the Parapsychological Association, 8, 14-15.
Honorton, C. (1971). Automated forced-choice precognition tests with a
"sensitive". Journal of the American Society for Psychical Research, 65,
476-481.
Honorton, C., & Barksdale, W. (1972). PK performance with waking
suggestions for muscle tension vs. relaxation. Journal of the American
Society for Psychical Research, 66, 208-214.
Honorton, C., Ramsey, M., & Cabibbo, C. (1975). Experimenter effects in
ESP research. Journal of the American Society for Psychical Research,
69, 135-139.
Honorton, C., & May, E. C. (1976). Volitional control in a psychokinetic task
with auditory and visual feedback. Research in Parapsychology 1975,
90-91.
Honorton, C. (1977). Effects of meditation and feedback on psychokinetic
performance: A pilot study with an instructor of Transcendental
Meditation. Research in Parapsychology 1976, 95-97.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 77
Honorton, C., & Tremmel, L. (1980). Psitrek: A preliminary effort toward
development of psi-conducive computer software. Research in
Parapsychology 1979, 159-161.
Honorton, C., Barker, P., & Sondow, N. (1983). Feedback and participant-
selection parameters in a computer RNG study (RB). Research in
Parapsychology 1982, 157-159.
Honorton, C. (1987). Precognition and real-time ESP performance in a
computer task with an exceptional subject. Journal of Parapsychology,
51, 291-320.
Houtkooper, J. M. (1976). Psychokinesis, clairvoyance and personality
factors. Proceedings of Presented Papers: The Parapsychological
Association 19th Annual Convention, 1-15.
Houtkooper, J. M. (1977). A study of repeated retroactive psychokinesis in
relation to direct and random PK effects. European Journal of
Parapsychology, 1, 1-20.
Houtkooper, J. M., Schienle, A., Vaitl, D., & Stark, R. (1999). Atmospheric
electromagnetism: An attempt at replicating the correlation between
natural sferics and ESP. Proceedings of Presented Papers: The
Parapsychological Association 42nd Annual Convention, 123-135.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 78
Jacobs, J. C., Michels, J. A. G., Millar, B., & Millar-De Bruyne, M.-L. F. L.
(1987). Building a PK trap: The adaptive trial speed method.
Proceedings of Presented Papers: The Parapsychological Association
30th Annual Convention, 348-370.
Jahn, R. G., Dunne, B. J., Dobyns, Y. H., Nelson, R. D., & Bradish, G. J.
(1999). ArtREG: A random event experiment utilizing picture-preference
feedback. (Technical Note PEAR 99003). Princeton NJ 08544: PEAR
Laboratory, School of Engineering and Applied Science.
Jahn, R. G., Mischo, J., Vaitl, D., Dunne, B. J., Bradish, G. J., Dobyns, Y. H.,
Lettieri, A., Nelson, R. D., Boller, E., Bösch, H., Houtkooper, J. M., &
Walter, B. (2000). Mind/Machine interaction consortium: PortREG
replication experiments. Journal of Scientific Exploration, 14, 499-555.
Kelly, E. F., & Kanthamani, B. K. (1972). A subject's efforts toward
voluntary control. Journal of Parapsychology, 36, 185-197.
Kelly, E. F., & Lenz, J. (1976). EEG correlates of trial-by-trial performance in
a two-choice clairvoyance task: A preliminary study. Research in
Parapsychology 1975, 22-25.
Kugel, W., Bauer, B., & Bock, W. (1979). Versuchsreihe Telbin.
[Experimental series Telbin]. (Arbeitsbericht 7). Berlin: Technische
Universität Berlin.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 79
Kugel, W. (1999). Amplifying precognition: Four experiments with roulette.
Proceedings of Presented Papers: The Parapsychological Association
42nd Annual Convention, 136-146.
Levi, A. (1979). The influence of imagery and feedback on PK effects.
Journal of Parapsychology, 43, 275-289.
Lignon, Y., & Faton, L. (1977). Le factor psi sexerce sur un appareil
électronique. [The psi factor affects an electronic apparatus]. Psi-
Réalité, 54-62.
Lounds, P. (1993). The influence of psychokinesis on the randomly-
generated order of emotive and non-emotive slides. Journal of the
Society for Psychical Research, 59, 187-193.
Lucadou, W. v. (1991). Locating Psi-bursts - correlations between
psychological characteristics of observers and observed quantum
physical fluctuations. Proceedings of Presented Papers: The
Parapsychological Association 34th Annual Convention, 265-281.
Mabilleau, P. (1982). Electronic dice: A new way for experimentation in
"psiology". Le Bulletin PSILOG, 2, 13-14.
Matas, F., & Pantas, L. (1971). A PK experiment comparing meditating vs.
nonmeditating subjects. Proceedings of the Parapsychological
Association, 8, 12-13.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 80
May, E. C., & Honorton, C. (1976). A dynamic PK experiment with Ingo
Swann. Research in Parapsychology 1975, 88-89.
Michels, J. A. G. (1987). Consistent high scoring in self-test PK experiments
using a stopping strategy. Journal of the Society for Psychical Research,
54, 119-129.
Millar, B., & Broughton, R. (1976). A preliminary PK experiment with a
novel comuter-linked high speed random number generator. Research
in Parapsychology 1975, 83-84.
Millar, B., & Mackenzie, P. (1977). A test of intentional vs unintentional PK.
Research in Parapsychology 1976, 32-35.
Millar, B., & Broughton, R. S. (1977). An investigation of the psi
enhancement paradigm of Schmidt. Research in Parapsychology 1976,
23-25.
Millar, B. (1983). Random bit generator experimenten. Millar-replicatie.
[Random bit generator experiments. Millar's replication]. SRU-Bulletin,
8, 119-123.
Morris, R., Nanko, M., & Phillips, D. (1978). Intentional observer influence
upon measurements of a quantum mechanical system: A comparison of
two imagery strategies. Program and Presented Papers:
Parapsychological Association 21st Annual Convention, 266-275.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 81
Morris, R. L., & Reilly, V. (1980). A failure to obtain results with goal-
oriented imagery PK and a random event generator with varying hit
probability. Research in Parapsychology 1979, 166-167.
Morris, R. L., & Harnaday, J. (1981). An attempt to employ mental practice
to facilitate PK. Research in Parapsychology 1980, 103-104.
Morris, R. L., & Garcia-Noriega, C. (1982). Variations in feedback
characteristics and PK success. Research in Parapsychology 1981, 138-
140.
Morrison, M. D., & Davis, J. W. (1978). PK with immediate, delayed, and
multiple feedback: A test of the Schmidt model's predictions. Program
and Presented Papers: Parapsychological Association 21st Annual
Convention, 97-117.
Nanko, M. (1981). Use of goal-oriented imagery strategy on a
psychokinetic task with "selected" subjects. Journal of the Southern
California Society for Psychical Research, 2, 1-5.
Nelson, R. D. (1994). Effect size per hour: A natural unit for interpreting
anomalies experiments. (Technical Note PEAR 94003). Princeton
University, Princeton, NJ 08544: Princeton Engineering Anomalies
Research, School of Engineering/ Applied Science.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 82
Palmer, J., & Perlstrom, J. R. (1986). Random event generator PK in
relation to task instructions: A case of "motivated" error? The
Parapsychological Association 29th Annual Convention, 131-147.
Palmer, J. (1995). External psi influence of ESP task performance.
Proceedings of Presented Papers: The Parapsychological Association
38th Convention, 270-282.
Palmer, J., & Broughton, R. S. (1995). Performance in a computer task with
an exceptional subject: A failure to replicate. Proceedings of Presented
Papers: The Parapsychological Association 38th Convention , 289-294.
Palmer, J. (1998). ESP and REG PK with Sean Harribance: Three new
studies. Proceedings of Presented Papers: The Parapsychological
Association 41st Annual Convention, 124-134.
Pantas, L. (1971). PK scoring under preferred and nonpreferred conditions.
Proceedings of the Parapsychological Association, 8, 47-49.
Pare, R. (1983). Random bit generator experimenten. Pare-replicatie.
[Random bit generator experiments. Pare's replication]. SRU-Bulletin, 8,
123-128.
Randall, J. L. (1974). An extended series of ESP and PK tests with three
English schoolboys. Journal of the Society for Psychical Research, 47,
485-494.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 83
Reinsel, R. (1987). PK performance as a function of prior stage of sleep
and time of night. Proceedings of Presented Papers: The
Parapsychological Association 30th Annual Convention, 332-347.
Schechter, E. I., Barker, P., & Varvoglis, M. P. (1983). A preliminary study
with a PK game involving distraction from the Psi task (RB). Research in
Parapsychology 1982, 152-154.
Schechter, E. I., Honorton, C., Barker, P., & Varvoglis, M. P. (1984).
Relationships between participant traits and scores on two computer-
controlled RNG-PK games. Research in Parapsychology 1983, 32-33.
Schmeidler, G. R., & Borchardt, R. (1981). Psi-scores with random and
pseudo-random targets. Research in Parapsychology 1980, 45-47.
Schmidt, H. (1969). Anomalous prediction of quantum processes by some
human subjects. (D1-82-0821). Plasma Physics Laboratory.
Schmidt, H. (1970). A PK test with electronic equipment. Journal of
Parapsychology, 34, 175-181.
Schmidt, H., & Pantas, L. (1972). Psi tests with internally different
machines. Journal of Parapsychology, 36, 222-232.
Schmidt, H. (1972). An attempt to increase the efficiency of PK testing by
an increase in the generation speed. Proceedings of Presented Papers:
Parapsychological Association 15th Annual Convention
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 84
Schmidt, H. (1973). PK tests with a high-speed random number generator.
Journal of Parapsychology, 37, 105-118.
Schmidt, H. (1974/a). PK effect on random time intervals. Research in
Parapsychology 1973, 46-48.
Schmidt, H. (1974/b). Comparison of PK action on two different random
number generators. Journal of Parapsychology, 38, 47-55.
Schmidt, H. (1975). PK experiment with repeated, time displaced
feedback. Proceedings of Presented Papers: The Parapsychological
Association 18th Annual Convention.
Schmidt, H. (1976). PK effects on pre-recorded targets. Journal of the
American Society for Psychical Research, 70, 267-291.
Schmidt, H., & Terry, J. (1977). Search for a relationship between
brainwaves and PK performance. Research in Parapsychology 1976, 30-
32.
Schmidt, H. (1978). Use of stroboscopic light as rewarding feedback in a
PK test with pre-recorded and momentarily generated random events.
Program and Presented Papers: Parapsychological Association 21st
Annual Convention, 85-96.
Schmidt, H. (1990). Correlation between mental processes and external
random events. Journal of Scientific Exploration, 4, 233-241.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 85
Schouten, S. A. (1977). Testing some implications of a PK observational
theory. European Journal of Parapsychology, 1, 21-31.
Stanford, R. G. (1981). "Associative activation of the unconscious" and
"visualization" as methods for influencing the PK target: A second study.
Journal of the American Society for Psychical Research, 75, 229-240.
Stanford, R. G., & Kottoor, T. M. (1985). Disruption of attention and PK-
task performance. Proceedings of Presented Papers: The
Parapsychological 28th Annual Convention, 117-132.
Stewart, W. C. (1959). Three new ESP tests machines and some
preliminary results. Journal of Parapsychology, 23, 44-48.
Talbert, R., & Debes, J. (1981). Time-displacement psychokinetic effects
on a random number generator using varying amounts of feedback.
Program and Presented Papers: Parapsychological Association 24th
Annual Convention.
Tedder, W., & Braud, W. (1981). Long-distance, nocturnal psychokinesis.
Research in Parapsychology 1980, 100-101.
Tedder, W. (1984). Computer-based long distance ESP: An exploratory
examination. Research in Parapsychology 1983, 100-101.
Thouless, R. H. (1971). Experiments on psi self-training with Dr. Schmidt's
pre-cognitive apparatus. Journal of the Society for Psychical Research,
46, 15-21.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 86
Tremmel, L., & Honorton, C. (1980). Directional PK effects with a
computer-based random generator system: A preliminary study.
Research in Parapsychology 1979, 69-71.
Varvoglis, M. P. (1988). A "psychic contest" using a computer-RNG task in
a non-laboratory setting. Proceedings of Presented Papers: The
Parapsychological Association 31st Annual Convention, 36-52.
Verbraak, A. (1981). Onafhankelijke random bit generator experimenten -
Verbraak-replicatie. [Independent random bit generator experiments -
Verbraak's replication]. SRU-Bulletin, 6, 134-139.
Weiner, D. H., & Bierman, D. J. (1979). An observer effect in data analysis?
Research in Parapsychology 1978, 57-58.
Winnett, R. (1977). Effects of meditation and feedback on psychokinetic
performance: Results with practitioners of Ajapa Yoga. Research in
Parapsychology 1976, 97-98.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 87
Table 1
Main Results of Radin & Ferrari's (1991) Dice Meta-Analysis
N ¯t SE z
Dice-casts "Influenced"
All studies 148 0.50610 .00031 19.68***
All studies, quality weighted 148 0.50362 .00036 10.18***
Balanced studies 69 0.50431 .00055 7.83***
Balanced studies, homogenous 59 0.50158 .00061 2.60***
Balanced studies, homogenous, quality
weighted
59 0.50147 .00063 2.33***
Dice-casts Control
All studies 31 0.50047 .00128 0.36***
Note. The z-value is based on the null value of ¯t = .5
* p < .05. ** p < .01. *** p < .001
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 88
Table 2
Previous PK Meta-analyses - Total Samples
N ¯o SE z mean
Dice
1991 Meta-analysis 148 .50822 .00041 20.23*** .51105
RNG
1989 First meta-analysis 597 .50018 .00003 6.53*** .50414
1997 First MA without PEAR data 339 .50061 .00009 6.41*** .50701
2000 Second meta-analysis 515 .50005 .00001 3.81*** .50568
Note. mean = the unweighted averaged effect size of studies
*** p < .001
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 89
Table 3
Basic Study Characteristics - Intentional Studies
Studies
(n)
Studies
(n)
Source of studies Year of publication
Journal 276 1960 2
Conference proceeding 63 1961 - 1970 14
Report 16 1971 - 1980 202
Thesis/Dissertation 2 1981 - 1990 101
Number of participants 1991 - 2000 37
1 91 > 2000 1
>1 - 10 101 Sample size (bit)
>10 - 20 52 >10 - 100 8
>20 - 30 33 >100 - 1,000 74
>30 - 40 13 >1,000 - 10,000 123
>40 - 50 12 >10,000 - 100,000 91
>50 - 60 12 >100,000 - 1,000,000 42
>60 - 70 2 >1,000,000 - 10,000,000 7
>70 - 80 3 >10,000,000 - 100,000,000 8
>80 - 90 2 >100,000,000 - 1,000,000,000 3
>90 - 100 2 >1,000,000,000 -
10,000,000,000
1
>100 7
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 90
Table 4Overall, Trimmed Sample and Moderator Variables Summary Statistics
Variable and class n ¯o SE z M bit
Mdnbit
M py
M sub. Mdnsub.
2 z (rnd)
Intentional overall 357 .500029 .000011 2.73*** 6095359 6400 1979 20 10 1442.89*** .06Trimmed
Homogeneous 287 .500017 .000011 1.54*** 6824190 6400 1981 22 10 323.96***Heterogeneous 70 .500136 .000034 4.02*** 3107153 6593 1976 14 4 1107.88*** .07
Sample size (bit)(Q1) Smallest 89 .523500 .002655 8.85*** 446 320 1979 23 10 318.92*** .58(Q2) Small 91 .505519 .000914 6.03*** 3683 4000 1978 19 10 269.52*** .45(Q3) Large 91 .503249 .000421 7.71*** 16726 15360 1979 14 10 419.04*** .42(Q4) Largest 86 .500026 .000011 2.43*** 25280771 200000 1983 25 11 262.79*** .13
Year of publication(Q1) Oldest 90 .505356 .000394 13.59*** 18553 4364 1971 24 6 636.32*** .54(Q2) Old 97 .500178 .000147 1.21*** 120378 8000 1977 16 10 255.24*** .09(Q3) New 85 .500292 .000247 1.18*** 52993 7500 1981 16 12 122.17*** .18(Q4) Newest 85 .500024 .000011 2.23*** 25390500 18000 1991 25 8 244.02*** .13
Number of participants(Q1) One 91 .500453 .000168 2.70*** 116550 5000 1980 1 1 632.14*** .09(Q2) Few 101 .500043 .000045 .96*** 1232867 4800 1978 7 8 330.26*** .04(Q3) Several 57 .499986 .000039 -.35*** 2863035 12288 1980 16 16 168.60*** -.02(Q4) Many 81 .500024 .000012 2.10*** 22955969 18000 1982 61 40 134.18*** .18Unknown 27 .500627 .000117 5.35*** 677453 7500 1979 143.80*** .20
ParticipantsSelected 55 .500628 .000185 3.40*** 134727 5000 1977 4 1 579.62*** .08Unselected 244 .500027 .000011 2.53*** 8881236 10321 1980 27 16 675.64*** .09Other 58 .500237 .000408 .58*** 27790 1280 1980 8 10 176.85*** .05
Study status Formal 192 .500027 .000011 2.46*** 10744544 7727 1982 26 10 616.39*** .08Pilot 152 .500061 .000049 1.26*** 698876 6400 1978 14 6 801.66*** .02Other 13 .500202 .000191 1.06*** 527819 5000 1977 9 6 23.55*** .15
FeedbackVisual 221 .500017 .000012 1.49*** 8370435 5000 1979 19 10 814.63*** .04Auditory 35 .502357 .000382 6.18*** 50385 17000 1976 14 6 254.83*** .39Other 101 .500085 .000028 3.06*** 3212016 10000 1983 24 10 331.14*** .16
Random sourcesNoise 194 .500028 .000011 2.60*** 11202736 11456 1983 15 10 852.46*** .07Radioactive 96 .502357 .000544 4.33*** 10442 1600 1973 23 6 459.53*** .22Other 67 .500766 .000389 1.97*** 25523 10800 1979 30 11 109.01*** .29
Note. z (rnd) = z-score random model* p < .05. ** p < .01. *** p < .001
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 91
Table 5Safeguard Variables Summary Statistics
Variable and class n ¯o SE z M bit
Mdnbit
M py
M sub. Mdnsub.
2 z (rnd)
RNG controlYes (2) 246 .500031 .000011 2.82*** 8448064 10000 1981 18 10 866.46*** .08Earlier (1) 7 .499996 .000052 -.08*** 13471208 1000 1982 3 2 286.75*** .00No (0) 104 .499979 .000377 -.06*** 33856 4048 1977 26 10 289.22*** .00
All data reportedYes (2) 289 .500028 .000011 2.58*** 7477090 6400 1980 21 10 1352.94*** .06Unclear (1) 11 .501074 .000537 2.00*** 80726 37000 1976 30 10 16.75***No (0) 57 .500204 .000134 1.52*** 250459 7200 1978 18 10 67.71***
Split of dataPreplanned (2) 233 .500064 .000043 1.48*** 583819 6400 1980 19 10 734.71*** .05Unclear (1) 45 .500026 .000011 2.32*** 45275647 25600 1983 14 8 173.27*** .14Not preplanned
(0)79 .501111 .000314 3.53*** 33029 4400 1977 29 10 522.33*** .14
Safeguard sum-scoreSum = 6
(highest)138 .500285 .000098 2.90*** 187695 6144 1982 21 10 455.05*** .13
Sum = 5 47 .500025 .000011 2.27*** 45373113 48000 1983 18 10 210.59*** .13Sum = 4 104 .500178 .000137 1.30*** 143255 6400 1979 12 6 391.64*** .07Sum = 3 46 .500303 .000339 .89*** 55091 1600 1976 43 17 105.72*** .09Sum = 2 8 .517376 .002755 6.31*** 4185 1472 1978 2 1 220.96*** .36Sum = 1 4 .500000 .026352 .00*** 120 120 1977 10 10 .00***Sum = 0
(lowest)10 .503252 .001344 2.42*** 13840 16000 1975 28 10 4.75***
Note. z (rnd) = z-score random model* p < .05. ** p < .01. *** p < .001
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 92
Table 6
Summary of Weighted Stepwise Linear Meta-Regression Analysis for Variables Predicting Effect size of RNG Studies
Step and variable B SE B ? t R2
Step 1
Year of publication -.000025 .000007 -.180 -3.45*** .032
Step 2
Year of publication -.000021 .000007 -.153 -2.88***
Auditory feedback .001856 .000770 .128 2.41*** .048
* p < .05. ** p < .01. *** p < .001
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 93
Table 7
Potential Sources of the Small-Study Effect
True heterogeneity
Different intensity/quality
Different participants
Different feedback
Other moderator(s)
Data irregularities
Poor methodological design
Effect intrinsically irreplicable
Inadequate analysis
Fraud
Selection biases
Biased inclusion criteria
Publication bias
Chance
The table is adapted from Sterne, Egger, & Smith, 2001, p. 193.
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 94
Table 8
Stepped Weight Function Monte Carlos Simulation of Publication Bias
n ¯o SE z Stud z? 2
Overall 357 0.500027 0.000021 2.48*** 1453 0.36 592.19***
Sample size
(Q1) Smallest 89 0.514658 0.004908 5.80*** 364 1.58 116.52***
(Q2) Small 91 0.505118 0.001692 5.92*** 367 0.21 115.21***
(Q3) Large 91 0.502462 0.000794 6.07*** 372 0.88 113.27***
(Q4) Largest 86 0.500024 0.000021 2.22*** 350 0.08 138.64***
Note. Stud = number of studies which did not get published (simulated); z? = difference between effect size of the simulated and
experimental data
* p < .05. ** p < .01. *** p < .001
Human Intention Interacting with Random Number Generators (Printed:
27 August 2004) 95
Figure 1Funnel Plot Intentional Studies
Three very extreme values (pi > .70; n < 400) are omitted form the figure for the sake of better representation of all other values.
10
100
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000
10,000,000,000
0.35 0.4 0.45 0.5 0.55 0.6 0.65Effect size (pi)
Samp
le siz
e (Nu
mber
of b
its)