+ All Categories
Home > Documents > Rhythm perception in repetitive sound sequence

Rhythm perception in repetitive sound sequence

Date post: 25-Dec-2021
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
13
J. Acoust. Soc. Jpn. (E) 4, 2 (1983) Rhythm perception in repetitive sound sequence Seishi Hibi Research Institute of Logopedics and Phoniatrics, Faculty of Medicine, University of Tokyo, 7-3-1, Hongo, Bunkyo-ku, Tokyo, 113 Japan (Received 30 April 1982) In perception of regular sequence of stimulus sounds that is neither too rapid nor too slow, we tend to perceive it as rhythmic. In the present study, we investigated the degree of temporal distortion intervened in the sequence which was needed in order for listen- ers to be able to report an irregularity, and we also investigated the timing patterns of the repetition of monosyllables as a response to the regular sound sequence. The degree of distortion which afforded 50% judgments of detection was found to be about 6% in the region of rates slower than 3 times per second, and about 7.6•`8.9% depending upon the type of distortion in the region of rates more rapid than 4 times per second. Another experiment showed that a negative correlation between the time intervals of neighboring vicinity which suggested an adjusting mechanism was found only in the region of rates slower than 3 times per second. From the results, the author tentatively concludes that the ongoing processing mechanism works in the region of rates slower than 3 times per second and the wholistic processing mechanism works in the region of rates more rapid than 3 times per second. PACS number: 43. 66. Mk, 43. 66. Lj, 43. 70. Dn 1. INTRODUCTION There have been many studies which have eluci- dated both qualitative and quantitative character- istics of rhythmic activities. In perception of a regular sequence of stimuli that is neither too rapid nor too slow, we tend to perceive it as rhythmic.1,2) The time interval between pulses has to be greater than about 0.1 s in order to be heard as a succession of pulses, and the interval has to be less than about 3.0 s in order to be heard as a group of pulses.3) The perceptually "preferred" rate of succession, which is indicated by the time interval between successive clicks, is between 0.2 and 1.3 s.2) The discrimination of two empty intervals each of which is bounded by a pair of clicks is found to be most accurate at intervals of 0.6 and 0.8 s, and at these lengths the just noticeable relative difference is slightly less than 8% of the standard, and increases both above and below this middle region.2) How- ever, if the temporal regularity of the series is distorted or perturbed, we hear some irregularity therein and it is perceived as arhythmic. On production of rhythm, many studies have investigated the rate of succession of rhythmic beats. Several studies have shown interstress inter- vals in the stress-timed language fell between 0.2 and 0.8 s.4,5) In French, the rate of succession of syllables is about 0.15 to 0.2 s per syllable, and the number of syllables in an utterance group is 2 to 11.6) In production of non-speech rhythm, the time interval between the key notes in a musical com- position is statistically found to be between 0.15 and 0.9 s.1) In terms of other characteristics of rhythmic action, various researchers have shown that the overall range of standard errors was about 3 to 11% of the length of the interval, when the subject pro- duced an even tempo.1,2,7-9) In speech, short seg- ments have variability of about 10%, longer stretch- es of speech about 4%.3) The argument so far is that we 83
Transcript

J. Acoust. Soc. Jpn. (E) 4, 2 (1983)

Rhythm perception in repetitive sound sequence

Seishi Hibi

Research Institute of Logopedics and Phoniatrics,Faculty of Medicine, University of Tokyo,7-3-1, Hongo, Bunkyo-ku, Tokyo, 113 Japan

(Received 30 April 1982)

In perception of regular sequence of stimulus sounds that is neither too rapid nor too

slow, we tend to perceive it as rhythmic. In the present study, we investigated the degree

of temporal distortion intervened in the sequence which was needed in order for listen-

ers to be able to report an irregularity, and we also investigated the timing patterns

of the repetition of monosyllables as a response to the regular sound sequence. The

degree of distortion which afforded 50% judgments of detection was found to be

about 6% in the region of rates slower than 3 times per second, and about 7.6•`8.9%

depending upon the type of distortion in the region of rates more rapid than 4 times

per second. Another experiment showed that a negative correlation between the time

intervals of neighboring vicinity which suggested an adjusting mechanism was found

only in the region of rates slower than 3 times per second. From the results, the author

tentatively concludes that the ongoing processing mechanism works in the region of

rates slower than 3 times per second and the wholistic processing mechanism works in

the region of rates more rapid than 3 times per second.

PACS number: 43. 66. Mk, 43. 66. Lj, 43. 70. Dn

1. INTRODUCTION

There have been many studies which have eluci-

dated both qualitative and quantitative character-

istics of rhythmic activities. In perception of a

regular sequence of stimuli that is neither too rapid

nor too slow, we tend to perceive it as rhythmic.1,2)

The time interval between pulses has to be greater

than about 0.1 s in order to be heard as a succession

of pulses, and the interval has to be less than about

3.0 s in order to be heard as a group of pulses.3)

The perceptually "preferred" rate of succession,which is indicated by the time interval betweensuccessive clicks, is between 0.2 and 1.3 s.2) Thediscrimination of two empty intervals each of whichis bounded by a pair of clicks is found to be mostaccurate at intervals of 0.6 and 0.8 s, and at theselengths the just noticeable relative difference isslightly less than 8% of the standard, and increasesboth above and below this middle region.2) How-ever, if the temporal regularity of the series is

distorted or perturbed, we hear some irregularitytherein and it is perceived as arhythmic.

On production of rhythm, many studies haveinvestigated the rate of succession of rhythmicbeats. Several studies have shown interstress inter-vals in the stress-timed language fell between 0.2and 0.8 s.4,5) In French, the rate of succession ofsyllables is about 0.15 to 0.2 s per syllable, and thenumber of syllables in an utterance group is 2 to11.6) In production of non-speech rhythm, the timeinterval between the key notes in a musical com-

position is statistically found to be between 0.15and 0.9 s.1)

In terms of other characteristics of rhythmicaction, various researchers have shown that theoverall range of standard errors was about 3 to 11%of the length of the interval, when the subject pro-duced an even tempo.1,2,7-9) In speech, short seg-ments have variability of about 10%, longer stretch-es of speech about 4%.3)

The argument so far is that we perceive some

83

J. Acoust. Soc. Jpn. (E) 4, 2 (1983)

sequential sounds as rhythmic when they haveregular sequential time intervals and we produceregular sequential time intervals when we actrhythmically. The rate of succession within a rangeof so-called perceptually "preferred" rates matchesthe rate of succession of rhythmic activities. More-over, the variability of production of rhythmicactivities is not far different from the most ac-curate level of discrimination of time intervals.

Since it is the general notion that production ofrhythm and perception thereof are dynamicallycoupled, there should be a mechanism which governsthe timing controls of temporal sequences in both

perception and production of rhythm. The objec-tive of the present study is, therfore, to explore anidentical timing-related threshold which would becommon both in perception and production ofrhythm.

2. PERCEPTION OF RHYTHM

The problem for experimental investigation hereconcerns the ability of listeners to tell whether ornot there is a distortion in a sequence. We thereforewish to know how greatly a temporal distortionmust intervene in uniformly spaced members ofsounds in order for a listener to be able to report anirregularity.

2.1 Experiment A2.1.1 Method

(1) Stimulus materialsUniformly spaced temporal sequences consisting

of 15 tone bursts, each of which was 1 kHz in fre-

quency and 5 ms in duration and with differentrates of succession ranging from 0.7 to 7.0 times

per second, were used as basic (i.e. undistorted)sequences. The rates of succession were 0.7, 1.0,1.3, 2.0, 2.5, 3.0, 4.0, 5.0, 6.0 and 7.0 times persecond. Translated into a time interval measure,they were 1,500, 1,000, 750, 500, 400, 333, 250, 200,167 and 143 ms, respectively. Distorted versions ofeach basic sequence were prepared as follows. Oneinterval in the sequence was either lengthened orshortened by 2, 4, 6, 8, 10, 12, 14 and 16% of thebasic interval, yielding 16 distorted versions of thesequence. Figure 1 shows the examples of lengthen-ing and shortening of the interval in the sequenceschematically.

The distorted time interval appeared in serial

position 7, 8, or 9. Namely, the length of a time

Fig. 1 Examples of distorted interval in Ex-

periment A.

The length of the time interval bounded by

the 7th and 8th tone bursts is either

lengthened (middle) or shortened (bottom)

by ‡™T.

interval bounded by the 7th and 8th tone bursts,by the 8th and the 9th, or by the 9th and the 10th,was either lengthened or shortened. There weretherefore a total of 480 distorted versions, depend-ing upon the serial position of the distorted interval.With three copies of each basic sequence versionincluded, there were therefore a total of 510 se-

quences in the experiment. The 510 sequences wereassembled into three blocks, and the experimentalconditions were assigned quasirandomly in such afashion that experimental conditions were distributedevenly within and between blocks.

All of the sequences were prepared by means ofPDP 11/34 computer routines as follows. Assignedthe following three parameters, i.e. a basic timeinterval which corresponded to a rate of successionof stimulus sequence, a degree* of distortion, anda serial position where the distortion was placed,then a digitized tone was read out repeatedly withthe aid of a built-in clock. The synthesized sequenceof the digitized tones which were sampled in 10 kHzwas digital-to-analog converted, and then passedthrough a low-pass filter. The cutoff frequency ofthe low-pass filter was 4 kHz. The tone burstsequences were recorded on videotape by PCMrecording system.

(2) Subjects and procedureThree subjects were employed in the present ex-

periment: two male adults who were skilled in

*The degree of distortion is defined as ⊿T/T, where

⊿T is the length of the time interval either added or

subtracted, and T is the length of the basic time

interval.

84

S. HIBI: RHYTHM PERCEPTION IN REPETITIVE SOUND SEQUENCE

listening tests of synthesized speech, and a femaleotolaryngologist. All had normal hearing. Thesubjects participated in a total of 12 sessions, eachof which lasted approximately 40 min. There wasa rest interval after every one-third of the session.During each session, the subjects sat in a sound-

proof room and listened the test stimuli through aloudspeaker. Tone bursts were set at 75 dB SPL.*

The subjects were required to judge whether therewas a distortion in the sequence by writing either"detected" or "not detected ." The choice betweenthese two alternatives was forced. A pause of 7 sin duration was placed between each presentationof the sequence for the subjects' responses.

2.1.2 Results

(1) Data analysisFor each of the three subjects, the number of

responses "detected" out of a total of 12 trials wasrecorded. Then the average percentage of judg-ments of the "detected" distortions for all subjectswas plotted on normal probability paper, on theassumption that the function relating the probabilitythat a subject would respond "detected" to thedegree of distortion would be a normal ogive. Oneexample of the values is presented in Fig. 2, whereeach asterisk (*) represents the mean relativefrequency of judgments "detected" among 12 trialson the 3 subjects.

The abscissa of Fig. 2 shows the degree of distor-tion in percentage. The assumption that the rela-tion between response probability and the degreeof distortion would be a normal ogive appearstenable because the distributions for all of thedifferent conditions appear to fall reasonably wellon a straight line. The line in the figure representsstraight-line fits to the data by means of the least-square solution using Muller-Urban weights.The degree of distortion which afforded 50%

judgments of detection (hereafter, D50) for eitherthe lengthened or shortened sequence of each rateof succession was calculated. These values are

presented in Fig. 3, where each filled circle (●) and

open circle (•Z) represents the degree of distortion

(D50) under the condition of "lengthened" and"shortened

," respectively. The abscissa of the figure

shows the rate of succession ranging from 0.7 to

7.0 times per second in logarithmic scale.

Fig. 2 Probability that distortion will bedetected as a function of the degree ofdistortion.The ordinate represents relative frequencyof judgments "detected," while the abscis-sa shows the degree of distortion. Oneexample of the results is presented herefor "shortened" sequences of "6 times persecond."

The "detected" or "not detected" data weresubmitted to analyses of variance using the above-mentioned D50s as cell entries, and using eithersubjects or serial positions of distortion as samplingunits. Since almost all analyses yielded similarresults, only mean values of the degree of distortion

(D50) across subjects (serial positions of distortion assampling units) are reported below.

The experimental factors in the overall analysis

(see Table 1) were rate of succession (0.7 to 7.0times per second), position of distortion (serial

position 7, 8, or 9), type of distortion ("lengthened"or "shortened"), and interactions among thesefactors.

(2) ResultsAs Fig. 3 shows, the D50 value of the degree of

distortion varied depending upon the rates of suc-cession of the sequences. The results of statisticalanalyses of data show that the "rate of succession"factor was significant. The interaction between"rate" and "type of distortion" reached significance

,although the "type of distortion" factor was notsignificant.

Since we could readily see from the figure thatthere was no great difference among the D50 valueswhich were obtained at the lower rates of succes-sion, that there seemed to be some difference among

*The sound pressure level of test stimulus is expressed

in the case of continuously presented 1,000 Hz sound.

85

J. Acoust. Soc. Jpn. (E) 4, 2 (1983)

Table 1 Analysis of variance for the results obtained in Experiment A, where "rate of suc-

cession," "serial position," and "condition" were selected as main factors.

Fig. 3 Degree of distortion which afforded

50% judgment (D50) as a function of the

rate of succession.

The ordinate contains D50 values, while the

abscissa shows the rate of succession.

Each filled circle (•œ) and open circle (•›)

represents the degree of distortion (D50)

under the condition of "lengthened" and"shortened

," respectively.

higher rate data, and that the difference in D50

values between lower and higher rate data was sig-

nificant, the D50 data were divided into two groups

in terms of rate of succession and analyses of vari-

ance were carried out for the data in each group

separately. The following results were obtained by

the local analyses of variance. First, the variations

of D50 among the lower rates of succession, i.e.

1.0 to 2.5 times per second, were relatively small.

(F ratio was just smaller than the significant F(0.05).) There was no significant difference between

"lengthened" and "shortened" among these rates

of succession. The mean value of D50s among theserates was 6.1%. Second, among the higher rates ofsuccession, i.e. 4.0 to 7.0 times per second, thevariations of D50 were smaller too (F ratio was muchsmaller than F(0.05)), but this time the difference inD50 values between the two types of distortion wassignificant. The mean values of D50s among thesehigher rates were 8.9% and 8.2% for "lengthened"and "shortened," respectively.

Of particular interest here is a difference between"lengthened" and "shortened" at a rate of 3 times

per second of succession. The values of D50 for thedifferent types of distortion differed to a greatextent. A more precise experiment will be intro-duced later in Experiment C.

2.2 Experiment B

2.2.1 Method (stimulus materials, subjects and

procedure)

Whereas a temporal interval in the sequence was

either lengthened or shortened in Experiment A,

a temporal allocation of the tone burst was slightly

changed for Experiment B. In other words, when

the time interval just preceding a tone burst was

lengthened or shortened, the time interval which

succeeded this particular tone burst was shortened

or lengthened by the same amount in order to main-

tain the whole length of the sequence to be the

same. Figure 4 shows this type of temporal distor-

tion, i.e. lengthening/shortening or shortening/

lengthening, of the sequence schematically. The

distorted temporal allocation of a tone burst ap-

peared in serial position 7, 8, or 9 in this experiment

too. Namely, the temporal allocation of the 7th,

8th, or 9th tone burst was moved either forward

or backward by ‡™T.

The method for preparing the tone burst sequence,

86

S. HIBI: RHYTHM PERCEPTION IN REPETITIVE SOUND SEQUENCE

Fig. 4 Examples of distorted temporal al-

location in Experiment B.

The temporal allocation of the 8th tone

burst is moved either backward (middle)

or forward (bottom) by ‡™T.

subjects and procedure were the same as those

described for Experiment A.

2.2.2 Results

The raw data were processed in the same way as

described for Experiment A, and the degrees of

distortion which afforded 50% judgment of detec-

tion (D50) were calculated. These values are pre-

sented in Fig. 5, where each filled square (•¡) and

open square (• ) represents the D50 value of degree

of distortion under the condition of "lengthened/

shortened" and "shortened/lengthened," respec-

tively. The abscissa in the figure shows the rate of

succession in logarithmic scale.

It was found that the D50 value of degree of dis-

tortion varied depending upon the rate of succession

in this experiment too. In the analysis of variance

(see Table 2), the "rate of succession" factor was

also significant here. The difference between "types

of distortion" ("lengthened/shortened" vs. "short-

ened/lengthened") was not significant. Variance for

this factor was negligibly small so that there were

Fig. 5 Degree of distortion which afforded

50% judgment (D50) as a function of the

rate of succession.

The ordinate contains D50 values, while the

abscissa shown the rate of succession.

Each filled square (•¡) and open square

(• ) represents the degree distortion (D50)

under the condition of "lengthened/

shortened" and "shortened/lengthened,"

respectively.

no different D50s between "types of distortion" atall. This result was sound, holding across all ratesof succession except for 3.0 times per second. Al-though variance for this factor was very small, theinteraction between rate and type was significant.

Since we could readily see from the figure thatthere seemed to be no particular variations amongD50 values obtained in the lower regions of rate ofsuccession (i.e. 1.0 to 2.5 times per second), oramong those obtained in the higher regions of rate

Table 2 Analysis of variance for the results obtained in Experiment B, where "rate of

succession," "serial position," and "condition" were selected as main factors.

87

J. Acoust. Soc. Jpn. (E) 4, 2 (1983)

(i.e. 4.0 to 7.0 times per second), the D50 data weredivided into two groups in terms of rate of succes-sion and analyses of variance were carried out forthe data in each group separately. The local analysesof variance showed the following. The variationsof D50 among the lower rates of succession, i.e. 1.0to 2.5 times per second, and those among the higherrates of succession, i.e. 4.0 to 7.0 times per second,were small. There was no significant differencebetween types of distortion both among the above-mentioned lower and higher regions of rate ofsuccession. The mean values of D50 were 6.0% and7.6% for the lower rates and the higher rates ofsuccession, respectively. Almost the same values ofD50 for the lower rates of succession were found inboth Experiments A and B.

2.3 Experiment C

Of particular interest in the results of Experi-

ment A is, as mentioned earlier, the difference

between the DL values which were obtained by

lengthening and shortening at a rate of 3 times per

second.

In addition to Experiment A, the present Experi-

Fig. 6 Degree of distortion which afforded50% judgment (D50) as a function of therate of succession.The ordinate contains D50 values, whilethe abscissa shows the rate of succession.The figure shows the results obtained inExperiment C, and it also carries the sameresults as those shown in Fig. 3 addi-tionally.

ment C was carried out in order to explore a

threshold by which "detected" or "not detected"

judgment would be divided into two categories.

The stimulus materials were prepared in the same

way as described in Experiment A. The intervals of

the stimulus sequence were, however, set at 370,

350, 330, 310 and 290 ms this time. The other

conditions were exactly the same as in Experiment A.

The results are shown in Fig. 6. The D50 value for

either lengthened or shortened sequences of each

rate of succession is represented by a filled circle

(•œ) and open circle (•›), respectively. The abscissa

of the figure shows the basic rate of succession in

logarithmic scale. The figure shows the results of

Experiment C, and it also carries the same data as

those shown in Fig. 3 additionally. In the figure,

we can see the difference in D50 values between

"lengthened" and "shortened" at around the rate

of 3 times per second. The lengthened interval

might act as the slower rate of succession and the

shortened interval vice versa, even if the rest of the

intervals were left unchanged. Figure 7 is redrawn

so as to make it clearer in order to show what

happens in this region. The abscissa of the figure

shows the particular time interval which is either

lengthened or shortened by the above-mentioned

degree in logarithmic scale. The ordinate contains

Fig. 7 Degree of distortion which afforded

50% judgment D50 as a function of the

length of the distorted interval.

The ordinate contains D50 values, while

the abscissa shows the length of the

distorted interval. Each filled circle (•œ)

and open circle (•›) represents the degree

of distortion (D50) under the condition

of "lengthened" and "shortened," respec-

tively.

88

S. HIBI: RHYTHM PERCEPTION IN REPETITIVE SOUND SEQUENCE

D50 values. The figure shows that the time interval

of about 330 ms forms the boundary between thetwo categories.

3. PRODUCTION OF RHYTHM

The problem for experimental investigation here

concerns the ability of speakers to produce a rhyth-mic sequence with as few fluctuations as possible.

We therefore this time wish to know how accuratelywe can produce a temporal sequence.

3.1 Experiment D3.1.1 Method

The stimulus signals were uniformly spaced tem-

poral sounds which have a rate of repetition rangingfrom 1 to 6 times per second. A sequence consistingof 100 tone bursts, each of which was 1 kHz in fre-

quency and 5 ms in duration, was recorded inadvance on audio tape to serve as signal stimuli.

Nine subjects participated in the present experi-ment: seven male and two female adults, between25 and 45 years old. The subjects were requestedto repeat the monosyllable /pa/ at least 50 times intime with the pre-recorded signal stimuli, and theirutterances and stimulus signals were recorded ona 2-channel tape recorder simultaneously. Thestimulus signals were presented to the subjects ina sound-proof room through headphones at 60 dBSPL.

The recorded utterances and stimulus signalswere played back and stored digitally on data filesin a laboratory computer. The digital sound wavesof the utterances were displayed on a graphic ter-minal, from which the vowel onset of each utterance

/pa/ was determined visually. The time intervalbetween each utterance was thus obtained from theinter-vowel-onset interval, based on which statisticalanalyses were carried out.3.1.2 Results

(1) Mean and standard deviationThe mean and standard deviation of the time

intervals were calculated from about 40 reproducedutterances for each different rate of succession ofstimulus signal.

The mean time interval of the repetition of thetemporal sequence, as responses in time with stimu-lus signals, was synchronous with the interval ofthe stimulus signals for every rate of succession

(Fig. 8). The differences between the mean timeintervals and the intervals of stimulus signals were

Fig. 8 The relation between the mean timeinterval of the feproduction of the se-

quence as response in time with the stim-ulus sequence (Tres) and the time intervalof the stimulus signals (Tsig).The mean values were calculated fromamong all of 9 subjects.

Fig. 9 The relation between the coefficientof variation and the rate of succession ofthe stimulus signals.The results of 9 subjects were shown bytriangles.

within 2%. In the figure, the ordinate shows themean time interval of the reproduction, and theabscissa shows the interval of the stimulus signals.Both are indicated in logarithmic scale. Standard.deviations were small; namely, coefficients of varia-tion (CV) fell between 3 and 9% almost independ-ent of the rate of succession of stimulus sequence

89

J. Acoust. Soc. Jpn. (E) 4, 2 (1983)

(Fig. 9). In the figure, the ordinate and abscissashow the coefficient of variation in percentage andthe rates of succession of stimulus sequence, re-spectively.

(2) Autocorrelation in reproduction time intervalsThe mean time interval of reproduction of the

temporal sequence was clearly maintained to besynchronous with that of the stimulus signal, al-though each time interval of reproduction deviatedto some extent. It is, therefore, plausible that thereshould be some adjusting mechanism which doeswork in order to maintain this synchronization.This possibility is suggestive of a negative correla-tion between the time intervals of reproduction.

To investigate the possibility, an autocorrelationfunction was introduced. On the assumption thatthe response time sequence would be a regularstochastic process, an autocorrelation function ofthe sequence R(j) is defined as follows.

First, the covariance of the sequence is,

where, Ti is i-th time interval, Ti+j is (i+j)-th timeinterval, T is the mean time interval, and n is thetotal number of time intervals. Of course when

j=0,

is the variance of the sequence. Then, R(j) is ob-tained from C(j) which is normalized by the vari-ance C(0):

This parameter represents the correlation betweentime intervals which were distanced by j. Figure10 show examples of R(j) for the reproductions ofthe sequence in subject A, at rates of 2 times persecond and 5 times per second, respectively. In thefigure, the ordinate contains R(j) values while theabscissa shows j, which is the distance between thetime intervals whose correlation is shown by R(j).R(j) values show a damped oscillation pattern inFig. 10(a). In other words, the absolute values ofR(j) reduce as j increases. Of particular interesthere, R(1) and R(3) in Fig. 10(a) represent negativevalues, so that the neighboring time intervals havenegative correlation. But in Fig. 10(b), we can nolonger see either the damped oscillation pattern orthe negative correlation in R(j) values. R(j) values

(a)

(b)

Fig. 10 The autocorrelation function R(j)as a function of j, where j is the distancebetween two time intervals whose relationis shown by R(j).R(j) values are for the reproduction of thetime sequence at rates of 2 times persecond (a) and 5 time per second (b).Both R(j) values were obtained from thedata of subject A.

appeared randomly around zero. The time intervalscome to be independent of each other, and thedistribution of time intervals is non-correlative.

To clarify the trends, joint histograms are shownin Fig. 11, where the abscissa shows a time intervalwhich is normalized by the mean interval of thesequence (Ti/T) and the ordinate shows the timeinterval distanced by j also normalized by the mean

(Ti+J/T). Namely, each dot in the figure has hori-zontal and vertical coordinates of (Ti/T, Ti+j/T).

In the figure, the value of j is limited within therange from 1 to 4, in order to examine the correla-tions between the neighboring time intervals. Whenthere exist some negative correlations between theneighboring time intervals, most dots fall in thesecond and fourth quadrants and the joint histogramshows an oval shape which inclines toward left.The non-correlative distribution of the time intervals,on the contrary, does not result in such an ovalshape. The results for subject A when he producedthe time sequence at a rate of 2 times per second is,for example, shown in Fig. 11(a), in which we cansee the oval shape. The results for the same subjectwhen he produced the time sequence at a rate of 5times per second is also, for example, shown in Fig.11(b) in which we find only non-correlative dis-tributions.

Table 3 shows the results of inspection by auto-correlation and the joint histogram studies, where

90

S. HIBI: RHYTHM PERCEPTION IN REPETITIVE SOUND SEQUENCE

(a) (b)

Fig. 11 Joint histograms for the reproduction of the time sequence at rates of 2 times persecond (a) and 5 times per second (b).Both figures were obtained from the data of subject A.

Table 3 Results of inspection by means of

joint histogram and autocorrelation stud-ies, where each circle represents that theadjustment mechanism seemed to work.

each circle represents that the negative correlationbetween the neighboring time intervals seemed toexist. In almost all of the subjects, the negativecorrelations were found only in the region of thelower rates of reproduction (i.e. 1.0 to 3.0 times persecond).

4. DISCUSSION

In the present experiments the author has at-

tempted to examine the ability of human beings

both in perception and in production of the simplest

rhythm which has an even tempo. Figures 3, 5 and 6

show the results in terms of the Weber ratio ‡™T/T

where ‡™T is the increment or decrement neces-

sary to give an average performance of 50% detec-

tion when added to a basic duration T. In Experi-

ments A, B, and C, Weber's law has, in a sense,

been found to hold approximately for discrimina-

tion of the irregularity in the rhythmic sequence,

but only in the restricted regions of the rate of suc-

cession. Namely, the ability of listeners to report

whether there was a distortion in a sequence as

a function of the rate of succession did not vary in

the regions of from 1 to 2.5 times per second and

from 4 to 7 times per second. However, the differ-

ence in D50 values in terms of the ratio ‡™T/T be-

tween these two regions was significant.

The picture emerging from these results, then, is

of the subjects' accommodating to the temporal

sequences which have different rates of succession

ranging from 0.7 to 7.0 times per second. After

several sessions, the author asked the subjects how

they felt the distorted part, or in which way they

paid attention to the sequence. All answered that

they listened to one tone burst at a time and assigned

the time orientation1) of the next in perception of

time sequence of the lower rates (i.e. from 1.0 to

2.5 times per second). Then, they attempted to look

for a discrepancy between the actual timing and the

predicted one. In perception of sequences of the

higher rates of succession (i.e. from 4.0 to 7.0 times

per second), on the other hand, they heard a group

of tone bursts in which the distorted part might

cause an impression of irregularity. In fact, some

subjects did write down "detected" when they were

only under the impression that there was a "stumbl-

ing" or a "slip" in the sequence.

There might be different sorts of processing in

perception of temporal sequences, namely, one

might be an ongoing or one-by-one processing in

91

J. Acoust. Soc. Jpn. (E) 4, 2 (1983)

the region of lower rates of succession, and another

a cumulative or wholistic processing in the regionof higher rates. This possibility led to the formula-

tion of a modeling hypothesis, based on the resultsof both experiments in perception and in production

of the temporal sequences.

In the rhythmic concept, it is inherent that percep-tion of preceding events in a temporal sequence

generates expectancies concerning forthcomingevents.10) This mechanism appears to work both in

an ongoing processing and in a cumulative pro-cessing of the temporal sequence, although it plays

a different role in each processing.

4.1 Ongoing Processing

In an ongoing processing, one might first predictthe timing of the next coming event, and then,

check the actual event with the predicted one. The

subject then reports "detected" if he finds a dis-crepancy in this checking process. The whole rou-tine, therefore, appears to be an ongoing processing

in the sense that the event is processed one-by-one.

The most direct evidence for the ongoing process-

ing mechanism is that all types of distortion in bothExperiments A and B yielded similar values of D50

when the rate was set at 1.0 to 2.5 times per second.

The ongoing processing mechanism should yield

similar values of D50 both in Experiments A and Bbecause the listeners were required to detect only

the first distorted interval which they encounteredin perception of the sequence. In Experiment B,

once a lengthened (or shortened) interval was

detected, the following shortened (or lengthened)interval could play only a role in confirming the

detection for the listeners. In Experiment A, the"lengthening" and "shortening" of the same degree

could give rise to the same amount of discrepancybetween the actually perceived timing and the

predicted one. It was found, therefore, that the"ongoing processing" mechanism can work only

in perception of the lower rates of succession (i. e.

from 1.0 to 2.5 times per second).

As mentioned earlier in Results of Experiment D,the negative correlation among neighboring time

intervals tended to restore the timing of the follow-ing utterances to their original regular relationship

with the preceding utterances only in reproductionat the lower rates (i.e. 1.0 to 3.0 times per second).

The ongoing processing mechanism accounts forthe experimental results. In order to readjust a

discrepancy between the timing of utterance and

that of stimulus signal in an adjacent or neighbor-

ing utterance, one has to detect the discrepancy in

an "ongoing" manner.

4.2 Wholistic ProcessingIn perception of sequences of the higher rates of

succession (i.e. from 4.0 to 7.0 times per second),on the other hand, the detection of the distorted

interval might require the following two steps.First, it is necessary to postulate a regular temporal

pattern, that is, the pattern of time intervals thatwould yield an even tempo, and then to detect adeparture from these regular values in the observed

intervals by pattern-matching or similar routine.The mechanism of detection of the distorted interval

which requires a two-stage analysis with the pat-tern-matching stage depending upon the preceding

postulating stage bears some resemblances to theanalysis-by-synthesis model of speech perception.11)In this model, the determination of the time interval

pattern is realized by means of a "wholistic" process-ing.

An intuitive interpretation for wholistic process-ing is that one can hardly process the ongoing event

in a period of time as short as less than 250 ms,

although he can predict the timing pattern of the

forthcoming events. In wholistic processing, onemight first predict the timing pattern of the forth-

coming events, but he can hardly check the ongoingevent. He perceives and cumulates the successive

events, and then attempts to compare the actually

perceived pattern with the predicted one. There-fore, the routine appears to be a wholistic process-

ing in the sense that the sequence is processed as awhole.

One of the evidences for the ongoing processingmechanism was, as mentioned earlier, that both

types of distortion, i. e. lengthening and shortening,in Experiment A yielded similar values of D50

among the lower rates of succession. On the con-trary, the greater degree of distortion was needed

in order to afford 50% judgments in the case of

lengthening (8.9%) than in the case of shortening

(8.2%) among the higher rates in Experiment A.The results of the experiment suggest that theongoing processing mechanism could hardly work

among these rates of succession. The more directevidence for the wholistic processing mechanism is

that both lengthening/shortening and shortening/

92

S. HIBI: RHYTHM PERCEPTION IN REPETITIVE SOUND SEQUENCE

Table 4 The calculated probabilities that the distortion was detected at the lengthenedinterval and/or at the shortened interval from the results obtained in Experiment A, whenthe degree of distortion which afforded 50% judgment in Experiment B was placed inthe sequence.

lengthening in Experiment B yielded similar valuesof D50 when the rate was set at 4.0 to 7.0 times per

second, while the "lengthening" and "shortening"did not yield the similar values in Experiment A.

The wholistic processing mechanism should yield

similar values of D50 both in the "lengthened/short-ened" and the "shortened/lengthened" without

regard to the order of the distortion. Anotherevidence for wholistic listening is that differentvalues of D50 were obtained in Experiments A and

B. As the D50 values obtained in Experiment Bwere not far different from those obtained in Ex-

periment A among the lower rates of succession,the succeeding parts of distortion seemed to have

little effect on the discrimination among these ratesin Experiment B. On the contrary, since the D50

values obtained in Experiment B were smaller thanthose obtained in Experiment A among the higher

rates of succession, it could be interpreted that the

succeeding parts enhanced the detectability of dis-tortion among these rates in Experiment B. The

probabilities that the distortion was detected at thelengthened interval and/or at the shortened interval

were calculated from the results obtained in Ex-

periment A, when the degree of distortion whichafforded the 50% judgment in Experiment B was

placed in the sequence. The probabilities werecalculated on the assumption that the detectionof the lengthened (or shortened) interval and that

of the shortened (or lengthened) interval wereindependent phenomena, which can occur simul-

taneously. Let P1 and P2 denote the probabilitiesof the "lengthened" and the "shortened" intervals

being detected in Experiment A, respectively. The

probability that neither the "lengthened" nor the"shortened" interval would be detected was given

by (1-P1)×(1-P2), and then the probability that

the distortion would be detected at the "lengthened"

and/or at the "shortened" interval was obtained

by 1-(1-P1)×(1-P2) (see Table 4). The calculated

probabilities were nearly equal to 0.5 and appeared

to match the observed ones. The distortion in

Experiment B was, therefore, detected at the length-

ened (or shortened) and/or the shortened (or length-

ened) intervals. The present results which show the

difference between Experiments A and B thus sug-

gest a "wholistic" processing in the higher rates of

succession.

As mentioned earlier in Results of Experiment D,

the negative correlation among neighboring time

intervals disappeared in reproduction at the higher

rates of succession (i.e. 4.0 to 6.0 times per second).

Although the rate of reproduction was fixed at al-

most the same as that of the signal stimuli, the

detection of a time-lead or a time-lag by means of

ongoing processing was no longer found. However,

there seemed to be another kind of adjusting mecha-

nism which contributed toward the synchronization

between the reproduction and the signal stimuli,

since the wholistic processing mechanism could

analyze a dismatching of the pattern in due course.

The discussion so far makes it clear that different

processing mechanisms govern the timing-related

controls at the lower and higher rates of succession

both in perception and in production of a rhythmic

sequence. However, the measurements of D50

value at a rate of 3 times per second fell into two

categories, namely, the D50 for the "lengthened"

93

J. Acoust. Soc. Jpn. (E) 4, 2 (1983)

sequence showed a value which was similar to those

obtained in the lower rate group, and the D50 for

the "shortened" sequence showed the value of the

higher rate group. The more precise experiment in

this region, i.e. Experiment C, showed that the time

interval of 330 ms formed the boundary between

these two categories.

The results of other experiments in the literature

with different methods and purposes are consistent

with those of the present experiments. In percep-

tion of nearly identical temporal spacing, the pre-

diction-comparison routine can be interpreted as

a "rehearsal" of the sequence. In reproduction of

the temporal sequence, on the other hand, the

reproduction in time with the stimulus sequence

can be interpreted as a "shadowing" of the sequence.

Various authors described that we are able to re-

hearse or shadow the time sequence up to 3 to 6

times per second.12) From the time interval mea-

surement view, every "rehearsal" or "shadowing"

requires a time interval of about 170 to 330 ms,

which would be interpreted as the time interval

needed in order to read out an output from the

short-term memory and then to rewrite it into the

same memory for the rehearsal routine, and as that

needed in order to read out an output from the

short-term memory and then to effect an operation

for the shadowing routine.

5. CONCLUDING REMARKS

There were three specific aims in the present study.They were:1) to know how greatly a temporal distortion must

intervene in uniformly spaced members ofsounds in order for a listener to be able toreport an irregularity,

2) to know how accurately one can produce aregular temporal sequence, and

3) to explore an identical timing-related thresholdwhich would be common both in perceptionand production of rhythm.

First, as concerns 1), it was found that the degreeof distortion which afforded 50% judgments ofdetection varied dependently upon the rates ofsuccession of the sequence. Weber's law was,however, found to hold approximately for dis-crimination of irregularity in sequences in theregions of the lower and higher rates of succession.In Experiment A, the mean value of D50S amongthe lower rates of succession, i.e. 1.0 to 2.5 times

per second, was 6.1%. The mean values were 8.9%and 8.2% for the "lengthened" and the "shortened,"respectively, in the regions of the higher rates ofsuccession (i.e. 4.0 to 7.0 times per second). InExperiment B, the mean values were 6.0% and 7.6%in the regions of the lower and higher rates of suc-cession, respectively. We may, at this stage, assumethat there are two types of processing mechanismswhich govern the perception of the temporal se-

quence.Second, as concerns 2), it was found in Experi-

ment D that it is possible to reproduce the regulartemporal sequence synchronously with the stimulussignals. The variabilities of the reproduced timeintervals were found to be 3 to 9% of their meanintervals, almost independent of the rate of succes-sion of the stimulus sequence. The autocorrelationstudy showed that an adjusting mechanism workedonly in the region of the lower rates of succession

(i.e. 1.0 to 3.0 times per second). Here again, wemay assume that there are two types of processingmechanism which govern the perception of thetemporal sequence; one allows the adjusting mecha-nism to work in reproduction of the sequence andthe other does not.

Third, as concerns 3), it was found in ExperimentC that a time interval of about 330 ms was a thresh-old which distinguished one region from another.

From the results of the present study, we maytentatively conclude that the ongoing processingmechanism works in the region of rates slower than3 times per second and the wholistic processingmechanism works, on the contrary, in the region ofrates more rapid than 3 times per second.

The present study has revealed some character-istics of man's ability in perception and in repro-duction of the temporal sequence which have notbeen explicitly shown so far.

REFERENCES

1) P. Fraisse, The Psychology of Time (Harper andRow, New York, 1963).

2) H. Woodrow, "Time Perception," Handbook ofExperimental Psychology, ed. by S. S. Stevens

(Wiley, New York, 1951), pp.1224-1236.3) G. D. Allen, "Speech rhythm: Its relation to

performance universals and articulatory timing,"J. Phonetics 3, 75-86. (1975).

4) I. Abe, "English sentence rhythm and synchro-nism," Bull. Phonetic Soc. Jpn. 125, 9-11 (1967).

5) G. D. Allen, "The location of rhythmic stress

94

S. HIBI: RHYTHM PERCEPTION IN REPETITIVE SOUND SEQUENCE

beats in English speech, Part I and II," LanguageSpeech 15, 72-100, 179-195 (1972).

6) A. Malecot, R. Johnston, and P. A. Kizziar, "Syl-labic rate and utterance length in French," Pho-netica 26, 235-251 (1972).

7) J. Michon, "Timing in temporal tracking," Inst.for Perception, Soesterberg, the Netherlands (1967).

8) R. Teranishi, "Some features of synchronizedshadowing to rhythmical acoustic sequence," Re-

ports of the 1979 Autumn Meeting of the Acoust.Soc. Jpn., 595-596 (1979).

9) S. Hibi, "On the rhythm patterns of repetitive ut-

terances," Ann. Bull. RILP 14, 85-90 (1980).10) J. G. Martin, "Rhythmic (hierarchical) versus serial

structure in speech and other behavior," Psychol.Rev. 79, 487-507 (1972).

11) M. Halle and K. S. Stevens, "Speech recognition:A model and a program for research," The Struc-ture of Language: Reading in the Philosophy ofLanguage, ed. by J. A. Fordor and J. J. Kats(Prentice Hall, Englewood Cliffs, N. J., 1964).

12) D. A. Norman, Human Information Processing(Academic Press, New York, 1977).

95


Recommended