Date post: | 02-Oct-2015 |
Category: |
Documents |
Upload: | ransompaycheque |
View: | 27 times |
Download: | 3 times |
University of IowaIowa Research Online
Theses and Dissertations
2010
Musical time and information theory entropySarah Elizabeth CulpepperUniversity of Iowa
Copyright 2010 Sarah Elizabeth Culpepper
This dissertation is available at Iowa Research Online: http://ir.uiowa.edu/etd/659
Follow this and additional works at: http://ir.uiowa.edu/etd
Part of the Music Commons
Recommended CitationCulpepper, Sarah Elizabeth. "Musical time and information theory entropy." MA (Master of Arts) thesis, University of Iowa, 2010.http://ir.uiowa.edu/etd/659.
MUSICAL TIME AND INFORMATION THEORY ENTROPY
by
Sarah Elizabeth Culpepper
A thesis submitted in partial fulfillment of the requirements for the Master of
Arts degree in Music in the Graduate College of
The University of Iowa
July 2010
Thesis Supervisor: Assistant Professor Robert C. Cook
Graduate College The University of Iowa
Iowa City, Iowa
CERTIFICATE OF APPROVAL
_______________________
MASTER'S THESIS
_______________
This is to certify that the Master's thesis of
Sarah Elizabeth Culpepper
has been approved by the Examining Committee for the thesis requirement for the Master of Arts degree in Music at the July 2010 graduation.
Thesis Committee: ___________________________________ Robert C. Cook, Thesis Supervisor
___________________________________
Nicole Biamonte
___________________________________
Jerry Cain
ii
Is their wish so unique To anthropomorphize the inanimate
With a love that masquerades as pure technique?
Donald Justice Nostalgia of the Lakefronts
iii
TABLE OF CONTENTS
LIST OF TABLES ............................................................................................................. iv
LIST OF FIGURES ........................................................................................................... vi
CHAPTER
I. INTRODUCTION ............................................................................................1
II. INFORMATION THEORY ENTROPY ..........................................................6
III. EXISTING MUSIC-THEORETIC SCHOLARSHIP ON INFORMATION THEORY ENTROPY ........................................................21
IV. ALPHABETS FOR ENTROPY-BASED ANALYSIS ..................................36
Interval Entropy ..............................................................................................36 CSEG Entropy ................................................................................................46 PC-Set Entropy ...............................................................................................61
V. INFORMATION AND TIME ........................................................................67
VI. ANALYSES ...................................................................................................80
Op. 16, no. 1: Christus factus est ................................................................80 Op. 5, no. 4 .....................................................................................................97
VII. CONCLUSION.............................................................................................115
BIBLIOGRAPHY ............................................................................................................117
iv
LIST OF TABLES
Table
3.1. Pitch entropies from Youngblood .............................................................................21
4.1. Pitch entropies in Webern works, compared with Babbitt and Schubert .................36
4.2. Interval class entropies comparing serial and non-serial works ...............................38
4.3. Vertical and horizontal entropy on one serial and one non-serial work ...................39
4.4. Registrally-ordered interval class entropy in Webern and Babbitt ...........................40
4.5. Ordered directional interval class entropy in serial and non-serial works ................42
4.6. Interval entropy in Webern and Babbitt ...................................................................44
4.7. CSEG entropies for random and motivic strings ......................................................49
4.8. CSEG entropies, random string versus Webern, op. 5, no. 1 ...................................51
4.9. CSEGs, random string versus Webern, op. 5, no. 1 .................................................52
4.10. CSEG entropies, op. 5, no. 1, versus op. 18 .............................................................54
4.11. CSEG entropies, op. 5, no. 1, versus op. 15 .............................................................56
4.12. CSEG entropies for serial works ..............................................................................60
4.13. Pc-set entropies in op. 16 and op. 25 using discrete segmentation algorithm ..........62
4.14. Pc-set entropies in op. 16 and op. 25 using window algorithm ................................64
4.15. Vertical pc-set entropy in op. 16 and op. 25, no. 1 ...................................................65
6.1. Pitch class entropy in op. 16, no. 1 ...........................................................................87
6.2. Pitch class entropy in the vocal line of op. 16, no. 1. ...............................................88
6.3. CSEG entropies in op. 16, no. 1 ...............................................................................89
6.4. Interval class entropy in op. 16, no. 1 .......................................................................91
6.5. Discrete pc-set entropies in op. 16, no. 1 ..................................................................91
6.6. Pitch entropy in sections of op. 5, no. 4 ..................................................................104
6.7. Interval class entropies in op. 5, no. 4 ....................................................................105
6.8. Discrete pc-set entropies in op. 5, no. 4 ..................................................................106
v
6.9. Pitch entropy in op. 5, no. 4, A and B.....................................................................108
6.10. Interval entropies in op. 5, no. 4, A and B ..............................................................109
vi
LIST OF FIGURES
Figure
2.1. A corrupted tonal work .........................................................................................16
2.2. A corrupted contextual work .................................................................................16
3.1. 95% confidence intervals for Youngbloods entropy calculations ...........................23
3.2. Passage with pitch class entropy 2.52 .......................................................................28
3.3. Passage with pitch class entropy 2.52 .......................................................................28
4.1. Interval class entropies comparing serial and non-serial works ...............................38
4.2. Registrally-ordered interval class entropy in Webern and Babbitt ...........................41
4.3. Ordered directional interval class entropy in serial and non-serial works ...............42
4.4. Interval entropy in Webern and Babbitt ...................................................................44
4.5. A randomly-generated string of pitches....................................................................48
4.6. A motivic string of pitches........................................................................................49
4.7. CSEG entropies for random and motivic strings ......................................................50
4.8. CSEG entropies, random string versus Webern, op. 5, no. 1 ...................................51
4.9. CSEGs, random string versus Webern, op. 5, no. 1 .................................................53
4.10. CSEG entropies, op. 5, no. 1, versus op. 18 .............................................................55
4.11. CSEG entropies, op. 5, no. 1, versus op. 15 .............................................................56
4.12. Melody generated using the CSEG distribution of Webern, op. 5, no. 1 .................57
4.13. Melody generated using the CSEG distribution of a string of random pitches. .......58
4.14. CSEG entropies for serial works ..............................................................................60
4.15. Op. 27, no. 1, mm. 20-21 ..........................................................................................61
4.16. Pc-set entropies in op. 16 and op. 25 using discrete segmentation algorithm ..........63
4.17. Pc-set entropies in op. 16 and op. 25 using window algorithm ................................64
4.18. Vertical pc-set entropy in op. 16 and op. 25, no. 1 ...................................................65
6.1. Vertical ic1s in op. 16, no. 1 ....................................................................................83
vii
6.2. Pitch class entropy in op. 16, no. 1 ...........................................................................88
6.3. Pitch class entropy in the vocal line of op. 16, no. 1 ................................................89
6.4. CSEG entropies in op. 16, no. 1 ...............................................................................90
6.5. Interval class entropy in op. 16, no. 1 .......................................................................91
6.6. Discrete pc-set entropies in op. 16, no. 1 ..................................................................92
6.7. Lewins depiction of the three flyaway motives. ......................................................98
6.8. Pc-set analysis of op. 5, no. 4 .................................................................................100
6.9. Clampitts analysis of op. 5, no. 4, mm. 1-6 ...........................................................101
6.10. Pitch entropy in op. 5, no. 4 ....................................................................................104
6.11. Interval class entropy in op. 5, no. 4. ......................................................................105
6.12. Registrally-ordered interval class entropy in op. 5, no. 4 .......................................106
6.13. Discrete pc-set entropies in op. 5, no. 4 ..................................................................107
6.14. Pitch entropy in op. 5, no. 4, A and B.....................................................................108
6.15. Interval class entropy in op. 5, no. 4, A and B........................................................109
6.16. Registrally-ordered interval class entropy in op. 5, no. 4, A and B ........................110
1
CHAPTER I INTRODUCTION
In the conclusion of The Time of Music (1988), Jonathan Kramer gives two anecdotes of his personal experiences with what he calls musical timelessness. The first
recalls a performance of the middle movement of Satie's Pages mystiques, a collection of
phrases repeated 840 times in succession:
For a brief time I felt myself getting bored, becoming imprisoned by a hopelessly repetitious piece. Time was getting slower and slower, threatening to stop. But then I found myself moving into a different listening mode. I was entering the vertical time of the piece. My present
expanded, as I forgot about the music's past and future.... After what seemed forty minutes I left. My watch told me that I had listened for three hours. I felt exhilarated, refreshed, renewed.1
The second anecdote concerns the opposite condition, a happening dense enough to
induce sensory overload:
The production began at 7:00 p.m. The noise level was consistently high, and the visual panorama was dizzying. I found myself, although performing, focusing my attention on one layer, then another, and then various combinations of layers.... After what seemed to be a couple of hours, everyone spontaneously agreed that it was time to stop... I loaded my tape and slides into my car. Only then did I glance at my watch. It was not yet 8:00! What had seemed like a two-hour performance must have lasted under 25 minutes by the clock.2
Kramer attributes the disparity between these temporal experiences to the amount and
density of information each performance contained. Music that is predictable and easily
chunked, he argues, takes up less mental storage space and seems shorter than music
that is less predictable; Thus a two-minute pop tune will probably seem shorter than a
1 Jonathan Kramer, The Time of Music (New York: Schirmer, 1988): 379. 2 Ibid., 380.
2
two-minute Webern movement.3
The connection between musical predictability and perception of musical time is a
common one. Kramer characterizes musical temporalities as directed, multiply-directed,
and non-directed based on their movement towards a predictable goal.4 Re-ordered
temporal progressions, such as the misplaced closing gestures Levy finds in Haydn and
the evolving themes Hatten finds in Beethoven, draw power from their violation of
listener expectations.5 Although complicating factors abound the audience's familiarity
with a musical idiom; tendency to disengage from overly predictable works; how
comfortable the chairs are the existence of a connection between time and predictability
is clear.
This thesis examines the relationship between time and predictability through the
lens of information theory entropy. Just as traditional entropy speaks to the degree of
randomness in a system, information theory entropy speaks to the randomness of a
message or, alternately, to that message's predictability. Although information theory
entropy was initially developed to determine the most efficient way to encode a message
for radio transmission, it has since been adopted as an analytical tool by a variety of
fields, including linguistics, literary criticism, and music theory.
In particular, information theory entropy seems relevant to Webern's music.
Adorno refers to Webern's work as possessing a skeletal simplicity, a comparative
economy of musical materials that seems well-suited for analysis in terms of information
3 Ibid., 337.
4 Ibid., 16ff.
5 Janet Levy, Gesture, Form, and Syntax in Haydn's Music, in Haydn Studies: Proceedings of the International Haydn Conference (New York: Norton, 1981), 355-362; Robert Hatten, The Troping of Temporality in Music, in Approaches to Meaning in Music, ed. Byron Almen and Edward Pearsall (Bloomington: Indiana University Press, 2006), 66ff.
3
theory in the sense that no pitch or gesture seems superfluous or reducible, as though its
omission would not have a marked effect on the passage, or as though it had only been
added to fill space before the beginning of the next phrase.6 (In Adornos words: Every single note in Webern fairly crackles with meaning.7) Literary applications of information theory entropy speak meaningfully to this economy as a feature of poetry, as
will be shown in a later chapter; I believe entropy can speak to these same qualities in
Webern's work.
Webern's music is also of interest to this project because of the relationship between information content and the listener's perception of time, as will be discussed in
chapter 5. Certainly perception of time is salient to analysis of Weberns work. As
Stockhausen writes, If we realise, at the end of a piece of music... that we have 'lost all
sense of time', then we have in fact been experiencing time most strongly. This is how we
always react to Webern's music.8 In a different vein, Ligeti describes Webern's music as
the spatialization of time.9 Perception and analysis of time in Webern is, at the very least,
complicated, but entropy provides a useful metaphor for its description and a useful tool
for its examination.
In the 2009 article Number Fetishism, Vanessa Hawes criticizes music-theoretic
use of information theory as... well, as number fetishism: as a component of the claim
6 Theodor Adorno, The Aging of the New Music, in Essays on Music, ed. Richard Leppert, trans. Susan Gillespie (Berkeley, Los Angeles: University of California Press, 2002), 187.
7 Theodor Adorno, Quasi una Fantasia: Essays on Modern Music, trans. Rodney Livingstone (New York: Verso, 1998), 180. 8
Karlheinz Stockhausen, Structure and Experiential Time, Die Reihe 2 (Bryn Mawr, PA: Presser, 1959), 65.
9 Gyorgy Ligeti, Metamorphoses of Musical Form, Die Reihe 7 (Bryn Mawr, PA: Presser, 1965), 16.
4
that music theorists can consider themselves scientists who refute or uphold hypotheses
based on empirical evidence, a notion she depicts as quaint and outdated.10 Indeed, early
uses of information theory often relied upon questionable assumptions, as Hessert (1971) claims, and were often divorced from diachronic perception of music.11 Nevertheless,
insofar as information theory entropy measures predictability a very salient factor in
diachronic perception of music it can be a relevant lens for the examination of musical
time.
Using information theory to quantify subjective musical temporality would be questionable indeed, but using information theory to analyze and discuss temporality
seems much less problematic. Writing about traditional entropy, Eddington clarifies the
situation:
Suppose that we were asked to arrange the following in two categories distance, mass, electric force, entropy, beauty, melody....
I think there are the strongest grounds for placing entropy alongside beauty and melody, and not with the first three. Entropy is only found when the parts are viewed in association, and it is by viewing or hearing the parts in association that beauty and melody are discerned. All three are features of arrangement. It is a pregnant thought that one of these three associates should be able to figure as a commonplace quantity of science. The reason why this stranger can pass itself off among the aborigines of the physical world is that it is able to speak their language, viz., the language of arithmetic.12
Entropy is discussed in terms of number but is not the fetishism of number; rather, it is a
10 Vanessa Hawes, Number Fetishism: The History of the Use of Information Theory as a Tool for Musical Analysis, in Music's Intellectual History 2009, ed. Zdravko Blazekovic and Barbara Dobbs Mackenzie (New York: RILM, 2009), esp. 836-838.
11 Norman Hessert, The Use of Information Theory in Musical Analysis (Ph.D diss., Indiana University, 1971).
12 A. Eddington, The Nature of the Physical World (Ann Arbor: University of Michigan Press, 1935), 105.
5
powerful and elegant principle that can be expressed quantitatively. Similarly,
information theory entropy need not be a formula divorced from musical experience, but
can instead be a analytical tool and metaphor for the discussion of something deeply
experiential and even as Meyer (1957) claims a way to approach the question of musical meaning.13
This thesis begins with an explication of information theory entropy (chapter 2) and a history of its use in music theory (chapter 3). In chapter 4, a variety of alternative approaches to entropy are developed, including entropy calculations based on CSEGs and
pc-sets (as opposed to single pitch classes). Chapter 5 makes a more in-depth argument for the relationship between information theory entropy and time, recasting analyses of
temporality in Webern in terms of entropy. Finally, in chapter 6, information theory
entropy will be used to analyze time in the first of the Fnf Canons, op. 16, and the fourth of the Fnf Stze, op. 5 two movements in which form is created by perceptible shifts among differing depictions of temporality, shifts prompted by varying degrees of
predictability in a variety of musical domains.
13 Leonard Meyer, Meaning in Music and Information Theory, Journal of Aesthetics and Art Criticism 15, no. 4 (1957): 412-424.
6
CHAPTER II INFORMATION THEORY ENTROPY
Information theory entropy is based on the idea that in most alphabets, some
letters communicate more information than other letters do, because they occur less
frequently. If a word has been corrupted during transmission and all that remains is q _ _
_ k, the recipient can easily guess what the original word was, since there are very few
words that contain both a q and a k. By contrast, if all that remains of the word is _ _ i c
_, the original word is much more difficult to guess. Since q and k are uncommon, they
communicate more information about the original message than more common letters
can.14
In general, the more unequal the frequencies of letters in an alphabet are, the
easier it is to determine what letters have been corrupted. If an alphabet only has two
letters, A and B, but the former occurs 90% of the time and the latter occurs 10% of the
time, the message recipient has an excellent chance of guessing any letters that have been
corrupted (since there is a 90% chance any given letter will be an A). By contrast, if A and B appear 50% of the time, our ability to guess a missing letter is diminished.
From the perspective of a person sending a telegram, the former language is very
inefficient. Assume, for simplicity, any message in this language must contain exactly
90% As and 10% Bs (although in a real language, these would be averages). If the
14 Some more in-depth sources on information theory entropy:
A. Khinchin, Mathematical Foundations of Information Theory (New York: Dover, 1957); Abraham Moles, Information Theory and Esthetic Perception, trans. Joel Cohen (Urbana and London: University of Illinois Press, 1966); Lawrence Rosenfield, Aristotle and Information Theory (Paris, The Hague: Mouton, 1971); Claude Shannon, A Mathematical Theory of Communication, Bell System Technical Journal 27 (1948), 379-423; Claude Shannon and Warren Weaver, A Mathematical Model of Communication (Urbana: University of Illinois Press, 1949). Information in this chapter is drawn heavily from these sources, as well as from the music-theoretic sources cited in Chapter 3.
7
transmitter is limited to ten characters, there are exactly ten words s/he can send:
AAAAAAAAAB, AAAAAAAABA, AAAAAAABAAA, and so on. The letter A is so
common that it is practically meaningless; only the position of the less common letter
differentiates between words, but it occurs very rarely. By contrast, in a language that is
50% A and 50% B, the transmitter would have 2^10 or 1024 word choices. By creating
an alphabet in which all letters occur with the same frequency, the efficiency of
transmission is maximized.
Of course, in addition to being more efficient, the latter alphabet is less resistant
to corruption. Ideally, one must find a balance between the most efficient language
possible and the most robust language possible, to be sure that the message arrives to its
recipient intact but without wasting time or resources during transmission. Finding this
balance generally for the purpose of data compression or encoding was one of the
first goals of the field of information theory, pioneered by Bell Labs engineer Claude
Shannon in the late 1940s.
The inequality of the amount of information contributed by each letter in an
alphabet is called the Shannon entropy of that alphabet. If Shannon entropy is low, the
language is inefficient but robust; a few letters occur very frequently and the rest are
uncommon. If Shannon entropy is high, the language is efficient; each letter occurs with
roughly the same frequency and therefore each letter conveys the most information
possible.
The Shannon entropy of a message or an alphabet is given by the following
formula:
Here, p(x) is the probability that a given event occurs; p(x=6) denotes the probability that
8
a randomly selected pitch will be an F#, for example.
The example of bits illustrates the purpose of the logarithm in this formula. Each
bit presents two choices; given six bits, the number of combinations that can be
communicated is two to the sixth power. The entropy formula can seen as taking the
number of possible choices (here, expressed in terms of probability) and returning the number of bits that would be necessary to communicate that much information.15
(Log base two is necessary to express these results in terms of bits. Another log base would create meaningful data if used consistently, but these data would be in terms of
other units of measurement.) Effectively, the use of logarithms in this formula ensures that the highest entropy is created when each possible outcome has an equal probability
of occurring, and that the lowest entropy is created when one event has a very high
probability of occurring. Consider an alphabet that has one letter, A, that occurs 100% of
the time. The entropy for this language is
that is, since we are absolutely certain every letter will be an A, the language has an
uncertainty of zero and an entropy of zero. The closer any probability gets to 1, the
smaller the language's entropy becomes. For example, if this language had three letters
instead, in which A occurred 98% of the time, and B and C each occurred 1% of the time,
the entropy of the language would be
15 See Khinchin or Shannon and Weaver for more information.
9
The logarithmic expression makes the contribution of the first term very small, whereas
the small probabilities make the contributions of the second and third terms very small as
well. By contrast, if each option occurs with roughly equal frequency, the entropy of the
language is
which is the highest possible entropy for an alphabet with three letters. Of course, the
more letters in an alphabet, the higher the maximal entropy becomes. If this same
equally-weighted alphabet had eight letters, its entropy would be
An alphabet with twenty-six letters has a maximal entropy of 4.7; an alphabet with a
hundred letters has a maximal entropy of 6.64.
It is clear from these examples that entropy is most useful for comparisons. The
claim that an alphabet with a hundred letters has a maximal entropy of 6.64 is not terribly
meaningful on its own; it only takes on meaning when paired with the statement that an
alphabet with three letters has a maximal entropy of 1.58, or with other entropy
calculations from hundred-letter alphabets.
To allow more meaningful comparisons between entropies of alphabets with
different cardinalities, we introduce the concept of relative entropy, which expresses
entropy values (as computed above) as a percentage of the maximal possible entropy for an alphabet of that cardinality. For example, the relative entropies of the cardinality three
and cardinality eight alphabets discussed above are
10
and
respectively. Thus, we can think of these two alphabets as having equivalent entropies,
even if their absolute entropies are not equal.
Relative entropy also allows entropy calculations to reflect unused letters in a
passage. Intuitively, a passage of English text that uses only thirteen letters should not
have the same entropy as a passage of Hawaiian. One imagines the former would seem
more stilted, more restricted than the latter, since a listener would hear it in the context of
a twenty-six letter alphabet, rather than a thirteen-letter alphabet. Similarly, a piece that
only uses the pitches C, C#, Eb, G#, A, A#, and B with given frequencies is very different
from a passage of chant that uses each of its seven tones with the same frequencies as the
above piece. While it is likely that the former piece will be heard as using a restricted
subset of a twelve-pitch alphabet, the latter piece exhausts its alphabet and would not be
heard as restricted in its materials in the same way as the former. The traditional entropy
formula is unable to reflect this distinction, because any unused letters carry with them a
probability of 0, effectively canceling out any entropic contribution from those letters, but
these unused letters are relevant to the computation of relative entropy through their
conclusion in the maximal entropy for an alphabet of a given cardinality.
Nevertheless, the use of relative entropy requires caution. A piece of music that
11
uses three pitches with equal frequency is much more predictable, mathematically and
aurally, than a piece of music that uses twelve pitches with equal frequencies, even
though their relative frequencies are equal. In other words, although relative entropy
allows for comparison between alphabets of different cardinalities, such a comparison
must always be considered alongside the alphabets respective absolute entropies. In this
paper, relative entropy will only be invoked in the presence of corresponding absolute
entropy figures or some sort of intuitive justification for hearing these alphabets as perceptually similar.
It is also clear from these examples that the entropy of a message at least,
entropy computed on the literal letters of an alphabet is indepedent of the meaning of
that message. Entropy only reflects characteristics of the language in which that message
is written or encoded. However, the meaning of a message may become relevant if the
'alphabet' in question is not a literal alphabet. For years, literary critics especially
modernists and post-modernists, and in particular those interested in the work of Thomas
Pynchon have completed information theory analyses of texts using words or images,
instead of literal letters, as the letters of an alphabet. In this case, the most commonly
occurring letters are connective words like articles and prepositions. Consider, for
example, a corrupted block of text from F. Scott Fitzgerald's The Great Gatsby, from
which every eighth word has been removed:
When we pulled out into the winter and the real snow, our snow, began stretch out beside us and twinkle against windows, and the dim lights of small Wisconsin echoed by, a sharp wild brace came into the air. We drew in deep of it as we walked back from through the cold vestibules,
unutterably aware of identity with this country for one hour before we melted indistinguishably into it again.
Although the result is disjointed in places, it is certainly still intelligible; in places the reader cannot tell the message has been corrupted at all. Every image found in this
12
excerpt is repeated; if the word 'winter' were corrupted, 'snow' and 'cold' would still
convey its meaning. Additionally, the passage contains many connective words and
when these words are missing (as in, our snow began stretch out beside us) the blanks can be filled in easily. We conclude that this passage has low word-based entropy,
regardless of any entropy figures computed on the basis of individual letter frequency.
For comparison, an excerpt from Flann O'Brien's At Swim-Two-Birds (considered the first Irish post-modern novel) has been similarly corrupted below.
I will relate, said Finn. Till a man accomplished twelve books of poetry, the same is not for want of poetry but is forced away. man is taken till a black hole is in the world to the depth of his oxters and he put into it to gaze it with his lonely head and nothing to but his shield and a stick of. Then must nine warriors fly their at him, one with the other and together.
From this, we can gather that we are listening to a narrator named Finn; word repetition
clues us in that poetry and war are somehow involved, but there is little else we can say
about this passage. This same lack of repetition makes the original, non-corrupted
passage more difficult to understand than the non-corrupted Fitzgerald.
I will relate, said Finn. Till a man has accomplished twelve books of poetry, the same is not taken for want of poetry but is forced away. No man is taken till a black hole is hollowed in the world to the depth of his two oxters and he put into it to gaze from it with his lonely head and nothing to him but his shield and a stick of hazel. Then must nine warriors fly their spears at him, one with the other and together.
In examining the original passage, we are in fact examining different sources of
corruption. The Fitzgerald is robust against the corruption of readers lacking context, or
readers being sleepy; in the presence of these forms of corruption the passage is still
readable. The O'Brien is much less robust by comparison. We conclude the passage has
higher entropy.
13
Alternately, we can conclude that the O'Brien passage is more efficient than the
Fitzgerald, since each individual word communicates more information. If the reader can
easily guess the meaning of a missing word, as in the Fitzgerald, then that word has a
very low information content; with these words removed, the passage becomes less
elegant but not much less intelligible. This is the same quality that makes this passage
easy to summarize. Since removing words from the O'Brien limits the reader's ability to
comprehend the passage, we can conclude that the missing words had a higher
information content that overall, there are fewer redundant or repeated words, and that
therefore, the O'Brien is a more efficient communication.
Finally, we examine a passage from Todtnauberg by Paul Celan.16
Arnica, Eyebright, the drink from the with the star-die on top,
in the
into the book whose name did it in before mine? the line written into this about a hope, today, for a thinker's (un- coming) word in the
The result is nearly unintelligible; the reader cannot guess the original narrator, subject, or purpose of this passage. What remains is interpretable, certainly, but the reader cannot
16 Although this passage is shorter than the others, the same percentage of words has been removed in each case.
14
be certain of the original text based on this except. Consequently, this passage has high
entropy.
As expected, lack of repeated words and lack of connective words contribute to
higher entropy. Shared meaning contributes to lower entropy as well, as seen in the
Fitzgerald example dealing with 'snow,' 'winter,' and 'cold.' From these examples, though,
we can also see that clear syntactical structures reduce entropy. If the reader can perceive
the sentence structure underlying We drew in deep ____ of it as we walked back from
____ through the cold vestibules, the reader can make more educated guesses as to what
the missing words could be. The second missing word appears to be some sort of place;
the first missing word is a noun that can be an object of the verb 'to draw in,' so perhaps the missing word is 'breaths' or 'gasps' or something along those lines. The general import
of the sentence is still clear. Similarly, in the O'Brien sentence Then must nine warriors
fly their ____ at him, as long as the reader can parse that warriors are throwing things at
an unhappy target, the meaning of the sentence is clear.
By contrast, poetry especially the works of Paul Celan is characterized by
economy of words and imagery, in that every word contributes a great deal of meaning to
a passage. This is why the Celan example is the least intelligible of the above: there are
few redundant words, and the associations between words are specifically designed to be
unexpected and novel. In other words, each word is intended to convey the greatest
possible amount of information.
One could conceptualize this new, more meaning-sensitive interpretation of
entropy as occurring on a higher level than entropy computed based on literal letters of an
alphabet. If this Fitzgerald sample were encoded in a different alphabet if it were
written in binary, or encrypted for secure transmission without changing its vocabulary,
its low-level, alphabet-based entropy would be quite different but its higher-level, word-
15
based entropy would be the same. To achieve a word-based equivalent to encryption, one
would need a paraphrase of this text by another author, or a similar text that
communicates the same images (snowfall; evening; solitude) or the same themes (introspection; nostalgia; the notion that a persons actions and mindsets are influenced by that persons home17) using thriftier vocabulary.
These corrupted blocks of text can be seen as analogous to hearing music in a
static-filled radio broadcast. Listening to a Haydn string quartet in such a broadcast, one
would still be able to identify the key, the time signature, and the instrumentation; one
could make an educated guess as to which movement the quartet was playing, and
probably one could even hum the missing notes. By contrast, listening to such a broadcast
of the Webern Concerto, op. 24, one might not even be able to determine the
orchestration of the piece, let alone guess the missing notes. One can imagine a similar
corruption of the original musical signal being created by a poor ensemble; in this
situation, the Haydn can be considered to have a higher entropy because ensemble
mistakes, whether wrong notes or dynamic mismatches or harmonic misalignments, are
generally much more recognizable than the corresponding mistakes would be in the
Webern. Because the listener is (usually) able to form more confident predictions for upcoming events in the Haydn, violations of these predictions (including mistakes) are more striking.18
Alternately, consider the (comically) corrupted piece of music shown in Figure 2.1.
17 From the next paragraph: I see now that this has been a story of the West, after all Tom and Gatsby, Daisy and Jordan and I, were all Westerners, and perhaps we possessed some deficiency in common which made us subtly unadaptable to Eastern life.
18 This is a generalization, of course. Many Webern compositions can be considered to have low entropy in terms of dynamics in which case a mistake in terms of dynamics would be immediately recognizable as such.
16
Figure 2.1: A corrupted tonal work
Despite the corruption, the identity of this piece is readily apparent. Even a listener who
had never heard this piece before could make a reasonable guess at every missing note,
based on typical harmonic progressions, repetition, and motive. By comparison, a
similarly corrupted, non-tonal work, shown in Figure 2.2, is less easy to identify.
Figure 2.2: A corrupted contextual work
17
A listener already familiar with the piece might be able to identify this as the third
movement of Webern's Variations for Piano, op. 27, but a listener unfamiliar with the
piece would not even be able to guess which of the corrupted objects were pitches and which were rests. A listener who expects a serial work based on a derived row may be
able to fill in the blanks surmising in retrospect that the first missing pitch must be a
Bb, creating the ordered interval series to match the of the inverted row
form beginning in m. 5 but probably not on first hearing without a score, and certainly
not as readily as in the Bach. In other words, the second work is more efficient, more
condensed. Because the missing pitches cannot be determined easily based on the
surrounding material, these pitches carry a high information content.
Other potential sources of corruption beyond literal transmission factors like
radio static, a corrupted score, or poor acoustics, and figurative transmission factors like
poor performance raise larger questions about the nature of entropy in music. One can
interpret an imprecise piano reduction of an orchestral work as a corruption of that
orchestral work, in roughly the same way one could consider a poorly executed English
translation of a German text a corruption of the original. However, if one considers
corruption as something that can happen within the music itself, as opposed to something
imposed upon the music by external factors (things like radio static or performers mistakes), it becomes difficult to decide which musical features are the original signal and which are corruption: is a theme an original signal and its variations corruption? is
the original A section of a ternary form an original signal and its altered A corruption?
Since entropy is defined as a messages ability to resist corruption, what can entropy be
said to measure in these cases? It may be meaningful to say that a theme resists
variation or that a melody resists ornamentation, if the former is not very memorable
or if the latter is already very elaborate, but these states may or may not coincide with
18
entropy figures generated for these passages (in that a very elaborate melody may still be very predictable and therefore have a low entropy, for example).
More to the point, this approach makes questionable implications about the nature
of musical meaning in such a work. Is it reasonable to consider a Stokowski transcription
as necessarily subsidiary to the work it transcribes, as opposed to an independent work in
its own right even if the aesthetic of the transcription is meaningfully different from the
aesthetic of the original? If so, is it still reasonable to consider a Webern transcription of
Bach, or for that matter a Wendy Carlos performance of Bach, in the same light? In cases
of music not governed by a score, which performance is the canonical performance and
which is the corrupted performance?
Meyer also raises the issue of cultural noise: corruption that occurs in
transmission as the result of a time-lag between the habit responses which the audience
actually possess and those which the more adventurous composer envisages for it.19 This
can be understood as avant-garde music whose language an audience has not yet
internalized, or as pre-modern music heard differently by modern or post-modern
audiences. In this case, the music is not corrupted by any external factors, but the
audience's perception is; the issue is not signal transmission, but signal reception.
It seems most reasonable, for the purposes of this project, to consider each score as an uncorrupted signal, accepting publisher and performer mistakes as corruption but
accepting changes that arise through arrangement as part of an original signal. (That is to say, this project accepts Shelleys philosophy of translation: that a translation is or should be a new artwork unto itself rather than a derivative work dependent upon an original.20)
19 Meyer, Meaning in Music and Information Theory, 420.
20 Percy Shelley, A Defence of Poetry and Other Essays (1840; Project Gutenberg, 2005), http://www.gutenberg.org/etext/5428. See Part I.
19
The issue of cultural noise is important, because it is important in every work of analysis;
an information-theoretic analysis cannot assume an audience will hear a work the way an
ideal listener would, but nor can any other kind of analysis that wishes to reflect a
practical perceptual reality.
In any case, the factors that lead to high or low entropy in a musical example are
the same as in the excerpted Fitzgerald, O'Brien, and Celan texts. If we analyzed these
texts using literal letters as an alphabet, we would be able to identify the texts as English,
and we would probably be able to make general statements about the author's style for
example, one could determine the average entropy for a passage saturated with Latinate
vocabulary and the average entropy for Anglo-Saxon vocabulary, based on which letters
occur the most frequently and which letters do not occur at all (such as w and j in Latin), and from this make predictions about the loftiness or folksiness of the author's writing
style. Similarly, if we accept pitch as an alphabet, we can make predictions about how
diatonic or how chromatic a musical excerpt is, based on which pitches occur the most
frequently. However, loftiness of vocabulary does not result from avoiding the letters w
and j, any more than tonality results from using scale degrees 1 and 5 frequently. Low entropy (on a pitch-by-pitch basis) is generally symptomatic of tonality, but does not speak to the harmonic progressions that bring tonality into being.
It may be inappropriate to claim that entropy created by pitches is directly
analogous to low-level, letter-based entropy in text. In some contexts a pitch may be
operating as a part of a word (for example, a single pitch within an arpeggiation), while in other contexts that pitch may be a word unto itself. For this reason, pitch-based entropy
may be more relevant to musical analysis than letter-based entropy is to literary analysis.
Nevertheless, it seems reasonable to claim that the analysis of more complex musical
alphabets may strengthen the link between musical style or predictability and entropy
20
calculations, creating something more broadly comparable to word-based entropy in
text. In both music and in text, entropy (as perceived intuitively by the listener or reader) is lowered by the presence of connective material (arpeggiations, passing tones, parsimonious voice leading), repetition (motivic material, canons, imitation), and larger structures (a T-P-D-T phrase structure, a serial row). If alphabets are built that can address the existence or nonexistence of these elements and structures, a more intuitive
interpretation of entropy will result.
Generally speaking, entropy is less of a commentary on musical meaning than it is
a commentary on musical style, and the degree of redundancy or predictability with
which that meaning is communicated. With that said, though, it is impossible to divorce
the two concepts, just as the meaning of a text cannot be separated from the words with which it is conveyed or, arguably, from the audience's interpretative creation of
meaning. As Meyer writes,
Both meaning and information are thus related through probability to uncertainty. For the weaker the probability of a particular consequent in any message, the greater the uncertainty (and information) involved in the antecedent-consequent relationship.21
Earlier, Meyer highlights this same relationship as the source of musical meaning:
Musical meaning arises when an antecedent situation, requiring an estimate as to the
probable modes of pattern continuation, produces uncertainty as to the temporal-tonal
nature of the expected consequent.22 Although this relationship has not always been the
focus of music theory's use of information theory entropy, Meyer's comments imply that
information theory entropy has potential insight into musical meaning as well as musical
style.
21 Meyer, Meaning in Music and Information Theory, 416
22 Ibid.
21
CHAPTER III EXISTING MUSIC-THEORETIC SCHOLARSHIP ON INFORMATION THEORY
ENTROPY
Use of entropy in music theory is generally thought to begin with Youngblood's
1958 article Style as Information, in which entropies are calculated for eight songs
from Schuberts Die Schne Mllerin, six arias from Mendelssohn's St. Paul, and six
songs from Schumanns Frauen-Liebe und Leben. Only melodies in major keys are considered. In each case, a modified system of scale degrees is used as an alphabet; 1
indicates tonic, 2 indicated a raised tonic or a lowered subtonic, and so forth up to 12. His
zero-order results for these composers can be summarized as follows:
Composer Zero-order Entropy Zero-order Relative Entropy Mendelssohn 3.03 84.60% Schumann 3.05 85.00% Schubert 3.13 87.00%
Table 3.1: Pitch entropies from Youngblood
Youngblood finds the Mendelssohn sample to have the lowest entropy (or, alternately, the greatest redundancy/inefficiency) of the three, although he finds all three composers to have very similar entropies overall.23
Youngblood also compares the entropy values for these composers to the
23 Joseph Youngblood, Style as Information, Journal of Music Theory 2, no. 1 (1958): 24-35.
22
entropies of a collection of randomly chosen Mode I chants. When these chants are
considered as representatives of a seven-note alphabet, they are found to have a much
higher relative entropy than the lieder and arias (HR=96.7%). Youngblood attributes this to the chants' more regular use of non-final and non-tenor tones, as compared to the
lieder's marked preference for diatonic pitches over chromatic ones. Of course, when
considered as representative of a twelve-note alphabet, the chant selections have a lower
entropy than the works of all three later composers (H=2.72, HR=76%).24 Knopoff and Hutchinson question Youngblood's non-chant results for statistical
reasons, claiming that Youngblood's sample size is too small for the differences he finds
in Mendelssohn's and Schuberts entropies to be significant. In support for this argument,
they construct confidence intervals for Youngblood's data, shown in Figure 3.1. When
Youngblood says the entropy of his Mendelssohn sample is 3.03, he makes the implicit
claim that this sample is representative of all of Mendelssohn that if an analyst were to
compute a total entropy for all extant Mendelssohn works, that result would be fairly
close to Youngblood's. Confidence intervals measure how certain we are that the sample's
entropy is comparable to that of Mendelssohns complete body of work. Figure 3.1 shows
Knopoff and Hutchinsons confidence intervals for Youngblood's entropy calculations.
In this example, Knopoff and Hutchinson state with 95% confidence that
Mendelssohn's entropy falls between 2.895 and 3.183, and that Schubert's entropy falls
between 3.016 and 3.244. Since the confidence intervals overlap, one cannot conclude
based on this data that Schubert's and Mendelssohn's total entropies differ; it is entirely
possible, based on this data, that Schubert's total entropy is in fact lower than
24 Although most listeners probably hear chant in terms of a seven-note alphabet, one can
imagine factors that would lead the listener to hear chant in terms of a twelve-note alphabet, such as placement of the chant between or within tonal works (as in the fragment of chant that concludes Bruckners Os Justi), or a listeners lack of familiarity with the repertoire.
23
Mendelssohn's, or that the two are equal.25
For simple random samples, confidence intervals are generally computed using
some variant of the following formula
25 Leon Knopoff and William Hutchinson, Entropy as a Measure of Style: The Influence of Sample Length, Journal of Music Theory 27, no. 1 (1983): 75-97.
Figure 3.1: 95% confidence intervals for Youngblood's entropy calculations
24
where x-bar is the mean of the sample (in this case, the sample's entropy), s is the sample's standard deviation, n is the sample size, and is the quantity we wish to
establish: the predicted entropy for the musical style or composer in question.26 As is
clear from this formula, there are two factors that influence the size of a confidence
interval: sample size (Knopoff and Hutchinson's focus) and sample variance (the focus of a 1990 Snyder article). The former is reasonably intuitive; a very small sample could be a fluke, but if a large sample of Mendelssohn's work supports the conclusion that his total
entropy is 3.03, then it seems much more probable that Mendelssohn's overall entropy
really is close to 3.03. Snyder adds that variance within the sample can also make us
more or less confident. If an analyst looks at four Mendelssohn samples of comparable
length and finds them to have entropy values of 2, 4.98, 3.6, and 1.4, that analyst would
have difficulty predicting Mendelssohn's total entropy, since the samples are so disparate.
By contrast, if the first four samples came back as 3.01, 3.08, 2.97, and 2.93 the
conclusions drawn about these data would seem much more reasonable, even if the
sample were smaller.27
In addition to statistical concerns, Snyder and Knopoff and Hutchinson highlight
26 The multiplier 1.96 specifies a 95% confidence interval that is, if we take 100 samples, the means of 95 of the samples will fall within this interval. Multiplying by 1.645 instead would return a 90% confidence interval. This formula is provided only to illustrate the concept; confidence intervals in this paper were calculated for binomial proportions, taking propagation of error into account. See Appendix A of Knopoff and Hutchinson, 1993, for more information.
27 John Snyder, Entropy as a Measure of Musical Style: The Influence of A Priori Assumptions, Music Theory Spectrum 12, no. 1 (1990): 121-160.
25
several methodological problems, the clearest of which is modulation. In a piece that
modulates from C minor to Eb major, one would expect the pitches C, G, Eb, and Bb to occur with the greatest frequency which increases that piece's entropy quite sharply,
since such a piece contains four pitches that occur frequently, instead of the two such
pitches found in a nonmodulatory work. (This logic could be expanded to include scale degree 7 of each key as well as scale degree 5, but the result would be the same.) In a piece that modulates from C minor to G major, the shift to a new diatonic collection would result in a higher entropy, as well. Youngblood's analyses make no accommodation
for this; although he computes entropies based on a scale-degree system, these scale
degrees are never adjusted for modulations. He notes that this lack of regard for modulation may have disguised the differences between his Schumann and Schubert
samples, noting that (at least in these samples) Schubert's chromatic pitches tended to arise from modulation, whereas chromatic pitches in his Schumann samples tended to be
more ornamental very different phenomena that lead to similar results.28
In their analyses, Knopoff and Hutchinson compensate for modulations by
normalizing all passages to C major or A minor, although this normalization is only initiated by changes in written key signature. Snyder finds this disregarding of implied
modulations quite problematic, as well as the implied prioritization of la-minor. Since a
modulation between relative keys is never accompanied by a change in key signature, a
piece that modulates from, say, F major to D minor would register as having higher entropy insofar as the latter tonal area deviates from its la-tonic. Snyder also questions
the assumption that modulations should be normalized away, arguing that a piece that
begins and ends in distantly related keys ought to have a higher entropy than a piece that
28 Youngblood, 78.
26
begins and ends in the same key.29 Alternately, one could argue that a piece that
modulates from I to V ought to have a lower entropy than a piece that modulates from,
say, I to bII that the predictability (and perhaps even the smoothness) of a modulation ought to be a factor in that piece's entropy calculations.
Unfortunately, there are few solutions to the problem of accurate representation of
modulations in entropy. Arguably, Youngblood's system does associate distant keys with
higher entropies, since a modulation from C major to G major would contribute much less to a piece's entropy than a modulation from C major to Ab major would, based on the number of pitches held in common between the two respective diatonic collections. One
could imagine combining this method with a weighting system, in which pitches
belonging to passages in non-tonic keys contribute less to the piece's total entropy than
pitches in the tonic key do. Ideally, these weights would be determined in part by the
amount of time spent in the new key (as the listener's ability to remember the home key diminishes over time), but any such system would almost certainly be criticized as arbitrary.
The debate over how best to represent modulation in entropy calculations for
tonal repertoire highlights the concern that underlies most if not all entropy-based
analyses: what alphabet best reflects listeners' perceptions of musical language?
Uninterpreted pitch or pitch class is rejected as an alphabet because it is a poor reflection of listeners' interpretative hearings, since it shows no connection between the roles of C
and G in C minor and Eb and Bb in Eb major. Similarly, when Snyder adopts a twenty-eight-letter alphabet in which enharmonic spellings are taken as separate up to double
29 Snyder, 126-128. While the average listener may not realize that a piece has ended in C# major instead of C major, this same listener would probably notice if the piece begins in C major and ends in F# minor, if only for reasons of mode and register a distinction that cannot be made within this system.
27
flats (of scale degrees 7, 3, 6, and 2) and double sharps (of scale degrees 1, 4, and 5), his motivation is the notion that listeners hear F and E# as distinct pitches in certain contexts,
rather than the creation of an exhaustive system.
The variety of options available to analysts (even in terms of pitch alone) speaks to the expressive potential of these alphabets, since they can be altered to best reflect
listeners' perceptions of any given repertoire. This same flexibility can limit the analyst's
ability to compare samples from sufficiently different styles, though. It would seem
unfair to compare Wagner's entropy within a twenty-eight-note system with late serial
Schoenberg's, for example, since Schoenberg's disuse of double sharps does not speak to
any increased predictability in his music as compared to Wagner's, nor would it be fair to
say the listener finds Schoenberg's style more constricted because these letters are
omitted. One of the unstated goals of such analysis, then, is the selection of an alphabet
that is sensitive to perceptual concerns for specific repertoires but also general enough in
its applicability that its use on music from other repertoires seems reasonable.
This challenge is even greater for contextual music. The most common and most
universally applicable alphabet, either pitch names or scale degrees accepting octave and
enharmonic equivalence, is all but useless for serial music or any sort of music that
exhausts the aggregate regularly. Any such piece will have maximal entropy for that
alphabet cardinality, regardless of whether the piece is based on a derived row or an all-
interval row and, indeed, regardless of whether or not the piece is atonal at all. This
entropic equality implies that Webern's Variations for Piano is exactly as predictable as
Boulez's Piano Sonata no. 2, which would be in turn just as predictable as the first few bars of Coltrane's Giant Steps an unintuitive claim to say the least.
One potential solution is the incorporation of higher-order entropies, often
accomplished through the guise of Markov chains. Such constructs would allow the
28
analyst to look for patterns in the ordering of pitches, rather than relying on their
frequency alone. With a simple pitch alphabet, Markov chains could not differentiate
between serial rows, but could at least distinguish between a serial piece and a non-serial
piece that happens to use each pitch equally. Higher-order constructs have even clearer
applicability in entropic analyses of tonal music, since they measure predictability of
succession something of particular importance if entropy is taken to be a measure of
tonality, since entropy on its own is order-blind. Thus, from the perspective of zero-order
entropy, the progressions in Figures 3.2 and 3.3 are exactly the same, although certainly
one is more predictable than the other within a tonal paradigm, and certainly one is more
tonal than the other. By contrast, Markov chains could differentiate between these two
strings easily.
Figure 3.2: Passage with pitch-class entropy 2.52
Figure 3.3: Passage with pitch-class entropy 2.52
29
In his 1958 analyses, Youngblood computes entropies on first-order combinations,
in addition to entropies based on zero-order data. That is, rather than accepting C, D, and
E as the most basic units of music, Youngblood accepts C followed by C, C followed by
C#, C followed by D, and so forth as individual letters, creating an alphabet with 144
letters. However, Hessert notes that the continued effectiveness of this strategy is limited;
an alphabet built from consecutive pitch pairs is almost reasonable at 144 letters, but
three consecutive letters lead to 1728 possibilities, which leads to unwieldy
calculations.30 One can only imagine the complexity of a higher-order alphabet that does
not accept octave equivalence.
Hessert advocates the use of an alphabet based on intervals as a potential solution
to this problem, since a computation based on intervals is effectively first-order without
requiring any first-order computations. He also notes that an alphabet based on intervals
avoids the issue of modulation quite nicely, while reflecting motivic content more
accurately than pitch-based analysis can and potentially allowing for more meaningful
comparisons across disparate repertoire.31 Rhodes advocates a similar solution: an
alphabet that combines each pitch with its preceding interval.32 Potentially, such an
alphabet would allow the analyst to distinguish between typical and non-typical
resolutions of dissonant tones; a piece in which any scale degree can be left by any
interval is probably less tonal than a piece in which certain scale degrees (4 and 7, e.g.) can usually only be left by certain intervals (down by step and up by step, respectively). Of course, in the eyes of this computation, a composer who always resolves 7 to b5
30 Hessert, 16ff.
31 Ibid., 43-44.
32 James Rhodes, Musical Data as Information: A General-Systems Perspective on Musical Analysis, Computing in Musicology 10 (1995-1996): 165-180.
30
would be no more or less predictable than a composer who always resolves 7 to 1, or
even a composer whose 7s can resolve anywhere but whose b3s always resolve to b6.
Through its reliance on pitch, this sort of analysis nullifies many of the benefits Hessert
ascribes to intervallic analysis.
Lewin conceptualizes the importance of higher-order analytical capacity in terms
of charge, defined as the listener's degree of uncertainty as to what an upcoming
interval will be based on the intervals directly preceding it. His analysis goes up to sixth-
order strings (that is, fifth-order computations based on intervals), but it occurs within a highly idealized environment: a twelve-tone row independent of musical context, and
therefore independent of irregularities (e.g., partial presentations or reorderings of a row) or complications (e.g., the division of a row into verticalities, leading to the creation of melodic intervals not present in the original row) that would make such higher-order analysis impractical.33
Based on this ideal environment, Lewin determines that if a listener is able to
remember the previous five intervals of Schoenberg's String Quartet, no. 4, the listener can predict the sixth interval with complete certainty (assuming the row form in question has not been altered or truncated). This certainty is not an accurate reflection of listeners' perceptions of this row, though, even under ideal circumstances; if it were, Lewin argues,
the associated musical experience would be quite dull. Therefore, he concludes, the
listener probably only hears back two or three intervals perhaps more or fewer,
depending on motivic structure, complexity of the line's presentation, repetitiveness of
the line, and other factors, but probably not six. Thus, even if such higher-order analyses
were practical, they may not be a reasonable reflection of the listener's experience.
33 David Lewin, Some Applications of Communication Theory to the Study of Twelve-Tone Music, Journal of Music Theory 12, no. 1 (1968): 50-84.
31
Of course, one can imagine situations in which a less literal sixth-order analysis
would be appropriate. Although it seems unreasonable to expect a listener to remember
and consider six successive intervals, it seems quite reasonable for a listener to remember
a six-element contour, or six intervals expressed in the form of two or three verticalities.
To date, this form of entropy chunking has had almost no mention in the relevant
literature.
Hessert also raises the question of duration, finding it problematic that a half-note
chord root C is treated as equal to a sixteenth-note neighbor tone D in most entropic
analyses.34 Hiller and Bean reflect this same concern in their 1966 analyses of sonata
expositions, in which longer notes are weighted more heavily than shorter notes but
Hessert criticizes this approach for its lack of attention to attack, arguing that sixteen
sixteenth-note Cs are quite different perceptually from a single whole-note C.35 One
imagines that any method of computation that addresses both concerns would be
prohibitively complex; it seems most likely that any interested analyst must choose
whichever approach seems least inappropriate for that analyst's particular repertoire.
In any case, the most salient of Hessert's concerns that an ornamental tone is
treated as equal to a chord tone seems to be more an issue of interval than of duration,
since ornamental tones are approached by step more often than not (and since an ornamental tone approached by a large leap is probably aurally surprising enough that it
ought well to contribute as much to the piece's entropy as the chord tone it ornaments). It seems perceptually reasonable to claim that a whole step is a whole step, regardless of
whether that whole step connects a C and a passing D or a chord tone C and an adjacent
34 Hessert, 68.
35 Lejaren Hiller and Calvert Bean, Information Theory Analyses of Four Sonata Expositions, Journal of Music Theory 10, no. 1 (1966): 96-137.
32
chord tone D. From a pitch-based perspective, the distance a line must travel to arrive at
the next pitch may be more relevant from the perspective of musical predictability than
how long the line stays on that pitch but even in the absence of pitch, it seems the
primary determinant of predictability is not the duration of each individual pitch, but
instead either the rhythmic pattern in which these pitches present themselves or the
presence or absence of attacks at certain metric positions.
Hessert cites one example of entropy calculations based on rhythmic patterns, an
unpublished 1959 Master's thesis by John Brawley (Indiana University). Hessert finds this analysis problematic, since it relies upon an implicit invocation of an alphabet of
infinite cardinality, which makes the computation of relative entropy and redundancy
impossible. Additionally, Brawley sets forth no predetermined limits to what constitutes a
pattern. Does a dotted quarter followed by an eighth note constitute a rhythmic pattern?
If this configuration begins on a weak beat or is preceded by an eighth note, is it the same
pattern? Is pattern perceptually the same at M.M.=160 as it is at M.M.=40?36
Snyder advocates the exploration of duration-sensitive entropy calculations, but
he notes that such calculations almost necessarily conflate clock time with perceptual
time.37 In other words, by creating calculations based on the notated tempo we implicitly
privilege the former, which is less defensible given the degree to which analysis based on
entropy is meant to be a measure of listeners' perceptions of predictability. Of course, any
analysis that claims to be a reflection of perceptual time must almost certainly encompass
multiple musical domains beyond rhythm, tempo, and duration. In priviliging longer
notes over shorter ones, we run the risk of (for example) privileging extended neighbor notes over the shorter chord tones they ornament.
36 Ibid., 45-50.
37 Snyder, 125-126.
33
Other than Rhodes, few analysts have attempted to deal with more than one
musical alphabet simultaneously. The notable exception is Hiller and Fuller's 1967
analysis of the op. 21 Symphony, in which pitch (not pitch class) is combined with the number of eighth notes between successive attacks. Entropies are also computed on
various types of intervals. These entropy calculations are then used to draw conclusions
about formal sections of each section of the first movement. When pitch is considered
alone, results between zero-order entropy and first- or higher-order chains are
inconclusive; although the development is (as one would expect) the least predictable in terms of individual pitches, its higher-order results are more predictable than either the
exposition or the recapitulation.38 These inconsistencies carry over into interval-based
and attack-point-based entropies.39
As mentioned, entropy is the quantity of information (measured in the number of bits the message would require to store or transmit) that each letter of an alphabet conveys. Hiller and Fuller also express their entropy in terms of bits per second (based on the notated tempo) that is, examining entropy in terms of the rate at which information is presented. Their hope is to distinguish between the listener's experience of a great deal
of information presented quickly, and the same amount of information presented over a
longer timespan. Interpreting entropy in terms of bits per second does not change the
entropy results for op. 21, but the idea bears investigation: that the speed with which
information is presented influences the audience's perception of its complexity.
Unfortunately, this measure cannot describe how evenly distributed information is across
this passage distinguishing, for example, a burst of information followed by silence
38 Lejaren Hiller and Ramon Fuller, Structure and Information in Webern's Symphonie, op. 21, Journal of Music Theory 11, no. 1 (Spring 1967): 78. 39 Ibid., 84ff.
34
from a passage with a continuous information rate. Of course, the accuracy of Webern's
notated tempos is problematic in any case, and the frequent ritardandos in his music make
it less plausible that a calculation of this type could be relevant to a performance. Despite
any practical limitations, though, the fact that entropy was considered in terms of the rate
at which information is received hints at an early connection between entropy and
diachronic analysis and arguably, an early connection between entropy and time, as
well.
Hessert gives four criteria for effective entropy-based analyses:
1. An alphabet should be finite; 2. Elements in an alphabet should be discrete; 3. Sample sizes should be as large as possible; 4. Analysis should be based on as many musical domains as possible.
The first two are basic criteria without which entropy calculations are impossible; the
second two are desiderata but not necessarily requirements. To these one can add that
entropy can most effectively analyze samples with low variances, since entropy is in
some sense a decontextualized measure of central tendency. Smaller sample sizes may
assist in analysis, if they serve to reduce variance; it is more effective to analyze a small
sample that possesses a given characteristic uniformly than to combine this sample with
another sample lacking this characteristic. Imagine a bimodal grade distribution in which
many students have a 90% average and many have a 60% average. Considering these
students in terms of two smaller sample sizes allows one to generalize about the data
easily, but combining the two samples yields both an unhelpful overall average and a
much higher degree of uncertainty. The same logic ought to apply to musical domains.
Considering data across multiple domains is useful, but considering multiple domains
simultaneously that is, combining entropies of different domains into a single entropy
35
measure may disguise tendencies in the data.
The cautions one can draw from the history of entropy in music are, for the most
part, no different from the cautions that apply to all analysis. In particular, entropy-based
analyses are problematic when they do not reflect musical experience. If one accepts that
all music analysis is necessarily metaphor that quantitative analyses are simply a
different way of exploring metaphor then the most important caution is that these
metaphors must be apt, rather than relying upon their quantitative nature to make their
arguments. If an analyst is careful to ensure that conclusions based on entropy are
reflective of musical experience and perception diachronic or synchronic then entropy
can prove a useful tool for analysis.
36
CHAPTER IV ALPHABETS FOR ENTROPY-BASED ANALYSIS
Interval Entropy
As discussed previously, pitch class entropy is rarely useful for analysis of post-
tonal music. The table below gives pitch class entropy figures for a collection of post-
tonal vocal works; Youngbloods results for Schuberts pitch entropy provides a baseline
from tonal repertoire.
Work Style Pitch class entropy Relative entropy Webern op. 15 (Fnf geistliche Lieder), without no. 540
Freely atonal 3.58 100%
Webern op. 16 (Fnf Canons) and op. 15, no. 5
Freely atonal canons 3.57 99.7%
Webern op. 25 (Drei Lieder)
Serial, based on a derived row
3.58 100%
Babbitt, Widow's Lament in Springtime
Serial, based on an all-interval row
3.58 100%
Youngblood's Schubert sample
Tonal 3.13 87.4%
Table 4.1: Pitch entropies in Webern works, compared with Babbitt and Schubert
No measures of statistical significance are necessary to interpret these results.
40 Since op. 15, no. 5 is a canon, it is included with the op. 16 canons throughout this section.
37
Although pitch entropy is able to distinguish Schubert from Webern, it is unable to
distinguish between serial and freely atonal works, or derived rows and all-interval rows.
Even canons are seen as maximally unpredictable, although one imagines the second and
third voices are quite predictable indeed.
Intuitively, it seems entropy based on interval class should be able to distinguish
between these styles. Pitch class entropy can only recognize canons iterated at the same
pitch level, but a canon interpreted as a series of intervals should be recognizable at any
pitch level. Although the order-blindness of entropy somewhat limits its effectiveness for
canons, interval class entropy can at least distinguish a canon from a non-canonic piece in
the same style. The same logic applies to serial works; a serial work will generally have
lower entropy than a freely atonal work since any interval appearing in the row would be
repeated many times, while any interval not appearing in the row would be heard very
infrequently. (Similarly, a serial work based on an all-interval row should have roughly the same number of all interval classes, whereas a work based on a derived row would
have roughly proportionate numbers of a few interval classes and very few of any others.) Such a measure could be unable to distinguish between a serial work based on a derived
row and a freely atonal work saturated with the pitch class set that forms the basis of the
former's derived row, or a serial work based on an all-interval row and a freely atonal
work that simply exhausts the aggregate of interval classes regularly, but arguably, most
listeners would not be able to make this distinction, either.
These intuitions are somewhat flawed, in that they assume an idealized linear
presentation of a serial row. Vertical presentation of a portion of a row or the division of a
row amongst several voices will almost certainly create new intervals not represented in
the original row. Nevertheless, entropy is at heart a measure of predictability, and it seems
reasonable that it should reflect the listener's surprise at hearing an interval not linearly
38
present in the row, even if reflection of this surprise comes at the expense of the
construct's ability to identify a work as serial or non-serial.
Horizontal interval class analysis of the same works from Table 4.1 provides the
results shown in Table 4.2 and Figure 4.1.
Work Interval class entropy
Deviation at a 95% confidence level
Relative entropy
Webern op. 15 2.57 .04 91.5% Webern op. 16 2.48 .04 88.3% Webern op. 25 2.35 .05 83.6% Babbitt, Widow's Lament 2.72 .06 96.8%
Table 4.2: Interval class entropies comparing serial and non-serial works
Figure 4.1: Interval class entropies comparing serial and non-serial works
39
These data indicate that interval class entropy is able to distinguish between
derived and all-interval rows, and between canons and non-canons from approximately
the same period. These are both important tests of the construct's effectiveness; its ability
to make these distinctions speaks toward its ability to reflect musical saturation and
predictability.
These distinctions are not retained when vertical intervals are included.
Work Entropy (vertical and horizontal intervals)
Deviation Relative entropy
op. 16 3.42 .05 95.5% op. 25 3.34 .06 93.3%
Table 4.3: Vertical and horizontal interval entropy on one serial and one non-serial work
Although op. 25's entropy is still lower than op. 16's, the difference is no longer
significant. In other words, based on these calculations we cannot posit a distinction
between Webern's use of verticalities in op. 16 and op. 25; if both pieces were played as
block chords, it is unlikely the listener would be able to distinguish between them based
solely on intervallic content.
Returning to the question of horizontal intervals, then, we find that removing
inversional equivalence eliminates many of the distinctions between these works, as
shown in Table 4.4 and Figure 4.2. Without inversional equivalence, relative entropies are
higher across the board since what was originally an emphasis on interval class 1
becomes a dual emphasis on registrally-ordered interval classes 1 and 11. Variances
increase for the same reason, which makes statistically significant distinctions less likely.
40
Nevertheless, registrally-ordered interval class entropy can still distinguish
meaningfully between canons and non-canons (Webern op. 15 vs. op. 16) and between derived rows and all-interval rows (Webern op. 25 vs. Babbitt). The most interesting difference between Table 4.2 and Table 4.4 is op. 25, which has a lower interval class
entropy than op. 16 but a higher registrally-ordered interval class (ric) entropy. This distinction speaks to a fundamentally different approach to inversion between these two
works. In op. 16, an ric1 is not the same as an ric11, since a melodic ric1 in the clarinet
line could not be answered with an ric11 in the vocal line without breaking the canon.
Assumptions of inversional equivalence seem much more reasonable in op. 25, since the
juxtaposition of prime rows with inversional rows leads the listener to hear intervals and their inversions as at least related, if not equivalent.
Work Registrally-ordered interval class entropy
Deviation Relative entropy
Webern op. 15 3.40 .06 95.0% Webern, op. 16 3.24 .06 90.5% Webern, op. 25 3.34 .06 93.3% Babbitt, Widow's Lament in Springtime
3.50 .07 97.8%
Table 4.4: Registrally-ordered interval class entropy in Webern and Babbitt
41
The remaining oddity in these data is the similarity between Webern op. 15 and
Babbitt. To investigate this similarity, we expand intervallic entropy into interval entropy
(-72 < x < 72) and ordered directional interval class entropy (-12 < x < 12).41
Ordered directional interval class entropy bears few surprises. The data in Table 4.5 and
Figure 4.3 show the expected distinction between Webern and Babbitt, but from these
data no conclusions can be drawn about any of the Webern works examined almost the
opposite of the results generated by registrally-ordered interval class entropy.
41 72 (or six octaves) is a number chosen out of convenience the distance between the highest and lowest pitch in any of these pieces, rounded up to the nearest octave.
Figure 4.2: Registrally-ordered interval class entropy in Webern and Babbitt
42
Work Ordered directional interval class
Deviation Relative entropy
Webern op. 15 3.71 .08 82.1% Webern op. 16 3.64 .09 80.1% Webern op. 25 3.52 .11 77.9% Babbitt, Widow's Lament
4.03 .12 89.2%
Table 4.5: Ordered directional interval class entropy in serial and non-serial works
Figure 4.3: Ordered directional interval class entropy in serial and non-serial works
43
The inferences to be made from this apparent inconsistency either that ordered interval
class is less a relevant structure in these Webern works, or that Webern's predictability in
terms of ordered interval class remains consistent across a variety of post-tonal styles
are at first alarming. Either conclusion makes suspect Hessert's claim that interval-based
entropy is capable of dealing meaningfully with works from disparate periods and styles
given that registrally-ordered interval class entropy lacks the generality to distinguish
between Babbitt and freely-atonal Webern, while another lacks the generality to
distinguish between Webern works of different styles and time periods. Perhaps the more
useful claim to draw from this perceived lack of generality is that any invocation of
intervallic entropy must be nuanced that in computing intervallic entropy we make
implicit assumptions about a given composer's approach to the interval, assumptions that
should be examined and argued.
One must also keep in mind that although statistically significant differences
between works imply differences in style, the lack of statistically significant differences
does not imply stylistic similarities. The lack of distinction between Webern's op. 15 and
Babbitt's Widow's Lament in terms of registrally-ordered interval class entropy does
not imply a fundamental similarity between these works' use of registrally-ordered
interval classes; rather, the differences between the works are simply not profound
enough for us to be certain that they imply a genuine stylistic difference. In short, an
unexpected significant difference between two works is noteworthy, but an unexpected
similarity need not be.
At the very least, these results demonstrate the utility of examining repertoire
from multiple perspectives on the interval. These results also hint at the possibility of
using various types of intervallic entropy as evidence in an argument against, for
44
example, accepting inversional equivalence as a given in analysis of a particular work.
Entropy computations for pure intervals, as opposed to interval classes, provide
the following results:
Work Interval entropy Deviation Webern op. 15 4.92 .11 Webern op. 16 4.75 .11 Webern op. 25 4.98 .16 Babbitt, Widow's Lament
4.91 .16
Table 4.6: Interval entropy in Webern and Babbitt
Figure 4.4: Interval entropy in Webern and Babbitt
45
Relative entropy is omitted here, because the maximal entropy of a 144-letter
alphabet is extraordinary large. As a result, these results would have extraordinarily small
relative entropies, which would give an impression of predictability not audible in the
music.
Although op. 16 seems to have a much smaller entropy than all other works
considered, this deviation is not statistically significant. Even if it were, the conclusions
drawn would be slightly problematic. One could not conclude even from significantly
smaller interval entropy that Webern op. 16 relies more upon smaller interval