+ All Categories
Home > Documents > The MP3 Book

The MP3 Book

Date post: 08-Oct-2014
Category:
Upload: david-e-weekly
View: 1,645 times
Download: 2 times
Share this document with a friend
Description:
A "v0.2" (unfinished rev) of a book about the MP3 audio technology. Please give feedback if you liked it and want me to finish it.
Popular Tags:
28
The MP3 Book David E. Weekly April 6, 2000 v0.2
Transcript
Page 1: The MP3 Book

The MP3 Book

David E. Weekly

April 6, 2000

v0.2

Page 2: The MP3 Book

Preface

As a senior at Stanford in the fall of 1999, I was asked by a publisher to author an authoritative book on MP3 as both a movement and technology. I had written what was likely the first laymen’s description of the MP3 technology and was deeply involved in promulgating it, so it seemed appealing. Unfortunately, I only got a few chapters into it before I realized I was unlikely to actually graduate if I didn’t buckle down and focus on my classes. In early 2012 as I was dusting off the decrepit corners of my old website, it occurred to me to publish what I had written as a “v0.2” to get some feedback to see if I should continue working on it. This is that work - please get in touch with me at [email protected] if you have feedback on this text.

My Story

I’ve been involved with MP3s since early 1997. At the time, I was composing music in the Amiga .MOD format, an early software wavetable synthesis system, and I was talking with a friend about it. He asked me if I had released anything in “MP3” format. I had not; I didn’t know what an MP3 was! So I surfed the Web that night looking for information about this new format. (As it turned out later, MP3s are not new at all, but we’ll get to this.) I found a few scattered websites with sparse information on the topic, making allusions to this way of compressing sound. I managed to find an audio player and a few MP3 files and I clicked “Play.” I was absolutely blown away by what then happened. Music started playing from my speakers that sounded like it came from a CD. I had downloaded a small file from someone else’s server that was now playing absolutely gorgeous music from my computer. I was dumbfounded by the implications: you didn’t need to be able to hold a piece of music to hear it; you could share music with friends without giving it up yourself; record stores were no longer necessary!

I wanted to share this discovery with everyone. I showed it to everyone in my dorm, and people were soon blasting their freshly downloaded music out of their stereos. In late February 1997, I set up a website that explained what the format was in clear and straightforward terms (possibly the first such explanation on the web), had links to all of the latest players (about 15 or so), encoders, and other MP3 sites, and I also put up 120 of my favorite songs from other sites to demonstrate the quality of the format. There were around a dozen other MP3 websites at the time, so I sent off emails to the webmasters to introduce myself. One guy had to take down his website due to the amount of traffic he was getting, so I suggested for him to redirect visitors to my site. I watched my site traffic go from 5 hits per hour to over 100 hits per hour the minute he put the code on his site. I was, naturally, quite excited. I now had a popular website! I

Page 3: The MP3 Book

tweaked the server to focus on web serving and put a realtime graphic on the screen to show how many people were currently on the site. At night, I learned to go to sleep to the throbbing and clicking of the hard drive. I hope my roommate did, too, God bless his soul. This went on for about a week and a half. Then I got two phone calls.

The first call was from Residential Networking. It went something like this: “Hello, David. I’m one of the network administrators at Stanford. I don’t know how familiar you are with how we work here, but we keep a ‘Top 20’ list of sorts that tracks the twenty computers putting the most data out from Stanford to the Internet. Now you should understand that normally dorm computers don’t make it onto this list, just our main servers like www.stanford.edu. Well, David, your computer is on this list. In fact, it’s been on there for a while, in the #1 slot. Your computer is currently responsible for 80% of the outgoing traffic from this campus.” I gulped. “We’re just curious,” they asked, “What are you doing?” So I told them. Amused, they hung up. They had mainly been worried that I was running some kind of commercial service, which I was not.

The second phone call was not as kindly. It was from Network Security and on the same day as the first. Apparently they had gotten a phone call from Geffen Records advising them that a student was distributing copyrighted music on a webpage and that it would be in Stanford’s best interests to shut the site down quickly and quietly. The man on the other end of the line was impervious to my plea as to why this was the ultimate boon for artists. He was not very happy about the idea of Stanford getting sued and demanded the site be down in five minutes. What could I do? I took the music offline.

But I was upset. I had just stumbled across the greatest tool an artist could ever have for distributing their music online and now the industry most responsible for helping artists was shutting it down. That day, Geffen shut down the twenty or so major MP3 websites. It was not a difficult task: we were not trying to hide ourselves! We listed ourselves on search engines and linked to each other. We had been proud to be showing people a new technology. We were not trying to hide, by any means.

I went back to my website to shut it down, and as I did, I went to the chat room that I had setup on the page. Someone was there, so I clicked on him: it was Jim Griffin of Geffen Records. Jim left the chat room before he saw me log in, but I had just enough time to grab his email address. Now I had the email address of my oppressor! I sent a two-page letter explaining why I thought that he had erred in shutting my site down and how much I thought that this medium could benefit artists. I didn’t really expect to hear back.

Page 4: The MP3 Book

But surely enough, Jim wrote me an email back, with his phone number. He wanted me to call. I was amazed! I called him.

It turned out that he was not as anti-technology as I had thought he would be. In fact, he agreed with nearly all of the points I had to make about online distribution being the future of the music industry. As it turned out, that week the RIAA (the Recording Industry Association of America, a group representing the largest US record labels) had held an emergency meeting in New York to discuss the suddenly exploding issue of online music piracy. They had, among other things, demonstrated my site and had discussed possible “legal remedies.” Jim just didn’t want me to get sued; he had done me a favor without me even knowing it.

Realizing that there might be a possibility of cooperation between the music and technology worlds, I was encouraged. I contacted the other webmasters whose sites had been shut down and we began to talk about what we could do to promote MP3 as a legal, cool way to share music. None of us really had a patent interest in illegally copying music; we were simply blown away by the “cool factor” of the new medium. We decided to form an official quorum for discussion of these issues, called The MP3 Audio Consortium. (It was actually originally called “The MPEG-3 Audio Consortium,” until Tristan Savartier of mpeg.org pointed out to us that MPEG-3 didn’t exist and that MP3 really meant MPEG Audio Layer 3!) One of our members drew up a logo, and I set up a website and a mailing list. Nicknamed M3C, we grew quickly.

Within two weeks of formation, we had over 100 members from the Internet, audio, and technology worlds. List traffic was flowing furiously discussing ideas for how we could make MP3 viable. I contacted ASCAP and BMI about web licenses; first told they didn’t exist, I was then sent ASCAP’s web license soon after it came out. I scanned the documents and put them online. Discovering that a license from ASCAP was not sufficient rights to broadcast music, I began to put up information on the legal aspects of distributing music online. I maintained my list of audio players and included news briefs on what was going on in the MP3 world. Needless to say, it was becoming more difficult to focus on schoolwork.

Through the discussion on the list, I personally had come to the conclusion that the best way to get Internet audio into the ears of the masses would be to start a company and showcase artists on a web site, selling some pieces of music and giving away other parts. With fellow Stanford student Steve Oskoui, we set out to transform the music industry, naïve and hopeful. Our first mission was to lock down the technology that we’d need. Knowing that were we to succeed, we’d require ungodly amounts of storage and bandwidth, I began investigating storage and bandwidth solutions, asking around for solutions that could store

Page 5: The MP3 Book

terabytes (trillions of characters of text) and deliver gigabits (billions of ones and zeroes) per second of bandwidth. After all, a gigabit was only 10,000 people listening to CD-quality music.

At the same time as investigating the technical backend to our servers, I was in the middle of organizing a conference for mid-August ’97. It was going to be the first conference focused on MP3s. I was going to make it cheap, $50, instead of the lucrative $3000+ charged by the other conferences. I was going to bring in panels of lawyers, artists, techies, and record label execs to duke it out, showcase new technologies, and talk about what direction MP3s should go. It was to be called NetWave. I lined up speakers, companies, and financial backing. I reserved a hotel and conference rooms. But alas, my funding from the RIAA fell through and Liquid Audio took too long to pony up their sponsorship, and the date for the down payment on the hotel came and went. I had to cancel. Conference planning had left me too busy to keep up the M3C site, which was rapidly becoming outdated. Desperately trying to delegate its maintenance to other volunteers, I was unable to sustain the site and was forced to bring it down. I was sad, but at least I could now focus on the company.

Steve and I worked out the pricing and figured we could pay for bandwidth by embedding 5-second audio advertisements in front of the music. If we could get even a penny a song, we’d be able to turn a good profit, due to the massive volumes of people downloading MP3s. On purchased music, we’d offer songs for a dollar and an album for six or seven. It seemed reasonable enough. We dreamed of becoming Silicon Valley superstars. We dubbed the company Universal Digital Media and incorporated ourselves as a Limited Liability Corporation. All we needed now was pop content and we’d be all set.

I contacted Mr. Lippman, head of Lippman Entertainment, through his son Josh who also went to Stanford. Lippman Entertainment managed big artists, like Guns & Roses. They were interested in promoting Matchbox 20’s new 3am single on the web as a bit of a publicity stunt: they wanted to be a “first,” catch a bit of press for it, and hopefully drive up album sales. Matchbox 20’s Push was already playing on the Top 10 lists across America, so I knew I had a hit on my hands. The traffic to our site would be tremendous; other acts would look to follow with us; we’d be a trusted name in the industry. Josh talked it over with the band and they gave it their thumbs up. We went down to LA to meet with Mr. Lippman and demo our solution for him.

It turned out that Mr. Lippman didn’t have an Internet connection, so we had to arrange the demonstration for a painfully slow 28.8k modem on an America Online dialup account. America Online decided to be finicky that day and the dialup was not working. It’s not easy to explain to someone that their connection

Page 6: The MP3 Book

is at fault and that yes, this really was something that hundreds of millions of people did easily, trivially, every day.

But we managed to come through okay, and Mr. Lippman saw the potential for distribution. He realized that if he could get another couple million people to hear his music that that would mean new sales, and that if his company was viewed as innovative and cutting-edge for adopting new technologies that artists might be more “hip” to sign with him. He agreed and we shook on it. 3am was going to be distributed on the web for ten cents a download. All we had to do was formalize the agreement.

Steve drew up a contract and we mailed it to LA. We were told we’d hear back in two days, at which time we’d launch the site. We waited patiently, and two days turned into three, turned into a week, and then two weeks. We called. Apparently, Atlantic Records was the organization that actually had control over what Matchbox 20 did, and they hadn’t so much as read the contract? Why? Because they weren’t “quite looking at getting into Net audio just yet.”

During this whole time, the press had contacted us and was very curious about what was going on. They were intrigued by the story of a “pirate turning entrepreneur,” and articles flew left and right: USA Today and then Fortune, Forbes, Wired Magazine, Red Herring, and The New York Times. It felt weird: I had almost never been in the press before and now I was fielding phone calls like an operator after an earthquake.

But the deal didn’t go through. Atlantic had turned us down. Then Geffen, after eagerly promising us pop content, also came back empty handed. “Dumb lawyers,” they shrugged, “what can you do?” We didn’t have content.

It was clear that without major label support, the vast majority of our content would have to be through small, unsigned (and unknown bands). Our company would not, it turned out, be a technology company after all, but just a small label with a very high-tech website. This is not what we had come for. Both of us had interest in and experience with technical work, and neither of us were ready or willing to drop out of school to undertake full-time work in the music industry. At this point, around February of 1998, we decided not to go through with Universal Digital Media.

I suggested to Michael Robertson of MP3.Com that he start a “NetWave” of his own. I laid out my ideas for the conference with him, and in June of 1998 Michael hosted the First Annual MP3 Summit, doing a fantastic job. Michael asked me to write a report afterwards and I did; it was subsequently posted on MP3.Com’s

Page 7: The MP3 Book

web site. Apparently, people liked the informal, tongue-in-cheek reporting, and I was asked to write columns for various websites.

I continued to write articles in the space and also started consulting companies as to positive directions for their Internet audio strategies. I took a class and figured out how to write my own audio codecs. Prima Publishing contacted me and here I am, writing my own book! These surely are exciting times when such things can befall a hapless 20-year-old.

Page 8: The MP3 Book

The Hype

A History of MP3

What is MP3? Microsoft DOS and Windows computers usually name files with a 3-letter extension. MP3 was chosen as to be the extension for MPEG Audio Layer 3 files. MPEG, the Moving Pictures Expert Group, is an international body formed to create standards for audio and video compression.

MPEG audio layers 1 and 2, providing simplistic audio compression, had been defined in the mid-1980’s: the former for digital audio tapes, and the latter for use in laserdisks. But these were simplistic algorithms that did not sufficiently compress audio.

Around this time in Europe, there came a strong demand for radio stations to be able to broadcast the same show in many places at once. In this way, a popular show in Milan could be broadcast in Paris at the same time and allow for syndication of radio talent. Unfortunately, single-channel ISDN (Integrated Services Digital Network) was the primary way that radio stations had for sending data to each other and that sent data at a mere 64 thousand bits per second (kbps). A new method for compressing audio would have to be invented.

In 1987, Fraunhofer, a German commercial research institution, began work on a more effective method for compressing audio. Led by Karlheinz Brandenburg, by 1989 Fraunhofer had created a more complex and effective encoder that allowed files to be compressed nearly twice as much as layer 2 files and three times as much as layer 1. Using layer 3 audio, standardized in 1991 with the International Standards Organization (ISO), one could achieve reasonable stereo sound at 64kbps and excellent sound at 128kbps, using two channels of an ISDN line at once: one for the right stereo channel, and one for the left. The complex algorithms required special, dedicated hardware to compress and decompress the audio on the fly. Home PCs of the time did not have enough processing power to be able to play back these files, much less to compress them. Audio compression quietly stagnated as the Internet began to explode.

Audio on the Internet was pretty awful around mid-1996. With RealAudio 1.0 playing barely recognizable music, and poorly recorded 8-bit Sun .AU files floating around the Net, it’s no wonder few saw a genuine potential for Internet audio1. A bright soul within Fraunhofer realized that modern personal computers, Pentium 100’s at the time, were becoming sufficiently powerful to

1 I must here credit my father for mentioning to me in 1993 that the person who figured out how to compress music to allow it to download quickly would likely make a good deal of money.

Page 9: The MP3 Book

play back layer 3 files, and released as shareware the now famous ‘L3ENC’ MP3 encoder and their ‘WinPlay3’ player. A few college students began to catch on to the possibilities and started telling their friends about this new format that could let you reproduce CD-quality audio in small, compressed files. Some programmers took a look and figured out how to optimize the code that Fraunhofer had provided for playback, allowing slower, cheaper, computers to also play MP3 files.

Tomislav Uzelac was one such programmer. At school in Croatia in 1997, Tomislav felt pressed to create an optimized MP3 player so he could listen to MP3 files on some of the slower computers he had access to. After weeks of hard work, he released the AMP MP3 decoding engine. Several players instantly sprang up, using the AMP engine as their core technology, including Nullsoft’s WinAMP, now the most popular MP3 player in the world and since acquired by America Online.

With MP3 players rapidly advancing in stability, usability, customization, and user experience, and music download sites springing up daily, MP3s began to explode in the spring of 1997. Geffen Records shut down my own site, which served to explain MP3 audio technologies, for allowing public access to their copyrighted songs.

When it became clear that “out in the open” distribution would not work, people started sharing music more discretely, taking MP3s from the Internet to their intranets. Even today, this is how the vast majority of digital music distribution is performed: a study by MusicDish.com on June 21, 1999, revealed that only 5% of their respondents who listened to MP3s obtained them from a public web site. Intranet sharing is simple. All Windows users have to do is right click a folder and select “Sharing” to begin sharing the contents of that folder with all of their friends on the local network. Clearly, even if one could take down all of the major public MP3 sites, this was only a tiny fraction of what was going on.

The record companies to this day hold to the belief that it is possible to “educate consumers,” i.e., make them believe that sharing music is a fundamentally immoral act. “SoundByting” was the code name and web site for the RIAA’s first attempt at this. Needless to say, it only served as a point of discussion for how misguided the RIAA’s attempts to thwart MP3s were.

Microsoft caught on to the fact that MP3s were becoming very popular and moved quickly to enter the space. They quietly added MP3 support (including encoding!) to version 3.0 of their NetShow media playback application in 1997. Windows98 came bundled with MP3 playback and encoding support, although few were aware of this fact at the time, as Microsoft was not looking to upset the

Page 10: The MP3 Book

record labels. Microsoft then renamed NetShow “Windows Media 4.0” and came out with a new codec of their own invention, titled “MS Audio 4.0” and claimed to twice MP3’s quality. Of course, you could only play back these files on a Windows platform, and you needed to have NT Server to stream them. In reality, MS Audio 4 does not offer the scalability of MP3, as covered later in this book.

Other companies also looked to leverage MP3’s popularity to make a quick buck. Several websites opened up to search for illegal MP3 files on the Internet, thus questionably staying legal themselves while providing access to MP3s that were not likely licensed from their respective copyright holders. Lycos in 1998 licensed such technologies from Fast Company of Norway to create an MP3 search engine, and was subsequently sued by the RIAA.

Michael Robertson, who at the time was running a file search engine called “filez,” kept tabs on search terms and noticed that a lot of people were searching for ‘MP3.’ Without knowing what ‘MP3’ was, Michael jumped to register MP3.com in late 1997. After a little more poking about the space, Michael discovered what he had got himself into and started the site, bringing on smalltime artists, writing articles about the space that usually focused on anti-industry rhetoric, and linking to MP3 players and encoders. By virtue of the site’s easy to remember URL and the vast popularity of MP3 files, the site gets its good share of visitors. Michael is also particularly adept at playing to the media, and at one point was getting nearly two articles a week about MP3.com on C|Net’s news.com. Unsurprisingly, he filed to take MP3.com public in mid-1999. Digital audio distribution was maturing and coming into the public limelight.

Current Exposure

Since its inception, the scene has shifted from a software focus to a hardware and content focus. This makes sense: software is easy to rapidly deploy to millions; once the user base and technology infrastructure is there the long-term hardware and content industries move in. The 1998 MP3 Summit showcased a single, prototype portable player from an obscure Korean Manufacturer. The 1999 Summit saw five new portable players about to enter the market, three MP3 set-top box manufacturers, and even a cabling company that focused exclusively on making connectors for portable MP3 players. The scene has also seen a profusion of “online record label” startups, usually seeking to provide non-exclusive online label-like services to artists.

As of mid-1999, it is estimated that there are many tens of millions of people using MP3s to listen to their music. They certainly have taken their place in college culture: an informal study at Stanford revealed that more students listened to MP3s in their dorm room than listened to the radio. MP3 has become

Page 11: The MP3 Book

a regular keyword in the press, with even such mainstream publications as TIME, Newsweek, and USA Today reporting regularly on the field.

Artists are beginning to push their labels for digital distribution rights. Alanis Morissette, Tori Amos, and the Beastie Boys all are pursuing MP3 distribution as a promotional channel. Public Enemy, a long-standing and popular rap group, recently broke off from their record label when Def Jam refused to let them post MP3 of tracks on their upcoming album “Bring Tha Noise 2000” to their website. They are now focusing on digital distribution channels.

Why People Use It

People use online digital audio for a number of reasons, but the most prevalent one is usually the large, free archives of high-quality audio that are readily available. Some use this as a tool to discover new bands and end up purchasing vast quantities of music from it: since it costs them nothing to sample bands and genres whose CDs they would not have bought, the consumer feels more free to experiment and try out new artists. Those that she likes, she’ll go out and buy. Another percentage of those using the Internet to listen to digital audio do so as a replacement for CD purchases, decreasing their music spending dramatically. Both groups seek the same goal: to surround themselves with music that they love nearly continuously, to have an “invisible orchestra” capable of playing any tune accompany them.

Fundamentally, music is exciting and universal. Few people are not stirred by some genre of music. Many people, myself included, first used the technology as a novelty. We are not used being able to send CD quality music to a friend on the other side of the planet! But the meshing of something as core to humanity as music with something as key to our work lives as a computer will help make our lives more humane. My personal goal in the digital music revolution is that people everywhere will get to listen to all kinds of music they really like whenever they want.

One interesting critique that people often have of digital music is that it is intangible. I was once teaching a class of high school student in New England about new media and what the Internet was going to do to music online. When I showed the class the new Diamond Rio player with no moving parts, a girl raised her hand. She explained that she would never use such a device, because she could not touch the music and wanted to be able to hold the CD or tape in her hand. Surprised, I paused. I had not thought about consumer intuition regarding music.

Later, it became clear to me what the girl’s issue was. She had grown up in an era where the vast majority of music she purchased came from tangible media,

Page 12: The MP3 Book

assuming that most people only go to live concerts so often. For her, music was intuitively about the packaging.

But the irony was killing me! I realized that the girl had been complaining about how unnatural digital audio was while defending the CD! She did not realize that even the act of encapsulating music, of recording and packaging it, is unnatural. People were shocked by Thomas Edison’s turn of the century “phonograph” that could record speech and song in grooves on a wax cylinder. Surely, this was not the “natural” form for music.

The answer is that intuitions about music and its delivery will have to change slowly as the culture adopts to it. To today’s younger children, music from a device like a Rio that could play all of their favorite music might make far more sense than a clumsy CD or tape player for which they have to have with them in physical form the particular group of songs they want to play.

Future Trends

It is my belief that technology is going to make broadcasting on the Internet as easy as pushing a button. Artists will be able to cheaply produce, perform, distribute, and sell their music to a global audience: a folk artist in Oklahoma might develop a strong following in Sri Lanka and fans could virtually attend live concerts on a tour.

With technology as an enabling technology for publication, distribution, and sales, one major issue remains: marketing. For as of now, this is the biggest ally of the record label. Large record labels know how to get a song played on the radio, put in large racks at Tower Records, and push music out to every corner of the world. Small bands and “indy” labels have found it very difficult to get this kind of coverage, but this is not necessarily because their music is bad. Instead, it is because they do not have enough resources to “carpet-bomb” the planet with their content in the same way the majors can.

Few have realized it, but technology can obsolete this issue, too -- for the real purpose of marketing is to make everybody who would be interested in a given product aware of it and to create an intention to purchase the product. There exists a technology called “collaborative filtering” (implemented in pieces of software called “recommendation engines”) that can make those likely to enjoy a product aware of it. Here is how it works:

As a user goes to a site and listens to some pieces of music the site keeps track of what music she has listened to and compares it to the pieces of music that other users have listened to. As it turns out, people tend to group well: there are quite likely a lot of people out there who have similar tastes in music, however

Page 13: The MP3 Book

eccentric or esoteric they may think their tastes. So if I notice a user who hates country music but loves certain alternative bands, I will group them with other users who fit the same profile: if I find that a lot of people in that group are really enjoying this one reggae piece, I may suggest it to the user, even if the user didn’t normally listen to that genre of music.

If I were to create a personalized radio station that streamed audio to people and learned what music they did and did not like based on their clicking “next song” or not, I could start to stream to them only the music I would be relatively confident that they liked. This kind of one-to-one marketing could offer relatively unknown bands rapid popularity without spending a dollar on advertising: “automated word-of-mouth” is another way to perceive this technology. New music in this way gets rapidly exposed to those most amenable to it.

So what’s left for a record company? Independent “web labels” are signing non-exclusive contracts with artists left and right, giving free and widespread exposure to the Net audience. Without the need for expensive studio time, and in possession of their own distribution and sales mechanisms, artists will depend on labels much less than they do now.

I find it likely that in the future, there will be a plethora of small record labels, all serving up a vast number of artists, most of which can also be found on other sites. Consumers will go the site they are most comfortable with and browse music, usually taking a peek to see what their “music advisor” has suggested for the week. People could just leave the “web radio” playing in the background, occasionally clicking to skip a song they don’t like or to buy a song that they did like. (Buying a song here means purchasing the right to play a given song whenever you wish for the rest of your life.)

The large record labels will continue to exist in the future, but they will not thrive unless they wholeheartedly embrace online audio distribution, can provide services truly above and beyond the “web labels,” and can continue to exclusively attract top musical talent to sell to consumers.

Page 14: The MP3 Book

The Guts of Music Technology

In this section and the ones following, things are going to get increasingly technical. I’m going to start off pretty simple and slowly ramp up to some considerably involved topics, so please feel free to skip the parts that you already know to get to the juicy stuff. It’s possible that you may find some parts overwhelming. Don’t worry yourself too much about it; just feel free to simply skim. To make this easy for you, I’ve bolded the key definitions throughout the text. And if you get bored? Just go to the next chapter. Nobody’s quizzing you on this!

Digital Audio Basics

Computers work by passing small charges through aluminum trenches etched in silicon and shoving these charges through various gates: IF this charge is here AND that one is too THEN the chip will create a charge in another place. The computer does all of its computations in ones and zeroes. Integers, like -4, 15, 0, or 3, can be represented with combinations of ones and zeroes in an arithmetic system called binary. Humans normally use a “decimal” system with ten symbols per space: we count 1, 2, 3,8, 9, 10, 11, and so on. In the binary system there are only two symbols per space: one counts 1, 10, 11, 100, 101, 110, 111, 1000, etc.!

If the computer is to understand how to store music, music must be represented as a series of ones and zeroes. How can we do this? Well, one thing to keep in mind throughout all of this discussion is that we’re going to be focusing on making music for humans to hear. While that may sound trite, that will allow us to “cheat” and throw out the parts of the music the people can’t hear: a dog might not be able to appreciate Mozart as much after we’re done with things, but if it sounds just the same to an average Jane, then we’ve accomplished our true mission - to have realistic music come from a computer!

We first need to understand what sound is. When you hear a sound, like a train whistle or your favorite hip-hop artist, your eardrum is getting squished in and out by air. Speakers, whistles, voices, and anything else that makes sound repeatedly squishes air and then doesn’t. When the sound gets to your ear, it pushes your eardrum in and out. If the air gets squished in and out at a constant rate, like 440 times a second, you’ll hear a constant tone, like when someone whistles a single note. The faster the air gets squished in and out, the higher tone you hear; likewise, the low bass tones of a drum squish the air in and out very slowly, about 50 times a second. Engineers use the measurement Hertz, abbreviated Hz, to mean “number of times per second” and kilohertz, or kHz, to mean “thousands of times per second.” Some people with very good hearing can

Page 15: The MP3 Book

hear sounds as low as 20Hz and as high as 20kHz. Also, the more violently the air is compressed and decompressed, the louder the signal is.

Now we can understand what a microphone does. A microphone consists of a thin diaphragm that acts a lot like your eardrum: as music is being played, the diaphragm of the microphone gets pushed in and out. The more pushed in the diaphragm is, the more electrical charge the microphone sends back to the device into which you’ve plugged your mic. What if you plug the mic into your computer?

The computer is good at dealing with discrete numbers, also known as digital information, but the amount that the microphone is being compressed is always changing; it is analog information. There is a small piece of hardware in a computer that allows it to record music from a microphone: it is a called an Analog to Digital Converter, or ADC for short. It is impossible for us to record a smooth signal as ones and zeroes and reproduce it perfectly on a computer. The ADC does not attempt to perfectly record the signal. Instead, several thousand times a second it takes a peek at how squished in the microphone is. The rate at which I check on the microphone is called the sampling rate. If the microphone is 100% squished in, we’ll give it the number 64,000. If the microphone is not squished in at all, we’ll give it a 0, and we’ll assign it a number correspondingly for in-between values: halfway squished in would merit a 32,000. We call these values samples.

The Nyquist Theorem says that as long as our sampling rate is twice the frequency of highest tone we want to record, we’ll be able to accurately reproduce the tone. Since humans can’t hear anything higher than 22kHz, if we take sample the microphone 44,000 times a second, we’ll be able to reproduce the highest tones that people can hear. In fact, CDs sample at 44.1kHz and, as suggested above, store the amount the microphone was squished as a number between 0 and 65,536, using 16 ones and zeros, or bits, for every sample. In this way, we’d say that CDs have a sample resolution of 16 bits.

All of this data ends up taking a great deal of space: if we sample a left and a right channel for stereo sound at 44.1kHz, using 16 bits for every sample, that’s 1.4 million bits for every second of music! On a 28.8 modem, it would take you over 50 seconds to transmit a single second of uncompressed music to a friend! We clearly need a way to use fewer bits to transmit the music.

Those of you comfortable with computers may suggest we use a compression program like WinZIP or StuffitDeluxe to reduce the size of these music files. Unfortunately, this does not work very well. These compression programs were designed largely with text in mind. These programs were also designed to

Page 16: The MP3 Book

perfectly reproduce every bit: if you compress a document to put it on a floppy, it had better not be missing anything when you decompress it on a friend’s machine! Compression algorithms work best when they know what they are compressing. Specialized algorithms can squish down video to an 100th of its original size, and people routinely use the GIF and JPEG compression formats to reduce the size of pictures on the web. These formats are lossy; that is to say, they destroy some data. If you scan in a beautifully detailed picture and squish it down to a small GIF file, you will see that there are noticeable differences between the original and the compressed versions, but in general it is throwing away the information that is less important for your eye to see to understand what the picture is about.

In the same way, we will get much better compression of sound if we use and algorithm that understands the way that people hear and destroys the parts of the sound that we cannot perceive. Already, we have done this in a small way by ignoring any sounds above 22kHz. We might have done things differently if we were making an audio system for a dog or a whale; we have already exploited some knowledge of the human ear to our advantage, now it comes time for us to further use this knowledge to compress the sound.

Understanding Fourier

In order to compress the sound, we need to understand what parts are okay to throw away; that is to say, what the least important parts of the sound are. That way, we can keep the most important parts of the sound so we can stream them live through, say, a 28.8k modem.

Now as it turns out, sound is very tonal. This means that sounds tend to maintain their pitch for periods of time: a trumpet will play a note for half-second; a piano will sound a chord, etc. If I were to whistle an ‘A’ for second, your eardrum may be wiggling in and out very quickly, but the tone stays constant. While recording the “wiggling” of the signal going in and out would take a great deal of numbers to describe, in this case it would be much simpler to simply record the tone and how long it went for, i.e., “440Hz (that’s A!) for 1.0 seconds.” In this way, I’ve replaced hundreds of thousands of numbers with two numbers.

While clearly most signals are not so compressible, the concept applies: sound pressure, or the amount that your eardrum is compressed, changes very rapidly (tens of thousands of times a second), while frequency information, or the tones that are present in a piece of music, tend not to change very frequently (32 notes per second is pretty fast for a pianist!). If we only had a way to look at sound in the frequency domain, we could probably get excellent compression.

Page 17: The MP3 Book

Luckily for us, J. B. Joseph Fourier, a 19th century mathematician, came up with a nifty way for transforming a chunk of samples into their respective frequencies. While describing the method in detail has occupied many graduate-level electrical engineering books, the concept is straightforward: if I take a small chunk of audio samples from the microphone as you are whistling, I take the discrete numbers that describing the microphone’s state and run it through a Discrete Fourier Transform, also known as a DFT. What I get out is a set of numbers that describe what frequencies are present in the signal and how strong they are, i.e., “There is a very loud tone playing an A# and there is a quiet G flat, too.“ I call the chunk of samples that I feed the DFT my input window.

There is an interesting tradeoff here: if I take a long input window, meaning I record a long chunk of audio from the microphone and run it all through the DFT at once, I’ll be able to pick out what tone a user was whistling with great precision. And, just like with people, if I only let the computer hear a sound for a short moment, it will have poor frequency resolution, i.e., it will be difficult for it to tell what tone was whistled. Likewise, if I’m trying to nail down exactly when a user begins to whistle into a microphone if I take short windows, I’ll be able to pick out close to the exact time when they started to whistle; but if I take very long windows, the Fourier transform won’t tell me when a tone began, only how loud it is. I’d have trouble nailing down when it began and could be said to have poor time resolution. Frequency resolution and time resolution work against each other: the more you need to know exactly when a sound happened, the less you know what tone it is; the more exactly you need to know what frequencies are present in a signal, the less precisely you know the time at which those frequencies started or stopped.

As a real world example of where this is applicable, Microsoft’s MS Audio 4 codec uses very long windows. As a result, music encoded in that format is bright and captures properly the tone of music, but quick, sharp sounds like hand claps, hihats, or cymbals sound mushy and drawn out. These kinds of quick bursts of sound are called transients in the audio compression world. Later on, we’ll learn how MP3 deals with this. (AAC and AC-3 use similar techniques to MP3.)

In 1965, two programmers, J. Tukey and J. Cooley invented a way to perform Fourier transforms a lot faster than had been done before. They decided to call this algorithm the Fast Fourier Transform, or FFT. You will likely hear this term used quite a bit in compression literature to refer to the Fourier transform (the process of looking at what tones are present in a sound).

Page 18: The MP3 Book

The Biology of Hearing

Now that we understand how computers listen to sounds and how frequencies work, we can begin to understand how the human ear actually hears sound. So I’m going to take a bit of a “time out” from all of this talk about computer technology to explain some of the basics of ear biology.

As I mentioned before, when sound waves travel through the air, they cause the eardrum to vibrate, pushing in and out of the ear canal. The back of the eardrum is attached to an assembly of the three smallest bones in your body, known as the hammer, anvil, and stirrup. These three bones are pressed up against an oval section of a spiral fluid cavity in your inner ear shaped like a snail shell, known as the cochlea. (Cochlea is actually Latin for “snail shell!”) The vibrations from the bones pushing against the oval window of the cochlea cause hairs within the cochlea to vibrate. Depending on the frequency of the vibrations, different sets of hairs in the cochlea vibrate: high tones excite the hairs near the base of the cochlea, while low tones excite the hairs at the center of the cochlea. When the hairs vibrate, they send electrical signals to the brain; the brain then perceives these signals as sound.

The astute reader may notice that this means that the ear is itself performing a Fourier transform of sorts! The incoming signal (the vibrations of the air waves) is broken up into frequency components and transmitted to the brain. This means that thinking about sound in terms of frequency is not only useful because of the tonality of music, but also because it corresponds to how we actually perceive sound!

The sensitivity of the cochlear hairs is mind-boggling. The human ear can sense as little as a picowatt of energy per square foot of sound compression, but can take up to a full watt of energy before starting to feel pain. Visualize dropping a grain of sand on a huge sheet and being able to sense it. Now visualize dropping an entire beachful of sand onto the same sheet, without the sheet tearing and also being able to sense that. This absurdly large range of scales necessitated the creation of a new system of acoustic measurement, called the bel, named after the inventor of the telephone, Alexander Graham Bell. If one sound is a bel louder than another, it is ten times louder. If a sound is two bels louder than another, it is a hundred times louder than the first. If a sound is three bels louder than another, it is a thousand times louder. Get it? A bel corresponds roughly to however many digits there are after the first digit. A sound 100,000 times louder than another would mean there was 5 bels of difference. This system lets us deal

Page 19: The MP3 Book

with manageably small numbers that can represent very large numbers. Mathematicians call these logarithmic numbering systems.

People traditionally have used “tenths of bels,” or decibels (dB) to describe relative sound strengths. In this system, one sound that was 20dB louder than another would be 2 bels louder, which means it is actually 100 times louder than the other. People are comfortable with sounds that are a trillion times louder than the quietest sounds they can hear! This corresponds to 12 bels, or 120dB of difference. [INSERT CHART OF COMMON DECIBEL LEVELS HERE]

If a set of hairs is excited, it impairs the ability of nearby hairs to pickup detailed signals; we’ll cover this in the next section. It’s also worth noting that our brain groups these hairs into 25 frequency bands, called critical bands: this was discovered by (?Austrian?) acoustic researchers Zwicker, Flottorp, and Stevens in 1957. We’ll review critical bands a bit later on. Now, equipped with a basic knowledge of the functioning of the ear, we can tackle understanding the parts of a sound less important to the ear.

Psychoacoustic Masking

Your ear adapts to the sounds in the environment around you. If all is still and quiet, you can hear a twig snap hundreds of feet away. But when you’re at a concert with rock music blaring, it can be difficult to hear your friend, who is shouting right into your ear. This is called masking, because the louder sounds mask the quieter sounds. There are several different kinds of masking that occur in the human ear.

Normal Masking

Your ear obviously has certain inherent thresholds: you can’t hear a mosquito buzzing 5 miles away even in complete silence, even though, theoretically it might be possible to do it with sufficiently sensitive instrumentation. The human ear is also more sensitive to some frequencies than to others: our best hearing is around 4000Hz, unsurprisingly not too far from the frequency range of most speech.

If you were to plot a curve graphing the quietest tone a person can hear versus frequency, as is done to the right, it would look like a “U,” with a little downwards notch around 4000Hz. Interestingly enough, people who have listened to too much loud music have a lump in this curve at 4000Hz, where they should have a notch. This is why it’s hard to hear people talk right after a loud concert. Continued exposure to loud music will actually permanently damage your cochlear hair cells, and unlike the hair on your head, cochlear hairs never grow back.

Page 20: The MP3 Book

This curve, naturally, varies from person to person, and gets smaller the older the subject is, especially in the higher frequencies. Translation: old people usually have trouble hearing. Theoretically, this variance could be used to create custom compression for a given person’s hearing capability, but this would require a great deal of CPU horsepower for a server delivering 250 custom streams at once!

Tone Masking

Pure tones, like a steady whistle, mask out nearby tones: if I were to whistle a C very loudly and you were to whistle a C# very softly, an onlooker (or “on-listener,” really) would not be able to hear the C#. If, however, you were to whistle an octave or two above me, I might have a better chance of noticing it. The farther apart the two tones are, the less they mask each other. The louder a tone is, the more surrounding frequencies it masks out.

Noise Masking

Noise often encompasses a large number of frequencies. When you hear static on the radio, you’re hearing a whole slew of frequencies at once. Noise actually masks out sounds better than tones: it’s easier to whisper to someone at even a loud classical music concert than it is under a waterfall.

Critical Bands, Prioritization, and Quantization

As mentioned in our brief review of the biology of hearing, frequencies fall into one of 25 human psychoacoustic “critical bands.” This means that we can treat frequencies within a given band in a similar manner, allowing us to have a simpler mechanism for computing what parts of a sound are masked out.

So how do we use all of our newly acquired knowledge about masking to compress data? Well, we first grab a window of sound, usually about 1/100th of a second-worth, and we take a look at the frequencies present. Based on how strong the frequency components are, we compute what frequencies will mask out what other frequencies. We then assign a priority based on how much a given frequency pokes up above the masking threshold: a pure sine wave in quiet would receive nearly all of our attention, whereas with noise all of our attention would be spread around the entire signal. Giving more “attention” to a given frequency means allocating more bits to that frequency than others. In this way, I describe exactly how much energy is at that frequency with greater precision than for other frequencies.

How are the numbers encoded with different resolutions? That is to say, how can I use more bits to describe one number than another? The answer involves a touch of straightforward math. Do you remember scientific notation? Like 4.02

Page 21: The MP3 Book

x 1032? In this case, 4.02 is the mantissa and 32 is the scale factor. Since frequencies in the same critical band are treated similarly by our ear, we give them all the same scale factor and allocate a certain (fixed) number of bits to the mantissa of each. For example, let’s say I had the numbers 149.32, -13.29, and 0.12 – I’d set a scale factor of 4, since 104 = 100 and our largest number is 0.14932 x 103. In this way, I’m guaranteed that all of my mantissas will be between -1 and 1. I would encode the numbers above as 0.14932, -0.01329, and 0.00012 using a special algorithm known as fixed-point quantization.

Have you ever played the game where someone picks a number between 1 and 100 and you have to guess what it is, but are told if your guess is high or low? Everybody knows that the best way to play this game is to first guess 50, then 25 or 75 depending, etc., each time halving the possible numbers left. Fixed-point quantization works in a very similar fashion. The best way to describe it is to walk through the quantization of a number, like 0.65. Since we start off knowing the number is between -1 and 1, we should record a 0 if the number is greater than or equal to 0, and a 1 if it is less than 0. Our number is greater than zero, so we record 0: now we know the number is between 0 and 1, so we record a 0 if the number is greater than or equal to 0.5. Being greater, we record 0 again, narrowing the range to between 0.5 and 1. On the next step, we note that our number (0.742) is less than 0.75 and record a 1, bringing our total number to 001. You can here see how with each successive “less-than, greater-than” decision we record a one or a zero and come twice as close to the answer. The more decisions I am allowed, the more precisely I may know a number. We can use a lot of fixed-point quantization decisions on the frequencies that are most important to our ears and only a few on those that are less. In this way, we “spend” our bits wisely.

We can reconstruct a number by reversing the process: with 001, we first see that the number is between 0 and 1, then that it is between 0.5 and 1, and finally that it is between 0.5 and 0.75. Once we’re at the end, we’ll guess the number to be in the middle of the range of numbers we have left: 0.625 in this case. While we didn’t get it exactly right, our quantization error is only 0.025 – not bad for three ones and zeroes to match a number so closely! Naturally, the more ones and zeroes that are given, the smaller the quantization error.

The above technique roughly describes the MPEG Layer 2 codec (techie jargon for compression / decompression algorithm) and is the basis for more advanced codecs, like Layer 3, AAC, and AC-3, all of which incorporate their own extra tricks, like predicting what the audio is going to do in the next second based on the past second. At this point you understand the basic foundations of modern audio compression and are getting comfortable with the language used; it is time to move to a comprehensive review of modern audio codecs.

Page 22: The MP3 Book

Modern Audio Codecs

In this section, we’ll take a close look at modern audio codecs, including their history, how they work, how well they perform in certain conditions, and their best areas of application.

MPEG-1 Audio Layer 2 (MP2)

This format, like MP3, is known best by its file extension, MP2. MP2 is the audio format used on laserdisks. MP2 was designed for high-bitrate, high-quality applications. Despite its age (it was developed in the mid-80’s), many audiophiles consider MP2 the cleanest, highest-quality codec. At about 320kbps, MP2 is considered transparent. Source code for the algorithm is widely available and there exists many free or low-cost, high-quality MP2 encoders, players, and hardware implementations. To the best of my knowledge, no patents are enforced on the MP2 encoding or decoding algorithms and any company or organization is free to use it. MP2 does not perform well at lower bitrates, becoming very difficult to listen to at anything beneath 128kbps; as such, it is not recommended for streaming over telephone or ISDN lines. MP2 is a relatively straightfoward algorithm and runs quickly even on very old computers (e.g., 386-486s).

MPEG-1, 2, and 2.5 Audio Layer 3 (MP3)

MP3 is perhaps the most famous and well-known of any of these formats. As mentioned in the introduction, MP3 was invented in 1989 at Fraunhofer IIS Erlagen, in Germany. MP3 was designed for streaming applications at lower bitrates than MP2, and thus was optomized for performance at 128kbps. At 160-192kbps, MP3 audio becomes transparent, but many people are comfortable with near-CD quality 128kbps files. Fraunhofer IIS and Thompson, a French consumer electronics company, share the patents on both MP3 encoding and decoding. Thompson generally handles all MP3-related hardware licensing and Fraunhofer covers MP3 software encoder licensing. Even if you were to write your own MP3 encoder from scratch, you’d still have to get a license from Fraunhofer to sell it: licenses are reasonably priced, as evidenced by a number of low-cost MP3 encoding programs on the market. Fraunhofer decided not to enforce its software decoding patent, allowing companies like Nullsoft (acquired by America Online (AOL) in early 1999) to produce free MP3 players.

Xing Technologies, since acquired by RealNetworks in mid-1999, implemented a special MP3 compression algorithm that could guess how complex a piece of music was and change the encoding rate accordingly. Just as we learned in the last chapter how these algorithms can give more bits to the more important frequencies for a given chunk of time, this algorithm gives more bits to chunks

Page 23: The MP3 Book

of time that are more important. If a song was very quiet for a second, the encoder would only use a few bits: if the song became loud and intricate, the encoder would use more bits for each chunk of music. This technology is called Variable Bitrate Encoding (VBR). While this makes streaming difficult, since the server would be continuously changing the number of bits it was sending, it makes relatively small, high-quality files for storage on your hard drive. Studies have yet to show exactly how much space VBR saves over constant-rate encoding techniques.

MP3 audio technology starts to degrade below 56kbps, but because of its popularity, a number of software packages have been developed to stream MP3 files to modem users at 32, 24, and 16kbps. Of particular note is the Shoutcast streaming architecture, developed by Nullsoft, that allows a user to stream their MP3 collection song-by-song to a service that broadcasts the stream to hundreds of listeners. Jack Moffit, then attending [???] University, was peeved that a Linux version was not offered at the get-go and set out to write his own version, to which he would release the source code. Dubbing it “icecast,” it has since become extremely popular in the broadcasting community as developers have added new features and made the software robust. Jack’s bio and interview can be found in Chapter 9.

MP3 is considerably more complex than MP2. As such, a Pentium 90 or better is required for most MP3 players, although there are reports of extremely optimized MP3 players running on very fast 486s. Several custom Digital Signal Processor (DSP) chips exist for MP3 playback, the most notable of which is Micronas Intermetall’s MAS3501C [double check chip name].

Dolby Audio Coder 3 (AC-3)

Dolby Laboratories has been doing sound quality research for longer than almost anyone. Their noise-reduction and surround-sound circuitry can be found in nearly every stereo system in the world. Part of their research has targeted new forms of audio compression, specifically so that extremely high-quality audio could be encoded on the edge of a film reel to allow for rich sound in movies.

The third version of their compression algorithm, known as Audio Coder 3, or AC-3, allows for multichannel encoding. Most audio is either monaural (one sound signal) or stereo (left & right signals), but by using more speakers, it’s possible to surround yourself with sound. Movie theaters do this today to make you feel like monsters are sneaking up behind you. It’s also useful to have a separate audio channel just for a subwoofer to carry chest-thumping beats and explosions: since a subwoofer is only playing back the bass parts, it only needs a tenth of the information that a regular speaker does. AC-3 allocates a front

Page 24: The MP3 Book

channel for dialogue, left and right channels, left and right “surround” speakers, and a subwoofer channel, making it a 5.1 channel encoding scheme. Other schemes exist for doing 7.2, 10.2, and 12.2 channel schemes, but these are exclusively for high-end digital movie theaters. The vast majority of North American DVDs on the market today use 5.1 channel AC-3 to encode their soundtracks. In the early 90’s, RealNetworks, then Progressive Networks, was unsatisfied with the performance of the first two versions of RealAudio and came to Dolby for help. Dolby repackaged AC-3 for Internet streaming applications and dubbed it Dolbynet. RealAudio versions 3, 4, and 5 were subsequently centered on the Dolbynet codec and Real continued to dominate the audio streaming space. Not long after Real licensed Dolbynet for streaming, upstart Liquid Audio inked a deal with Dolby to give them exclusive rights over Internet use of high-bitrate Dolbynet, barring Real from entering into the hifi realm of digital downloading.

Advanced Audio Coding (AAC) / MPEG-4

After MPEG Layer 3 was developed, Fraunhofer continued to advance their compression research. As the complexity of the new techniques increased, it became clear that it would be nigh impossible to make the next-generation audio coder backwards compatible with the previous sound layers. As such, the prototype coder was codenamed NBC for “Not Backwards Compatible.” (Engineers are not, perhaps, hired for their creative naming talents.) NBC was perfected over several years and eventually finalized as Advanced Audio Coding, or AAC for short. Fraunhofer certainly did not work along on NBC & AAC: notable contributions to the coder were made by nearly every active party in digital acoustics! AT&T, Thompson, Lucent (separately from AT&T), Dolby, and academic researchers all pooled in their finest work. As a result, AAC developed into the most powerful codec currently in existance. Fraunhofer and AT&T continue to refine the psychoacoustic models and improve upon its quality.

Unfortunately, AAC is not widely available. This is largely because the record companies were not pleased at how easy Fraunhofer’s MP3 technology had made it to copy their music and had subsequently made several less-than-kindly visits to Germany along these lines. Few technology companies wished to place themselves in this sort of negative light and consequently the world’s best compression has stayed far, far away from the hands of consumers. AAC is used in AT&T’s a2bmusic site that serves to demonstrate AT&T’s proprietary music protection system that, it would seem, nobody else has adopted. Unsurprisingly, half of their key staff jumped ship mid-1999 to go join various companies that were actually interested in making a product that people would use.

Page 25: The MP3 Book

When designing MPEG-4, the MPEG committee adopted AAC for high-bandwidth transmissions and VQF (covered below) for low-bandwidth transmissions. [more info on mpeg4 from MIT sources later]

RealAudio G2

RealNetworks was not very pleased about having been locked out of using Dolby’s higher bitrates; additionally, they desired better compression and adaptive streaming (where a transmission can change bitrates on-the-fly to adjust to dynamic network conditions), features that Dolbynet was unable to provide. Consequently, Real spent a number of years developing their own robust streaming codec, known as G2. As of late 1999, Real has no plans to license this technology. Despite high per-stream server costs and a proprietary encoder/server/client trio, Real has maintained its market leadership in the streaming space and a very large percentage of computers on the Internet have RealAudio installed. RealAudio G2 is of roughly the same quality as MP3.

TwinVQ (VQF)

Like AT&T, Japan’s Nippon Telephone and Telegraph (NTT) has had a vested interest in finding new and effective ways to compress audio. If NTT, or AT&T for that matter, could squeeze a phone conversation down to take half as much information as it currently does, they could put twice as many phone calls over the same wires that they already have, saving NTT the cost of laying new cables. NTT has thus been vigorously pursuing research into new compression techniques.

NTT invented Transform-Weighted Interleaved Vector Quantization and dubbed it TwinVQ for short, also known as VQF, its file extension. Vector quantization works differently from the “scale-factor” technology described in Chapter 2. [continue description of vector quantization here]

NTT has been relatively restrictive in licensing TwinVQ: only Kobe Steel and Yamaha were licensed to use it as of late 1999. Both Yamaha and NTT have players and encoders for TwinVQ, although input plug-ins have been created for WinAMP and a few other media players. Despite this, TwinVQ has not seen extensive acceptance in the end-user community, possibly due to a lack of media exposure, restrictive licensing, and a lack of compatible software and hardware. TwinVQ is also considerably more CPU-intensive than MP3, but less so than QDesign’s codec (see below) or AAC.

Page 26: The MP3 Book

QDesign Music Codec, version 2

QDesign’s original codec was used in Apple’s third version of Quicktime; their new v2 codec (QDMC2) is included with Quicktime 4. Unfortunately, due to Apple’s restrictive licensing, no consumer playback applications exist for QDMC2 other than Quicktime 4. QDMC2 is extremely time-consuming to encode: on my Pentium 166, encoding a 30 second audio clip took over 8 minutes! Quicktime is also relatively slow-loading and a bit of a memory hog itself, causing the QDMC2 music experience to be less than optimal. Neither Quicktime 4 nor QDMC2 have seen extensive use in the audio-only arena, although Quicktime 4 video has become relatively popular.

Page 27: The MP3 Book

The New Pipeline

Digital production and Internet distribution have enabled a new pipleline for the music experience: an idea can quickly become a tune, recorded, and shared or sold. Artists can get a much bigger piece of the pie than ever possible with traditional distribution; consumers can get cheap access to a nearly infinite collection of music and can own music “permanently,” set up their own Internet radio stations, remix their favorite hits, and easily share music with their friends. The new pipeline shatters barriers and lets everyone participate in the music revolution: no longer are a handful of executives reponsible for the listening habits of a nation. The age of broadcast is dead.

“Wait just a second!” you may excalaim to yourself after that last sentence; why is broadcast dead? The answer…because we are moving back to narrowcast. Narrowcast is defined as “narrowly targeted broadcast.” A conversation with a person is narrowcasting, because you are delivering a message to a very small audience (of one!). Newspapers, television, radio, and most static websites (e.g., cnn.com or news.com) transmit a single message to many people and thus are broadcasting. But how often do you want to be reading the exact same information as the next guy? With obvious exceptions (national disasters, war, international events), most people just want to see the news that pertains to them. Relevant to the discussion at hand, most people want to listen to the music that they like: not some generic preselected junk.

People buy “pop” music because they know what they are getting. When a person walks into a record store, only the bands that are played on the radio incessantly are known entities: everything else could be weird, bad music; and maybe some decent percentage is. As such, a good percentage of the population would rather go ahead and buy something that they 80% like and know about than spend a great deal of time, money, and effort buying bad CDs to try and get something that they like 100%.

Page 28: The MP3 Book

DEW: Feb 8, 2012

The book ends here as it was never finished. If you’d like to see me finish the book or just want to say thank you, please email or PayPal me at [email protected].


Recommended