An Introduction to the “Thor-like” Power of Ogg Vorbis! Robert W. Ferguson III January 30, 2003.

An Introduction to the“Thor-like” Power of Ogg Vorbis!

Robert W. Ferguson IIIJanuary 30, 2003

Xiphophorus

Xiphophorus is a freshwater fish genus comprised of 23 species.

Since the 1920's its been known that one could make hybrids between the different species easily. In some cases, one simply had to place one Xiphophorus species next to another in an aquarium, and they would reproduce.

XIPH.COM

Xiphophorus is a non-profit organization responsible for the Ogg project.

Xiphophorus is GPL.All cool companies have an X to start

their name.

What Is Ogg Vorbis

The Ogg project is an open-source alternative to proprietary and patented codecs for digital media (for both audio and video).

The Vorbis project is responsible for the creation of a perceptual audio encoder similar to famous,inherently evil, proprietary codecs popularized by global, illegal file sharing.

It Is Not MP3

Vorbis is in the same category as MPEG-4 (AAC)

And similar to, but higher performance than MPEG-1/2 audio layer 3 MPEG-4 audio (TwinVQ) WMA - Windows Media Audio PAC

Classification

Vorbis I Vorbis I is a forward-adaptive monolithic

transform CODEC based on the Modified Discrete Cosine Transform.

The codec is structured to allow addition of a hybrid wavelet filter bank in Vorbis II to offer better transient response and reproduction using a transform better suited to localized time

Packets

Vorbis uses free-form packets that have no minimum size, maximum size, or fixed/expected size. Packets are designed that they may be truncated (or padded) and remain decodable.

Error Detection

Vorbis provides none of its own protection against errors.

It is solely a method of accepting input audio, dividing it into individual frames and compressing these frames into raw, unformatted 'packets'.

ATH – Absolute Threshold of Hearing

Most codecs assume volume is fixed during playback. Vobis assumes that volume can be adjusted.

Tone Masking

Tone masking is when louder frequencies mask out adjacent quieter ones.

Most codes use a psychoacoustics model to calculate what’s left as best as possible in given bit-rate limits.

Vorbis approximates the same thing using as many bits as it takes.

Coupling

Most sounds consist of many channels and have redundancy between these channels. This is exploited to lower the bit-rate if the channels are encoded in some joint representation.

The simplest example is to encode the average and the difference between channels (for a stereo sound) – this is called mid/side representation and it requires fewer bits for sections that are close to mono.

Channel Support

Vorbis supports up to 255 channels.At the moment the encoder knows to use

coupling for 2-channel files only, but eventually it will scale.

Vector Quantization

Vector Quantization (VQ) is a lossy data compression method where vectors are rounded off into encoding regions.

Basically if you group together numbers describing different channels, your channels become automatically coupled (normally a group would be picked from data describing a single channel, so channels would be approximated independently).

Vector Quantization…

The process of VQ introduces some vector quantization noise. The difference between the approximation (a limited number of these can be chosen) and the original group of numbers.

All codecs suffer from quantization problems. VQ should suffer less.

Memory Usage

The vector codebooks used in the first stage of decoding are packed, in their entirety into the Vorbis bit-stream headers.

In packed form, these codebooks occupy only a few kilobytes; The extent to which they are pre-decoded into a cache is the dominant factor in decoder memory usage.

Following the Standard

Any file that follows the decoding standard, regardless of encoding method follows the standard.

Headers

Identification Header The identification header identifies the bitstream as Vorbis, Vorbis

version, and the simple audio characteristics of the stream such as sample rate and number of channels.

Comment Header The comment header includes user text comments ["tags"] and a

vendor string for the application/library that produced the bitstream.

Setup Header The setup header includes extensive CODEC setup information as well

as the complete VQ and Huffman codebooks needed for decode.

Decoding Procedure

The decoding and synthesis procedure for all audio packets is fundamentally the same.

1. decode packet type flag 2. decode mode number 3. decode window shape [long

windows only] 4. decode floor

5. decode residue into residue vectors

6. inverse channel coupling of residue vectors

7. generate floor curve from decoded floor data

Decoding Procedure...

8. compute dot product of floor and residue, producing audio spectrum vector

9. inverse monolithic transform of audio spectrum vector, always an MDCT in Vorbis I

10. overlap/add left-hand output of transform with right-hand output of previous frame

11. store right hand-data from transform of current frame for future lapping.

12. if not first frame, return results of overlap/add as audio result of current frame

Rearrangement of the synthesis arithmetic is possible.

Controversy

The entire probability model of the codec, the Huffman and VQ codebooks, is packed into the bitstream header along with extensive CODEC setup parameters (often several hundred fields).

It’s impossible to embed a simple frame type flag in each audio packet, or begin decode at any frame in the stream without having previously fetched the codec setup header.

Vorbis can initiate decode at any arbitrary packet within a bitstream so long as the codec has been initialized/setup with the setup headers.

Window Shape Decode

Vorbis frames use one of two PCM sample sizes specified during codec setup. In Vorbis I, legal frame sizes are powers of two from 64 to 8192 samples. Aside from coupling, Vorbis handles channels as independent vectors and these frame sizes are in samples per channel.

Overlapping Windows

Vorbis uses an overlapping transform, namely the MDCT, to blend one frame into the next, avoiding most inter-frame block boundary artifacts. The MDCT output of one frame is windowed according to MDCT requirements, overlapped 50% with the output of the previous frame and added. The window shape assures seamless reconstruction.

Dealing with Windows

And slightly more complex in the case of overlapping unequal sized windows:

Inverse Monolithic Transform

The audio spectrum is converted back into time domain PCM audio via an inverse modified discrete cosine transform (MDCT). A detailed description of the MDCT is available in the paper The use of multirate filter banks for coding of high quality digital audio_, by T. Sporer, K. Brandenburg and B. Edler.

Date post:	02-Jan-2016
Category:	Documents
Upload:	rodger-harper
View:	214 times
Download:	1 times

An Introduction to the “Thor-like” Power of Ogg Vorbis! Robert W. Ferguson III January 30, 2003.

Documents