+ All Categories
Home > Documents > Data Compression Notes1

Data Compression Notes1

Date post: 06-Apr-2018
Category:
Upload: cbcbgm
View: 250 times
Download: 0 times
Share this document with a friend

of 32

Transcript
  • 8/2/2019 Data Compression Notes1

    1/32

    Eric Dubois

  • 8/2/2019 Data Compression Notes1

    2/32

    InformationSource

    Encodersignal binary stream

    Channel

    DecoderInformation

    Receiver binary streamsignal

  • 8/2/2019 Data Compression Notes1

    3/32

    InformationSource

    Encodersignal binary stream

    Channel

    DecoderInformation

    Receiver binary streamsignal

    aka data

  • 8/2/2019 Data Compression Notes1

    4/32

    InformationSource

    Encodersignal binary stream

    Channel

    DecoderInformation

    Receiver binary streamsignal

    aka dataerrormeasure

  • 8/2/2019 Data Compression Notes1

    5/32

    Speech Image

    Video

    Text file Music

    Radiograph

    Binary executable computer program

    Computer graphics primitives

    Weather radar map

  • 8/2/2019 Data Compression Notes1

    6/32

    Airwaves (EM radiation) Cable

    Telephone line

    Hard disk CD, DVD

    Flash memory device

    Optical path

    Internet

  • 8/2/2019 Data Compression Notes1

    7/32

    TV screen and viewer Audio system and listener

    Computer file

    Image printer and viewer Compute engine

  • 8/2/2019 Data Compression Notes1

    8/32

    No errors permitted (lossless coding) Numerical measures of error, e.g. mean-

    squared error (MSE), signal-to-noise ratio(SNR)

    Numerical measures of perceptual difference

    Mean opinion scores from human users

  • 8/2/2019 Data Compression Notes1

    9/32

    Data rate (bits per second) Transmission time (seconds)

    File size (bytes)

    Average number of bits per source symbol

  • 8/2/2019 Data Compression Notes1

    10/32

    There is usually a natural representation forthe source data at a given level of fidelity andsampling rate. Examples: 8 bits per character in ASCII data

    24 bits per RGB color pixel 16 bits audio signal sample

    This natural representation leads to a certainraw channel rate (which is generally too high).

    Compression involves reducing the channelrate for a given level of distortion (which maybe zero for lossless coding).

  • 8/2/2019 Data Compression Notes1

    11/32

    ratechannelcompressedratechannelrawrationcompressio =

    Example: HDTV, 1080IRaw channel rate: 1493 Mbit/s

    (1920*1080*30*24)

    Compressed channel rate: ~20 Mbit/s

    Compression ratio: ~75

  • 8/2/2019 Data Compression Notes1

    12/32

    Categories of sources continuous time or domain: x(t), x(h,v) discrete time or domain: x[n], x[m,n] continuous amplitude or value: xR discrete amplitude or value: x A = {a1, a2, aM}

    We will only consider discrete domain sources. Weassume that continuous domain signals can be sampledwith negligible loss. This is not considered in thiscourse.

    We will mainly concentrate on one-dimensional signalssuch as text, speech, audio, etc. Extensions to images

    are covered in ELG5378. A source signal is a sequence of values drawn from a

    source alphabetA:x[1], x[2], , x[n] A

  • 8/2/2019 Data Compression Notes1

    13/32

    A source coder transforms a source sequence into acoded sequence whose values are drawn from a codealphabet G: u[1], u[2], , u[i] G

    Normally G= {0,1}, and we will limit ourselves to thiscase.

    Note that the time indexes for the source sequence x[n]and the coded sequence u[i] do not correspond.

    The decoder must estimate the source signal on thebasis of the received coded sequence [i]. This may be

    different from u[i] if there are transmission errors. Wewill generally assume that there are no transmissionerrors.

  • 8/2/2019 Data Compression Notes1

    14/32

    Lossless coding: The source sequence has discretevalues, and these must be reproduced withouterror. Examples where this is required is text, data,executables, and some quantized signals such asX-rays.

    Lossy coding: The source sequence may be eithercontinuous or discrete valued. There exists adistortion criterion. The decoded sequence may bemathematically different from the source sequence,but the distortion should be kept sufficiently small.

    Examples are speech and images. Often aperceptual distortion criterion is desired. Lossless coding methods are often a component of

    a lossy coding system.

  • 8/2/2019 Data Compression Notes1

    15/32

    There are two variants of the compressionproblem

    1. For a given source and distortion measure,minimize the channel rate for a given levelof distortion D0 (which can be zero).

    2. For a given source and distortion measure,minimize the distortion (or maximize the

    quality) for a given channel rate R0.

  • 8/2/2019 Data Compression Notes1

    16/32

    R

    D

    In a coding system, there is typically atradeoff between rate and distortion

  • 8/2/2019 Data Compression Notes1

    17/32

    R

    D

    In a coding system, there is typically atradeoff between rate and distortion

    D0

  • 8/2/2019 Data Compression Notes1

    18/32

    R

    D

    In a coding system, there is typically atradeoff between rate and distortion

    R0

  • 8/2/2019 Data Compression Notes1

    19/32

    1. When there is statistical redundancy. For example, for a sequence of outcomes of a fair

    16-sided die, we need 4 bits to represent eachoutcome. No compression is possible.

    In English text, some letters occur far more oftenthan others. We can assign shorter codes to thecommon ones and longer codes to the uncommonones and achieve compression (e.g., Morse code).

  • 8/2/2019 Data Compression Notes1

    20/32

    There are many types of statisticalredundancy.

    For example, in English text, we are prettysure that the next letter after a Q will be a U,so we can exploit it.

    The key to successful compression will be toformulate models that capture the statistical

    redundancy in the source.

  • 8/2/2019 Data Compression Notes1

    21/32

    2. When there is irrelevancy. In many cases, the data is specified more

    precisely than it needs to be for the intendedpurpose.

    The data may be oversampled, or quantized morefinely than it needs to be, either everywhere, or insome parts of the signal.

    This particularly applies to data meant only forconsumption and not further processing.

  • 8/2/2019 Data Compression Notes1

    22/32

  • 8/2/2019 Data Compression Notes1

    23/32

    Change of representation Quantization (not for lossless coding)

    Binary code assignment

    All will depend on good models of the sourceand the receiver.

  • 8/2/2019 Data Compression Notes1

    24/32

  • 8/2/2019 Data Compression Notes1

    25/32

    Eric Dubois

    CBY A-512

    Tel: 562-5800 X 6400 [email protected] www.eecs.uottawa.ca/~edubois/courses/ELG5126

    http://www.eecs.uottawa.ca/~edubois/courses/ELG5126http://www.eecs.uottawa.ca/~edubois/courses/ELG5126
  • 8/2/2019 Data Compression Notes1

    26/32

    Textbook: K. Sayood, Introduction to DataCompression, third edition, Morgan KaufmannPublishers, 2006.

    http://www.sciencedirect.com/science/book/9780126208627http://www.sciencedirect.com/science/book/9780126208627http://www.sciencedirect.com/science/book/9780126208627http://www.sciencedirect.com/science/book/9780126208627
  • 8/2/2019 Data Compression Notes1

    27/32

    Basic probability and signal processing astypically obtained in an undergraduateElectrical Engineering program

    (e.g., at uOttawa, ELG3125 Signal and System Analysis,

    ELG3126 Random Signals and Systems

  • 8/2/2019 Data Compression Notes1

    28/32

    The objective of this course is to present thefundamental principles underlying data andwaveform compression.

    The course begins with the study of lossless

    compression of discrete sources. Thesetechniques are applicable to compression oftext, data, programs and any other type of information where no loss is tolerable. They alsoform an integral part of schemes for lossy

    compression of waveforms such as audio andvideo signals, which is the topic of the secondpart of the course.

  • 8/2/2019 Data Compression Notes1

    29/32

    The main goal of the course is to provide anunderstanding of the basic techniques and theoriesunderlying popular compression systems andstandards such as ZIP, FAX, MP3, JPEG, MPEG and

    so on, as well as the principles underlying futuresystems.

    Some of the applications will be addressed instudent projects.

  • 8/2/2019 Data Compression Notes1

    30/32

    Lossless coding: Discrete sources, binary codes,entropy, Huffman and related codes, Markovmodels, adaptive coding.

    Arithmetic coding: Principles, coding anddecoding techniques, implementation issues. Dictionary techniques: Principles, static

    dictionary, adaptive dictionary.

    Waveform coding: Distortion measures, rate-distortion theory and bounds, models.

  • 8/2/2019 Data Compression Notes1

    31/32

    Quantization: Formulation, performance, uniformand non-uniform quantizers, quantizeroptimization, vector quantization.

    Predictive coding: Prediction theory, differentialcoding (DPCM), adaptive coding.

    Transform and subband coding: Change of basis,block transforms and filter banks, bit allocationand quantization.

    Applications (student projects)

  • 8/2/2019 Data Compression Notes1

    32/32

    20% Assignments: Several assignments, to behanded in during class on the due-date specified.There will be a 5% penalty for each day late, and noassignment will be accepted after one week.

    30% Project: An individual project on an applicationof data compression involving some experimentalwork. A project report and presentation at the end ofthe course will be required. More details will followearly in the course.

    20% Midterm exam: Closed-book exam, 80 minutesin length.

    30% Final exam: Closed-book exam, 3 hours inlength, covering the whole course.


Recommended