+ All Categories
Home > Documents > Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded...

Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded...

Date post: 15-Aug-2020
Category:
Upload: others
View: 11 times
Download: 0 times
Share this document with a friend
59
Gzip Compression Using Altera OpenCL Mohamed Abdelfattah (University of Toronto) Andrei Hagiescu Deshanand Singh
Transcript
Page 1: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Gzip Compression Using Altera OpenCL

Mohamed Abdelfattah (University of Toronto)

Andrei Hagiescu

Deshanand Singh

Page 2: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Gzip

Widely-used lossless compression program

Gzip = LZ77 + Huffman

Big data needs fast compression

Lower disk space in data centers

Less power on communication networks

2

Gigabyte-per-second

Page 3: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

LZ77 Compression Example

This sentence is an easy sentence to compress.

3

1. Scan file byte by byte

2. Look for matches

3. Replace with a reference to previous occurrence

Page 4: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

LZ77 Compression Example

4

This sentence is an easy sentence to compress.

1. Scan file byte by byte

2. Look for matches

3. Replace with a reference to previous occurrence

Page 5: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

LZ77 Compression Example

5

This sentence is an easy sentence to compress.

1. Scan file byte by byte

2. Look for matches

3. Replace with a reference to previous occurrence

Page 6: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

LZ77 Compression Example

6

This sentence is an easy sentence to compress.

1. Scan file byte by byte

2. Look for matches

3. Replace with a reference to previous occurrence

Page 7: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

LZ77 Compression Example

7

This sentence is an easy sentence to compress.

1. Scan file byte by byte

2. Look for matches

3. Replace with a reference to previous occurrence

Page 8: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

LZ77 Compression Example

8

This sentence is an easy sentence to compress.

1. Scan file byte by byte

2. Look for matches

3. Replace with a reference to previous occurrence

Page 9: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

This sentence is an easy sentence to compress.

LZ77 Compression Example

9

1. Scan file byte by byte

2. Look for matches

1. Match length

2. Match offset

3. Replace with a reference to previous occurrence

Page 10: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

This sentence is an easy sentence to compress.

LZ77 Compression Example

10

1. Scan file byte by byte

2. Look for matches

1. Match length = 2

2. Match offset

3. Replace with a reference to previous occurrence

Page 11: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

This sentence is an easy sentence to compress.

LZ77 Compression Example

11

1. Scan file byte by byte

2. Look for matches

1. Match length = 3

2. Match offset

3. Replace with a reference to previous occurrence

Page 12: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

This sentence is an easy sentence to compress.

LZ77 Compression Example

12

1. Scan file byte by byte

2. Look for matches

1. Match length = 8

2. Match offset

3. Replace with a reference to previous occurrence

Match offset = 20 bytes

Page 13: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

This sentence is an easy sentence to compress.

LZ77 Compression Example

13

1. Scan file byte by byte

2. Look for matches

1. Match length = 8

2. Match offset = 20

3. Replace with a reference to previous occurrence

Match offset = 20 bytes

Page 14: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

This sentence is an easy @(8,20) to compress.

LZ77 Compression Example

14

1. Scan file byte by byte

2. Look for matches

• Match length = 8

• Match offset = 20

3. Replace with a reference to previous occurrence

• Marker, length, offset

Page 15: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

This sentence is an easy sentence to compress.

This sentence is an easy @(8,20) to compress.

LZ77 Compression Example

15

1. Scan file byte by byte

2. Look for matches

• Match length = 8

• Match offset = 20

3. Replace with a reference to previous occurrence

• Marker, length, offset

Saved

5 bytes!

Page 16: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Altera OpenCL Compiler for FPGAs

16

void kernel simple(global int *input, int size, global int *output) { for(i=1..size) { int x = input[i]; int y = input[i+1]; int z = x + y; output[i] = z; } }

OpenCL Single-threaded Code

Host

CPU

FPGA Accelerator

PCIe

Altera’s OpenCL

Compiler

Load x Load y

Store z

DDRx Memory

//host code //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers …

Host Code Altera’s OpenCL

Compiler

Page 17: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Altera OpenCL Compiler for FPGAs

17

void kernel simple(global int *input, int size, global int *output) { for(i=1..size) { int x = input[i]; int y = input[i+1]; int z = x + y; output[i] = z; } }

OpenCL Single-threaded Code

Host

CPU

FPGA Accelerator

PCIe

Altera’s OpenCL

Compiler

Load x Load y

Store z

DDRx Memory

//host code //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers …

Host Code Altera’s OpenCL

Compiler

1

Page 18: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Altera OpenCL Compiler for FPGAs

18

void kernel simple(global int *input, int size, global int *output) { for(i=1..size) { int x = input[i]; int y = input[i+1]; int z = x + y; output[i] = z; } }

Host

CPU

FPGA Accelerator

PCIe

Altera’s OpenCL

Compiler

Load x Load y

Store z

DDRx Memory

//host code //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers …

Host Code Altera’s OpenCL

Compiler

2

1

OpenCL Single-threaded Code

Page 19: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Altera OpenCL Compiler for FPGAs

19

void kernel simple(global int *input, int size, global int *output) { for(i=1..size) { int x = input[i]; int y = input[i+1]; int z = x + y; output[i] = z; } }

Host

CPU

FPGA Accelerator

PCIe

Altera’s OpenCL

Compiler

Load x Load y

Store z

DDRx Memory

//host code //Enqueue buffer //Enqueue Kernel(s) //dequeue buffers …

Host Code Altera’s OpenCL

Compiler

3

2

1

OpenCL Single-threaded Code

Page 20: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

FPGAs can be VERY Custom

Host

CPU

FPGA Accelerator

PCIe

Load x Load y

Store z

DDRx Memory

IO Channels IO Channels

QDR?

RDL? Different

memory types

ARM Host on

FPGA chip

Page 21: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Implementation Overview

21

1. Shift In

New Data

2. Dictionary

Lookup/Update 3. Match Search

& Filtering

4. Write to

output

Page 22: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

1. Shift In New Data

22

Current Window Input from

DDR

memory

Page 23: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

1. Shift In New Data

23

Current Window

sample_text

e.g.

o l d _ t e x t

Cycle boundary

Page 24: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

1. Shift In New Data

24

Current Window

sample_text

e.g.

o l d _ t e x t

Cycle boundary

VEC = 4

Use text in our

example, but

can be anything

Page 25: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

1. Shift In New Data

25

Current Window

sample_text

e.g.

t e x t

Cycle boundary

Page 26: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

1. Shift In New Data

26

Current Window

le_text

e.g.

t e x t s a m p

Cycle boundary

Page 27: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Implementation Overview

27

1. Shift In

New Data

2. Dictionary

Lookup/Update 3. Match Search

& Filtering

4. Write to

output

Page 28: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

e x t s x t s a t s a m t e x t

2. Dictionary Lookup/Update

28

t e x t s a m p Current Window:

1. Compute hash

2. Look for match

in 4 dictionaries

3. Update dictionaries

Dictionary

0

Dictionary

1

Dictionary

2

Dictionary

3

Dictionaries buffer the text that

we have already processed, e.g.:

Page 29: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

2. Dictionary Lookup/Update

29

t e x t s a m p Current Window:

t e x t

e x t s

x t s a

t s a m

Dictionary

0

Dictionary

1

Dictionary

2

Dictionary

3

t a n _

t e x t

Hash

t e x l

t e e n

Page 30: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

2. Dictionary Lookup/Update

30

t e x t s a m p Current Window:

t e x t

e x t s

x t s a

t s a m

Dictionary

0

Dictionary

1

Dictionary

2

Dictionary

3

t a n _

t e x t

Hash

t e x l

t e e n

e a t e

e a r s

e e p s

e n t e

Page 31: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

2. Dictionary Lookup/Update

31

t e x t s a m p Current Window:

t e x t

e x t s

x t s a

t s a m

Dictionary

0

Dictionary

1

Dictionary

2

Dictionary

3

t a n _

t e x t

Hash

t e x l

t e e n

e a t e

e a r s

e e p s

e n t e

x a n t

x y l o

x e l y

x i r t

Page 32: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

2. Dictionary Lookup/Update

32

t e x t s a m p Current Window:

t e x t

e x t s

x t s a

t s a m

Dictionary

0

Dictionary

1

Dictionary

2

Dictionary

3

t a n _

t e x t Hash

t e x l

t e e n

e a t e

e a r s

e e p s

e n t e

x a n t

x y l o

x e l y

x i r t

t e e n

t e a l

t a n _

t a m e

Possile matches from

history (dictionaries)

Page 33: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

2. Dictionary Lookup/Update

33

Dictionary

0

Dictionary

1

Dictionary

2

Dictionary

3

t e x t s a m p Current Window:

t e x t

e x t s

x t s a

t s a m

Hash

Page 34: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

2. Dictionary Lookup/Update

34

W0

RD02

RD03

RD00

RD01

Dictionary

0

W1

RD12

RD13

RD10

RD11

Dictionary

1

W2

RD22

RD23

RD20

RD21

Dictionary

2

W3

RD32

RD33

RD30

RD31

Dictionary

3

t e x t s a m p Current Window:

Generate exactly the number of read/write

ports that we need and the width

t e x t

t a n _

t e x t

t e x l

t e e n

256 read ports, 16 write ports – 128 bits

Page 35: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Implementation Overview

35

1. Shift In

New Data

2. Dictionary

Lookup/Update 3. Match Search

& Filtering

4. Write to

output

Page 36: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

3. Match Search & Filtering

36

Current Windows:

t e x t

e x t s

x t s a

t s a m

t a n _ t e x t t e x l t e e n

e a t e e a r s e e p s e n t e

x a n t x y l o x e l y x i r t

t e e n t e a l t a n _ t a m e

Comparison Windows:

A set of candidate matches

for each incoming substring

The substrings

Compare current window against

each of its 4 compare windows

Page 37: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

3. Match Search & Filtering

37

Current Window:

t e x t

t a n _ t e x t t e x l t e e n

Comparison Windows:

1 4 3 2 Match Length:

Comparators

We have another 3 of those

Compare each byte

Page 38: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

3. Match Search & Filtering

38

Current Window:

t e x t

t a n _ t e x t t e x l t e e n

Comparison Windows:

1 4 3 2 Match Length:

Comparators

4

Match Reduction

Best Length:

Page 39: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

3. Match Search & Filtering

39

Page 40: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

3. Match Search & Filtering

40

Page 41: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

3. Match Search & Filtering

41

Page 42: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

3. Match Search & Filtering

42

Typical C-code

Fixed loop bounds – compiler can unroll loop

Page 43: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

3. Match Search & Filtering

One bestlength associated with each current_window

43

t e x t

e x t s

x t s a

t s a m

3

3

4

3

3

1

t e x t s a m p

Page 44: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

3. Match Search & Filtering

44

3

t e x t s a m p

Cycle boundary

1 3 4

Matches

0

1

2

4

0 1 2 3

Best lengths:

Select the best combination of matches from the set of candidate matches

1. Remove matches that are longer when encoded than original

2. From the remaining set; select the best ones

• (heuristic for bin-packing) last-fit

3. Compute “first valid position” for next step

Page 45: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

3. Match Search & Filtering

45

3

t e x t s a m p

Cycle boundary

1 3 4

Matches

0

1

2

4

0 1 2 3

Best lengths:

Too short

Last-fit

Overlap

Last-fit

Select the best combination of matches from the set of candidate matches

1. Remove matches that are longer when encoded than original

2. From the remaining set; select the best ones

• (heuristic for bin-packing) last-fit

3. Compute “first valid position” for next step

Page 46: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

3. Match Search & Filtering

46

3

t e x t s a m p

Cycle boundary

1 3 4

Matches

0

4

0 1 2 3

Best lengths:

Last-fit

1

2

Too short

Overlap

Last-fit

Select the best combination of matches from the set of candidate matches

1. Remove matches that are longer when encoded than original

2. From the remaining set; select the best ones

• (heuristic for bin-packing) last-fit

3. Compute “first valid position” for next step

Page 47: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

3. Match Search & Filtering

47

3

t e x t s a m p

Cycle boundary

1 3 4

Matches:

0 1 2 3

Select the best combination of matches from the set of candidate matches

1. Remove matches that are longer when encoded than original

2. From the remaining set; select the best ones

• (heuristic for bin-packing) last-fit

3. Compute “first valid position” for next step

Best lengths:

Last-fit

First Valid position

next cycle 0 1 2 3 3

Page 48: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Implementation Overview

53

1. Shift In

New Data

2. Dictionary

Lookup/Update 3. Match Search

& Filtering

4. Write to

output

Page 49: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

4. Writing to Output

Marker, length, offset Length is limited by VEC (=16 in our case) – fits in 4 bits

Offset is limited by 0x40000 (doesn’t make sense to be more) – fits in 21 bits

Use either 3 or 4 bytes for this: Offset < 2048

Offset = 2048 .. 262144

54

MARKER LENGTH OFFSET OFFSET

OFFSET OFFSET MARKER LENGTH OFFSET

Page 50: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Results

55 OFFSET OFFSET MARKER LENGTH OFFSET

Page 51: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Comparison against CPU/Verilog – Best Gzips out there!

56

Page 52: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Comparison against CPU/Verilog

57

• Best implementation of Gzip on CPU

• By Intel corporation

• On Intel Core i5 (32nm) processor

• 2013

• Compression Speed: 338 MB/s

• Compression ratio: 2.18X

Page 53: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Comparison against CPU/Verilog

58

• Best implementation on ASICs

• AHA products group

• Coming up Q2 2014

• Compression Speed: 2.5 GB/s

Page 54: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Comparison against CPU/Verilog

59

• Best implementation on FPGAs

• Verilog

• IBM Corporation

• Nov. 2013 ICCAD

• Altera Stratix-V A7

• Compression Speed: 3 GB/s

Page 55: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Comparison against CPU/Verilog

60

• OpenCL design example

• Altera Stratix-V A7

• Developed in 1 month

• Compression speed ?

• Compression Ratio ?

Page 56: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Comparison against CPU/Verilog

61

2.7 GB/s 3 GB/s

2.5 GB/s

0.3 GB/s

Page 57: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Comparison against CPU

62

Same compression ratio

12X better performance/Watt

Page 58: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Comparison against Verilog

63

12% more resources

Much lower design effort and design time

Days instead of months

10% Slower

Page 59: Gzip Compression Using Altera OpenCL · Altera’s OpenCL Compiler 3 2 1 OpenCL Single-threaded Code . FPGAs can be VERY Custom Host CPU FPGA Accelerator PCIe Load x Load y Store

Thank You Thank You


Recommended