+ All Categories
Home > Documents > A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon...

A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon...

Date post: 21-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
29
A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan November 2001
Transcript

A Hybrid Caching Strategy for Streaming Media Files

Jussara M. Almeida Derek L. Eager Mary K. Vernon

University of Wisconsin-MadisonUniversity of Saskatchewan

November 2001

Outline• Characteristics of Streaming Media (SM) files

• Delivery of SM files

• Hypothesis and Assumptions

• Previous Caching Policies

• New Policy Performance Comparison

• New Caching Policies

• Conclusions and Future Work

Characteristics of SM Files

• Large file size– cache on disk

• Sustained I/O bandwidth – inserting and reading new content

• Clients access partial files– initial portion– favored segment– base + variable number of layers of layered

encoding

Delivery of SM Files

• Unicast streaming:

– server bandwidth is linear in client request rate

– goal: maximize byte hit ratio

• Multicast streaming

– save bandwidth

– cost sharing introduces new tradeoffs

Multicast

0

5

10

15

20

1 10 100 1000Client Request Rate

Re

qu

ire

d S

erv

er

Ba

nd

wid

th

• example: 10 distributed proxy servers each serving a local region,

100 requests (on avg) arrive per region during a given popular video

need 7 streams per region, or 12 streams at the remote server

Caching for Multicast Streams: Tradeoffs

Caching for Multicast Streams: Tradeoffs

• caching popular content reduces the load on the remote server and network

• delivering popular content from the remote server amortizes the cost of a stream over more clients

• earlier portions of a popular video require more bandwidth and have less cost-sharing than later portions

New Caching Policies Research

• Hypothesis: popularity-based strategy will outperform replacement-based strategy

– significant fraction of requests to uncached files may be for files that are accessed very sporadically

• Assumptions:

– limited disk space implies limited disk bandwidth

– proxy bandwidth for delivering cached streams is equal to min of proxy disk bw and proxy network bw

(call this proxy disk bandwidth)

Current Web Caching Policies

• Replacement based (cache on each miss)

• Top replacement candidate is an ad-hoc combination of:– large files

– least recently access or lower access frequency

– miss penalty (server latency, bandwidth)

• Cache whole file or none

• Unicast

• Ignore limited disk bandwidth

• Interval Caching [DaSi93, KaRT95]

• Resource Based Caching (RBC) [TVDS98]

• Least Frequently Used (LFU)

• Block-based insertion and deletion [AcSm00]

• Popularity-based caching for layered encoding [RYHE00]

• Prefix and Segment Caching for smoothing [SeRT99,WZDS98]

Previous SM Caching Policies

Interval Caching

• Cache smallest intervals

• Target: memory caches (lots of insertions)

File f

0 T Time

S1S2

0 T

S1

S1S2

0 T

S1S2S3

0 T

• Cache entire files and intervals/runs

• Goal: efficiently utilize the limited resource – limited space: cache smallest space requirement– limited bandwidth: cache smallest write overhead

• Pre-allocate bandwidth to each cached entity

• Complex algorithm – Complex implementation – High time complexity

Resource Based Caching

RBC Algorithm

xixi

xi

WR

W

,,

,

xi

xixi

xi

S

WRR

,

,,

,

)(

Step 1: Selecting entity x {interval, run, file} of file i

1) If Ubw > Uspace +

Choose the entity with lowest

2) If Uspace > Ubw +

Choose the entity with minimum space requirement Si,x

3) If Uspace - < Ubw < Uspace +

Choose the entity with largest

Step 2: Caching decision for entity x

1) If enough unallocated space and unallocated bandwidth:

Cache entity x

2) If enough unallocated space but bandwidth constrained:

Use bandwidth goodness list to select candidates for eviction

3) If enough unallocated bandwidth but space constrained:

Use space goodness list to select candidates for eviction

4) If both bandwidth and space constrained:

Walk on both lists: at each step, remove entity from bandwidth goodness list or from space goodness list.

Step 3: Allocate space and bandwidth for entity x

Least Frequently Used

• Different implementation options:

– What to do when receive first access to an object?

– How to estimate frequency?

• Version studied: Currently Most Popular (CMP)

– Insert only most frequently accessed

(file or segment)

– On-line popularity estimate: future research

Previous comparison : RBC vs. CMP [TVDS98]

• Fixed file access frequencies

• RBC outperforms CMP for all parameter values studied

• Limited design space– e.g.: total cache size 16GB

• Inconsistent results

New Performance Comparison

• Re-evaluate byte hit ratio of CMP and RBC– Simulation with synthetic workload– Broad design space

• New Pooled RBC

• New simple hybrid CMP/interval caching (CMP/IC) policy

System Assumptions

• Arrivals: Poisson()– extra experiments with Pareto(,k)

• File access frequency: Zipf()

• Perfect File popularity

– extra experiments with approximate file popularity

• Uniform file size and delivery rate

– extra experiments with variable file size and delivery rate

• Load balanced across multiple disks

System Parameters

• n : number of files

: Zipf parameter

• N : arrival rate (avg. number of requests per avg. file duration T)

N = T

• C : cache size (fraction of media data accessed)

• B: normalized disk bandwidth

(fraction of the average number of simultaneous streams needed to deliver data that is cached by CMP)

• B depends on N, , n, C and disk technology

• Relative performance of policies depends mainly on B

• B = 1.0 : CMP system is bandwidth balanced

• B 1.0 : CMP system is bandwidth deficient

• B 1.0: CMP system is bandwidth abundant

System Parameters

• Ultrastar 72ZX disk : – disk space: 116.76 hours of MPEG-1 video (73.4GB)

– disk bandwidth: 108 MPEG-1 streams (22-37 MB/s )

• Assume: 100 requests / hour for cached files

• If cache contains 2-hour movies:– Need 200 streams

– B = 108/200 = 0.54

• If cache contains 30-minute TV shows:– Need 50 streams for cache content

– B = 108/50 = 2.16

Normalized Disk Bandwidth (B)Example

RBC vs. CMP

• CMP outperforms RBC if B 1.0• RBC slightly outperforms CMP if B 1.0 and small caches

0

0.2

0.4

0.6

0.8

1

0 0.1 0.25 0.4 0.6 0.8 1

Cache Size

Byt

e H

it R

atio RBC

CMP

CMP

RBC

B=0.75

B=1.0

0

0.2

0.4

0.6

0.8

1

0 0.1 0.25 0.4 0.6 0.8 1

Cache Size

Byt

e H

it R

atio

CMPRBC

B=2.0

N = 450, n= 100, =0

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 21 41 61 81

File

Fra

ctio

n C

ach

ed

B = 0.75

00.10.20.30.40.50.60.70.80.9

1

1 21 41 61 81

File

Fra

ctio

n C

ach

edB = 2.0

00.10.20.30.40.50.60.70.80.9

1

1 21 41 61 81

File

Fra

ctio

n C

ach

ed

B = 1.0

Files Cached by RBC

• Average fraction of each file cached by RBC (N = 450, n = 100, C=0.25)

00.20.40.60.8

1

0 0.2 0.4 0.6 0.8 1Cache Size

Util

izat

ion

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1Cache Size

Util

izat

ion CMP - BW Util.

RBC - BW Util.RBC - Space Util.RBC - Write BW

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1Cache Size

Util

izat

ion

B = 0.75 B = 2.0B = 1.0

Space and Bandwidth Utilization

Pooled RBC

• Three improvements over RBC

– simpler rule to select entity to cache

– can keep cached intervals when deleting a full file

– pool of pre-allocated bandwidth

• Similar complexity as RBC

Pooled RBC, RBC and LFU

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cache Size

Byt

e H

it R

atio

CMP / Pooled RBC

RBC

B=0.75CMP / Pooled RBC

RBC

B =1.0

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cache SizeB

yte

Hit

Rat

io

RBC / Pooled RBC

CMP

B = 2.0

• Pooled RBC CMP• BUT, Pooled RBC is much more complex than CMP

N = 450, n= 100, =0

Hybrid CMP/IC Policies

• Do interval caching on a separate (small) cache

– Interval Cache in Main Memory: CMP/ICmem and Pooled RBC/ICmem

– Interval Cache on Disk: CMP/ICdisk

• e.g. 5% of disk cache

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cache Size

Byt

e H

it R

atio

CMP/ICmemPooled RBC/ICmem

CMP/ICmem

Pooled RBC/ICmem

B = 1.0

B = 0.75

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cache Size

Byt

e H

it R

atio

Pooled RBC/ICmem

CMP/ICmem

B = 2.0

N = 450, n= 100, =0

CMP/ICmem vs. Pooled RBC/ICmem

• Memory cache improves CMP and Pooled RBC • B 1.0 : greater improvement for CMP

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cache Size

Byt

e H

it r

atio

CMP/ICdisk / CMP

Pooled RBC

B=0.75CMP/ICdisk / CMP

Pooled RBC

B=1.0

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

Cache SizeB

yte

Hit

Rat

io

CMP

CMP/ICdisk / Pooled RBC

B = 2.0

N = 450, n= 100, =0

CMP/ICdisk vs. Pooled RBC

• CMP/ICdisk Pooled RBC CMP

Conclusions

• Simple CMP

– simple to implement

– performance similar to Pooled RBC, CMP/ICdisk (static file popularities)

• Hybrid CMP/IC policy

– Performance Pooled RBC

– simple to implement

– possibly more robust (imperfect and dynamic popularity measures)

Future Work• Develop on-line estimate of file popularity

• Server log analysis– client behavior and workloads (NOSSDAV’01 paper)– More logs!!!!

• Caching Policies for Multicast Streams – popular file has greater cache-sharing if not cached– determine cache content that minimizes per-client cost– caching principles / on-line policy– (coming up soon)

• Prototype, experimental ( live ) workloads


Recommended