+ All Categories
Home > Documents > STACS: Storage Access Coordination of Tertiary Storage for High Energy Physics Applications

STACS: Storage Access Coordination of Tertiary Storage for High Energy Physics Applications

Date post: 14-Jan-2016
Category:
Upload: gada
View: 52 times
Download: 1 times
Share this document with a friend
Description:
STACS: Storage Access Coordination of Tertiary Storage for High Energy Physics Applications Arie Shoshani, Alex Sim, John Wu, Luis Bernardo*, Henrik Nordberg*, Doron Rotem* Scientific Data Management Group Computing Science Directorate Lawrence Berkeley National Laboratory. - PowerPoint PPT Presentation
48
STACS TACS: Storage Access Coordination TACS: Storage Access Coordination of Tertiary Storage of Tertiary Storage or High Energy Physics Application or High Energy Physics Application Arie Shoshani, Alex Sim, John Wu, Arie Shoshani, Alex Sim, John Wu, Luis Bernardo*, Henrik Nordberg*, Luis Bernardo*, Henrik Nordberg*, Doron Rotem* Doron Rotem* Scientific Data Management Group Scientific Data Management Group Computing Science Directorate Computing Science Directorate Lawrence Berkeley National Laboratory Lawrence Berkeley National Laboratory * no longer at LBNL
Transcript
Page 1: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

STACS: Storage Access Coordination STACS: Storage Access Coordination of Tertiary Storage of Tertiary Storage

for High Energy Physics Applicationsfor High Energy Physics Applications

Arie Shoshani, Alex Sim, John Wu,Arie Shoshani, Alex Sim, John Wu,

Luis Bernardo*, Henrik Nordberg*,Luis Bernardo*, Henrik Nordberg*,

Doron Rotem*Doron Rotem*

Scientific Data Management GroupScientific Data Management GroupComputing Science DirectorateComputing Science Directorate

Lawrence Berkeley National LaboratoryLawrence Berkeley National Laboratory

* no longer at LBNL

Page 2: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Outline

• Short High Energy Physics overview (of data handling problem)

• Description of the Storage Coordination System• File tracking• The Query Estimator (QE)

— Details of the bit-sliced index• The Query Monitor

— coordination of “file bundles”• The Cache Manager

— tertiary storage queuing and tape coordination— transfer time for query estimation

Page 3: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Optimizing Storage Management forOptimizing Storage Management for High Energy Physics Applications High Energy Physics Applications

Collaboration# members/institutions

Date offirst data # events/year

total datavolume/year-TB

STAR 350/35 2000 107-10

8 300

PHENIX 350/35 2000 109 600

BABAR 300/30 1999 109 80

CLAS 200/40 1997 1010 300

ATLAS 1200/140 2004 109 2000

Data Volumes for planned HENP experiments

STAR: Solenoidal Tracker At RHICRHIC: Relativistic Heavy Ion Collider

Page 4: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Particle Detection Systems

STAR detector at RHIC Phenix at RHIC

Page 5: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Result of Particle Collision (event)

Page 6: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Typical Scientific Exploration Process

• Generate large amounts of raw data— large simulations— collect from experiments

• Post-processing of data— analyze data (find particles produced, tracks)— generate summary data

• e.g. momentum, no. of pions, transverse energy

• Number of properties is large (50-100)

• Analyze data— use summary data as guide— extract subsets from the large dataset

• Need to access events based on partialproperties specification (range queries)

• e.g. ((0.1 < AVpT < 0.2) ^ (10 < Np < 20)) v (N > 6000)

— apply analysis code

Page 7: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

• STAR experiment

— 108 events over 3 years

— 1-10 MB per event: reconstructed data

— events organized into 0.1 - 1 GB files

— 1015 total size

— 106 files, ~30,000 tapes (30 GB tapes)

• Access patterns

— Subsets of events are selected by region in high-dimensional property space for analysis

— 10,000 - 50,000 out of total of 108

— Data is randomly scattered all over the tapes

• Goal: Optimize access from tape systems

Size of Data and Access Patterns

Page 8: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

EXAMPLE OF EVENT PROPERTY VALUES

I event 1I N(1) 9965I N(2) 1192I N(3) 1704I Npip(1) 2443I Npip(2) 551I Npip(3) 426I Npim(1) 2480I Npim(2) 541I Npim(3) 382I Nkp(1) 229I Nkp(2) 30I Nkp(3) 50I Nkm(1) 209I Nkm(2) 23I Nkm(3) 32I Np(1) 255I Np(2) 34

I Np(3) 24I Npbar(1) 94I Npbar(2) 12I Npbar(3) 24I NSEC(1) 15607I NSEC(2) 1342I NSECpip(1) 638I NSECpip(2) 191I NSECpim(1) 728I NSECpim(2) 206I NSECkp(1) 3I NSECkp(2) 0I NSECkm(1) 0I NSECkm(2) 0I NSECp(1) 524I NSECp(2) 244I NSECpbar(1) 41I NSECpbar(2) 8

R AVpT(1) 0.325951R AVpT(2) 0.402098R AVpTpip(1) 0.300771R AVpTpip(2) 0.379093R AVpTpim(1) 0.298997R AVpTpim(2) 0.375859R AVpTkp(1) 0.421875R AVpTkp(2) 0.564385R AVpTkm(1) 0.435554R AVpTkm(2) 0.663398R AVpTp(1) 0.651253R AVpTp(2) 0.777526R AVpTpbar(1) 0.399824R AVpTpbar(2) 0.690237I NHIGHpT(1) 205I NHIGHpT(2) 7I NHIGHpT(3) 1I NHIGHpT(4) 0I NHIGHpT(5) 0

54 Properties, as many as 108 events

Page 9: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

• Prevent / eliminate unwanted queries=> query estimation (fast estimation index)

• Read only events qualified for a query from a file (avoid reading irrelevant events)=> exact index over all properties

• Share files brought into cache by multiple queries=> look ahead for files needed and cache management

• Read files from same tape when possible=> coordinating file access from tape

Opportunities for optimization

Page 10: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

The Storage Access Coordination System (STACS)

QueryEstimator

(QE)

CacheManager

(CM)

QueryMonitor

(QM)

Query estimation /execution requests

file cachingrequest

CachingPolicy

Module

FileCatalog

(FC)

Bit-Slicedindex

file purging

Disk Cache

file caching

User’sApplication

open,read,close

Page 11: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

A typical SQL-like Query

SELECT *FROM star_datasetWHERE 500<total_tracks<1000 & energy<3

-- The index will generate a set of files {F6 : E4,E17,E44, F13 : E6,E8,E32, …, F1036 : E503,E3112} that the query needs

-- The files can be returned to the application in any order

Page 12: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

File Tracking (1)

Page 13: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

File Tracking (2)

Page 14: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

File Tracking

query1start

query2start

query3start

All 3queries

Page 15: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

8 file info

Typical Processing Flow

Query Estimator

Event Iterators Query Monitor PolicyModule

Cache Manager

5

6

3 execute

4 request

whichFileToCache

FileIDToCache

7stage

11 staged

12 retrieve

14 release

15 purge

17 purged

Index

FileCatalog

Query Object

User Code

Local Disk

1 new QueryQuick Estimate

2 ExecuteFull Estimate

18 done

9FileCaching Request

10File

Caching

16 purge

13

STACS

Page 16: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

The Storage Access Coordination System (STACS)

QueryEstimator

(QE)

CacheManager

(CM)

QueryMonitor

(QM)

Query estimation /execution requests

file cachingrequest

CachingPolicy

Module

FileCatalog

(FC)

Bit-Slicedindex

file purging

Disk Cache

file caching

User’sApplication

open,read,close

Page 17: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Bit-Sliced Index:used by Query Estimator

• Index size— property space:

108 events x 100 properties x 4 bytes = 40 GB

— index requirements:• range queries (10 < Np < 20) ^ (0.1 < AVpT < .2)• number of properties involved is small: 3-5

• Problem— how to organize property space index

Page 18: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

indexing over all properties

• Multi-dimensional index methods— partitioning MD space (KD-trees, n-QUAD-trees, ...)— for high dimensionality - either fanout or tree depth

too large— e.g. symmetric n-QUAD-trees require 2100 fanout

— non-symmetric solutions are order dependent

QUAD-tree KD-tree

Page 19: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Partitioning property space

• One possible solution— partition property space into subsets— e.g. 7 dimensions at a time

• Performance— good for non-partial range queries (full hypercube)— bad if only few of the dimensions in each

partition are involved in query• S. Berchtold, C. Bohm, H. Kriegel, The Pyramid-Technique:

Towards Breaking the Curse of Dimensionality, SIGMOD 1998

— best for non-skewed (random) data— best for full hypercube queries— for partial range (e.g. 3 out 100) close to sequential

scan

Page 20: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Bit-Sliced Index

Solution: Take advantage that index need to be is append only• partition each property into bins

— (e.g. for 0<Np<300, have 20 equal size bins) • for each bin generate a bit vector• compress each bit vector (run length encoding)

000000000000000

000010001000000

000001110111011

101100000000000

010000000000000

000000000000100

000000000000000

property 1

000001110111011

101100000000000

010000001000000

000000000000100

000000000000000

property 2

000000000000000

000000001000000

000001110111011

101100000000000

010000000000000

000000000000100

000000000000000

property n

000010000000000

. . .

Page 21: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Run Length Compression

Uncompressed:0000000000001111000000000 ......0000001000000001111111100000000 .... 000000

Compressed:12, 4, 1000,1,8,1000

Store very short sequences as-is

Advantage:

Can perform: AND, OR, COUNT operations on compressed data

Page 22: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Bit-Sliced Index

000000000000000

000000001000000

000001110111011

101100000000000

010000000000000

000000000000100

000000000000000

000010000000000

AdvantagesAdvantages:• space for index very small - can fit in memory• Need only touch properties involved in queries (vertical

partitioning)• Need only touch bins involved

000000000000000

000000001000000

000001110111011

101100000000000

010000000000000

000000000000100

000000000000000

000010000000000

Query Estimationin memory only !!

min-max

Page 23: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Inner Bins vs. Edge Bins

.. ... ... ... ... ... .

.. ... ... ... ... ... ... ... ... ... .

.. ... .

Range(x)R

ange

(y)

Edge binEdge bin

Page 24: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Bit-Sliced index

(in memory)

eventsin bin1

.

.

.

Verticalpartitions (on disk)

propertiesrangeconditions

file list,and eventsthat qualifyin each

~ 20-40 GB~ 50-100 MB

Eventsin edge

bin

000001110111011

101100000000000

010000000000000

000000000000100

000000000000000

Eventlist

eventsin bin2

edgebin

edgebin

Page 25: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Experimental Results on Index

• Simulated dataset (hijing)— 10 million events— 70 properties

• property space— BSI: 2.5 GB— Oracle: 3.5 GB

• index size— BSI: 280 MB (4 MB/property)— Oracle: 7 GB (100 MB/property)

• index creation time— BSI: 3 hours (~2.5 min / property)— Oracle: 47 hours (~40 min / property)

Page 26: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Experimental Results on Index

• Run a “count” query (preliminary)— BSI:

• 1 property: 14 - 70 sec (depending on size of range)• 2 properties: 90 sec (both about half the range)• => linear with # of bins touched

— Oracle:• 1 property: comparable (counts only)• 2 properties: > 2 hours !

— use one index, loop on table

• => Need to tune Oracle— run “analyze” on indexes, choose policy

— bitmap index - did not help

• => After tuning: 12 Min

Page 27: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

The Storage Access Coordination System (STACS)

QueryEstimator

(QE)

CacheManager

(CM)

QueryMonitor

(QM)

Query estimation /execution requests

file cachingrequest

CachingPolicy

Module

FileCatalog

(FC)

Bit-Slicedindex

file purging

Disk Cache

file caching

User’sApplication

open,read,close

Page 28: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

File Bundles:Multiple Event Components

e1

e2

e3

e4

e5

e6

e7

e8

e9

Files ofComponent A

Files ofComponent B

Component Aof event e1

Component Bof event e1

File 4File 3

File 1 File 2

File Bundles: (F1,F2: e1,e2,e3,e5), (F3,F2: e4,e7), (F3,F4: e6,e8,e9)

Page 29: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

A typical SQL-like Queryfor Multiple Components

SELECT Vertices, RawFROM star_datasetWHERE 500<total_tracks<1000 & energy<3

-- The index will generate a set of bundles {[F7, F16: E4,E17,E44], [F13, F16: E6,E8,E32], …}

that the query needs

-- The bundles can be returned to the application in any order

-- Bundle: the set of files that need to be in cache at the same time

Page 30: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

File Weight Policy for ManagingFile Bundles

• File weight (bundle) = 1 if it appears in a bundle, = 0 otherwise

• Initial file weight = SUM (all bundles for each query) over all queries

• Example:— query 1: file FK appears is 5 bundles

— query 2: file FK appears is 3 bundles

Then, IFW (FK) = 8

queriesinbundlesall

jbundleinisFifotherwiseij

queriesall

i j

kkk FWFIFW 1

0{)()(

Page 31: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

File Weight Policy for ManagingFile Bundles (cont’d)

•Dynamic file weight: the file weight for a file in a bundle that was processed is decremented by 1

• Dynamic Bundle Weight

queriesin

bundlesprocessed

jbundleinisFifotherwiseij

queriesall

i j

kkkk FWFIFWFDFW 1

0{)()()(

ibundlein

filesall

kkFDFWBiDBW )()(

Page 32: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

How file weights are usedfor caching and purging

• Bundle caching policy— For each query, in turn, cache the bundle

with the most files in cache— In case of a tie, select the bundle with the

highest weight— Ensures that a bundle that include files

needed by other bundles/queries have priority• File purging policy

— No file purging occurs till space is needed— Purge file not in use with smallest weight— Ensures that files needed in other bundles

stay in cache

Page 33: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Other policies

• Pre-fetching policy— queries can request pre-fetching of bundles

subject to a limit— Currently, limit set to two bundles— multiple pre-fetching useful for parallel

processing• Query service policy

— queries serviced in Round Robin fashion— queries that have all their bundles cached

and are still processing are skipped

Page 34: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Managing the queues

.

.

....

.

.

.

.

.

.

.

.

.

QueryQueue

BundleSet

FileSet

FilesBeing

Processed

Filesin Cache

Page 35: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

File Tracking of Bundles

Query 1starts here

Query 2starts here

Bundle wasfound in cache

Bundle sharedby two queries

Bundle (3 files)formed, then

passed to query

Page 36: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Summary

• The key to managing bundle caching and purging policies is weight assignment— caching - based on bundle weight— purging - based on file weight

• Other file weight policies are possible— e.g. based on bundle size— e.g. based on tape sharing

• Proving which policy is best - a hard problem— can test in real system - expensive, need stand alone— simulation - too many parameters in query profile can

vary: processing time, inter-arrival time, number of drive, size of cache, etc.

— model with a system of queues - hard to model policies— we are working on last two methods

Page 37: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

The Storage Access Coordination System (STACS)

QueryEstimator

(QE)

CacheManager

(CM)

QueryMonitor

(QM)

Query estimation /execution requests

file cachingrequest

CachingPolicy

Module

FileCatalog

(FC)

Bit-Slicedindex

file purging

Disk Cache

file caching

User’sApplication

open,read,close

Page 38: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Queuing File Transfers

• Number of PFTPs to HPSS are limited— limit set by a parameter - NoPFTP— parameter can be changed dynamically

• CM is multi-threaded— issues and monitors multiple PFTPs in parallel

• All requests beyond PFTP limit are queued• File Catalog used to provide for each file

— HPSS path/file_name— Disk cache path/file_name— File size— tape ID

Page 39: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

File Queue Management

• Goal— minimize tape mounts— still respect the order of requests— do not postpone unpopular tapes forever

• File clustering parameter - FCP

— If the file at top of queue is in Tapei

and FCP > 1 (e.g. 5) then up to 4 files from Tapei will be selected to be transferred next

— then, go back to file at top of queue• Parameter can be set dynamically F(Ti)

F(Ti)

F(Ti)

F(Ti)

1

2

3

4

5

Page 40: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

File Queue Management

• Goal— minimize tape mounts— still respect the order of requests— do not postpone unpopular tapes forever

• File clustering parameter - FCP

— If the file at top of queue is in Tapei

and FCP > 1 (e.g. 4) then up to 4 files from Tapei will be selected to be transferred next

— then, go back to file at top of queue• Parameter can be set dynamically

F1(Ti)

F3(Ti)

F2(Ti)

F4(Ti)

1

2

3

4

5

Orderof fileservice

Page 41: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

File Caching Order fordifferent File Clustering Parameters

File Clustering Parameter = 1 File Clustering Parameter = 10

Page 42: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Transfer Rate (Tr) Estimates

• Need Tr to estimate total time of a query• Tr is average over recent file transfers from the

time PFTP request is made to the time transfer completes. This includes:— mount time, seek time, read to HPSS Raid,

transfer to local cache over network• For dynamic network speed estimate

— check total bytes for all file being transferredover small intervals (e.g. 15 sec)

— calculate moving average over n intervals(e.g. 10 intervals)

• Using this, actual time in HPSS can be estimated

Page 43: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Dynamic Display of Various Measurements

Page 44: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Query Estimate

• Given: transfer rate Tr.• Given a query for which:

— X files are in cache— Y files are in the queue— Z files are not scheduled yet

• Let s(file_set) be the total byte size of all files in file_set

• If Z = 0, then— QuEst = s(Y’)/Tr

• If Z = 0, then— QuEst = (s(T)+ q.s(Z))/Tr

where q is the number of active queries

F1(Y)

F3(Y)

F2(Y)

F4(Y)

T

Page 45: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Reason for q.s(Z)

20 Queries of length ~20 minuteslaunched 20 minutes apart

20 Queries of length ~20 minuteslaunched 5 minutes apart

Estimate pretty close Estimate bad - requestaccumulate in queue

Page 46: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Error Handling

• 5 generic errors— file not found

• return error to caller

— limit PFTP reached• can’t login• re-queue request, try later (1-2 min)

— HPSS error (I/O, device busy)• remove part of file from cache, re-queue• try n times (e.g. 3), then return error

“transfer_failed”

— HPSS down• re-queue request, try repeatedly till successful• respond to File_status request with “HPSS_down”

Page 47: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Summary

• HPSS Hierarchical Resource Manager (HRM)— insulates applications from transient

HPSS and network errors— limits concurrent PFTPs to HPSS— manages queue to minimize tape mounts— provides file/query time estimates— handles errors in a generic way

• Same API can be used for any MSS, suchas Unitree, Enstore, etc.

Page 48: STACS: Storage Access Coordination  of Tertiary Storage  for High Energy Physics Applications

STACS

Web pointers

• http://gizmo.lbl.gov/stacs• http://gizmo.lbl.gov/~arie/download.papers.html

-- to download papers• http://gizmo.lbl.gov/stacs/stacs.slides/index.htm

-- a STACS presentation


Recommended