Tactical Storage:Tactical Storage:Simple, Secure, and SemanticSimple, Secure, and Semantic
Access to Remote DataAccess to Remote Data
Prof. Douglas ThainProf. Douglas Thain
University of Notre DameUniversity of Notre Dame
http://www.cse.nd.edu/~dthainhttp://www.cse.nd.edu/~dthain
As of 25 April 2006...As of 25 April 2006...
Condor Worldwide:Condor Worldwide:– 56,682 CPUs / ??? TB / 1758 sites56,682 CPUs / ??? TB / 1758 sites
TeragridTeragrid– 15,328 CPUs / 220 TB / 6 sites15,328 CPUs / 220 TB / 6 sites
Open Science GridOpen Science Grid– 21,156 CPUs / 83 TB / 61 sites21,156 CPUs / 83 TB / 61 sites
EGEE GridEGEE Grid– Lots???Lots???
http://www.cs.wisc.edu/condor/map
Plentiful Computing PowerPlentiful Computing Power
Complex Ecology of StorageComplex Ecology of Storage
Shared Filesystemshared
disk
shareddisk
privatedisk
privatedisk
privatedisk
privatedisk
HTTP, FTP, RFIO, gLite,SRB, SCP, RSYNC, HTTP...
Independent Cluster Disks
Problems Accessing DataProblems Accessing DataLarge Burden on the UserLarge Burden on the User– User may not be able/willing to state files in advance.User may not be able/willing to state files in advance.– Different services/protocols available at different sites.Different services/protocols available at different sites.– Programs not modified to take advantage of services.Programs not modified to take advantage of services.
Different access modes for different purposes.Different access modes for different purposes.– File transfer: preparing system for intended use.File transfer: preparing system for intended use.– File system: access to data for running jobs.File system: access to data for running jobs.
Resources go unused.Resources go unused.– Disks on each node of a cluster.Disks on each node of a cluster.– Unorganized resources in a department/lab.Unorganized resources in a department/lab.– Would like to combine disks into larger structures.Would like to combine disks into larger structures.
A global file system can’t satisfy everyone!A global file system can’t satisfy everyone!– (Global means different things to different people.)(Global means different things to different people.)– Both a technical and social problem.Both a technical and social problem.
What’s the Problem?What’s the Problem?
We often assume that the site administrator is We often assume that the site administrator is responsible for making the site comfortable for responsible for making the site comfortable for the user. (Not possible on the grid!)the user. (Not possible on the grid!)
Rather, the user should be able to bring along a Rather, the user should be able to bring along a mechanism to access multiple independent mechanism to access multiple independent (remote?) data sources.(remote?) data sources.
Of course, we have to make it Of course, we have to make it easyeasy!!
Tactical Storage Systems (TSS)Tactical Storage Systems (TSS)
A TSS allows any node to serve as a file server A TSS allows any node to serve as a file server or as a file system client.or as a file system client.
All components can be deployed without special All components can be deployed without special privileges – but with security.privileges – but with security.
Users can build up complex structures.Users can build up complex structures.– Filesystems, databases, caches, ...Filesystems, databases, caches, ...– Admins need not know/care about larger structures.Admins need not know/care about larger structures.
Two Independent Concepts:Two Independent Concepts:– ResourcesResources – The raw storage to be used. – The raw storage to be used.– AbstractionsAbstractions – The organization of storage. – The organization of storage.
file transfer
filesystem
filesystem
filesystem
filesystem
filesystem
filesystem
filesystem
SimpleFilesystem
App
Distributed Database Abstraction
Parrot
App
Distributed Filesystem Abstraction
Parrot
App
Cluster administrator controlspolicy on all storage in cluster
UNIX UNIX UNIX UNIX UNIX UNIX UNIX
Workstations owners controlpolicy on each machine.
fileserver
fileserver
fileserver
fileserver
fileserver
fileserver
fileserver
UNIX UNIX UNIX UNIX UNIX UNIX UNIX
???Parrot
3PT
Key PropertiesKey Properties
Tactical Storage is Tactical Storage is SimpleSimple::– Appears as an ordinary filesystem.Appears as an ordinary filesystem.– Applies to unmodified applications and data w/out Applies to unmodified applications and data w/out
code changes, relinking, kernel modules, etc...code changes, relinking, kernel modules, etc...
Tactical Storage is Tactical Storage is SecureSecure::– Authentication with standard GSI or Kerberos.Authentication with standard GSI or Kerberos.– Rich distributed access control system.Rich distributed access control system.
Tactical Storage is Tactical Storage is SemanticSemantic::– Name data by meaning, not by location.Name data by meaning, not by location.– Supports external name resolution mechanisms.Supports external name resolution mechanisms.
Access Control in File ServersAccess Control in File Servers
Unix Security is not SufficientUnix Security is not Sufficient– No global user database possible/desirable.No global user database possible/desirable.– Mapping external credentials to Unix gets messy.Mapping external credentials to Unix gets messy.
Instead, Make External Names First-ClassInstead, Make External Names First-Class– Perform access control on remote, not local, names.Perform access control on remote, not local, names.– Types: Globus, Kerberos, Unix, Hostname, AddressTypes: Globus, Kerberos, Unix, Hostname, Address
Each directory has an ACL:Each directory has an ACL:globus:/O=NotreDame/CN=DThain RWLAglobus:/O=NotreDame/CN=DThain RWLA
kerberos:[email protected] RWLkerberos:[email protected] RWL
hostname:*.cs.nd.edu RLhostname:*.cs.nd.edu RL
address:192.168.1.* RWLAaddress:192.168.1.* RWLA
filesystem
filesystem
filesystem
filesystem
filesystem
filesystem
filesystem
UNIX UNIX UNIX UNIX UNIX UNIX UNIX
fileserver
fileserver
fileserver
fileserver
fileserver
fileserver
fileserver
PhysicsGroup
List
ChemistryGroup
List
Lab 5Group
List
App App
data
ACL:Lab 5 RW
Chemistry R
AppApp
data
ACL:Physics RW
Lab 5 R
Distributed Group ACLsDistributed Group ACLs
Semantic Data AccessSemantic Data Access
Appl
Parrot
/usr/local = /chirp/host5.nd.edu/software/tmp = /chirp/host9.nd.edu/scratch/data = /gsiftp/ftp.nd.edu/mydata/db = resolver:find_db
host5 host9 FTP
/usr/local /tmp/data
find_db
Where is /db/dir/523?
It’s at /ftp/ftp.infn.it/db/xz
Remote Database AccessRemote Database Access
script
Parrotfile
serverfile
system
DB data
libdb.so
sim.exe
WANSimple FS
HEP Simulation Needs Direct DB AccessHEP Simulation Needs Direct DB Access– App linked against Objectivity DB.App linked against Objectivity DB.– Objectivity accesses filesystem directly.Objectivity accesses filesystem directly.– How to distribute application How to distribute application securelysecurely??
Solution: Remote Root Mount via Parrot:Solution: Remote Root Mount via Parrot: parrot –M /=/chirp/fileserver/rootdirparrot –M /=/chirp/fileserver/rootdir
DB code can read/write/lock files directly.DB code can read/write/lock files directly.
GSI Auth
GSI
Credit: Sander Klous @ NIKHEF
Remote Application LoadingRemote Application Loading
appl
ParrotHTTPserver
filesystem
liba.so
libb.so
libc.so
Credit: Igor Sfiligoi @ Fermi National Lab
HTTP
Modular Simulation Needs Many LibrariesModular Simulation Needs Many Libraries– Devel. on workstations, then ported to grid.Devel. on workstations, then ported to grid.– Selection of library depends on analysis tech.Selection of library depends on analysis tech.– Constraint: Must use HTTP for file access.Constraint: Must use HTTP for file access.
Solution: Dynamic Link with TSS+HTTP:Solution: Dynamic Link with TSS+HTTP:– /home/cdfsoft -> /http/dcaf.fnal.gov/cdfsoft/home/cdfsoft -> /http/dcaf.fnal.gov/cdfsoft
select several MB from 60 GB of libraries
proxy
proxy
Technical ProblemTechnical Problem
HTTP is not a filesystem! (No directories)HTTP is not a filesystem! (No directories)– Advantages: Firewalls, caches, admins.Advantages: Firewalls, caches, admins.
Appl
Parrot
HTTP Module
HTTPServer
root
etchome bin
alice cmsbabar
opendir(/home)
opendir(/home)
GET /home HTTP/1.0
<HTML><HEAD>
<H1>
Technical ProblemTechnical ProblemSolution: Turn the directories into files.Solution: Turn the directories into files.– Can be cached in ordinary proxies!Can be cached in ordinary proxies!– Hierarchical SHA1 integrity check.Hierarchical SHA1 integrity check.
Appl
Parrot
HTTP Module
HTTPServer
root
etchome bin
alice cmsbabar
opendir(/home)
opendir(/home)
GET /home/.dir HTTP/1.0
.dir
.dir
makehttpfs
alicebabarcms
Logical Access to Bio DataLogical Access to Bio Data
Many databases of biological data in different Many databases of biological data in different formats around the world:formats around the world:– Archives: Swiss-Prot, TreMBL, NCBI, etc...Archives: Swiss-Prot, TreMBL, NCBI, etc...– Replicas: Public, Shared, Private, ???Replicas: Public, Shared, Private, ???
Users and applications want to refer to data Users and applications want to refer to data objects by logical name, not location!objects by logical name, not location!– Access the nearest copy of the non-redundant protein Access the nearest copy of the non-redundant protein
database, don’t care where it is.database, don’t care where it is.
Solution: EGEE data management system maps Solution: EGEE data management system maps logical names (LFNs) to physical names (SFNs).logical names (LFNs) to physical names (SFNs).
Credit: Christophe Blanchet, Bioinformatics Center of Lyon, CNRS IBCP, Francehttp://gbio.ibcp.fr/cblanchet, [email protected]
Logical Access to Bio DataLogical Access to Bio Data
BLAST
Parrot
RFIO gLite HTTP FTP
ChirpServer
FTPServer
gLiteServer
EGEE FileLocation Service
Run BLAST onLFN://ncbi.gov/nr.data
open(LFN://ncbi.gov/nr.data)
Where isLFN://ncbi.gov/nr.data?
Find it at:FTP://ibcp.fr/nr.data
nr.data
nr.data
nr.dataRETR nr.data
open(FTP://ibcp.fr/nr.data)
Performance of Bio Apps on EGEEPerformance of Bio Apps on EGEE
0
50
100
150
200
250
300
350
400
450
0 200 000 400 000 600 000 800 000 1 000 000 1 200 000
Protein Database Size (sequences)
Ru
nti
me (
sec)
BLAST+Parrot
FastA+Parrot
SSearch+Parrot
BLAST+copy
FastA+copy
SSearch+copy
Expandable FilesystemExpandable Filesystemfor Experimental Datafor Experimental Data
Credit: John Poirer @ Notre Dame Astrophysics Dept.
bufferdisk
2 GB/day todaycould be lots more!
dailytape
dailytapedaily
tapedailytapedaily
tape
30-yeararchive
analysiscode
Can only analyzethe most recent data.
Project GRANDhttp://www.nd.edu/~grand
Expandable FilesystemExpandable Filesystemfor Experimental Datafor Experimental Data
Credit: John Poirer @ Notre Dame Astrophysics Dept.
bufferdisk
2 GB/day todaycould be lots more!
dailytape
dailytapedaily
tapedailytapedaily
tape
30-yeararchive
Project GRANDhttp://www.nd.edu/~grand
fileserver
fileserver
fileserver
fileserver
Distributed Shared Filesystem
Adapter
analysiscode
Can analyze all dataover large time scales.
Current WorkCurrent Work
Credit: Jesus Izaguirre and Aaron Striegel @ Notre Dame
Now that we can easily use any storage...Now that we can easily use any storage...– Much easier to arrange data/jobs arbitrarily.Much easier to arrange data/jobs arbitrarily.– Idea: combine cluster storage / cluster comp!Idea: combine cluster storage / cluster comp!– Goal: keep jobs close to data that they need.Goal: keep jobs close to data that they need.– PINS: Processing in SToragePINS: Processing in STorage
Example: GEMS Distributed DatabankExample: GEMS Distributed Databank– Facility for creating, storing, and analyzing molecular Facility for creating, storing, and analyzing molecular
dynamics data in a cluster.dynamics data in a cluster.– Goal: Be able to easily scale both CPU and storage Goal: Be able to easily scale both CPU and storage
capacity by adding commodity nodes.capacity by adding commodity nodes.
filesystem
filesystem
filesystem
filesystem
filesystem
filesystem
filesystem
UNIX UNIX UNIX UNIX UNIX UNIX UNIX
fileserver
fileserver
fileserver
fileserver
fileserver
fileserver
fileserver
meta-datadatabase
J1 J2 J3 J4
D1 D2 D3 D4
D1 D1 D3 D4
F
F(D1)
FetchD1
ComputeF(D1)
Query(Mol==“CH4”)
&&(T>300K)
Distributed Filesystem Abstraction
Adapter
App
D2 D3 D4
D2 D3 D4D1
More Open ProblemsMore Open ProblemsResource ManagementResource Management– How to prevent overcommitment -> badput?How to prevent overcommitment -> badput?
SecuritySecurity– How to easily express complex policies for sharing How to easily express complex policies for sharing
and controlling combined cpu/disk?and controlling combined cpu/disk?
ReliabilityReliability– How to deal with disconnection, erasure, rejection, How to deal with disconnection, erasure, rejection,
unexpected performance, etc...unexpected performance, etc...
Garbage CollectionGarbage Collection– What’s to prevent me from filling every disk What’s to prevent me from filling every disk
everywhere with computations that I might need?everywhere with computations that I might need?
DebuggingDebugging– How do we dig out of numerous, noisy, distributed How do we dig out of numerous, noisy, distributed
logs that state relevant to a complex workflow?logs that state relevant to a complex workflow?
ConclusionConclusion
Tactical storage allows end users to build large Tactical storage allows end users to build large structures out of simple building blocks without structures out of simple building blocks without
getting stuck on the ugly details.getting stuck on the ugly details.
AcknowledgmentsAcknowledgments
Science Collaborators:Science Collaborators:– Christophe BlanchetChristophe Blanchet– Patrick FlynnPatrick Flynn– Sander Klous Sander Klous – Peter KunzstPeter Kunzst– Erwin LaureErwin Laure– John PoirierJohn Poirier– Igor SfiligoiIgor Sfiligoi
CS Collaborators:CS Collaborators:– Jesus IzaguirreJesus Izaguirre– Aaron StriegelAaron Striegel
CS Students:CS Students:– Paul BrennerPaul Brenner– James FitzgeraldJames Fitzgerald– Jeff HemmesJeff Hemmes– Paul MadridPaul Madrid– Chris MorettiChris Moretti– Gerhard NiederwieserGerhard Niederwieser– Phil SnowbergerPhil Snowberger– Justin WozniakJustin Wozniak
For more information...For more information...
Cooperative Computing LabCooperative Computing Lab
http://www.cse.nd.edu/~cclhttp://www.cse.nd.edu/~ccl
Cooperative Computing ToolsCooperative Computing Tools
http://http://www.cctools.orgwww.cctools.org
Douglas ThainDouglas Thain– [email protected]@cse.nd.edu– http://http://www.cse.nd.edu/~dthainwww.cse.nd.edu/~dthain
Problem: Shared NamespaceProblem: Shared Namespacefile
server
globus:/O=NotreDame/* RWLAX
a.out
test.c test.dat
cms.exe
Solution: Reservation (V) RightSolution: Reservation (V) Rightfile
server
O=NotreDame/CN=* V(RWLA)
/O=NotreDame/CN=Monk RWLA
mkdir
a.outtest.c
/O=NotreDame/CN=Monk
mkdir
/O=NotreDame/CN=Ted RWLA
a.outtest.c
/O=NotreDame/CN=Tedmkdir only!