Post on 06-Jan-2016
description
transcript
slide 1
PS1 Prototype Systems DesignJan Vandenberg, JHU
Early PS1 Prototype
slide 2
Engineering Systems to Support the Database Design
Raw data size Index size Most end-user operations I/O bound Loading/Ingest more cpu-bound, though we still need
solid write performance Time to do full table scans Time to do index scans Need to do most work where the data is; can’t sling TB’s
over the network quickly• …though we can brute-force past 1 Gbit Ethernet if
necessary
slide 3
Fibre Channel, SAN
Expensive but not-so-fast physical links (4 Gbit, 10 Gbit) Expensive switch Potentially very flexible Industrial strength manageability Little control over RAID controller bottlenecks
slide 4
SATA
Fast Cheap Ugly, spooky
• <cabling pic> Tough to manage
• <dlmsdb/sdssdb drive bay map>
slide 5
SAS
For our purposes, it’s SATA without the ugliness Fast: 12 Gbit/s FD building blocks Cheap: PS1 prototype MD1000 pricing versus Newegg
media costs Not Ugly: IB cables versus rats’ nest Industrial strength manageability: pretty blinking lights
and mgmt apps versus downtime plus white knuckles <cabling pic>
slide 6
I/O Performance of Dell SAS Systems in the PS1 Prototype
slide 7
SAS Performance, Gory Details
SAS v. SATA differences
Native SAS V. SATA Performance
0
50
100
150
200
250
300
350
400
450
500
1 2 3 4 5 6 7
Disks
MB
/s
20%
slide 8
Per-Controller Performance
Luckily, one controller is fast enough for one SATA disk box
<performance chart>
slide 9
Resulting PS1 Prototype I/O Topology
<topo diagram> <aggregate performance chart>
slide 10
RAID-5 v. RAID-10?
Primer, anyone? RAID-5 probably feasible with contemporary controller… …though tough to predict real-world effects of latency… …and not a ton of redundancy But after we add enough disks to meet performance
goals, we have enough storage to run RAID-10 anyway!• Remember sub-Newegg media costs
slide 11
RAID-10 Performance
Executive summary: RAID0/2 for single-threaded reads, RAID0 perf for 2-user/2-thread workloads. RAID0/2 writes
slide 12
PS1 Prototype Servers
<diagram of server roles plus storage and network interconnects>
slide 13
PS1 Prototype Servers
<iron photo (w/Will?)>
slide 14
Projected PS1 Systems Design
<diagram of 8-slice triply-replicated systems> <plus geoplex?>
slide 15
Backup/Recovery/Replication Strategies
No formal backup• …except maybe for mydb’s, f(cost*policy)
3-way replication• Replication != backup
– Little or no history– Replicas can be a bit too cozy: must notice badness before
replication propagates it• Replicas provide redundancy and load balancing…• Fully online: zero time to recover• Replicas needed for happy production performance plus
ingest, anyway Off-site geoplex
• Provides continuity if we lose HI (local or trans-Pacific network outage, facilities outage)
– <lava pic?>• Could help balance trans-Pacific bandwidth needs (service
continental traffic locally)
slide 16
Why No Traditional Backups?
Not super pricey… …but not very useful relative to a replica for our
purposes• Time to recover
Money no object… do traditional backups too!!! Synergy, economy of scale with other collaboration
needs (IPP?)… do traditional backups too!!!
slide 17
Failure Scenarios
Easy, zero-downtime:• Disks• Power supplies• Fans
Not so spooky, maybe some downtime and manual replica cutover:• System board (rare)• Memory (rare and usually proactively detected and handled via scheduled maintenance)• Disk controller (rare, potentially minimal downtime via cold-spare controller)• CPU (not utterly uncommon, can be tough and time consuming to diagnose correctly)
More spooky:• Database mangling by human or pipeline error
– Gotta catch this before replication propagates it everywhere– Can’t replicate too aggressively– (and so off-the-shelf near-realtime replication tools don’t help us)
• Catastrophic loss of datacenter– Have the geoplex
– …but we’re dangling by a single copy ‘till recovery complete– …but are we still screwed? Depending on colo scenarios, did we also lose the IPP and flatfile
archive? Terrifying:
• Unrecoverable badness fully replicated before detection• Catastrophic loss of datacenter without geoplex• Can we ever catch back up with the data rate if we need to start over?
– At some point in the survey, the answer likely becomes “no”.
slide 18
State Diagram for Replicas?
Loading Replicating Load balancing Failing Recovering
• Possibly repeat-loading
slide 19
Operating Systems, DBMS?
Sql2005 EE x64• Why?• Why not DB2, Oracle RAC, PostgreSQL, MySQL,
<insert your favorite>? (Win2003 EE x64) <Why EE?> Platform rant from JVV available over beers
• <JVV/beer graphic?>
slide 20
Systems/Database Management
Active Directory infrastructure Windows patching tools, methodology Linux patching tools, methodology Monitoring Staffing requirements
slide 21
Facilities/Infrastructure Projections for PS1
Cooling Rack space Network ports (plus AD/WSUS/monitoring infrastructure above)
slide 22
Operational Handoff to UofH
Mahalo!(See Ya, Hon!)