PS1 Prototype Systems Design Jan Vandenberg, JHU

transcript

PS1 Prototype Systems DesignJan Vandenberg, JHU

Early PS1 Prototype

Engineering Systems to Support the Database Design

Raw data size Index size Most end-user operations I/O bound Loading/Ingest more cpu-bound, though we still need

solid write performance Time to do full table scans Time to do index scans Need to do most work where the data is; can’t sling TB’s

over the network quickly• …though we can brute-force past 1 Gbit Ethernet if

necessary

Fibre Channel, SAN

Expensive but not-so-fast physical links (4 Gbit, 10 Gbit) Expensive switch Potentially very flexible Industrial strength manageability Little control over RAID controller bottlenecks

Fast Cheap Ugly, spooky

• <cabling pic> Tough to manage

• <dlmsdb/sdssdb drive bay map>

For our purposes, it’s SATA without the ugliness Fast: 12 Gbit/s FD building blocks Cheap: PS1 prototype MD1000 pricing versus Newegg

media costs Not Ugly: IB cables versus rats’ nest Industrial strength manageability: pretty blinking lights

and mgmt apps versus downtime plus white knuckles <cabling pic>

I/O Performance of Dell SAS Systems in the PS1 Prototype

SAS Performance, Gory Details

SAS v. SATA differences

Native SAS V. SATA Performance

1 2 3 4 5 6 7

Per-Controller Performance

Luckily, one controller is fast enough for one SATA disk box

Resulting PS1 Prototype I/O Topology

RAID-5 v. RAID-10?

Primer, anyone? RAID-5 probably feasible with contemporary controller… …though tough to predict real-world effects of latency… …and not a ton of redundancy But after we add enough disks to meet performance

goals, we have enough storage to run RAID-10 anyway!• Remember sub-Newegg media costs

RAID-10 Performance

Executive summary: RAID0/2 for single-threaded reads, RAID0 perf for 2-user/2-thread workloads. RAID0/2 writes

PS1 Prototype Servers

PS1 Prototype Servers

Projected PS1 Systems Design

Backup/Recovery/Replication Strategies

No formal backup• …except maybe for mydb’s, f(cost*policy)

3-way replication• Replication != backup

– Little or no history– Replicas can be a bit too cozy: must notice badness before

replication propagates it• Replicas provide redundancy and load balancing…• Fully online: zero time to recover• Replicas needed for happy production performance plus

ingest, anyway Off-site geoplex

• Provides continuity if we lose HI (local or trans-Pacific network outage, facilities outage)

– <lava pic?>• Could help balance trans-Pacific bandwidth needs (service

continental traffic locally)

Why No Traditional Backups?

Not super pricey… …but not very useful relative to a replica for our

purposes• Time to recover

Money no object… do traditional backups too!!! Synergy, economy of scale with other collaboration

needs (IPP?)… do traditional backups too!!!

Failure Scenarios

Easy, zero-downtime:• Disks• Power supplies• Fans

Not so spooky, maybe some downtime and manual replica cutover:• System board (rare)• Memory (rare and usually proactively detected and handled via scheduled maintenance)• Disk controller (rare, potentially minimal downtime via cold-spare controller)• CPU (not utterly uncommon, can be tough and time consuming to diagnose correctly)

More spooky:• Database mangling by human or pipeline error

– Gotta catch this before replication propagates it everywhere– Can’t replicate too aggressively– (and so off-the-shelf near-realtime replication tools don’t help us)

• Catastrophic loss of datacenter– Have the geoplex

– …but we’re dangling by a single copy ‘till recovery complete– …but are we still screwed? Depending on colo scenarios, did we also lose the IPP and flatfile

archive? Terrifying:

• Unrecoverable badness fully replicated before detection• Catastrophic loss of datacenter without geoplex• Can we ever catch back up with the data rate if we need to start over?

– At some point in the survey, the answer likely becomes “no”.

State Diagram for Replicas?

Loading Replicating Load balancing Failing Recovering

• Possibly repeat-loading

Operating Systems, DBMS?

Sql2005 EE x64• Why?• Why not DB2, Oracle RAC, PostgreSQL, MySQL,

<insert your favorite>? (Win2003 EE x64) <Why EE?> Platform rant from JVV available over beers

• <JVV/beer graphic?>

Systems/Database Management

Active Directory infrastructure Windows patching tools, methodology Linux patching tools, methodology Monitoring Staffing requirements

Facilities/Infrastructure Projections for PS1

Cooling Rack space Network ports (plus AD/WSUS/monitoring infrastructure above)

Operational Handoff to UofH

Mahalo!(See Ya, Hon!)

PS1 Prototype Systems Design Jan Vandenberg, JHU

Documents