PS1 Prototype Systems Design Jan Vandenberg, JHU

Post on 06-Jan-2016

36 views 0 download

description

PS1 Prototype Systems Design Jan Vandenberg, JHU. Early PS1 Prototype. Engineering Systems to Support the Database Design. Raw data size Index size Most end-user operations I/O bound Loading/Ingest more cpu-bound, though we still need solid write performance Time to do full table scans - PowerPoint PPT Presentation

transcript

slide 1

PS1 Prototype Systems DesignJan Vandenberg, JHU

Early PS1 Prototype

slide 2

Engineering Systems to Support the Database Design

Raw data size Index size Most end-user operations I/O bound Loading/Ingest more cpu-bound, though we still need

solid write performance Time to do full table scans Time to do index scans Need to do most work where the data is; can’t sling TB’s

over the network quickly• …though we can brute-force past 1 Gbit Ethernet if

necessary

slide 3

Fibre Channel, SAN

Expensive but not-so-fast physical links (4 Gbit, 10 Gbit) Expensive switch Potentially very flexible Industrial strength manageability Little control over RAID controller bottlenecks

slide 4

SATA

Fast Cheap Ugly, spooky

• <cabling pic> Tough to manage

• <dlmsdb/sdssdb drive bay map>

slide 5

SAS

For our purposes, it’s SATA without the ugliness Fast: 12 Gbit/s FD building blocks Cheap: PS1 prototype MD1000 pricing versus Newegg

media costs Not Ugly: IB cables versus rats’ nest Industrial strength manageability: pretty blinking lights

and mgmt apps versus downtime plus white knuckles <cabling pic>

slide 6

I/O Performance of Dell SAS Systems in the PS1 Prototype

slide 7

SAS Performance, Gory Details

SAS v. SATA differences

Native SAS V. SATA Performance

0

50

100

150

200

250

300

350

400

450

500

1 2 3 4 5 6 7

Disks

MB

/s

20%

slide 8

Per-Controller Performance

Luckily, one controller is fast enough for one SATA disk box

<performance chart>

slide 9

Resulting PS1 Prototype I/O Topology

<topo diagram> <aggregate performance chart>

slide 10

RAID-5 v. RAID-10?

Primer, anyone? RAID-5 probably feasible with contemporary controller… …though tough to predict real-world effects of latency… …and not a ton of redundancy But after we add enough disks to meet performance

goals, we have enough storage to run RAID-10 anyway!• Remember sub-Newegg media costs

slide 11

RAID-10 Performance

Executive summary: RAID0/2 for single-threaded reads, RAID0 perf for 2-user/2-thread workloads. RAID0/2 writes

slide 12

PS1 Prototype Servers

<diagram of server roles plus storage and network interconnects>

slide 13

PS1 Prototype Servers

<iron photo (w/Will?)>

slide 14

Projected PS1 Systems Design

<diagram of 8-slice triply-replicated systems> <plus geoplex?>

slide 15

Backup/Recovery/Replication Strategies

No formal backup• …except maybe for mydb’s, f(cost*policy)

3-way replication• Replication != backup

– Little or no history– Replicas can be a bit too cozy: must notice badness before

replication propagates it• Replicas provide redundancy and load balancing…• Fully online: zero time to recover• Replicas needed for happy production performance plus

ingest, anyway Off-site geoplex

• Provides continuity if we lose HI (local or trans-Pacific network outage, facilities outage)

– <lava pic?>• Could help balance trans-Pacific bandwidth needs (service

continental traffic locally)

slide 16

Why No Traditional Backups?

Not super pricey… …but not very useful relative to a replica for our

purposes• Time to recover

Money no object… do traditional backups too!!! Synergy, economy of scale with other collaboration

needs (IPP?)… do traditional backups too!!!

slide 17

Failure Scenarios

Easy, zero-downtime:• Disks• Power supplies• Fans

Not so spooky, maybe some downtime and manual replica cutover:• System board (rare)• Memory (rare and usually proactively detected and handled via scheduled maintenance)• Disk controller (rare, potentially minimal downtime via cold-spare controller)• CPU (not utterly uncommon, can be tough and time consuming to diagnose correctly)

More spooky:• Database mangling by human or pipeline error

– Gotta catch this before replication propagates it everywhere– Can’t replicate too aggressively– (and so off-the-shelf near-realtime replication tools don’t help us)

• Catastrophic loss of datacenter– Have the geoplex

– …but we’re dangling by a single copy ‘till recovery complete– …but are we still screwed? Depending on colo scenarios, did we also lose the IPP and flatfile

archive? Terrifying:

• Unrecoverable badness fully replicated before detection• Catastrophic loss of datacenter without geoplex• Can we ever catch back up with the data rate if we need to start over?

– At some point in the survey, the answer likely becomes “no”.

slide 18

State Diagram for Replicas?

Loading Replicating Load balancing Failing Recovering

• Possibly repeat-loading

slide 19

Operating Systems, DBMS?

Sql2005 EE x64• Why?• Why not DB2, Oracle RAC, PostgreSQL, MySQL,

<insert your favorite>? (Win2003 EE x64) <Why EE?> Platform rant from JVV available over beers

• <JVV/beer graphic?>

slide 20

Systems/Database Management

Active Directory infrastructure Windows patching tools, methodology Linux patching tools, methodology Monitoring Staffing requirements

slide 21

Facilities/Infrastructure Projections for PS1

Cooling Rack space Network ports (plus AD/WSUS/monitoring infrastructure above)

slide 22

Operational Handoff to UofH

Mahalo!(See Ya, Hon!)