Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | devin-huff |
View: | 54 times |
Download: | 3 times |
Windows NT Scalability
Jim Gray
Microsoft [email protected]
http/www.research.Microsoft.com/~Gray/talks/
OutlineOutline
• Scalability: What & Why?
• Scale UP: NT SMP scalability
• Scale OUT: NT Cluster scalability
• Key Message:
– NT can do the most demanding apps today.
– Tomorrow will be even better.
Scale OutScale Out
Scale Up
Scale DownDown
Scale OutScale Out
Server ClusterServer Cluster
What is Scalability?
• Grow without limits– Capacity– Throughput
• Do not add complexity– design– administer– Operate– UseS
cale
Do
wn
Do
wn
Win TermWin TermNetPCNetPC
HandheldHandheld
PortablePortable
TVTV
Sca
le U
pSuperSuperServerServer
ServerServer
PC PC WorkstationWorkstation
Scale UP & OUT Focus Here
• Grow without limits– SMP: 4, 8, 16, 32 CPUs– 64-bit addressing– Huge storage
• Cluster Requirements– Auto manage– High availability– Transparency– Programming tools & appsapps
Scale OutScale Out
Server ClusterServer Cluster
Sca
l e U
p
SuperSuperServerServer
ServerServer
Scalability is Important• Automation benefits growing
– ROI of 1 month....
• Slice price going to zero– Cyberbrick costs 5k$
• Design, Implement & Manage cost going down
– DCOM & Viper make it easy!– NT Clusters are easy!
• Billions of clients imply millions of HUGE servers.
• Thin clients imply huge servers.
ServerServer
Q: Why Does Microsoft Care? A: Billions of clients need millions of servers
Expect Microsoft to work hard on Scaleable Windows NT and Scaleable BackOffice.
Key technique: INTEGRATION.
0300600900
1,2001,5001,8002,1002,4002,700
1994 1995 1996 1997 1998 1999 2000 2001
UnixUnix
WindowsNT WindowsNT ServerServer
NetWareNetWare
Servers Shipped per year
(97-01 are MS estimates)
How Scaleable is NT??The Single Node Story
• 64 bit file system in NT 1, 2, 3, 4, 5
• 8 node SMP in NT 4.E, 32 node OEM
• 64 bit addressing in NT 5
• 1 Terabyte SQL Databases (PetaByte capable)
• 10,000 users (TPC-C benchmark)
• 100 Million web hits per day (IIS)
• 50 GB Exchange mail store next release designed for 16 TB
• 50,000 POP3 users on Exchange (1.8 M messages/day)
• And, more coming…..
Windows NT ServerEnterprise Edition
• Scalability– 8x SMP support (32x in OEM kit)– Larger process memory (3GB Intel)– Unlimited Virtual Roots in IIS (web)
• Transactions– DCOM transactions (Viper TP mon) – Message Queuing (Falcon)
• Availability– Clustering (WolfPack)– Web, File, Print,DB … servers fail over.
What Happened?
• Moore’s law: Things get 4x better every 3 years (applies to computers, storage, and networks)
• New Economics: Commodityclass price/mips software $/mips k$/yearmainframe 10,000 100 minicomputer 100 10microcomputer 10 1
• GUI: Human - computer tradeoffoptimize for people, not computers
mainframeminimicro
time
pric
e
Billions Of ClientsNeed Millions Of Servers
MobileMobileclientsclients
FixedFixedclients clients
ServerServer
SuperSuperserverserver
ClientsClients
ServersServers
All clients networked All clients networked to serversto servers May be nomadicMay be nomadic
or on-demandor on-demand Fast clients wantFast clients wantfasterfaster servers servers
Servers provide Servers provide Shared DataShared Data ControlControl CoordinationCoordination CommunicationCommunication
ThesisMany little beat few big
Smoking, hairy golf ballSmoking, hairy golf ball How to connect the many little parts?How to connect the many little parts? How to program the many little parts?How to program the many little parts? Fault tolerance?Fault tolerance?
$1 $1 millionmillion $100 K$100 K $10 K$10 K
MainframeMainframe MiniMiniMicroMicro NanoNano
14"14"9"9"
5.25"5.25" 3.5"3.5" 2.5"2.5" 1.8"1.8"1 M SPECmarks, 1TFLOP1 M SPECmarks, 1TFLOP
101066 clocks to bulk ram clocks to bulk ram
Event-horizon on chipEvent-horizon on chip
VM reincarnatedVM reincarnated
Multiprogram cache,Multiprogram cache,On-Chip SMPOn-Chip SMP
10 microsecond ram
10 millisecond disc
10 second tape archive
10 nano-second ram
Pico Processor
10 pico-second ram
1 MM 3
100 TB
1 TB
10 GB
1 MB
100 MB
Future Super Server:4T Machine
Array of 1,000 4B machinesArray of 1,000 4B machines1 bps processors1 bps processors1 BB DRAM 1 BB DRAM 10 BB disks 10 BB disks 1 Bbps comm lines1 Bbps comm lines1 TB tape robot1 TB tape robot
A few megabucksA few megabucks Challenge:Challenge:
ManageabilityManageabilityProgrammabilityProgrammabilitySecuritySecurityAvailabilityAvailabilityScaleabilityScaleabilityAffordabilityAffordability
As easy as a single systemAs easy as a single system
Future servers are CLUSTERSFuture servers are CLUSTERSof processors, discsof processors, discs
Distributed database techniquesDistributed database techniquesmake clusters workmake clusters work
CPU
50 GB Disc
5 GB RAM
Cyber BrickCyber Bricka 4B machinea 4B machine
The Hardware Is In Place…And then a miracle occurs
? SNAP: scaleable networkSNAP: scaleable network
and platformsand platforms Commodity-distributedCommodity-distributed
OS built on:OS built on: Commodity platformsCommodity platforms Commodity networkCommodity network
interconnectinterconnect Enables parallel applicationsEnables parallel applications
Thesis: Scaleable Servers• Scaleable Servers
– Commodity hardware allows new applications
– New applications need huge servers
– Clients and servers are built of the same “stuff”
• Commodity software and
• Commodity hardware
• Servers should be able to – Scale up (grow node by adding CPUs, disks, networks)
– Scale out (grow by adding nodes)
– Scale down (can start small)
• Key software technologies
– Objects, Transactions, Clusters, Parallelism
Scaleable ServersBOTH SMP And Cluster
Grow up with SMP; 4xP6Grow up with SMP; 4xP6is now standardis now standardGrow out with clusterGrow out with clusterCluster has inexpensive partsCluster has inexpensive parts
ClusterClusterof PCs of PCs
SMP superSMP superserverserver
DepartmentalDepartmentalserverserver
PersonalPersonalsystemsystem
SMPs Have Advantages
• Single system image easier to manage, easier to program threads in shared memory, disk, Net
• 4x SMP is commodity
• Software capable of 16x
• Problems:– >4 not commodity– Scale-down problem
(starter systems expensive)
• There is a BIGGEST one
SMP superSMP superserverserver
DepartmentalDepartmentalserverserver
PersonalPersonalsystemsystem
Tpc-C Web-Based Benchmarks
• Client is a Web browser (9,200 of them!)
• Submits – Order– Invoice– Query to server via Web
page interface
• Web server translates to DB
• SQL does DB work
• Net: –easy to implement –performance is GREAT!
HT
TP
HT
TP
OD
BC
OD
BC
SQL SQL
IISIIS= Web= Web
1987: 256 tps $ 14 million computerA dozen peopleTwo rooms of machines
1997: 1,250 tps $ 50 k$ computerOne person1 micro-dollar per transaction (1,000x cheaper)
What Happens in 10 Years?
Ready for the next 10 years?
1988: DB2 + CICS Mainframe65 tps
• IBM 4391
• Simulated network of 800 clients
• 2m$ computer
• Staff of 6 to do benchmark2 x 3725
network controllers
16 GB disk farm4 x 8 x .5GB
Refrigerator-sizedCPU
NT vs UNIX SMPs• NT traditionally ran on 1 to 4 cpus
– Scales near-linear on them
• UNIX boxes: 32-64 way SMPs– They do 3x more tpmC– They cost 10x more.
• 10 way NT machines are available– They cost more– They are faster
• My view (shared by many)– Need clusters for availability– Cluster commodity servers to make huge systems– a la Tandem, Teradata, VMScluster, IBM Sysplex, IBM SP2 – Clusters reduce need for giant SMPs
tpmC vs Time
05,000
10,00015,00020,00025,00030,00035,000
Jan-95 Jan-96 Jan-97
tpm
C
tpmC vs Time
05,000
10,00015,00020,00025,00030,00035,000
Jan-95 Jan-96 Jan-97
tpm
C
hh
UnixNT
tpmC vs Time
05,000
10,00015,000
20,00025,000
30,00035,000
Jan-95 Jan-96 Jan-97
tpm
C
hUnix
NT
Transaction Throughput TPC-C• On comparable hardware: NT scales better!
• SQL Server & NT Improving 250% per year
• NT has best Price Performance (2x cheaper)
tpmC on Intel CPUs
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
0 1 2 3 4 5 6 7 8 9 10
tpm
C
NT
UNIX
h
h
hhhh
tpmC vs Intel CPUs
0
2,000
4,000
6,000
8,000
10,000
12,000
14,000
0 1 2 3 4 5 6 7 8 9 10
tpm
C
NT all
NT Best
Unix best
hh
NT Scales Better Than Solaris• Microsoft SQL
NTIntel scales to 6x
• Beats Sybase Solaris UltraSPARCup to 11-way
0
5,000
10,000
15,000
20,000
0 10 20cpus
tpm
C
Sybase/Solaris
/UltraSPARC
MS
SQL/N
T/Inte
l
New News: WOW! HPUX-HPPA-Sybase
• Sybase on HP 16x SMP scales to 40 ktpmC!
• Price/Performance is flat (no diseconomy)
Sybase & HP tpmC vs CPUs
0
5000
10000
15000
20000
25000
30000
35000
40000
45000
0 5 10 15 20
cpus
tpm
C
HP + Sybase $/tpmC vs tpmC
$0$20
$40$60
$80$100
$120$140
0 10000 20000 30000 40000
tpmC
$/tm
pC
TPC Price/tpmC
10
17
6 64
0
5
10
15
20
25
30
35
40
45
50
processor disk software net total/10
Unisys-Microsoft 12 ktpmC @ 39$/tpmC
TPC Price/tpmC
20 21
10
6 6
10
17
6 64
0
5
10
15
20
25
30
35
40
45
50
processor disk software net total/10
SUN-Sybase 11.6 ktpmC @ 57$/tpmc
Unisys-Microsoft 12 ktpmC @ 39$/tpmC
TPC Price/tpmC
37
33
16
79
20 21
10
6 6
10
17
6 64
0
5
10
15
20
25
30
35
40
45
50
processor disk software net total
HP-Sybase 39K tpmC @96$/tpmC
SUN-Sybase 11.6 ktpmC @ 57$/tpmc
Unisys-Microsoft 12 ktpmC @ 39$/tpmC
TPC Price/tpmC
4138
50
6
13
37
33
16
79
20 21
10
6 6
10
17
6 64
0
5
10
15
20
25
30
35
40
45
50
processor disk software net total/10
Sun Oracle 52 k tpmC @ 134$/tpmC
HP-Sybase 39K tpmC @96$/tpmC
SUN-Sybase 11.6 ktpmC @ 57$/tpmc
Unisys-Microsoft 12 ktpmC @ 39$/tpmC
Low end More Competitive
• premium on CPUs, disks, & Oracle
Only NT Has Economy of Scale
• NT is 2x less expensive40$/tpmCvs 110$/tpmC
• Only NT has economy of scale
• Unix has dis-economy of scale
Transactions/k$ by vendor
0.0
5.0
10.0
15.0
20.0
25.0
0 10,000 20,000 30,000 40,000
tpmC
tpm
C/k
$
DB2/Unix
Sybase/Unix
Informix/Unix
Microsoft/NT
Oracle/Unix
TPC-D Decision Support Benchmark
• NT has good performance and price/performance.
TPC D 100 GB results
-
500
1,000
1,500
2,000
2,500
3,000
0 200 400 600 800 1000 1200 1400 1600
Performance
Pri
ce/P
erf
($/Q
thD
)
Lowerprice
More Througput
NT
NT
NT
Scaleup To Big Databases?• NT 4 and SQL Server 6.5
– DBs up to 1 Billion records,
– 100 GB
– Covers most (80%) data warehouses
• SQL Server 7.0
– Designed for Terabytes
• Hundreds of disks per server.
• SMP parallel search
– Data Mining and Multi-Media
• TerraServer is good MM example
ExcelExcelspreadsheetspreadsheet
Manhattan phone book Manhattan phone book (15MB)(15MB)
Human GenomeHuman Genome (3GB) (3GB)
Dayton-HudsonDayton-HudsonSales recordsSales records(300GB)(300GB)
SatelliteSatellitephotos of photos of
Earth (1 TB)Earth (1 TB)
Database Scaleup: TerraServer™• Demo NT and SQL Server scalability• Stress test SQL Server 7.0• Requirements
– 1 TB– Unencumbered (put on www)– Interesting to everyone everywhere– And not offensive to anyone anywhere
• Loaded – 1.1 M place names from Encarta World Atlas– 1 M Sq Km from USGS (1 meter resolution)– 2 M Sq Km from Russian Space agency (2 m)
• Will be on web (world’s largest atlas)• Sell images with commerce server.• USGS CRDA: 3 TB more coming.
TerraServer System
• DEC Alpha 4100 (4x smp) +
• 324 StorageWorks Drives (1.4 TB)
• RAID 5 Protected
• SQL Server 7.0
• USGS 1-meter data (30% of US)
• Russian Space dataTwo meterresolutionimages(2 M km2
2% of earth)
SPIN-2SPIN-2
http://msrlab/terraserver
Demo
Manageability Windows NT 5.0 and Windows 98
• Active Directory tracks all objects in net
• Integration with IE 4.–Web-centric user interface
• Management Console–Component architecture
• Zero Admin Kit and Systems Management Server
• PlugNPlay, Instant On, Remote Boot,..
• Hydra and Intelli-Mirroring
Windows NT ServerWindows NT Serverwith “Hydra” Serverwith “Hydra” Server
Dedicated Dedicated Windows Windows terminalterminal
Existing, Existing, Desktop PC Desktop PC
MS-DOS, MS-DOS, UNIX, UNIX, Mac Mac clientsclients
Net PCNet PC
Thin Client SupportTSO comes to NT
lower per-client costs
Best of PC andBest of PC andcentralized computing advantagescentralized computing advantages
Windows NT 5.0IntelliMirror™
• Extends CMU Coda File System ideas
• Files and settings mirrored on client and server
• Great for disconnected users
• Facilitates roaming
• Easy to replace PCs
• Optimizes network performance
OutlineOutline
• Scalability: What & Why?
• Scale UP: NT SMP scalability
• Scale OUT: NT Cluster scalability
• Key Message:
– NT can do the most demanding apps today.
– Tomorrow will be even better.
Scale OutScale Out
Scale Up
Scale DownDown
Scale OUTClusters Have Advantages
• Fault tolerance: – Spare modules mask failures
• Modular growth without limitswithout limits– Grow by adding small modules
• Parallel data search– Use multiple processors and disks
• Clients and servers made from the same stuff– Inexpensive: built with
commodity CyberBricks
How scaleable is NT??The Cluster Story
• 16-node Tandem Cluster– 64 cpus– 2 TB of disk– Decision support
• 45-node Compaq Cluster– 140 cpus– 14 GB DRAM– 4 TB RAID disk– OLTP (Debit Credit)
• 1 B tpd (14 k tps)
microsoft.com• Production
– Windows NT.4 and IIS.3• 20 HTTP, • 3 download, • 3 FTP• 5 SQL 6.5• Index Server + 3 search
• Stagers– Site Server for content– DCOM Publishing wizard
• Network– 6 DS3– 4 TB/day download capacity
• Replicas in UK and Japan
• 90m hits/day– 17m page views– #4 site on Internet
• 900k visitors per day• Not cheap
– Data Centers– Bandwidth– 27 people on content – 22 people on systems
Tandem 2 Ton
• 2 TB SQL database
• 1.2 TB user data
• 16 node cluster
• 64 cpus, 480 disks
• Decision support parallel data-mining
• Will be Wolf Pack aware
• Demoed at DB Expo in
• ServerNet™ interconnect
Billion Transactions per Day Project
• Built a 45-node Windows NT Cluster (with help from Intel & Compaq) > 900 disks
• All off-the-shelf parts
• Using SQL Server & DTC distributed transactionsDCOM & ODBC clientson 20 front-end nodes
• DebitCredit Transaction
• Each server node has 1/20 th of the DB
• Each server node does 1/20 th of the work
• 15% of the transactions are “distributed”
Type nodes CPUs DRAM ctlrs disks RAIDspace
WorkflowMTS
20CompaqProliant
2500
20x
2
20x
128
20x
1
20x
1
20x
2 GB
SQL Server
20CompaqProliant
5000
20x
4
20x
512
20x
4
20x36x4.2GB7x9.1GB
20x
130 GB
DistributedTransactionCoordinator
5CompaqProliant
5000
5x
4
5x
256
5x
1
5x
3
5x
8 GB
TOTAL 45 140 13 GB 105 895 3 TB
Billion Transactions Per Day Hardware
• 45 nodes (Compaq Proliant)
• Clustered with 100 Mbps Switched Ethernet
• 140 cpu, 13 GB, 3 TB (RAID 1, 5).
Cluster ArchitectureSwitch
DriverDatabase
DTC
Control
VIPDC2 VIPDC3 VIPDC4 VIPDC5 VIPDC6 VIPDC7 VIPDC8 VIPDC9 VIPDC10 VIPDC11
VIPDC12 VIPDC13 VIPDC14 VIPDC15 VIPDC16 VIPDC17 VIPDC18 VIPDC19 VIPDC20 VIPDC21
VIPDTC1 VIPDTC2 VIPDTC3 VIPDTC4 VIPDTC5
VIPDC42 VIPDC43 VIPDC44 VIPDC45 VIPDC46 VIPDC47 VIPDC48 VIPDC49 VIPDC50 VIPDC51
DebitCreditDriver
DebitCreditComponent
DatabaseDriverThread
Local Debit Credit
1
2
4
5
7
89
12
13
14
Loop
3 Run
6 Init
10DebitCredit 11
DebitCredit
Distributed Debit Credit - Same DTC
Database1
DebitCredit
Database2
DTC
11
12
13
14
15
17
18
19
20
21
25
25
26
26
27
27
28
28
23
16
24
29
22 UpdateAcct
Distributed Debit Credit - Different DTC
Database1
DebitCredit
Database2
DTC1
11
12
13
DTC2
14
15
16
19
2727
18
17
25
23
20
2122
28 2932
33
3030
3131
3434
26
35
24
UpdateAcct
1.2 B tpd• 1 B tpd ran for 24 hrs.
• Out-of-the-box software
• Off-the-shelf hardware
• AMAZING!•Sized for 30 days•Linear growth•5 micro-dollars per transaction
48
Millions of Transactions Per Day
0.1
1.
10.
100.
1,000.
1 Btpd Visa ATT BofA NYSE
Mtp
d
Millions of Transactions Per Day
0.100.200.300.400.500.600.700.800.900.
1,000.
1 Btpd Visa ATT BofA NYSE
Mtp
d
How Much Is 1 Billion Tpd?• 1 billion tpd = 11,574 tps
~ 700,000 tpm (transactions/minute)• ATT
– 185 million calls per peak day (worldwide)
• Visa ~20 million tpd– 400 million customers– 250K ATMs worldwide– 7 billion transactions
(card+cheque) in 1994 • New York Stock Exchange
– 600,000 tpd• Bank of America
– 20 million tpd checks cleared (more than any other bank)– 1.4 million tpd ATM transactions
• Worldwide Airlines Reservations: 250 Mtpd
1 B tpd: So What?
• Shows what is possible, easy to build
– Grows without limits
• Shows scaleup of DTC, MTS, SQL…
• Shows (again) that shared-nothing clusters scale
•Next task: make it easy.– auto partition data
– auto partition application
– auto manage & operate
ParallelismThe OTHER aspect of clusters
• Clusters of machines allow two kinds of parallelism– Many little jobs:
online transaction processing• TPC-A, B, C…
– A few big jobs: data search and analysis• TPC-D, DSS, OLAP
• Both give automatic parallelism
Kinds of Parallel Execution
Pipeline
Partition outputs split N ways inputs merge M ways
Any Sequential Program
Any Sequential Program
Any Sequential
Any Sequential Program Program
Data Rivers
Split + Merge Streams
River
M ConsumersN producers
Producers add records to the river, Consumers consume records from the riverPurely sequential programming.River does flow control and buffering
does partition and merge of data records River = Split/Merge in Gamma = Exchange operator in Volcano.
N X M Data Streams
Partitioned Execution
A...E F...J K...N O...S T...Z
A Table
Count Count Count Count Count
Count
Spreads computation and IO among processors
Partitioned data gives NATURAL parallelism
N x M way Parallelism
A...E F...J K...N O...S T...Z
Merge
Join
Sort
Join
Sort
Join
Sort
Join
Sort
Join
Sort
Merge Merge
N inputs, M outputs, no bottlenecks.
Partitioned DataPartitioned and Pipelined Data Flows
Clusters (Plumbing)
• Single system image
– naming
– protection/security
– management/load balance
• Fault Tolerance
– Wolfpack
• Hot Pluggable hardware & Software
Windows NT clusters• Key goals:
– Easy: to install, manage, program
– Reliable: better than a single node
– Scaleable: added parts add power
• Microsoft & 60 vendors defining NT clusters– Almost all big hardware and
software vendors involved
• No special hardware needed - but it may help
• Enables – Commodity fault-tolerance– Commodity parallelism
(data mining, virtual reality…)– Also great for workgroups!
• Initial: two-node failover
– Beta testing since December96
– SAP, Microsoft, Oracle giving demos.
– File, print, Internet, mail, DB, other services
– Easy to manage
– Each node can be 4x (or more) SMP
• Next (NT5) “Wolfpack” is modest size cluster
– About 16 nodes (so 64 to 128 CPUs)
– No hard limit, algorithms designedto go further
So, What’s New?• When slices cost 50k$, you buy 10 or 20.• When slices cost 5k$ you buy 100 or 200.• Manageability, programmability, usability
become key issues (total cost of ownership).• PCs are MUCH easier to use and program
New MPP &NewOS
New App
New MPP &NewOS
New App
New MPP &NewOS
New App
New MPP &NewOS
New App
Customers
MPPVicious CycleNo Customers!
CP/CommodityVirtuous Cycle:Standards allow progressand investment protection
Apps
Standardplatform
Thesis: Scaleable Servers• Scaleable Servers
– Commodity hardware allows new applications– New applications need huge servers– Clients and servers are built of the same “stuff”
• Commodity software and • Commodity hardware
• Servers should be able to – Scale up (grow node by adding CPUs, disks, networks)
– Scale out (grow by adding nodes)
– Scale down (can start small)
• Key software technologies– Objects, Transactions, Clusters, Parallelism
Web Web sitesite
DatabaseDatabase
Web site filesWeb site files
Database filesDatabase files
AliceAlice BettyBetty
BrowserBrowser
WolfPack ClusterIIS & SQL Failover Demo
Web Web sitesite
DatabaseDatabase
AliceAlice BettyBetty
SummarySummary
• SMP Scale UP: OK but limited• Cluster Scale OUT: OK and unlimited• Manageability:
– fault tolerance OK & easy!– more needed
• CyberBricks work• Manual Federation now• Automatic in future
Scale OutScale Out
Scale Up
Scale DownDown
Scalability Research Problems • Automatic everything
• Scaleable applications– Parallel programming with clusters– Harvesting cluster resources
• Data and process placement– auto load balance– dealing with scale (thousands of nodes)
• High-performance DCOM – active messages meet ORBs?
• Process pairs, other FT concepts?
• Real time: instant failover
• Geographic (WAN) failover