Date post: | 30-Mar-2015 |
Category: |
Documents |
Upload: | salvatore-twaddle |
View: | 214 times |
Download: | 0 times |
Data Management in a Data Management in a Highly Connected Highly Connected
WorldWorld
James HamiltonJames [email protected]@microsoft.com
Microsoft SQL ServerMicrosoft SQL Server
March 3, 2000March 3, 2000
22
AgendaAgenda Client TierClient Tier
Number of devicesNumber of devices Device interconnect fabricDevice interconnect fabric Standard programming infrastructureStandard programming infrastructure Client tier database issuesClient tier database issues
Resource requirementsResource requirements Implementation languageImplementation language
Administrative cost implicationsAdministrative cost implications Development cost implicationsDevelopment cost implications
Middle TierMiddle Tier Server TierServer Tier SummarySummary
33
How Many Clients?How Many Clients?
1998 US WWW users 1998 US WWW users (IDC)(IDC)
US: 51M; World wide: 131MUS: 51M; World wide: 131M 2001 estimates:2001 estimates:
World Wide: 319M usersWorld Wide: 319M users 515M connected devices515M connected devices
½ billion connected Clients½ billion connected Clients Conservative estimate based upon Conservative estimate based upon
conventional device countsconventional device counts
44
Other Device TypesOther Device Types TVs, VCRs, stoves, thermostats, TVs, VCRs, stoves, thermostats,
microwaves, CD players, computers, garage microwaves, CD players, computers, garage door openers, lights, sprinklers, appliances, door openers, lights, sprinklers, appliances, driveway de-icers, security systems, driveway de-icers, security systems, refrigerators, health monitoring, etc.refrigerators, health monitoring, etc.
Sony evangelizing IEEE 1394 InterconnectSony evangelizing IEEE 1394 Interconnect http://www.http://www.selsel..sonysony.com/semi/iee1394wp.html.com/semi/iee1394wp.html
Microsoft & consortium evangelizing Microsoft & consortium evangelizing Universal Plug & PlayUniversal Plug & Play www.upnp.orgwww.upnp.org
WAP: Wireless Application ProtocolWAP: Wireless Application Protocol http://www.wap.net/http://www.wap.net/
55
Device Interconnect InfrastructureDevice Interconnect Infrastructure Power line controlPower line control
X10: X10: http://www.x10.orghttp://www.x10.org
Sunbeam Thalia:Sunbeam Thalia: http://www.thaliaproducts.com/http://www.thaliaproducts.com/
66
Why Connect These Devices?Why Connect These Devices?
TV guide & auto VCR programmingTV guide & auto VCR programming CD label info & song list downloadCD label info & song list download Sharing data & resourcesSharing data & resources Set clocks (flashing 12:00)Set clocks (flashing 12:00) Fire and burglar alarmsFire and burglar alarms Persist thermometer settingsPersist thermometer settings Feedback & data sharing based systems:Feedback & data sharing based systems:
Temperature control & power blind interactionTemperature control & power blind interaction Occupancy directed heating and lightingOccupancy directed heating and lighting
77
Device Communication ImplicationsDevice Communication Implications
The need is thereThe need is there Infrastructure is going in:Infrastructure is going in:
WirelessWireless Power line communicationsPower line communications Unused twisted pair (phone) bandwidthUnused twisted pair (phone) bandwidth
Connectable devices & infrastructure Connectable devices & infrastructure arriving & being deployedarriving & being deployed
On order of billions of client devicesOn order of billions of client devices
88
Device Interconnect ExampleDevice Interconnect Example
99
Device Interconnect ExampleDevice Interconnect Example
1010
Device Interconnect ExampleDevice Interconnect Example
1111
Device interconnect ExampleDevice interconnect Example
660 Gallon 660 Gallon MarineMarine
AquariumAquarium
X10 BackboneX10 Backbone
130 Gallon 130 Gallon F/WF/W
AquariumAquarium
130 Gallon 130 Gallon F/WF/W
AquariumAquariumBedroomBedroom
Living RoomLiving Room
DenDen
FiltrationFiltrationPlantPlant
HomeHomeSprinklersSprinklers
Windows NT
Server
Ethernet BackboneEthernet Backbone
Ethernet HubEthernet Hub
DeckDeck
56k bps line56k bps line
1212
Improvements For ExampleImprovements For Example
Cooperation of lighting, A/C and Cooperation of lighting, A/C and power blind systemspower blind systems
Alarms and remote notification for Alarms and remote notification for failures in:failures in: Circulations pumpCirculations pump Heating & coolingHeating & cooling Salinity & other water chemistry changesSalinity & other water chemistry changes Filtration systemFiltration system
Feedback directed systemsFeedback directed systems
1313
Palmtop Resource TrendsPalmtop Resource Trends Palmtops I’ve purchased through the yearsPalmtops I’ve purchased through the years All about same cost & physical sizeAll about same cost & physical size
1992 19941990 1996 1998 2000
0.1
1
10
100
2002
Shar
p IQ
7000
(0.1
25M
)
Shar
p IQ
8300
(0.2
5M)
HP
95L
X (0
.5M
)H
P 10
0LX
(1M
)
HP
200L
X (2
M)
Everex A20 (4m)
Casio E105 (32M)
Palmtop RAM Moore’s Law
32M32M
1414
O/S Memory RequirementsO/S Memory Requirements Windows Memory requirements over timeWindows Memory requirements over time
Desktop RAM Moore’s Law
1989 19911987 1993 1995 1997
0.1
1
10
100
1999
Windows 1.0 (256K)
Windows95 (4M)
1985
WFW 3.1 (3M)
Windows 2.0 (512K)
Windows 3.0 (2M)
Windows98 (16M)
128m128m
Windows 2000(64M)
1515
Smartcard Resource TrendsSmartcard Resource Trends
Source: Denis Roberson PIN/Card -Tech/ NCR
1990 1992 1996 1998 2000 2002
Me
mo
ry S
ize
(B
its
)
300 M
1 M
3 K
10 K
You are here
2004
1616
Devices Smaller Than PDAsDevices Smaller Than PDAs Qualcomm PDQQualcomm PDQ 2 MB total memory2 MB total memory Same mem curve as Same mem curve as
PDAs…just 2 to 5 years PDAs…just 2 to 5 years behindbehind
Nokia 9000ilNokia 9000il8 MB total Memory8 MB total Memory
1717
Digital CamerasDigital CamerasMakeMake ModelModel MemoryMemory
AgfaAgfa CL30CL30 60 to 360MB60 to 360MB
CanonCanon PowerShot PowerShot S20S20
6 to 176MB6 to 176MB
EpsonEpson PhotoPC PhotoPC 850Z850Z
10 to 120MB10 to 120MB
KodakKodak DC-280DC-280 32 to 245MB32 to 245MB
OlympusOlympus D-340RD-340R 18 to 120MB18 to 120MB
PanasonicPanasonic Palmcam Palmcam PV-SD4090PV-SD4090
450 to 450 to 1,500MB1,500MB
SanyoSanyo VPC-SX500VPC-SX500 19 to 120MB19 to 120MB
1818
Resource Trend ImplicationsResource Trend Implications
Device resources at constant cost are Device resources at constant cost are growing at super-Moore ratesgrowing at super-Moore rates Same but 2 to 3 yrs behind desktop system Same but 2 to 3 yrs behind desktop system
growthgrowth
Same is true of each class of devicesSame is true of each class of devices Telephones trail PDAs but again grow at the Telephones trail PDAs but again grow at the
same ratesame rate
Memory growth is not the problemMemory growth is not the problem However devices always smaller than desktopsHowever devices always smaller than desktops Devices more specialized so resource Devices more specialized so resource
consumption less … can still run standard consumption less … can still run standard vertical app slicevertical app slice
1919
Standard Infrastructure at ClientStandard Infrastructure at Client
Clearly specialized user interface S/W neededClearly specialized user interface S/W needed But we have the memory resources to support:But we have the memory resources to support:
Standard communications stack (TCP/IP)Standard communications stack (TCP/IP) Standard O/S softwareStandard O/S software Standard data management S/W with queryStandard data management S/W with query Transparent replicationTransparent replication
Symmetric multi-tiered infrastructure S/W:Symmetric multi-tiered infrastructure S/W: Leverage best development environmentsLeverage best development environments No need to rewrite millions of redundant lines of codeNo need to rewrite millions of redundant lines of code More heavily used & tested so less bugsMore heavily used & tested so less bugs Better productivity in programming to richer platformBetter productivity in programming to richer platform
A full DBMS at client both practical & usefulA full DBMS at client both practical & useful
2020
Client-Side Database IssuesClient-Side Database Issues ““Honey I shrunk the database” Honey I shrunk the database”
(SIGMOD99):(SIGMOD99):
DB FootprintDB Footprint Implementation LanguageImplementation Language
Both issues either largely irrelevant or Both issues either largely irrelevant or soon to be:soon to be: Resource availability trends support Resource availability trends support
standard infrastructure S/Wstandard infrastructure S/W Dominant costs: admin, operations & Dominant costs: admin, operations &
user training, and programminguser training, and programming Vertical slice of standard apps rather Vertical slice of standard apps rather
than full custom infrastructurethan full custom infrastructure
2121
DB Implementation LanguageDB Implementation Language Special DB implementation language (Java) Special DB implementation language (Java)
argument:argument: centers on auto-installation of S/W centers on auto-installation of S/W
infrastructureinfrastructure Auto-install is absolutely vital, but independent Auto-install is absolutely vital, but independent
of implementation languageof implementation language Auto-install not enough: client should be a Auto-install not enough: client should be a
cache of recently used S/W and datacache of recently used S/W and data Full DBMS at clientFull DBMS at client Client-side cache of recently accessed dataClient-side cache of recently accessed data Optimizer selected access path choice:Optimizer selected access path choice:
driven by accuracy & currency requirementsdriven by accuracy & currency requirements balanced against connectivity state & balanced against connectivity state &
communications costscommunications costs
2222
Admin Costs Still DominateAdmin Costs Still Dominate 60’s large system mentality still prevails:60’s large system mentality still prevails:
Optimizing precious machine resources is false Optimizing precious machine resources is false economyeconomy
Admin & education costs more importantAdmin & education costs more important TCO education from the PC world repeatedTCO education from the PC world repeated Each app requires admin and user training…Each app requires admin and user training…
much cheaper to roll out 1 infrastructure much cheaper to roll out 1 infrastructure across multiple form factorsacross multiple form factors
Sony PlayStation has 3Mb RAM & FlashSony PlayStation has 3Mb RAM & Flash Nokia 9000IL phone has 8Mb RAMNokia 9000IL phone has 8Mb RAM
Trending towards 64M palmtop in 2001Trending towards 64M palmtop in 2001 Vertical app slice resource reqmt can be metVertical app slice resource reqmt can be met
2323
Dev Costs Over Memory CostsDev Costs Over Memory Costs Specialty RTOS weak dev environmentsSpecialty RTOS weak dev environments Quality & quantity of apps driven by:Quality & quantity of apps driven by:
Dev environment qualityDev environment quality Availability of trained programmersAvailability of trained programmers
Requirement for custom client development Requirement for custom client development & configuration greatly reduces deployment & configuration greatly reduces deployment speedspeed
Same apps have wide range of device form Same apps have wide range of device form factorsfactors
Symmetric client/server execution environ.Symmetric client/server execution environ. DB components and data treated uniformlyDB components and data treated uniformly
Both replicated to client as neededBoth replicated to client as needed
2424
Client Side SummaryClient Side Summary On order of billions connected client devicesOn order of billions connected client devices
Most are non-conventional computing devicesMost are non-conventional computing devices
All devices include standard DB componentsAll devices include standard DB components Standard physical & logical device Standard physical & logical device
interconnect standards will emergeinterconnect standards will emerge DB implementation language irrelevantDB implementation language irrelevant Device DB resource consumption much less Device DB resource consumption much less
important than ease of:important than ease of: InstallationInstallation AdministrationAdministration ProgrammingProgramming Symmetric client/server execution environmentSymmetric client/server execution environment
2525
AgendaAgenda Client TierClient Tier Middle TierMiddle Tier
High Availability via redundant data & metadataHigh Availability via redundant data & metadata Fault Isolation domainsFault Isolation domains XMLXML Mid-tier CachingMid-tier Caching
Server TierServer Tier SummarySummary
2626
High Availability is ToughHigh Availability is Tough
AvailabilityAvailability Annual Annual Lost Data Lost Data AccessAccess
Number of Number of NinesNines
90%90% ~1 week~1 week 11
99%99% <4 days<4 days 22
99.9%99.9% <9 hours<9 hours 33
99.99%99.99% ~1 hour~1 hour 44
99.999%99.999% ~5 min~5 min 55
99.9999%99.9999% ~30 sec~30 sec 66
2727
Server Availability: HeisenbugsServer Availability: Heisenbugs
Industry good at finding functional errorsIndustry good at finding functional errors Multi-user & application interactions hard:Multi-user & application interactions hard:
Sequences of statistically unlikely eventsSequences of statistically unlikely events Heisenbugs (Heisenbugs (http://research.http://research.microsoftmicrosoft.com/~gray/talks.com/~gray/talks)) Testing for these is exponentially expensiveTesting for these is exponentially expensive Server stack is nearing 100 MLOCServer stack is nearing 100 MLOC Long testing and beta cycles delay software releaseLong testing and beta cycles delay software release
System size & complexity growth inevitable:System size & complexity growth inevitable: Re-try operation (Microsoft Exchange)Re-try operation (Microsoft Exchange) Re-run operation against redundant data copy (Tandem)Re-run operation against redundant data copy (Tandem) Fail fast design approach is robust but only acceptable Fail fast design approach is robust but only acceptable
with redundant access to redundant copies of datawith redundant access to redundant copies of data
2828
The Inktomi LessonThe Inktomi Lesson Inktomi web search engine Inktomi web search engine (Brewer --SIGMOD’98)(Brewer --SIGMOD’98)
Quickly evolving software:Quickly evolving software: Memory leaks, race conditions, etc. considered normalMemory leaks, race conditions, etc. considered normal Don’t attempt to test & beta until quality highDon’t attempt to test & beta until quality high
System availability of paramount importanceSystem availability of paramount importance Individual node availability unimportantIndividual node availability unimportant
Shared nothing clusterShared nothing cluster Exploit ability to fail individual nodes:Exploit ability to fail individual nodes:
Automatic reboots avoid memory leaksAutomatic reboots avoid memory leaks Automatic restart of failed nodesAutomatic restart of failed nodes Fail fast: fail & restart when redundant checks failFail fast: fail & restart when redundant checks fail Replace failed hardware weekly (mostly disks)Replace failed hardware weekly (mostly disks)
Dark machine roomDark machine room No panic midnight calls to admins No panic midnight calls to admins
Mask failures rather than futile attempt to avoidMask failures rather than futile attempt to avoid
2929
Apply to High Value TP Data?Apply to High Value TP Data?
Inktomi model:Inktomi model: Scales to 100’s of nodesScales to 100’s of nodes S/W evolves quicklyS/W evolves quickly Low testing costs and no-beta requirementLow testing costs and no-beta requirement
Exploits ability to lose individual node without Exploits ability to lose individual node without impacting system availabilityimpacting system availability
Ability to temporarily lose some data W/O Ability to temporarily lose some data W/O significantly impacting query qualitysignificantly impacting query quality
Can’t loose data availability in most TP systemsCan’t loose data availability in most TP systems Redundant data allows node loss w/o data availability lostRedundant data allows node loss w/o data availability lost
Inktomi model with redundant data & metadata a Inktomi model with redundant data & metadata a potential solutionpotential solution
3030
Redundant Data & MetadataRedundant Data & Metadata TP Point access to data nearly solved problemTP Point access to data nearly solved problem TP systems scale with user number, people on TP systems scale with user number, people on
planet, or business sizeplanet, or business size All trending at sub-Moore ratesAll trending at sub-Moore rates
Data analysis systems growing far faster than Data analysis systems growing far faster than Moore’s Law:Moore’s Law: Greg’s law: 2x every 9 to 12 Greg’s law: 2x every 9 to 12 (SIGMOD98—Patterson)(SIGMOD98—Patterson) Seriously super-Moore implying that no single system Seriously super-Moore implying that no single system
can scale sufficiently: clusters are the only solutioncan scale sufficiently: clusters are the only solution Storage trending to free with access speed limiting Storage trending to free with access speed limiting
factorfactor Detailed data distribution statistics need to be maintainedDetailed data distribution statistics need to be maintained
Improve access speed & availability using Improve access speed & availability using redundant data (indexes, materialized views, etc.)redundant data (indexes, materialized views, etc.)
Async update for stats, indexes, mat viewsAsync update for stats, indexes, mat views Data paths choice based upon need currency & accuracyData paths choice based upon need currency & accuracy
3131
Affordable AvailabilityAffordable Availability Web-enabled direct access model driving Web-enabled direct access model driving
high availability requirements:high availability requirements: recent high profile failures at eTrade and recent high profile failures at eTrade and
Charles Schwab Charles Schwab Web model enabling competition in Web model enabling competition in
information accessinformation access Drives much faster server side software Drives much faster server side software
innovation which negatively impacts qualityinnovation which negatively impacts quality ““Dark machine room” approach requires Dark machine room” approach requires
auto-admin and data redundancyauto-admin and data redundancy Inktomi model Inktomi model (Erik Brewer–SIGMOD’98)(Erik Brewer–SIGMOD’98) 42% of system failures admin error 42% of system failures admin error (Gray)(Gray)
Paging admin at 2am to fix problem is Paging admin at 2am to fix problem is dangerousdangerous
3232
Client
Connection Model/ArchitectureConnection Model/Architecture
ServerNode
Server Cloud
Redundant data & metadataRedundant data & metadata Shared nothingShared nothing Single system imageSingle system image Symmetric server nodesSymmetric server nodes
Any client connects to any serverAny client connects to any server
All nodes SAN-connectedAll nodes SAN-connected
3333
Client
Compilation & Execution ModelCompilation & Execution Model
Server Cloud
Server ThreadLex analyzeParseNormalizeOptimizeCode generate
Query execute
Query execution on many Query execution on many subthreads synchronized subthreads synchronized by root threadby root thread
3434
Lose node:Lose node: RecompileRecompile
Re-executeRe-execute
Client
Node Loss/RejoinNode Loss/Rejoin
Server Cloud
Execution in progressExecution in progress
Rejoin:Rejoin: Node local recoveryNode local recovery Rejoin clusterRejoin cluster Recover global data at rejoining nodeRecover global data at rejoining node Rejoin clusterRejoin cluster
3535
Client
Redundant Data Update ModelRedundant Data Update Model
Server Cloud
Updates are standard parallel Updates are standard parallel query plansquery plans
Optimizer manages redundant Optimizer manages redundant access pathsaccess paths
Query plan responsible for Query plan responsible for access plan management:access plan management: No significant new technologyNo significant new technology Similar to materialized view & Similar to materialized view &
index updates todayindex updates today
3636
Fault Isolation DomainsFault Isolation Domains Trade single-node perf for redundant data checks:Trade single-node perf for redundant data checks:
Complex error recovery more likely to be wrong than Complex error recovery more likely to be wrong than original forward processing codeoriginal forward processing code
Many redundant checks are compiled out of “retail Many redundant checks are compiled out of “retail versions” when shipped versions” when shipped
Fail fast rather than attempting to repair:Fail fast rather than attempting to repair: Bring down node for mem-based data structure faultsBring down node for mem-based data structure faults Don’t patch inconsistent data … copies keep system Don’t patch inconsistent data … copies keep system
availableavailable
If anything goes wrong “fire” the node and If anything goes wrong “fire” the node and continue:continue: Attempt node restartAttempt node restart Auto-reinstall O/S, DB and recreate DB partitionAuto-reinstall O/S, DB and recreate DB partition Mark node “dead” for later replacementMark node “dead” for later replacement
3737
Data Structure MattersData Structure Matters Most internet content is unstructured textMost internet content is unstructured text
restricted to simple Boolean search techniquesrestricted to simple Boolean search techniques Docs have structure, just not explicitDocs have structure, just not explicit Yahoo hand categorizes contentYahoo hand categorizes content
indexing limited & human involvement doesn’t indexing limited & human involvement doesn’t scale wellscale well
XML is a good mix of simplicity, flexibility, XML is a good mix of simplicity, flexibility, & potential richness& potential richness Structure description language of internetStructure description language of internet DBMSs need to support as first class datatypeDBMSs need to support as first class datatype
Too few librarians in worldToo few librarians in world so all information must be self-describingso all information must be self-describing
3838
Relational to XMLRelational to XML SELECT … FOR XMLSELECT … FOR XML
FOR XML RAW (return an XML rowset)FOR XML RAW (return an XML rowset) FOR XML AUTO (exploit RI, name matching, etc.)FOR XML AUTO (exploit RI, name matching, etc.) FOR XML EXPLICIT (maximal control)FOR XML EXPLICIT (maximal control)
Annotated SchemaAnnotated Schema Mapping between XML and relational schema expressed in XMLMapping between XML and relational schema expressed in XML
TemplatesTemplates Encapsulated parameterized queryEncapsulated parameterized query XSL/T supportXSL/T support XPATH supportXPATH support
Direct URL access (SQL “owned” virtual root)Direct URL access (SQL “owned” virtual root) SELECT … FOR XMLSELECT … FOR XML Annotated schemaAnnotated schema TemplatesTemplates
3939
XML to RelationalXML to Relational
XML bulk loadXML bulk load Templates and Annotated SchemaTemplates and Annotated Schema SQL server hosted XML treeSQL server hosted XML tree
Directly insert document into SQL Server Directly insert document into SQL Server hosted XML treehosted XML tree
Select from server hosted XML tree Select from server hosted XML tree rowset & insert into SQL tablesrowset & insert into SQL tables
XML Data type supportXML Data type support Hierarchical full text searchHierarchical full text search
4040
XML: ExampleXML: Example
http://SRV1/nwind?http://SRV1/nwind?sql=SELECT+DISTINCT+ContactTitle+FROMsql=SELECT+DISTINCT+ContactTitle+FROM+Customers+WHERE+ContactTitle+LIKE+'S+Customers+WHERE+ContactTitle+LIKE+'Saa%25'+ORDER+bY+ContactTitle+FOR+XML+AU%25'+ORDER+bY+ContactTitle+FOR+XML+AUTOTO
Result setResult set::<Customers ContactTitle="Sales Agent"/> <Customers ContactTitle="Sales Agent"/>
<Customers ContactTitle="Sales Associate"/> <Customers ContactTitle="Sales Associate"/>
<Customers ContactTitle="Sales Manager"/> <Customers ContactTitle="Sales Manager"/>
<Customers ContactTitle="Sales Representative"/><Customers ContactTitle="Sales Representative"/>
4141
Mid-Tier Cache RequirementsMid-Tier Cache Requirements Non-proprietary multi-lingual programmingNon-proprietary multi-lingual programming
Symmetric mid-tier & server programming modelSymmetric mid-tier & server programming model
Non-connected, stateless programming modelNon-connected, stateless programming model High scale thread pool basedHigh scale thread pool based
Efficient main memory DB supportEfficient main memory DB support Full query over local cacheFull query over local cache
Query over just cached data, orQuery over just cached data, or Query over full corpus (server interaction reqd)Query over full corpus (server interaction reqd) Ability to handle network partitions & server failureAbility to handle network partitions & server failure
Support for life-time attributed data:Support for life-time attributed data: Transactional (possibly multi-server)Transactional (possibly multi-server) Near real timeNear real time Every N time unitsEvery N time units Read onlyRead only
4242
AgendaAgenda Client TierClient Tier Middle TierMiddle Tier Server TierServer Tier
Affordable computing by the sliceAffordable computing by the slice Everything onlineEverything online Disk are actually getting slowerDisk are actually getting slower Processing moves to storageProcessing moves to storage Approximate answers quicklyApproximate answers quickly Semi-structured storage supportSemi-structured storage support Administrative issuesAdministrative issues
SummarySummary
4343
Server-Side ChangesServer-Side Changes Server databases more functionally rich Server databases more functionally rich
than often requiredthan often required Trend reversal:Trend reversal:
Less at the server-tier with richer mid-tierLess at the server-tier with richer mid-tier
Focus at back-end shifts to:Focus at back-end shifts to: Reliability, Availability, and ScalabilityReliability, Availability, and Scalability Reducing administrative costsReducing administrative costs
Server side trends:Server side trends: Scalability over single-node performanceScalability over single-node performance Everything onlineEverything online Affordable availability in high scale systemsAffordable availability in high scale systems
4444
0
50000
100000
150000
200000
250000
Compaq/Microsoft TPC-C BenchmarkCompaq/Microsoft TPC-C Benchmark
$20$20$20$20$19$19$19$19
$53$53$53$53
Enterprise 6500Solaris 2.6
Oracle 8i v 8.1.6 $13,153,324.$97.10/tpmC
tpmC
$98$98$98$98
Escala EPC2400AIX 4.3.3
Oracle v8.1.6$7,462,215
$54.94 tpmC
ProLiant 8500 ClusterWindows 2000
SQL Server 2000$4,341,603.
$ 19.12 tpmC
These are Top 5 benchmarks as of Feb 17, 2000.
NOTE: All TPC-C results reported as of February 17, 2000NOTE: All TPC-C results reported as of February 17, 2000
IBM RS/6000 S80AIX 4.3.3
Oracle v 8.1.6 $7,156,910.$52.70/tpmC
$55$55$55$55
135,815 135,461135,815
227,079
ProLiant 8500 ClusterWindows 2000
SQL Server 2000$2,880,431.
$ 18.93 tpmC
152,207
4545
Computing by the SliceComputing by the Slice
Source: TPC report executive summary
4646
Just Save EverythingJust Save Everything Able to store all Info produced on earth (Lesk):Able to store all Info produced on earth (Lesk):
Paper sources: less than 160 TBPaper sources: less than 160 TB Cinema: less than 166 TBCinema: less than 166 TB Images: 520,000 TBImages: 520,000 TB Broadcasting: 80,000 TBBroadcasting: 80,000 TB Sound: 60 TBSound: 60 TB Telephony: 4,000,000 TBTelephony: 4,000,000 TB
These data yield 5,000 petabytesThese data yield 5,000 petabytes Others estimate upwards of 12,000 petabytesOthers estimate upwards of 12,000 petabytes World wide 1998 storage production: 13,000 World wide 1998 storage production: 13,000
petabytespetabytes No need to manage deletion of old dataNo need to manage deletion of old data Most data never accessed by a humanMost data never accessed by a human
Access aggregations & analysis, not point fetchAccess aggregations & analysis, not point fetch More storage than data allows for greater redundancy:More storage than data allows for greater redundancy:
indexes, materialized views, statistics, & other metadataindexes, materialized views, statistics, & other metadata
4747
Disk are Becoming Black HolesDisk are Becoming Black Holes
Seagate Seagate Cheetah 73Cheetah 73
Fast: 10k Fast: 10k RPM, 5.6 ms RPM, 5.6 ms access, 16 access, 16 MB cacheMB cache
But Very But Very large: 73.4 large: 73.4 GBGB
Result? Black hole: 2.4 accesses/sec/gb Result? Black hole: 2.4 accesses/sec/gb Large data caches requiredLarge data caches requiredEmploy redundant access pathsEmploy redundant access paths
4848
Processing Moves Towards StorageProcessing Moves Towards Storage
Trends:Trends: I/O bus bandwidth is bottleneckI/O bus bandwidth is bottleneck Switched serial nets support very high Switched serial nets support very high
bandwidthbandwidth Processor/memory interface is bottleneckProcessor/memory interface is bottleneck Growing CPU/DRAM perf gap leading to most Growing CPU/DRAM perf gap leading to most
CPU cycles in stallsCPU cycles in stalls
Combine CPU, serial network, Combine CPU, serial network, memory, & disk in single packagememory, & disk in single package
E.g. David Patterson ISTORE projectE.g. David Patterson ISTORE project
4949
Processing Moves Towards StorageProcessing Moves Towards Storage
Each disk forms part of multi-thousand Each disk forms part of multi-thousand node clusternode cluster Redundant data masks failureRedundant data masks failure
RAID-like approachRAID-like approach Each cyberbrick commodity H/W and S/WEach cyberbrick commodity H/W and S/W
O/S, database, and other server softwareO/S, database, and other server software Each “slice” plugged in & personality setEach “slice” plugged in & personality set
E.g. database or SAP app server)E.g. database or SAP app server) No other configuration requiredNo other configuration required
On failure of S/W or H/W, redundant nodes pick On failure of S/W or H/W, redundant nodes pick up workloadup workload Replace failed components at leisureReplace failed components at leisure Predictive failure modelsPredictive failure models
5050
Approximate Answers QuicklyApproximate Answers Quickly
DB systems focus on absolute correctnessDB systems focus on absolute correctness As size grows, correct answer increasingly expensive As size grows, correct answer increasingly expensive Text search systems depend upon quick approx answerText search systems depend upon quick approx answer
Approx answer with statistical confidence Approx answer with statistical confidence bound:bound: Steadily improve result until user satisfiedSteadily improve result until user satisfied ““Ripple Joins for Online AggregationRipple Joins for Online Aggregation”” (Hellerstein-(Hellerstein-
SIGMOD99)SIGMOD99)
Allows rapid exploration of large search Allows rapid exploration of large search spaces:spaces: Conventional full accuracy only when neededConventional full accuracy only when needed
Run query on incomplete mid-tier cache?Run query on incomplete mid-tier cache?
5151
Semi-Structured Storage SupportSemi-Structured Storage Support
Example applications:Example applications: Directory systems (e.g. Microsoft Active Directory systems (e.g. Microsoft Active
Directory)Directory) Document management systemsDocument management systems
Storage characteristics:Storage characteristics: Flexible & sparse schema supportFlexible & sparse schema support Fine grained securityFine grained security Recursive queryRecursive query Notification based extensibility commonNotification based extensibility common XML support importantXML support important
Particularly difficult to support when native Particularly difficult to support when native SQL access is also allowedSQL access is also allowed
Important area for RDBMS expansionImportant area for RDBMS expansion
5252
Examples: Performance W/O AdminExamples: Performance W/O Admin
Multiple cached plans for different Multiple cached plans for different parameter marker sub-domainsparameter marker sub-domains
Async statistics gatheringAsync statistics gathering Async optimizationAsync optimization Feedback-directed techniques:Feedback-directed techniques:
Adapting number of histogram bucketsAdapting number of histogram buckets Re-optimizing when cardinality errors Re-optimizing when cardinality errors
discovered during executiondiscovered during execution re-optimize with additional data distribution info re-optimize with additional data distribution info
gained during previous executiongained during previous execution
Optimizer-created indexing structures:Optimizer-created indexing structures: Add indexes when needed (Exchange & AS/400)Add indexes when needed (Exchange & AS/400)
5353
SummarySummary After 30 years, DB technology more relevant than After 30 years, DB technology more relevant than
ever:ever: Database innovations required at all tiersDatabase innovations required at all tiers All devices run standard DB componentsAll devices run standard DB components Symmetric multi-tier programming modelSymmetric multi-tier programming model Hierarchical caching modelHierarchical caching model Administration including installation disappearsAdministration including installation disappears All info online & machine accessibleAll info online & machine accessible Symmetric programming model on all tiersSymmetric programming model on all tiers Redundant data for availability & performanceRedundant data for availability & performance Increased dependence on Approximate answersIncreased dependence on Approximate answers Support for semi-structured appsSupport for semi-structured apps
Mid-tier & Client: data moves to the processorsMid-tier & Client: data moves to the processors Server-Tier: Processing moves to dataServer-Tier: Processing moves to data
Data Management in a Data Management in a Highly Connected Highly Connected
WorldWorld
James HamiltonJames [email protected]@microsoft.com
Microsoft SQL ServerMicrosoft SQL Server
March 3, 2000March 3, 2000