5/10/2002 (c) Microsoft Corporation 1
Self-Tuning Database Systems: The AutoAdmin Experience
Surajit ChaudhuriData Management and Exploration Group
Microsoft Researchhttp://research.microsoft.com/users/surajitc
5/10/2002 (c) Microsoft Corporation 2
Research Group Overview
5/10/2002 (c) Microsoft Corporation 3
Data Management, Exploration and Mining Group
Formed in 1999 by fusing two projects -AutoAdmin and DB support for DM Research with technology transfer
Project-orientedClose partnership with SQL Server
6 researchers, 5 developersA junior-heavy team Strong internship program
5/10/2002 (c) Microsoft Corporation 4
Current Projects
AutoAdmin: Self Tuning Database Systems Data Cleaning Exploratory Projects
Approximate Query ProcessingDocuments + Structured DataXML2SQL
Past project: SQL-aware Data Mining
5/10/2002 (c) Microsoft Corporation 5
Self-Tuning Database Systems: The AutoAdmin Experience
The Black Art of Database Tuning. . .
Applications
DBS
Workload
Performance
TuningGuru
SystemParameters
5/10/2002 (c) Microsoft Corporation 7
AutoAdmin: Motivation
Started in summer 1996 at Microsoft Research – team of 2Our Goal:
Make database systems self-tuning and self administering
Analogy: Cars
Reduce TCO
5/10/2002 (c) Microsoft Corporation 8
Vision of a Self Tuning System
Manager Sets goals, policy, and the budgetSystem does the rest
Everyone is a CIOBuild a system
Used by millions of people each dayAdministered and managed by a ½ time person
On hardware fault, order replacement partOn overload, order additional equipmentUpgrade hardware and software automatically
“What Next?A dozen remaining IT problems”
Turing Award Lecture,FCRC,
May 1999Jim GrayMicrosoft
5/10/2002 (c) Microsoft Corporation 10
Physical Design ImpactsQuery Execution
SELECT NameFROM EmployeesWHERE Age < 40 AND Salary > 200K
Execution Plan A: Filter (Age < 40 AND Salary > 200K)Table Scan (Employees)
Execution Plan B:Filter (Age < 40)Table Lookup (Employees) by Salary
5/10/2002 (c) Microsoft Corporation 11
Effect of Workload on Physical Design
Which column(s) should we index?Right answer may be:
SalaryAgeBothNeither!
Depends on the workload, and requires knowledge of statistics
SELECT NameFROM EmployeesWHERE Age < 40 AND Salary > 200K
SELECT NameFROM EmployeesWHERE Age < 20 AND Salary > 50K
5/10/2002 (c) Microsoft Corporation 12
AutoAdmin: Key ContributionsA What-if architecture for exploring the space of hypothetical designs (SIGMOD 98)
Workload drivenIntegrated physical database design tool(VLDB 97, VLDB 00)
Recommends indexes and Materialized ViewsPart of Microsoft SQL Server product since 1998
Statistics selection (ICDE 00, SIGMOD 02)Execution feedback driven statistics building(SIGMOD 99, SIGMOD 01)
5/10/2002 (c) Microsoft Corporation 13
“What-If” Architectures
5/10/2002 (c) Microsoft Corporation 14
“What-If” Architecture Overview
Query
Optimizer(Extended)
Database Server
Workload
AutoAdmin
Recommendation
“What-if”
Application
5/10/2002 (c) Microsoft Corporation 15
“What-If” Analysis of Physical Design
Estimate quantitatively the impact of physical design on workload
e.g., if we add an index on T.c, which queries benefit and by how much?
Without making actual changes to physical design
Time consuming Resource intensive
Search efficiently the space of hypothetical designs
5/10/2002 (c) Microsoft Corporation 16
Workload-driven Physical Design for Databases
5/10/2002 (c) Microsoft Corporation 17
Physical Database Design:Problem Statement
Workloadqueries and updates
ConfigurationA set of indexes, materialized views from a search spaceCost obtained by “what-if” realization of the configuration
ConstraintsUpper bound on storage space for indexes
Search: Pick a configuration that is of “lowest” cost for the given database and workload (VLDB 1997)
5/10/2002 (c) Microsoft Corporation 18
Architecture of Tuning Wizard in Microsoft SQL Server
Candidate Selection
Workload
Recommendation
ConfigurationEnumeration
Microsoft
SQL
Server
ServerExtensions
5/10/2002 (c) Microsoft Corporation 19
Search Space
Large Search Space for indexesMany columns to choose fromKinds of indexes
Explosive search space for materialized viewsQuery optimizers use physical design in novel waysPhysical design choices interact
5/10/2002 (c) Microsoft Corporation 20
AutoAdmin Milestones
Started in late summer 1996SQL Server 7.0: Ships index tuning wizard (1998)SQL Server 2000: Integrated recommendations for indexes and materialized Views Shared research results widely
5/10/2002 (c) Microsoft Corporation 21
Workload Driven Statistics Management
5/10/2002 (c) Microsoft Corporation 22
ExampleSELECT * FROM lineitem, ordersWHERE l_orderkey = o_orderkey ANDl_shipdate = '01-02-99' AND o_orderdate = '01-01-99'
orders lineitem
Index Nested Loop Join
Result
orders lineitem
Merge Join
Result
With stats Cost = 25
Without stats Cost = 112
5/10/2002 (c) Microsoft Corporation 23
Essential Set of Statistics
“Chicken-and-egg” problemCannot tell if additional statistics are necessary until we actually build them!Need a test for equivalence without having to build any statistics in (C – S)
S
C
5/10/2002 (c) Microsoft Corporation 24
ExampleSELECT E.EmployeeName, D.DeptName FROM Employees E, Department D WHERE E.DeptId = D.DeptID AND E.Age < 40 AND E.Salary > 200KStatistics on E.Age are missingMay not need statistics on E.Age if predicate E.Salary > 200K is very selective
5
/
1
0
/
2
0
0
2
(
c
)
M
i
c
r
o
s
o
f
t
C
o
r
p
o
r
a
t
i
o
n
2
5
F
o
r
m
a
l
i
z
i
n
g
E
s
s
e
n
t
i
a
l
S
t
a
t
i
s
t
i
c
s
�
„
O
u
r
G
o
a
l
:
F
i
n
d
a
s
u
b
s
e
t
t
h
a
t
i
s
“
a
s
g
o
o
d
”
a
s
h
a
v
i
n
g
a
l
l
s
t
a
t
i
s
t
i
c
s
b
u
t
a
v
o
i
d
p
r
i
c
e
o
f
m
a
i
n
t
a
i
n
i
n
g
a
l
l
�
„
F
o
r
g
i
v
e
n
w
o
r
k
l
o
a
d
�
„
W
h
a
t
i
s
“
a
s
g
o
o
d
”
?
�
„
t
-
O
p
t
i
m
i
z
e
r
-
C
o
s
t
e
q
u
i
v
a
l
e
n
c
e
�
„
C
o
s
t
(
Q
,
C
)
a
n
d
C
o
s
t
(
Q
,
S
)
a
r
e
w
i
t
h
i
n
t
%
o
f
e
a
c
h
o
t
h
e
r
P
u
b
l
i
c
a
t
i
o
n
:
I
E
E
E
I
C
D
E
2
0
0
5/10/2002 (c) Microsoft Corporation 26
Essential Statistics(IEEE ICDE 2000)
In the absence of statistics:Query Optimizers use “magic numbers” for selectivity of predicates
For Age < 40, assume selectivity = 0.30Data distribution independent
MNSA (Magic Number Sensitivity Analysis)Set magic numbers to a few different valuesIf varying selectivity does not affect plan
⇒⇒⇒⇒ additional statistics will not help Else⇒⇒⇒⇒ Select a “promising” statistics to build
5/10/2002 (c) Microsoft Corporation 27
Statistics on Queries
Reduce optimizer error by building statistics on query expressions (SIT)A very promising ideaLike materialized views – a manageability challenge Recent work from AutoAdmin (SIGMOD 2002)
5/10/2002 (c) Microsoft Corporation 28
Execution Feedback Driven Statistics Building
5/10/2002 (c) Microsoft Corporation 29
Self-Tuning Statistics
Think Maps Why care about maps for Greenland? Need detailed maps for areas you visitMake maps more detailed each time you visit
Idea: Start with “uniformity” assumptionProgressively refine with execution feedbackSingle and multidimensional histograms SIGMOD 99, SIGMOD 2001
5/10/2002 (c) Microsoft Corporation 30
More on Self-Tuning Database Systems
More at MicrosoftSQL Server 7.0 introduced several auto-tuning features
IBM AlmadenWork by Mario and Shel LEO at IBM ARC has similar goals as AutoAdmin
5/10/2002 (c) Microsoft Corporation 31
Rethinking Database Systems
5/10/2002 (c) Microsoft Corporation 32
Featurism hurts Self-TuningFeaturism has turned into a curse
Yet another indexing smart /join method/optimizer transformation added
Abusing ExtensibilityEliminate all second-order optimizations
Turning into black magicHard to abstract principlesCannot educate next generation of engineersPerformance is unpredictable
Self-Tuning is difficult
5/10/2002 (c) Microsoft Corporation 33
Role ModelsEx. 1: Aircraft with many subsystems (engine, fuselage, electrical control, etc.)Ex. 2: RISC hardwareNo single engineer understands entire system
Local theories for individual subsystems andreasonable understanding of interactions
Few points of interaction with stable and narrow interfacesBuilt-in system support for debugging subcomponents (incl. Performance tuning)
5/10/2002 (c) Microsoft Corporation 34
RISC Philosophy for DBMS
Details in VLDB 2000 vision paperPackage as components with simplified functionalityEnforce
Layered approachStrong limits on interaction (narrow APIs)Multiple consumers for a component
Components must have manageable complexityEncapsulation must include predictable performance and self-tuningNot a new idea – but an idea worth revisiting
5/10/2002 (c) Microsoft Corporation 35
Final WordsDBMS has to be self-tuning to be a good software componentAutoAdmin
Exploit workload and execution feedback richly for enabling self-tuningDemonstrated through technology incorporated in Microsoft SQL Server
Despite advances, self-tuning remains a very formidable challenge
Need to think “self-tuning” globally by paying attention “locally”RISC DBMS architectures – worth revisiting?
5/10/2002 (c) Microsoft Corporation 36
More Information
Data Management, Exploration and Mining Group Homepage
http://research.microsoft.com/dmx
Microsoft SQL Server White papers on Self-Tuning technologyMy contacts
http://research.microsoft.com/users/[email protected]