+ All Categories
Home > Documents > Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853...

Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853...

Date post: 09-Jun-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
17
Deep Dive on new Search Features in Denali CTP1 Naveen Garg, Principal Program Manager Microsoft Corporation
Transcript
Page 1: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Deep Dive on new Search Features in Denali CTP1

Naveen Garg, Principal Program Manager

Microsoft Corporation

Page 2: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Search Improvements

FullText Search

• Revamped Codebase for Significant Performance and Scale Improvement

• New Property-scoped Search

• Customizable NEAR Search

Page 3: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Performance & Scale Goals

• Scale up to 350M documents

• Query magnitudes faster than 2008 release

• Worst-case Query response time < 3 sec

• At par or better than key DBMS players

Page 4: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Code Investments - Versioned Updates

• SQL 2008 Design Issue

– Queries block updates to internal table that maintains state of index wrt document updates (such as for auto change tracking)

• Denali Investment

– Track batch commits in order

– Track lowest timestamp below which all batches are committed

– Select index data for query and merge from below this timestamp

Block update of the lowest timestamp only (instead of index updates)

Page 5: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Code Investments – Single STVF for Query

Goals • Improve Query Execution Performance

• Lower costs, better plans and hints

• Better code organization

Code Changes • Query Preparation

– Rewrite CONTAINS/FREETEXT in terms of CONTAINSTABLE/FREETEXTTABLE during binding

– Rewrite it as SELECT [TOP N] key, score FROM STVF [ORDER BY score] during QP prepare

• Compilation – Parse parameters to a tree and bind specific columns, Word breaking

– Tree Expansion with appropriate AND/OR, Noise word filtering

– Tree Reduction, Load Stats

• Execution – Transform to execution tree including Ranking function

– Iterate to produce resulting rows

Page 6: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Code Investments - Predicate Folding

• Multiple CONTAINS, FREETEXT Folded together

e.g. – CT(1) AND CT(2) AND CT(3) => CT(1 AND 2 AND 3)

– CT(1) AND CT(2) OR CT(3) => CT(1 AND 2 OR 3)

– CT(1) AND NOT (NOT CT(2) OR CT(3)) => CT( 1 AND 2 AND NOT 3)

– CT(1) AND (CT(2) OR i=10 OR CT(3)) => CT(1) AND (CT(2 OR 3) OR i=10)

• Except… – CT(1) AND NOT (CT(2) AND CT(3)) => CT(1) AND (NOT CT(2) OR NOT CT(3))

NO folding

Page 7: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Code Investments – Query Parallelism

Goals • Retain basic assumptions to avoid complete rewrite

• Scale to 1.6x latency reduction for doubling the cores

• Work well on both NUMA and UMA architectures

Changes • Query Optimizer and Execution updates to allow fulltext query

parallelism

• Fulltext STVF Updates to support multiple threads per query – Use DocID histogram to slice doc ranges for each thread

– Rebuild Autostats as part of background/master merge

Page 8: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Summary Of Code Improvements

• Faster Execution – Numerous code and data layout improvements

– No blocking during high index update workloads

– Improved mixed relational query processing

– Optimize Top N by Rank

• 10x: Select top 1K by score for keyword in 1M docs (250ms -> 28ms)

• Leverage CPU – Cache for Operators and Core Algorithms

• Batch decompression and rank computation, virtual functions

– Vector CPU instructions (SSE*) for scalar computations

• Ranking, TOP N, and Stale Test as major benefiters

• Leverage multicore – Parallel Query execution

– Parallel Master Merge

* SSE: Streaming SIMD (Single Instruction Multiple Data) Extensions

Page 9: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Query Throughput on 350M Documents

0 3.014

10.501

43.775

119.017

0.597 0.534 0.385 0.853 1.364

0

20

40

60

80

100

120

140

0 5 10 15 20 25 30 35

Thro

ugh

pu

t (q

ps)

CPU

Throughput (qps) with DML

SQL Server Denali

SQL Server 2005

0 3.009

13.571

64.825

157.93

4.772 8.147

17.102

48.27

61.374

0

20

40

60

80

100

120

140

160

180

0 5 10 15 20 25 30 35

Th

rou

ghp

ut

(qp

s)

CPU

Throughput (qps) without DML

SQL Server Denali

SQL Server 2005

Page 10: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Throughput & Execution Time on a Customer Workload

0

20

40

60

80

100

120

140

0 500 1000 1500 2000 2500

Qu

eri

es/

Seco

nd

Number of Connections

Throughput/#Connections

SQL Server Denali

SQL Server 2005

0

10000

20000

30000

40000

50000

60000

70000

0 500 1000 1500 2000 2500

Avg

Exe

cTim

e(m

s)

Number of Connections

AvgExecTime (ms)/#connections

SQL Server Denali

SQL Server 2005

Page 11: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Query Throughput on another Customer Workload

2X Query performance improvement compared with SQL Server 2005

0

1

2

3

4

5

6

7

8

9

0 50 100 150 200 250 300 350 400 450

Qu

erie

s /

Seco

nd

s

Users

Scaling Queries/Seconds

SQL Server Denali

SQL Server 2005

Page 12: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Performance & Scale Summary

• Index and Query tested on scale up to 350Million documents with < ~2 Sec Response – ~3X better w/o DML and ~9X better w DML throughput

– Scale easily with increasing number of connections

• TAP customers already reporting significant performance improvement on their workloads

Page 13: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Property Scoped Search

• Load Office Filters (needed once per database instance) –EXEC sp_fulltext_service 'load_os_resources',1; –EXEC sp_fulltext_service 'restart_all_fdhosts„;

• Create a property list –CREATE SEARCH PROPERTY LIST p1;

• Add properties to be extracted –ALTER SEARCH PROPERTY LIST [p1] ADD N'System.Author' WITH – (PROPERTY_SET_GUID = 'f29f85e0-4ff9-1068-ab91-08002b27b3d9', – PROPERTY_INT_ID = 4, PROPERTY_DESCRIPTION = N'System.Author');

• Create/Alter Fulltext index to specify property list to be extracted –ALTER FULLTEXT INDEX ON fttable... SET SEARCH PROPERTY LIST = [p1];

• Query for properties –SELECT * FROM fttable WHERE – CONTAINS(PROPERTY(ftcol, 'System.Author'), 'fernlope');

Page 14: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Identifying Property GUIDs • Commonly known Property Guids documented in MSDN

• For the rest… – Enable TF 7603

– Create and fully populate a Fulltext index with property search

– Check error log for Property Guids

– Recreate Index with required properties

• OR use FiltDump.EXE (Windows SDK) – Get property details

Attribute = {F29F85E0-4FF9-1068-AB91-08002B27B3D9}\2 (System.Title)

Page 15: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Indexing Properties with Keywords • Stored along with keywords but with additional

Internal Property ID (s)

Page 16: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Customizable ‘NEAR’ operator

• NEAR (( { <simple_term> | <phrase> | <prefix_term> } [,…n] ), [<maximum_distance> [, <match_order> ]) <maximum_distance> ::= { integer | MAX } <match_order> ::= { TRUE | FALSE }

• E.G. • Resumes in the human resources DB containing the term “SQL

Server” within no more than 5 words from “expertise”:

• SELECT candidate_name FROM Candidates • WHERE CONTAINS(Resume, „NEAR((“SQL Server”, expertise),5,

FALSE)‟);

Customize Maximum Gap between terms/phrases when using NEAR operator

Page 17: Deep Dive on new Search Features in Denali CTP1 · 10.501 43.775 119.017 0.5970.5340 0.385 0.853 1.364 0 20 40 60 80 100 120 140 0 5 10 15 20 25 30 35 s) CPU Throughput (qps) with

Customizable NEAR

• Search for documents with two words a distance apart

Old NEAR Usage SELECT * FROM fttable WHERE CONTAINS(*, 'test NEAR Space')

New NEAR Usages • Specify Distance SELECT * FROM fttable WHERE CONTAINS(*, „NEAR((test, Space), 5,FALSE)')

• Reduce Distance SELECT * FROM fttable WHERE CONTAINS(*, „NEAR((test, Space), 2,FALSE)')

• Mandate Order of words SELECT * FROM fttable WHERE CONTAINS(*, „NEAR((test, Space), 5,TRUE)')


Recommended