Post on 12-May-2015
description
transcript
Taking SQL Server Beyond Relational Into the Realm of Unstructured Data Management
Michael RysPrincipal Program Manager@SQLServerMike
Unstructured Data in SQL Server
80% of all data is not stored in databases! Most of it is “unstructured”
Make SQL Server the preferred choice for managing Unstructured Data and allow building Rich Application Experience on top
Rich Unstructured Data in SQL Server 2012
Address important customer requests for Capabilities and rich services for Rich Unstructured Data (RUDS)
Scale Up for storage and search to 100m to 500m documentsEasy use/access to Unstructured data from all applicationsRich insight into unstructured data to make better decisions
Rich Unstructured Data & Services Ecosystem
Fulltext Search
Semantic Similarity Search
Rich
S
erv
ices
Database
Disk1
Disk2
Disk3
Multiple Containers
Sca
le-u
p
Solu
tions
Database Applications
Transactional Access
Blobs
DB FileStre
DB FileStreams
Integrated Backup/Replication/AlwaysO
n
Integrated AdministrationIntegrated Administration?
Windows Apps
SMB Share Files/Folders
FileStream API
Streaming Win32 AccessStreaming Win32 Access??
Customer Application
Azure lib Centera lib
SQL FILESTREAM lib
SQL RBS API
Azure Centera SQL DB
Remote BLOB Storage
FileStreamsFileTable
SQL Apps
RBS Example Workflow
Application
RBS Client Library
BLOB Store Provider Library
BLOB Store SQL Server
ClaimID ClaimDate PhotoRef
4390 6/5/2007 <Binary(20)>1
2
3
1Write BLOB(Photo)Return Blob IDWrite Blob ID to PhotoRef field
2
3
Machine Boundary
RBS Services:• Create• Fetch• GC• Delete
RBS – Create and Read Blob// Store a new blob.
byte[] myBlobId;
SqlRemoteBlobContext blobContext = new SqlRemoteBlobContext(sqlConn);
using (SqlRemoteBlob newBlob = blobContext.CreateNewBlob()) {
// Write to a System.IO.Stream object.
newBlob.Write(…);
newBlob.Close();
myBlobId = newBlob.BlobId;
}
// Alternative way to write.
newBlob.WriteFromStream(inputStream);
RBS – Create and Read Blob (Continued)
// Add a new row including the blob ID to the database
// table.
// Fetch the blob.
using (SqlRemoteBlob existingBlob = blobContext.OpenBlob(myBlobId)) {
// Read from System.IO.Stream object.
existingBlob.Read(...);
}
// Alternative way to read.
existingBlob.ReadToStream(outputStream);
FilestreamStorage Attribute on VARBINARY(MAX)
Works with integrated FTSUnstructured data stored directly in the file system (requires NTFS)Dual Programming Model
TSQL (Same as SQL BLOB)Win32 Streaming APIs with T-SQL transactional semantics
Data ConsistencyIntegrated Manageability
Back Up/RestoreAdministration
Size limit is the file system volume sizeSQL Server Security Stack
Store BLOBs in DB + File SystemApplication
BLOB
DB
TSQL FILESTREAM API
// New TSQL Function:
// Get_filestream_transaction_context()
//
SELECT Get_filestream_transaction_context()
// New TSQL Function :
// PathName()
//
SELECT ClaimImage.PathName()
FROM Insurancedb..Claims
Managed SqlFileStream: READ// New SqlFileStream Class in VS08 SP1
//
SqlFileStream sfs = new SqlFileStream(path, txnId, System.IO.FileAccess.Read);
// output file to read into
System.IO.FileStream fs = new System.IO.FileStream ("c:\\output2.jpg", System.IO.FileMode.Create);
{
byte[] buffer = new byte[512 * 1024];
int cbBytesRead = buffer.Length;
while (cbBytesRead == buffer.Length)
{
cbBytesRead = sfs.Read(buffer, 0, buffer.Length);
fs.Write(buffer, 0, cbBytesRead);
}
}
Managed SqlFileStream: WRITE
sfs = new SqlFileStream(path, txnId, System.IO.FileAccess.Write, 0);
using (System.IO.Stream res = Pictures.GetResourceStream(HealthCare.MRI.JoeSmith)) {
byte[] buffer = new byte[512 * 1024];
int cbBytesRead = buffer.Length;
while (cbBytesRead == buffer.Length) {
cbBytesRead = res.Read(buffer, 0, buffer.Length);
sfs.Write(buffer, 0, cbBytesRead);
}
}
// commit SQL transaction and close SQL connection.
Integrated Management of documents in SQL Server 2012
demo
FILETABLE Overview
FileTable: A Table of Files/Directories
User created Table with a fixed schema
contains FILESTREAM and File Attributes
Each row represents a File or a Directory
System defined constraints maintain the tree integrity
File/Directory hierarchy view through a Windows Share
Supports Win32 APIs for File/Directory Management
DB Storage is Transparent to Win32 applications
SMB level of application compatibility
Virtual network name (VNN) path support for transparent Win32 application failover
Private Docs(Database1)
Office Docs(Database2)
LogFiles (FileTable)
Documents(FileTable)
Media(FileTable)
MSSQLSERVER
\\my_machine\MSSQLSERVER\Office Docs\Documents
FILESTREAM Share
Database Directories
FileTable Directories
FileTable Folder Hierarchy
User-Defined Directory Structure
Creating a FileTable
Pre-requisitesEnable FILESTREAM
Create FILESTREAM Share and Filegroup
Enable non-transactional access at the DB levelALTER DATABASE Contoso SET FILESTREAM( non_transacted_access=FULL, Directory_name = N’Contoso’)
Create FileTableCREATE TABLE Contoso..Documents AS FILETABLE
WITH (filetable_directory = N'Document Library')
Access at \\<machine name>\<FILESTREAM share>\Contoso\Document Library\
FileTable SchemaFile Attribute Name Type Purpose
Path_locator hierarchyid Represents position of this node in the hierarchical FileNamespace.
parent_path_locator hierarchyid Represents the hierarchyID of the parent directory-- a computed column
stream_id uniqueidentifier UniqueId for Filestream Datafile_stream varbinary(max) filestream Filestream data
file_type nvarchar(255) Type of the file. Can be used for fulltext index creation
cached_file_size bigint Size of the filestream (cached value)
Name nvarchar(255) File/Folder Name (e.g foo.txt)creation_time datetime2 Creation Timelast_write_time datetime2 LastWrite Timelast_access_time datetime2 LastAccess Timeis_directory bit TRUE for directories.is_offline bit Offline attributeis_hidden bit Hidden attributeis_readonly bit Read Only attributeis_archive bit Archive attributeis_system bit System attributeis_temporary bit Temporary attribute
Modifying a FileTable
FileTable has a fixed schemaColumns, system defined constraints cannot be altered/dropped
Allows user defined indexes/constraints/triggers
Disabling/Enabling FileTable NamespaceALTER TABLE Documents DISABLE FILETABLE_NAMESPACE
Disables all system-defined constraints and Win32 access to FileTable
Useful for bulk-loading/re-organization of data
FileTable can be dropped similar to any other tableCatalog views can be used for obtaining metadata
Data Access – File system Access
FileTable hierarchy is visible through Filestream share\\machine\<FILESTREAMshare>\<Database_directory>\<FileTable_Directory>\...
Provides transparent Win32 API & File/Directory Management capabilitiese.g. MS word can create/open/save files; xcopy for copying directory trees into database..
Win32 API operations are non-transactionalOperations cannot be part of any user transactions
Win32 operations are intercepted by SQL Server at the File system level e.g. File/Directory creation/deletion => insert/delete into FileTable
Full locking/concurrency semantics with other accesses
Allows in-place update of file stream data/File attributes
Transactional FILESTREAM APIs can also be used.
Data Access – T-SQL Access
Normal Insert/Update/Delete allowed for the FileTable manipulationFileTable Namespace integrity constraints enforced
Set based operations on the File-attributes – value add
Built-in functionsGetFileNamespacePath() – UNC path for a file/directory
FileTableRootPath() – UNC path to the FileTable root
GetPathlocator() – path_locator value for a file/directory
DDL/DML Triggers are supportedDML triggers on a FileTable cannot update any FileTables
Programming PatternWindows applications work using normal Win32 APIs using the logical UNC paths
e.g. Search files by using FindFirstFile, FindNextFile, FindClose pattern
Move a directory using MoveFile or MoveFileEx .. etc
New Hybrid Applications using DB and FileTable:File I/O APIs start by obtaining a handle using FileNamespace Path
DECLARE @path nvarchar(max)
// get FileNamespace pathSELECT @path=file_stream.GetFileNamespacePath() FROM DocumentStore WHERE name='MySpec.doc';
// Open File handlehandle = CreateFile( @path, GENERIC_WRITE, 0, NULL, CREATE_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);
Managing FileTable
DB Backup/Restore operations include FileTable data
Point in time Restore’ may contain more recent FILESTREAM data due to non-transactional updates during backup
FileTables are secured similar to any other user tables
Same security is enforced for Win32 access also
Data LoadingWindows tools like xcopy/robocopy OR drag-drop operations through Windows Explorer can be used
BCP operations are supported for direct T-SQL data inserts
SSMS supports FileTable creation/exploration
Managing FileTable – High Availability
SQL Server 2012 AlwaysOn is fully supported
Transparent data failoverFileTables can be configured with multiple secondary nodes
Both sync and async data replication is supported
File and metadata is available in the secondary in case of failover
Transparent application failoverVirtual network name (VNN) path support for transparent Win32 application failover
Applications use \\VNN\Share\db\... Path
Applications are automatically redirected to the secondary in case of failover
RestrictionsFileTables cannot participate in “Read-only” replicas.
Managing FileTable – Trouble shooting
DMV to show all open non-transactional file handles
sys.dm_filestream_non_transact_handles
Stored Procedure to terminate open file handles
sp_kill_filestream_non_transacted_handles
X-events/Perf counters for trouble shooting
FileTable Restrictions
FileTables cannot be partitionedMerge/Transactional replications are not supportedRCSI/SnapShot isolation mode
Win32 Applications cannot modify file stream data in FileTables
Win32 Application compatibilityMemory mapped files, Directory notifications, links are not supported
Some FileStream/FileTable performance tipsReading bigger buffers gives better performance
Volumes hosting FILESTREAM/FILETABLE data should have 8.3 name generation and LastAccessTime disabled
FILESTREAM/FILETABLE containers to reside on dedicated volumes
Have one volume per FILESTREAM/FILETABLE containerenables space management at volume level
“Magic” SMB buffer size = ~60KB Another “good” value is 480KB
ROWGUID unique index for aligned partitioning for FILESTREAM
AntiVirus programs should be configured not to delete infected files but to quarantine them
If using compressed volumes, use cluster size 4 KB
Unstructured Data Scale-upMultiple Containers for FILESTREAM data
SQL 2008 R2Only one storage container/FILESTREAM filegroup
Limits storage capacity scaling and I/O scaling
SQL Server 2012Support for multiple storage containers/filegroup.
DDL Changes to Create/Alter Database statements
Ability to set max_size for the containers
DBCC Shrinkfile Emptyfile support
Scaling FlexibilityStorage scaling by adding additional storage drives
I/O scaling with multiple spindles
Unstructured Data : Multiple containers
Use of multiple spindles for achieving better I/O Scalability
RUDS Scale-up: FileStream Perf/Scale
Improved performance of T-SQL and File I/O accessVarious enhancements to improve read/write throughput
5 fold increase in Read throughput
Linear scaling with large number of concurrent threads
2012 2012
Unstructured Storage In SQL Server 2008 & 2012 File Stores /
External Blob Stores (CAS)
SQL BLOBs Remote Blob API FILESTREAM FILETABLE
Streaming PerformanceDepends on
external storeDepends on
external store
Win32 App CompatDepends on external store
Depends on external store
Link Level Consistency
Data Level Consistency
Integrated Query & Management
Non-local Windows File Servers
n/a
External Blob Stores n/a
Feature ComparisonFeatures FileServer+DB
SolutionSQL 2008–FILESTREAM
SQL 2012– FileTable
Integrated Admin operations for Relational and File data- Backup/Restore, HA/Mirroring
No Yes Yes
Integrated Services for Relational and File data- Tex/Semantic Search, Reports, Query etc
No Yes Yes
Integrated Security Model No Yes Yes
In-place update of Filestream data(non-transacted)
Yes No Yes
Fully Transacted update of Filestream data No Yes Yes
File/Directory hierarchy in DB No No Yes
Win32 App compatibility Yes No Yes
Relational access to File Attributes No No Yes
Summary: FileTable
Application Compatibility for Windows ApplicationsWindows applications run on top of files stored in FileTables with no modifications
Relational Value PropositionProvide Integrated Administration and Services
Backup, Log Shipping, HA-DR, Full text and Semantic search, …
T-SQL orthogonalityFile/Folder attributes surfaced through relational columns
Power of set based operations, Policy Management, Reporting etc
FileNamespace Hierarchy management
Full Text Search Improvements in SQL Server 2012Improved Performance and Scale:
Scale-up to 350M documents
iFTS query perf 7-10 times faster than in SQL Server 2008
Worst-case iFTS query response times < 3 sec for corpus
At par or better than main database search competitors
New Functionality:Property Search
customizable NEAR
New Wordbrakers: update existing WB, add Czech and Greek
Innovation in Search: Semantic Similarity Search
Full Text Search Performance & Scale ImprovementsArchitectural Improvements
Improved internal implementation
Queries no longer block Index updates
Improved Query Plans: Better Plans for common queries
Fulltext predicate folding
Parallel Plan execution
Index and Query tested on scale up to 350Million documents with < ~2 Sec Response
~3X better w/o DML and ~9X better with DML throughput
Scale easily with increasing number of connections
Scale-up: Full-Text Search
Queries over 350M documents database and random DMLs running in background. Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput
2012
2005/8
2005/8 vs 2012
Scale-up: Full-Text Search
Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer playback benchmark
2012
2005/8
2005/8 vs 2012
New FullText Search Capabilities in SQL Server 2012
demo
FullText Property Scoped Search
• Setup once per database instance to load the office filtersexec sp_fulltext_service 'load_os_resources',1goexec sp_fulltext_service 'restart_all_fdhosts'go
• Create a property listCREATE SEARCH PROPERTY LIST p1;
• Add properties to be extractedALTER SEARCH PROPERTY LIST [p1] ADD N'System.Author' WITH
(PROPERTY_SET_GUID = 'f29f85e0-4ff9-1068-ab91-08002b27b3d9', PROPERTY_INT_ID = 4, PROPERTY_DESCRIPTION = N'System.Author');
• Create/Alter Fulltext index to specify property list to be extractedALTER FULLTEXT INDEX ON fttable... SET SEARCH PROPERTY LIST = [p1];
• Query for propertiesSELECT * FROM fttable WHERE CONTAINS(PROPERTY(ftcol, 'System.Author'), 'fernlope');
New Search Filter for Document PropertiesCONTAINS (PROPERTY ( { column_name }, 'property_name' ),
‘contains_search_condition’ )
Full-Text Customizable Near
OLD NEAR SYNTAXselect * from fttable where contains(*, 'test near Space')
NEW NEAR USAGES
• SPECIFY DISTANCEselect * from fttable where contains(*, 'near((test, Space), 5,false)')
• REDUCE DISTANCEselect * from fttable where contains(*, 'near((test, Space), 2,false)')
• ORDER OF WORDS IS SPECIFIED AS IMPORTANTselect * from fttable where contains(*, 'near((test, Space), 5,true)')
Statistical Semantic SearchSemantic Insight into textual content
Uses language models to find most important keywords in documentNo need to build brittle ontologies!
Statistically Prominent KeywordsAutogenerated tag clouds
Potentially Related Content based on extracted Keywords, such asSimilar Products (based on description)
Similar Jobs or Applicants
Similar Support Incidents (based on call logs)
Potential Solutions (based on similar incidents)
First class usage experienceEfficent linear algorithms
Integrated with FTS and SQLNew Rowset functions for all results using SQL query
Semantic Extraction and RelationshipsFullText Search in SQL Server 2012
demo
Semantic SimilarityInput: Text such as varchar, Office, PDF, HTML, email…Output: Rowset functions with standard SQL queries
Illustrating example:
Key Title Document
D1 Annual Budget …
D2 Corporate Earnings …
D3 Marketing Reports …
… … …
------------------------------------------------------------
----------------------------------------------------------------------
----------
------------------------------------------------------------
----------
Source Table
ID Keyword Colid … compDocid CompOc CompPid
K1 revenue 1 … 10,23,123 (1,4),(5,8),(1,34) 2,5,6,8,4,3
K2 growth 1 … 10,23,123 (1,5),(5,9),(1,34) 2,5,6,8,5,4
… … … … … …
Keyword Index (Full-Text)
Keyphrases KeyphraseDocumentsID DocID
T1 (revenue) D1 (Annual Budget)
T2 (growth) D2 (Corporate Earnings)
T3 (Windows) D3 (Marketing Reports)
… …
T1 (revenue) D7 (Finance Report)
… …
T3 (Windows) D11 (Azure Strategy)
T4 (Azure) D11 (Azure Strategy)
ID Keyword
T1 revenue
T2 growth
T3 Windows
T4 Azure
… …
DocumentSimilarityDocID MatchedDocID
D1 (Annual Budget) D2 (Corporate Earnings)
D1 (Annual Budget) D7 (Finance Report)
D3 (Marketing Reports) D11 (Azure Strategy)
… …
Full-Text and Semantic Processing
quarter, record, revenue…
2b
3
2 a1
+ Language Models 3
Functional Surface: Initiate Semantics
Create / Alter Full-Text with SemanticsMakes internal design dependency on FTS explicit
CREATE FULLTEXT INDEX ON Production.Document (
Title LANGUAGE 1033,
Document
LANGUAGE 1033
TYPE COLUMN FileExtension
STATISTICAL_SEMANTICS
)
KEY INDEX PK_Document_DocumentID
ON documents_catalog
WITH CHANGE_TRACKING OFF, NO POPULATION;
ALTER FULLTEXT INDEX ON Production.Document
ALTER COLUMN Document
ADD STATISTICAL_SEMANTICS
WITH NO POPULATION;
…
…
ALTER FULLTEXT INDEX ON Production.Document
START FULL POPULATION;
Semantic Extraction: End-2-End Experience
Downloadable Language Statistical Database with registration stored procedureSetup along with Full-TextMetadata / Catalog viewsSystem level DMVs for progress state and usageManageability through SSMS and SMO
Key Takeaways
SQL Server’s unstructured data support is:targeting non-traditional database workloads that are growing rapidly in the enterprise. Example: Content and Collaboration apps
targeting key ISV asks in fast growing markets such as eDiscovery, Healthcare, Document management etc.
key strategy to enable you to build complex data applications that go beyond relational data!
Related Content
SQL Server 2012 Whitepapers and information:http://www.sqlserverlaunch.com
Channel 9 DataBound Episode 2: http://channel9.msdn.com
MySemanticsSearch Demo: http://mysemanticsearch.codeplex.com
More demo data sets and demo scripts: http://blogs.msdn.com/b/sqlfts/archive/2011/07/21/introducing-fulltext-statistical-semantic-search-in-sql-server-codename-denali-release.aspx
Microsoft Virtual Academy Recording: http://www.microsoftvirtualacademy.com/tracks/breakthrough-insights-using-microsoft-sql-server-2012-scalable-data-warehouseFind Me Later…• On Twitter: @SQLServerMike• Blog: http://sqlblog.com/blogs/michael_rys