Date post: | 07-Dec-2014 |
Category: |
Technology |
Upload: | chris-price |
View: | 1,408 times |
Download: | 0 times |
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
Into the WildTaming Unstructured Data with Semantic Search
Chris PriceSenior BI Consultant
@BluewaterSQL
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
Intro
Chris PriceSenior BI Consultant with Pragmatic Works
@BluewaterSQL http://bluewatersql.wordpress.com/ [email protected]
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
Outline Data gone Wild FileStream -> FileTable Full-Text
FileTable/Full-Text Integration SQL Server 2012 Enhancements
Semantic Search Search Scenarios
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
Data Gone Wild! Data by any other name….
Structured: Tabular, CSV & Fixed Width Semi-Structured: HTML, XML & JSON Unstructured: Images, Videos PDF & Email
80% of this stuff is not found in a DB Difficult to Integrate Hard to manage
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
Key Objective
SQL Server 2012 is a great choice for integrating and managing structured, semi-structured & unstructured data
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
FileStream Introduced in SQL Server 2008 Integrated DB Engine with NFTS File System VARBINARY(MAX) columns stored on File
System Dual Programming Model:
Transact SQL (No write) Win 32 Streaming (ODBC or OLE DB/ADO.NET)
Non-Trivial (Requires a Transactional Context)
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
FileTable Introduce in SQL Server 2012 Built over top FileStream Win32 API Access Implemented as a fixed format table:
FileStream Storage/Container Fille System Properties (Columns) Hierarchy ID (synthesized hierarchical file system
share)
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
FileTable Accessed through File System Share or Table
SMB Protocol for Remote Access Open docs in MS Word, Excel, etc
Share Allows Non-Transactional Access No Memory-Mapped Files (Notepad/Paint)
File Name/Properties Preserved Supports directory structures
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
FileTable Format
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
FileTable Set-Up Enable FileStream DATABASE
TABLE
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
FileTable Access Share:
\\<server>\<instance>\<database>\<table> T-SQL:
Insert/Update/Delete Can update a stream without affecting timestamp Cannot delete directories that have files
Functions: GetFileNamespacePath() FiletableRootPath() GetPathLocator()
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
FileTable Demo
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
Full-Text Enhanced in 2012
7-10x fast than prior version Scales up to >350m documents
NEW Property Search Filter for document properties (i.e. Author ,Title)
iFilter must support Customizable NEAR
CONTAINS(*, ‘NEAR((SQL, SATURDAY), 5, false’) CONTAINS(*, ‘NEAR((SQL, SATURDAY), 5, true’)
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
Full-Text Demo
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
Semantic Search Built on top of Full-Text What is a semantic search?
Full-Text finds words….Semantic Search meaning Extract & Index statistically significant keywords
Tag Clouds, Etc Identify related/similar docs
Based on Keywords) Explain how/why two docs are related
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
Semantic Set-Up Install Office Filter Pack & Filter Pack SP 1
Install, Attach & Register the Semantic DB
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
Verify Filters
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
Semantic Results SemanticKeyPhraseTable
Extracts key phrases for entire corpus or single document
SemanticSimilarityTable Finds similar documents
SemanticSimilarityDetailsTable Displays similarity details for two matched
documents
MAKING BUSINESS INTELLIGENT www.pragmaticworks.com
Semantic Search Demo
MAKING BUSINESS INTELLIGENT www.pragmaticworks.comMAKING BUSINESS INTELLIGENT www.pragmaticworks.com
Thank you!
Don’t forget to fill out your evaluations!
@BluewaterSQL http://bluewatersql.wordpress.com/ [email protected]