+ All Categories
Home > Technology > Overview of Recommind Decisiv Search

Overview of Recommind Decisiv Search

Date post: 15-Jan-2015
Category:
Upload: yonyonson
View: 1,167 times
Download: 0 times
Share this document with a friend
Description:
 
Popular Tags:
25
Petr Vasilev Konsulent, Integrasjon og Søk avdeling Bouvet ASA Oversikt over Recommind Decisiv Search
Transcript
Page 1: Overview of Recommind Decisiv Search

Petr VasilevKonsulent, Integrasjon og Søk avdelingBouvet ASA

Oversikt over Recommind Decisiv Search

Page 2: Overview of Recommind Decisiv Search

Search engines in Norway

• Long tradition for working with enterprise and web search engines

• Most popular solutions:– Microsoft/Fast– Apache Solr

• But! World is bigger than that!

Page 3: Overview of Recommind Decisiv Search

Recommind

• Founded in 2000• Headquartered in San Francisco;

offices in London, Boston, Sydney and Bonn

• #157 on 2010 Deloitte Fast 500– Fastest growing eDiscovery company– Fastest Growing Information Access company– 230+ employees

Page 4: Overview of Recommind Decisiv Search

Product Families

Page 5: Overview of Recommind Decisiv Search

CORE

• Fully powered search engine• Analytics:

– Concept extraction (Probabilistic latent semantic indexing, pLSI)

– Conceptual search– Categorization engine (pSVM with highest

empirical performance)– Workflow Rules and Smart Tagging– Duplicate detection– Near-duplicate computation– Thread extraction– Email Footer detection

Page 6: Overview of Recommind Decisiv Search

Architecture overview

Page 7: Overview of Recommind Decisiv Search

Enrichment and categorization

Page 8: Overview of Recommind Decisiv Search

API and Hooks

• CORE API– Crawler extensions– Connector SDK– Post-processor SDK

• Planned– Parser API– Custom mime type detection

• CORE extensions– JAAS security SDK– Call-back API– Storage SDK

• Custom reports

Page 9: Overview of Recommind Decisiv Search

Entity extraction

• Is based on generated lists• It is possible to bind custom names to

entities– http://psi.hafslund.no/sesam/ifs/Customer is

named “Kunde”

• Takes in simple XML for names and entity lists

• We can make many to 1 matching

Page 10: Overview of Recommind Decisiv Search

Rule Based Classfication

• Smart tagging – – Tag incoming documents based complex

search expressions

• Workflow rules – Category triggered – Target review state, Batch, Coding pattern

• Categorization result rules – – Action based on score range

• Policy– Publish, Remediate, Move, ...

Page 11: Overview of Recommind Decisiv Search

PLSA overview

• “Unsupervised”– Document collection only („inherent

statistics“)– Used for concept search („auto-extraction“)– No human required

• “Supervised“– By example, seed sets– For Categorization, Tagging, Annotation, etc– Human required

Page 12: Overview of Recommind Decisiv Search

Search as Statistical Inference

• Document in bag-of-words representation

US

Disneyeconomic

intellectual property

relations

Beijing

human rightsfree

negotiations

imports

China US trade relations

China?How probable is it that terms like “China“ or “trade“ might occur?

Additional index terms can be added automatically via statistical inference!

Page 13: Overview of Recommind Decisiv Search

Estimation via PLSA

Latent Concepts

TermsDocuments

TRADE

economic

imports

trade

Concept expression proba-bilities are estimated based on all documents that are dealing with a concept.

“Unmixing” of superimposed concepts is achieved by statistical learning algorithm.

Conclusion: No prior knowledge about concepts required, context and term co-occurrences are exploited

CHINA china

bejing

Page 14: Overview of Recommind Decisiv Search

Automatically generated concept groups

Ship

ship 109.41212

coast 93.70902

guard 82.11109

sea 77.45868

boat 75.97172

fishing 65.41328

vessel 64.25243

tanker 62.55056

spill 60.21822

exxon 58.35260

boats 54.92072

waters 53.55938

valdez 51.53405

alaska 48.63269

ships 46.95736

port 46.56804

hazelwood 44.81608

vessels 43.80310

ferry 42.79100

fishermen 41.65175

Securities

securities 94.96324

firm 88.74591

drexel 78.33697

investment 75.51504

bonds 64.23486

sec 61.89292

bond 61.39895

junk 61.14784

milken 58.72266

firms 51.26381

investors 48.80564

lynch 44.91865

insider 44.88536

shearson 43.82692

boesky 43.74837

lambert 40.77679

merrill 40.14225

brokerage 39.66526

corporate 37.94985

burnham 36.86570

India

india 91.74842

singh 50.34063

militants 49.21986

gandhi 48.86809

sikh 47.12099

indian 44.29306

peru 43.00298

hindu 42.79652

lima 41.87559

kashmir 40.01138

tamilnadu 39.54702

killed 39.47202

india's 39.25983

punjab 39.22486

delhi 38.70990

temple 38.38197

shining 37.62768

menem 35.42235

hindus 34.88001

violence 33.87917

(Sample aspect lists from AP data, 100-Aspect Model)

Page 15: Overview of Recommind Decisiv Search

Hafslund SESAM search

Demo

Page 16: Overview of Recommind Decisiv Search
Page 17: Overview of Recommind Decisiv Search
Page 18: Overview of Recommind Decisiv Search
Page 19: Overview of Recommind Decisiv Search
Page 20: Overview of Recommind Decisiv Search
Page 21: Overview of Recommind Decisiv Search

What is nice from technical side?

• Connectors! And framework for development

• SSO and integration with Kerberos, AD, OpenSSO

• Custom and OOTB authentication integration

• Query API and OOTB XSLT framework• Extensible set of parsers, including OCR

Page 22: Overview of Recommind Decisiv Search

Nice!

• Extremely flexible index structure– We take in everything– We can determine what is to be shown later

• Publishing data in external systems– You can use it as data mining tool

• PLSA: Categorization and concept extraction

• Transparency for Java developers– Everything is exposed as RMI calls over 1099

port

• Small company, quick response

Page 23: Overview of Recommind Decisiv Search

Sad parts

• Sessions in Query API– No sessions – no security

• There is no direct content push• Relatively heavy taxonomies• Licenses

– Basic license doesn’t includes entity extraction and rule based classification

• Notable hardware requirements

Page 24: Overview of Recommind Decisiv Search

Overall impression

• Nice and capable enterprise search solution

• Is not dependent on vendor– Easy to integrate in heterogeneous

environment

• Can be used for search and data mining• PLSA and categorization opportunities

are extremely promising

Page 25: Overview of Recommind Decisiv Search

Petr [email protected]

Spørsmål? Svar!


Recommended