Date post: | 19-May-2015 |
Category: |
Technology |
Upload: | enterprise-20-conference |
View: | 806 times |
Download: | 1 times |
Is Enterprise Search Ripe for Open
Source Disruption?
Larry Cannell
Senior Analyst
Burton Group
www.burtongroup.com
Brian Pinkerton
Chief Architect
Lucid Imagination
www.lucidimagination.com
Agenda
•Why Open Source and Search?
•Enterprise Opportunities to Use Open Source Search
•Market Analysis
•Lucid Imagination
Open Source Search2
Agenda
•Why Open Source and Search?
•Enterprise Opportunities to Use Open Source Search
•Market Analysis
•Lucid Imagination
Open Source Search3
Can You Tell the Difference?4
Can You Tell the Difference? Netflix5
Can You Tell the Difference? CNET6
Can You Tell the Difference? Best Buy7
Can You Tell the Difference? Wikipedia8
Can You Tell the Difference? Monster9
Which Site Uses Open Source Search?
Netflix
Best Buy
CNET
Wikipedia
Monster
10
Which Site Uses Open Source Search?
Best Buy
Monster
11
Lucene and Solr Gets Funded
Why Open Source and Search?12
Agenda
•Why Open Source and Search?
•Enterprise Opportunities to Use Open Source Search
•Market Analysis
•Lucid Imagination
Open Source Search13
Basic Website/Intranet Search
Enterprise Opportunities14
Basic Website/Intranet Search
Vertical Search
Enterprise Opportunities15
Basic Website/Intranet Search
Vertical Search
No compelling reason to use open source
Only consider if you have more headcount than budget
Enterprise Opportunities16
Basic Website/Intranet Search
Vertical Search
No compelling reason to use open source
Only consider if you have more headcount than budget
Best opportunities for open source search
Enterprise Opportunities17
Agenda
•Why Open Source and Search?
•Enterprise Opportunities to Use Open Source Search
•Market Analysis
•Lucid Imagination
Open Source Search18
Numerous Options
• Beagle
• DataparkSearch
• egothor
• Htdig
• Hounder
• Lemur
• MG4J
• Minion
• Mnogosearch
• Namazu
• OpenFTS
• regain
• Red Piranha
• Simplexo
• Sphinx
• Swish-e
• Swish ++
• Terrier
• Wumpus
• Zettair
19
Honorable Mention20
The Short List21
The Short List22
Lucene Family Tree
Lucene
LucenePorts
23
Lucene Family Tree
Lucene
NutchLucenePorts
Hadoop(2002) (2005)
(2000)
24
Lucene Family Tree
Lucene
Nutch
Solr
LucenePorts
Hadoop(2002) (2005)
(2005)
(2000)
25
Content Set
UserInterface
SearchEngine
Search Repository
ContentIngestion
Administration
26
Content Set
UserInterface
SearchEngine
Search Repository
ContentIngestion
Administration
Lucene
27
Content Set
UserInterface
SearchEngine
Search Repository
ContentIngestion
Administration
Solr
28
The MySQL of search servers?
• Search server based on Lucene
•Easy initial setup
•Web services-like interface (XML over HTTP)
•Support for non-Java clients
•Caching, performance tuning, high-availability, load balancing
•Faceted browsing, similar documents
Solr’s Potential to Disrupt29
The MySQL of search servers?
• Search server based on Lucene
•Easy initial setup
•Web services-like interface (XML over HTTP)
•Support for non-Java clients
•Caching, performance tuning, high-availability, load balancing
•Faceted browsing, similar documents
• Commoditizes vertical search
•Could have similar impact on application development as
ODBC/JDBC
• Consider the 1000s of applications enabled by
ODBC/JDBC
•Vertical search can now be applied to almost any application
Solr’s Potential to Disrupt30
Open Source Search
References
•Burton Group’s Collaboration and Content Strategies
•Open Source Search: Bringing Enterprise Search Out into the Open
•Enterprise Information Search: Transforming Search into an Insight
Engine (January 2010)
•A Complex Query: What’s the Right Enterprise Search Engine?
•Open Source Communication, Collaboration, and Content
Management: Cutting-Edge Innovation, Low-Cost Imitation, or Both?
Open Source Search
Brian PinkertonChief Architect
1
Lucid Imagina8on, Inc.
Why Open Source for Search?
Large scale: billions of documents; hundreds of cluster nodes
Uses modern architectures to achieve massive scalability
Some of the biggest search indexes are on open source soFware
High Performance
Fast response 8me
Flexible relevance
Use built-‐in relevance (on par with others) or augment
Stand-‐alone, integrated, or embedded
Mature, yet not stuck in 8me
Con8nued momentum on all facets of the products
Great support from the community
2
Lucid Imagina8on, Inc.
Example: Searching Social Media
Everyone collaborates with everyone on everything everywhere
You’ve heard the hype
Much is probably just that
But it’s changing Web habits
And it’s pushing the state of the art in search
Enterprise adop8on is trailing the wide Web, but it’s coming
Will you be ready?
3
Lucid Imagina8on, Inc.
Search is Essen;al
Too much content to navigate without filtering
Some8mes, only analy8cs can do the job
Other 8mes, users expect to search, not navigate
Used for surfacing more than just plain old search results
4
Lucid Imagina8on, Inc.
How is Social Media Transforming Search?
5
20th Century Web 1.0 Web 2.0
Business-‐generated content Power-‐user content; HTML only User-‐generated content
Searches the a\ributes Searches the content Both, plus the interac(on
Normalized data model Flat data model Ad hoc normaliza8on
Transac8onal models Batch processing Powered by now
Batch analy8cs Few analy8cs User-‐driven analysis
Lucid Imagina8on, Inc.
Examples of Searching Social Media
6
Pioneer in blog searching: Technora8 Lucene → Solr
Analyizing the Interac8on: Scout Labs Lucene
Bo\om-‐up relevance: digg Solr
People are the content: LinkedIn Lucene
People and places: Yelp Lucene
Pa\erns from the people: Xmarks Lucene
Searching the Social Universe: MySpace Lucene.NET
Lucid Imagina8on, Inc.
Technora;: Blog Search
Technora; is a blog-‐discovery engine300,000 new posts per day
Surge of posts in the morning
Separate indexes for blog and post data
Noisy, user-‐generated content
Search used behind the scenes to build the user interface
New index keeps only a limited 8me available
7
Lucid Imagina8on, Inc.
Scout Labs: Analyzing the Interac;on
Scout Labs is a social-‐media monitoring tool
Mines the stream of interac8on across many forms of social media: blogs, comments, tweets, forums, mailing lists
The interac8on can be messy, so Scout Labs provides summaries
Analy8cs provide comparisons
Sen8ment summarizes adtudes
Because of the analy8cs, must keep more data online -‐ this can get expensive
8
Lucid Imagina8on, Inc.
digg: BoMom-‐up Relevance
Digg shows user-‐submiMed links in real ;me
Users vote up or down on submissions
Content is indexed in near-‐real 8me
Results are scored by a combina8on of factors (recency, number of diggs, etc.)
9
Lucid Imagina8on, Inc.
LinkedIn: People are the Content
LinkedIn is a business social network50 million members
Faceted search
facets on loca8on, industries, companies, rela8onship, etc.
not all are easy to implement
Sor8ng by relevance + rela8onship
requires significant query-‐8me work
10
Lucid Imagina8on, Inc.
Yelp: People and Places
Yelp facilitates user reviewsSearches business meta-‐data plus review content
Heavy geographic component
Results are structured by establishment, but searchable by review
11
Lucid Imagina8on, Inc.
Xmarks: PaMerns from the People
Xmarks provides bookmark sync and Web discovery
First provided bookmark sync; adopted by millions of users
Aggregates bookmark folder structure and meta-‐data by URL
This descrip8ve content is mined to provide a searchable index
Needed new ranking algorithms to provide good relevance and filter out the noise
12
Lucid Imagina8on, Inc.
MySpace: Searching it all
MySpace does it all:Many content types from all over the site
User generated content + user interac8ons
Near Real Time
New content and users arriving 24x7
Both end-‐user and administra8ve func8ons
admin func8ons include log file searching
automated tasks help iden8fy spam, other problems
Massive scale: billions of records, petabytes of source data
new content at the rate of 1TB every week
13
Lucid Imagina8on, Inc.
Social Media is Pushing Search In New Direc;ons
Searches the product of interac8on among users, not just content
Aggregates data from mul8ple sources at search 8me
Operates in real-‐8me, as data is produced
Extends the tradi8onal no8ons of relevance
Builds analy8cs on top of search
and... you can build all of this on open source products!
14