+ All Categories
Home > Technology > Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, WalmartLabs

Ubiquitous Solr - A Database's Not-So-Evil Twin: Presented by Ayon Sinha, WalmartLabs

Date post: 07-Jan-2017
Category:
Upload: lucidworks
View: 896 times
Download: 0 times
Share this document with a friend
22
Ubiquitous Solr - A Database’s not-so-evil Twin Ayon Sinha Data Foundation @WalmartLabs OCTOBER 13-16, 2016 AUSTIN, TX
Transcript

Ubiquitous Solr - A Database’s not-so-evil Twin Ayon Sinha

Data Foundation @WalmartLabs

O C T O B E R 1 3 - 1 6 , 2 0 1 6 • A U S T I N , T X

2

Text Search

wow

Search Suggestions

Search Engine… Lucene… Solr

•  Internet and Intranet Search

•  Relevance

•  Search Suggestions

•  Faceting

•  Recommendations

•  Time series

•  Log search

•  Geo-spatial search

•  Analytics

•  Graph search

•  Document Store

Recommendations

Relevance Facets

3

Overview

•  How to scale any data infrastructure with Apache Solr

•  Build a high performance and highly available data platform for internal and external users alike

•  Walmart’s commitment to open source

4

About me

•  Team lead at the Data Foundation team for the largest retailer and the largest private employer in the world

•  Prior to Walmart, worked at startups building recommendation and analytics systems

•  And prior to that, was building search applications, recommendations systems and Hadoop based analytics systems for the largest online auction company, ebay, for 6 years

•  Have been a manuscript reviewer for Manning publications for 4 years and have helped shape the contents of “Hadoop in Practice” and “Big Data”

5

About Walmart

•  11,000+ Stores in 27 countries

•  11 eCommerce sites

•  250M customers weekly in stores and online

•  Millions of database transactions per day •  Sales, Holidays and massive volume shifts

6

It starts-up so simple

An idea implemented on the LAMP stack

7

Turns out to be a great idea!

Users seem to like the new product

8

Users REALLY like this..

Higher volume, increased use cases. Quick fix scaling alternatives add some headroom … and complexity

9

We need more Business Intelligence

Business is looking good but source-of-truth data store, not so much …

10

Scale up (in a hurry) with hardware

Least risk. Diminishing returns. What next?

11

Design to scale out

•  Offload queries to Search Engines

•  Offload recurring reads to Cache

•  Offload analytics to OLAP datastores

•  Shard the database

… and of course do something to hide the complexity. It is

worth it.

12

The Inspiration

Integration tools with a Lucene based search engines are abundant

13

The “not-so-evil” Twin to protect your Source of Truth DB

•  What if a copy of your source-of-truth data is available … Just about anywhere you want it?

•  How could you use a search engine to protect and augment your database? – Redirect queries

•  Helps scale by reducing demand for –  database indexing –  database connections –  scarce database resources like memory, storage

•  Not-so-evil Twin – Adding multiple near real-time search adds complexity … and it

comes at a cost; but done right, the benefits far outweigh the costs

14

Our Approach

•  Abstract the complexity of managing –  source-of-truth database –  cache coherence – Search queries – message bus

•  Abstract Connection pool management

•  Provide a scalable way to query across shards with full control of Solr schema

•  And to analyze big data without affecting real-time systems and isolating individual data domains

15

From a situation like..

16

DB, Solr and Hadoop

17

Sharded DB with Solr

18

The Eco-system

Separation of concerns

19

The Result

Scatter-gather vs Powered by Apache Solr

20

Lessons learned

A Search engine like Apache Solr is… •  not limited to search-based business applications.

•  a first class citizen in your persistence technology stack; it complements the SoT database.

•  easy to adopt and has all of us as community for support.

21

The Future

•  Symbiotic existence of Solr/Lucene with RDBMS, NoSQL and Big Data systems

•  Walmart is committed to be part of the community building it

22

Questions? Reach us at:

•  You can reach me, Ayon Sinha, at:

–  [email protected] –  https://www.linkedin.com/in/ayonsinha

•  Jason Sardina, our Lead Persistence Architect –  [email protected]

•  @WalmartLabs is always hiring the best


Recommended