+ All Categories
Home > Technology > keyvi the key value index @ Cliqz

keyvi the key value index @ Cliqz

Date post: 14-Apr-2017
Category:
Upload: hendrik-muhs
View: 626 times
Download: 6 times
Share this document with a friend
13
keyvi - the key value index Or How did we build a large scale low-latency search-engine with keyvi? Hendrik Muhs <[email protected]>
Transcript
Page 1: keyvi the key value index @ Cliqz

keyvi - the key value index

Or

How did we build a large scale low-latency search-engine with keyvi?

Hendrik Muhs <[email protected]>

Page 2: keyvi the key value index @ Cliqz

BASED IN MUNICH

MAJORITY-OWNED BY HUBERT BURDA MEDIA

INTERNATIONAL TEAM OF 90 EXPERTS

WE COMBINE THE POWER OF DATA, SEARCH, AND

BROWSERS TO REDESIGN THE INTERNET

FOR THE USER

WE REDESIGN THE INTERNET

http://cliqz.com/

Page 3: keyvi the key value index @ Cliqz

Key value index based on finite state, so basically a immutable key value store.

Licence: Apache 2.0 (just keyvi, 3rdparty)Language: C++ (core), Python (binding)Runs on: Linux, MacOSX (not tested on Windows)Link: www.keyvi.orgAuthor: me ;-)

Page 4: keyvi the key value index @ Cliqz

Cliqz Search Backend

Elasticsearch used in the early days (2014)

→ Redis own cluster implementation (before Redis cluster), at peak over 100 redis instances in 1 cluster, > 5TB of data, all on AWS

→ keyvidrop-in replacement for Redis, significantly reduced size (2TB) and number of machines

! Whether redis or keyvi: average latency of 55ms at backend !

Page 5: keyvi the key value index @ Cliqz

Why replace redis?

Size

extremely efficient storing values

low-level access: msgpack & Redis fork to compress even more (zlib)

implementation of auto-completion is expensive and slow

Runtime

single threaded → contention, queuing, timeouts

Persistence

memory only, loading times of several minutes

Page 6: keyvi the key value index @ Cliqz

Why replace redis?

→Redis is great! We still use it a lot! But for 1 of our

- and only 1 of our –

usecases, we can do better!

Page 7: keyvi the key value index @ Cliqz

started as auto-completion engine

caching layer for Redis

now providing the complete index (>2 TB)

distributed across multiple machines

multi-process, fast, reliable, stable

@

Page 8: keyvi the key value index @ Cliqz

shared memory model (mmap)

multi-core, reliable, no loading (un-serializing)

space efficient

compact key-space, FSA minimization

BUT:

keyvi is an immutable store, therefore index

(as the underlying data structure of Lucene is)

vs. Redis

Page 9: keyvi the key value index @ Cliqz

Workflow has 2 steps:

compile/build index using keyvicompiler or via python bindings

dump/query using C++ or python API

Note: There is no SegmentWriter/Merger/Reader (yet)!

Usage

Page 10: keyvi the key value index @ Cliqz

exact matching / simple entity recognition:

values can None, integer, string or json

approximate matching:

close/near match e.g. for Geo applications

scoring based: Levenshtein & Co

completion matching:

prefix, multi-word, fuzzy

more on Features

Page 11: keyvi the key value index @ Cliqz

it's fast! extremely fast!

it scales:

it's compact/small, enables indexing GB's of data

it brings FST's to a level of more established data structures like hash tables and B-Trees on one side …

… and enables applications not or hardly possible with them (completions, approximate matching, etc.)

the gist

Page 12: keyvi the key value index @ Cliqz

http://www.keyvi.org

Lot's of content from crashcourse to in-depth

check it out!

Page 13: keyvi the key value index @ Cliqz

Questions?

Comments!

Feedback.

Contact: [email protected]

check it out!


Recommended