Date post: | 01-Nov-2014 |
Category: |
Technology |
Upload: | sessionsevents |
View: | 1,531 times |
Download: | 9 times |
Smart database for next-generation applications
LOGICBLOX - SIMPLIFYING YOUR DATA STACKMLConf NY, 2014.04.11
AREN’T THERE ENOUGH DATABASES?
©2014. LogicBlox. All Rights Reserved.
IN 2007 THE SMARTPHONE UNIFIED CONSUMER DEVICES
©2014. LogicBlox. All Rights Reserved.
IN 2007 THE SMARTPHONE UNIFIED CONSUMER DEVICES
©2014. LogicBlox. All Rights Reserved.
Is a similar revolution coming in databases?
OUR MISSION
▪ Be the iPhone of databases▪ “Hybrid Transaction Analytical Processing”, Gartner, Jan. 2014
▪ One database to replace many specialized databases▪ Transactional (e.g. Oracle, VoltDB, NuoDB)
▪ Analytical (e.g. Teradata, Redshift, Hadoop)
▪ Graphs
▪ Documents
▪ ...
Footnote: for certain class of applications
©2014. LogicBlox. All Rights Reserved.
OUR MISSION
▪ Be the iPhone of databases. ▪ “Hybrid Transaction Analytical Processing”, Gartner, Jan. 2014
▪ One database to replace many specialized databases▪ Transactional (e.g. Oracle, VoltDB, NuoDB)
▪ Analytical (e.g. Teradata, Redshift, Hadoop)
▪ Graphs▪ Documents
▪ ...
Footnote: for certain class of applications
©2014. LogicBlox. All Rights Reserved.
SHOW ME
©2013. LogicBlox. All Rights Reserved.
FIRST THING FIRST
▪ Declarative query language▪ Based on Datalog
▪ ACID transactions ▪ In fact… full serializability
▪ Built from scratch -- not by stitching together different databases under the hood.
©2014. LogicBlox. All Rights Reserved.
CLIQUES IN LOGIQL
3 Clique - Triangle Queries 4 Clique
©2014. LogicBlox. All Rights Reserved.
3cliques(a, b, c) <-
edge(a, b),
edge(a, c),
edge(b, c).
4cliques(a, b, c, d) <-
edge(a, b),
edge(a, c),
edge(a, d),
edge(b, c),
edge(b, d),
edge(c, d).
3 CLIQUE in LOGIQL vs. SQL
©2013. LogicBlox. All Rights Reserved.
SELECT DISTINCT
v1.x AS x, v2.x AS y, v3.x AS w
FROM edge AS v1, edge AS v2, edge AS v3
WHERE
v1.y = v2.x
AND v2.y = v3.x
AND EXISTS(
SELECT 1 FROM edge AS vv1
WHERE vv1.x = v1.x AND vv1.y = v3.x);
SQL
3cliques(a, b, c) <-
edge(a, b),
edge(a, c),
edge(b, c).
LogiQL
3 CLIQUE in LOGIQL vs SPARQL
©2013. LogicBlox. All Rights Reserved.
sparql PREFIX g: <http://logicblox.com/graph>
SELECT DISTINCT ?av ?bv ?cv FROM <$database>
WHERE {
?a g:edge ?b .
?a g:edge ?c .
?b g:edge ?c .
?a g:value ?av .
?b g:value ?bv .
?c g:value ?cv .
FILTER (xsd:int(?av) < xsd:int(?bv) and
xsd:int(?bv) < xsd:int(?cv))
};
SPARQL
3cliques(a, b, c) <-
edge(a, b),
edge(a, c),
edge(b, c).
LogiQL
class triangle_count : public graphlab::ivertex_program<graph_type, set_union_gather> { public: bool do_not_scatter; // Gather on all edges edge_dir_type gather_edges(icontext_type& context, const vertex_type& vertex) const { return graphlab::ALL_EDGES; } gather_type gather(icontext_type& context, const vertex_type& vertex, edge_type& edge) const { set_union_gather gather; graphlab::vertex_id_type otherid = edge.target().id() == vertex.id() ?edge.source().id() : edge.
target().id(); size_t other_nbrs = (edge.target().id() == vertex.id()) ? (edge.source().num_in_edges() + edge.source().num_out_edges()): (edge.target().num_in_edges() + edge.target().num_out_edges()); size_t my_nbrs = vertex.num_in_edges() + vertex.num_out_edges(); if (PER_VERTEX_COUNT || (other_nbrs > my_nbrs) || (other_nbrs == my_nbrs && otherid > vertex.id())) { gather.v = otherid; } return gather; } void apply(icontext_type& context, vertex_type& vertex, const gather_type& neighborhood { do_not_scatter = false; if (neighborhood.vid_vec.size() == 0) { vertex.data().vid_set.clear(); if (neighborhood.v != (graphlab::vertex_id_type(-1))) vertex.data().vid_set.vid_vec.push_back(neighborhood.v); } else vertex.data().vid_set.assign(neighborhood.vid_vec); do_not_scatter = vertex.data().vid_set.size() == 0; } edge_dir_type scatter_edges(icontext_type& context, const vertex_type& vertex) const { if (do_not_scatter) return graphlab::NO_EDGES; else return graphlab::OUT_EDGES; } void scatter(icontext_type& context, const vertex_type& vertex, edge_type& edge) const { const vertex_data_type& srclist = edge.source().data(); const vertex_data_type& targetlist = edge.target().data(); if (targetlist.vid_set.size() < srclist.vid_set.size()) edge.data() += count_set_intersect(targetlist.vid_set, srclist.vid_set); else edge.data() += count_set_intersect(srclist.vid_set, targetlist.vid_set); }};
3-CLIQUE IN LOGILQ vs. GRAPHLAB
©2013. LogicBlox. All Rights Reserved.
GraphLab - C++
3cliques(a, b, c) <-
edge(a, b),
edge(a, c),
edge(b, c).
LogiQL
4 CLIQUE - SYNTHETIC DATA
©2014. LogicBlox. All Rights Reserved.
4 CLIQUE - REAL DATA
©2014. LogicBlox. All Rights Reserved.
SEMANTIC WEB - LUBM
©2014. LogicBlox. All Rights Reserved.
DATAWAREHOUSE - TPC-H
©2013. LogicBlox. All Rights Reserved.
A NON-TRIVIAL EXAMPLE: PAGERANK IN LOGIQL
©2013. LogicBlox. All Rights Reserved.
d[] = 0.85f. // dampening factor
tolerance[] = 0.01f. // when to the pr change is small enough to stop
pr[p] = 1.0f / node_count[] <- node(p), !pr[p] = _. // initial pr
pr[p] = (1.0f - d[]) + (d[] * sum[p]) <-
abs[r - pr[p]] > tolerance[].
pr[p] = pr[p] <-
r = (1.0f - d[]) + (d[] * sum[p]),
!(abs[r - pr[p]] > tolerance[]).
pr[p] = pr[p] <- !sum[p] = _.
sum[n] = t <-
agg<< t = total(r) >>
edge(p, n),
r = pr[p] / out_count[p].
HOW DOES IT WORK
©2013. LogicBlox. All Rights Reserved.
ALGORITHMS FIRST
Computer Science @CompSciFact Sep 28
“Computer science is now about systems. It hasn’t been about algorithms since the 1960’s.” -- Alan Kay #hlf13
PHILOSOPHY: BRAINS BEFORE BRAWN
▪ Algorithmic scalability▪ New worst-case optimal join algorithm
▪ Incremental maintenance proportional to trace edit distance
▪ Adaptive domain decomposition for parallelization
▪ Data structures▪ Compression close to info-theoretic limit in some cases
▪ I/O minimization, cache consciousness
▪ Persistent data structures: full serializability, branch & merge, auditability, scalable distribution
▪ Unified declarative programming model▪ Optimizations through aggressive analysis
▪ Brute force▪ In-memory when data fits
▪ Distribution across thousands of cores, and GPUs
©2013. LogicBlox. All Rights Reserved.
A SMART JOIN ALGORITHM - LFTJ
▪ “Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm” T. Veldhuizen, ICDT 2014▪ Best Newcomer Award
©2013. LogicBlox. All Rights Reserved.
LFTJ INTUITION: CONSIDER MORE THAN PAIRS
©2013. LogicBlox. All Rights Reserved.
▪ Widely adopted technique: pair-wise joins
▪ Suppose A, B, and C each have 1 million records distributed over 3 months▪ Pair-wise join: best case scenario, 0.5 million records as intermediate results
▪ LFTJ: no records materialized
Jan Feb Mar
A(x)
B(x)
C(x)
SMARTER INCREMENTAL VIEW MAINTENANCE
▪ Incremental Maintenance for Leapfrog Triejoin, T. Veldhuizen, 2013▪ http://arxiv.org/abs/1303.5313
▪ Replaced our implementation of Count and DRed algorithms [Gupta+ 93]
▪ Guarantees that work is done proportional to the trace edit distance between the before and after▪ Critical for allowing caching analytical
views for performance, but still incorporating real-time updates
©2013. LogicBlox. All Rights Reserved.
INCREMENTALIZING 3 CLIQUE VIEW
©2013. LogicBlox. All Rights Reserved.
LogicBlox - Algebraic
+3cliques(a, b, c) <-
+edge(a, b), edge(a, c), edge(b, c).
+3cliques(a, b, c) <-
edge(a, b), +edge(a, c), edge(b, c).
+3cliques(a, b, c) <-
edge(a, b), edge(a, c), +edge(b, c).
DReD - Synthactic
3cliques(a, b, c) <-
edge(a, b), edge(a, c), edge(b, c).
edge(a, b) edge(a, c) edge(b, c)
INCREMENTAL MAINTENANCE OF 4-CLIQUE
©2013. LogicBlox. All Rights Reserved.
A PARTICULAR USE CASE OF LB FOR GRAPHS
©2013. LogicBlox. All Rights Reserved.
SCREAMING FAST PROGRAM ANALYSIS
▪ Order of magnitude faster than prior-art
▪ Program analysis is graph analysis▪ “Strictly Declarative Specification of
Sophisticated Points-to Analyses” (OOPSLA ‘09)
▪ “Exception Analysis and Points-to Analysis - Better Together” (ISSTA ‘09)
▪ “Pick Your Context Well - Understanding Object-Sensitivity” (POPL ’11)
▪ “Efficient and Effective Handling of Exceptions in Java Points-to Analysis” (CC’13)
▪ “Hybrid Context Sensitivity for Points-to Analysis” (PLDI ’13)
▪ “Set-based Pre-processing for Points-to Analysis” (OOPSLA ‘13)
©2013. LogicBlox. All Rights Reserved.
PROGRAM ANALYSIS IS ALL ABOUT GRAPH ANALYSIS
©2013. LogicBlox. All Rights Reserved.
COMPARE TO PRIOR-ART : >10x
©2013. LogicBlox. All Rights Reserved.
...AND THAT WAS ON PRIOR ART LOGICBLOX
©2013. LogicBlox. All Rights Reserved.
RECAP
▪ LogicBlox: the iPhone of databases▪ But perhaps the $10k camera of graph queries?
▪ Holy Grails▪ Declarative query language: LogiQL
▪ ACID transactions
▪ Guiding Principle: Brains before Brawns▪ Innovate on algorithms: LTFJ, incremental view maintenance, etc.
▪ Innovate on data structures
▪ Declarative language allows aggressive optimizations
▪ Brute force when necessary
©2014. LogicBlox. All Rights Reserved.
THANK YOU
©2014. LogicBlox. All Rights Reserved.