+ All Categories
Home > Data & Analytics > SPARQL and Linked Data Benchmarking

SPARQL and Linked Data Benchmarking

Date post: 10-Jul-2015
Category:
Upload: kristian-alexander
View: 208 times
Download: 0 times
Share this document with a friend
Popular Tags:
18
Benchmarks What benchmarks are commonly used and what they mean
Transcript
Page 1: SPARQL and Linked Data Benchmarking

Benchmarks

What benchmarks are commonly used and what they mean

Page 2: SPARQL and Linked Data Benchmarking

Overview

• SP2Bench

• LUBM

• BSBM

• UOBM

• SIB

• DBPedia SPARQL Benchmark

• LODIB

• FedBench

• THALIA Testbed

• Benchmark for Spatial Semantic Web Systems

• LODQA

• LinkBench

Page 3: SPARQL and Linked Data Benchmarking

SP2Bench

• Is language specific: SPARQL-based performance benchmark.

• Components: data generator, query set

• provides a scalable RDF data generator and a set of benchmark queries, designed to test typical SPARQL operator constellations and RDF data access patterns.

• Example comparison: http://arxiv.org/pdf/0806.4627v2.pdf

Page 4: SPARQL and Linked Data Benchmarking

LUBM

• The Lehigh University Benchmark• “The Lehigh University Benchmark is developed

to facilitate the evaluation of Semantic Web repositories in a standard and systematic way. The benchmark is intended to evaluate the performance of those repositories with respect to extensional queries over a large data set that commits to a single realistic ontology.“

• Components: ontology, data generator, test queries, tester

• http://swat.cse.lehigh.edu/projects/lubm/

Page 5: SPARQL and Linked Data Benchmarking

BSBM

• Berlin SPARQL Benchmark

• provides for comparing the performance of RDF and Named Graph stores as well as RDF-mapped relational databases and other systems that expose SPARQL endpoints. Designed along an e-commerce use case. SPARQL and SQL version available.

• http://wifo5-03.informatik.uni-mannheim.de/bizer/berlinsparqlbenchmark/

Page 6: SPARQL and Linked Data Benchmarking

UOBM

• The Ontology Benchmark

• Extends the LUBM benchmark in terms of inference and scalability testing.

• Components: ontology and test data set

• http://www.springerlink.com/content/l0wu543x26350462/University

Page 7: SPARQL and Linked Data Benchmarking

SIB

• Social Network Intelligence Benchmark (SIB)

• A benchmark suite developed by people at CWI and Openlink taking the schema from Social Networks for generating test areas where RDF/SPARQL can truly excel, and challenging query processing over highly connected graph.

• http://www.w3.org/wiki/Social_Network_Intelligence_BenchMark

Page 8: SPARQL and Linked Data Benchmarking

DBPedia SPARQL Benchmark

• Designed to be benchmark against DBPediadata in order to provide a clear picture of real world performance.

• “Performance Assessment with Real Queries on Real Data.”

• http://svn.aksw.org/papers/2011/VLDB_AKSWBenchmark/public.pdf

Page 9: SPARQL and Linked Data Benchmarking

LODIB

• The Linked Data Integration Benchmark

• Is a benchmark for comparing the expressivity as well as the runtime performance of Linked Data translation/integration systems.

• http://wifo5-03.informatik.uni-mannheim.de/bizer/lodib/

Page 10: SPARQL and Linked Data Benchmarking

FedBench

• Benchmark for measuring the performance of federated SPARQL query processing.

• ISWC2011 whitepaper: https://www.uni-koblenz.de/~goerlitz/publications/ISWC2011-FedBench.pdf

Page 11: SPARQL and Linked Data Benchmarking

THALIA Testbed

• Is designed to test the expressiveness of relational-to-RDF mapping languages.

• http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/THALIATestbed

Page 12: SPARQL and Linked Data Benchmarking

Benchmark for Spatial Semantic Web Systems

• Extends LUBM with sample spatial data.

• https://filebox.vt.edu/users/dkolas/public/ssbm/

Page 13: SPARQL and Linked Data Benchmarking

LODQA

• The Linked Open Data Quality Assessment

• Is a benchmark for comparing data quality assessment and data fusion systems.

• https://filebox.vt.edu/users/dkolas/public/ssbm/

Page 14: SPARQL and Linked Data Benchmarking

LinkBench

• A database benchmark that is designed for the Facebook Social Graph.

• Whitepaper: http://people.cs.uchicago.edu/~tga/pubs/sigmod-linkbench-2013.pdf

• https://www.facebook.com/notes/facebook-engineering/linkbench-a-database-benchmark-for-the-social-graph/10151391496443920

Page 15: SPARQL and Linked Data Benchmarking

Performance Results

• Results provided by store implementers themselves:– Virtuoso BSBM benchmark results (native RDF store versus mapped relational

database)– Jena TDB BSBM benchmark results (native RDF store)– OWLIM Benchmark results (LUBM, BSBM and Linked Data loading/inference)– SemWeb .NET library BSBM benchmark results– Virtuoso LUBM benchmark results– AllegroGraph 2.0 Benchmark for LUBM-50-0– Sesame NativeStore LUBM benchmark results– RacerPro LUBM benchmark results– SwiftOWLIM benchmark results for the LUBM and City benchmark (from slide

27 onwards)– Oracle 11g benchmark results for the LUBM and Uniprot benchmark (from

slide 20 onwards)– Jena SDB/Query performance and SDB/Loading performance– Bigdata BSBM V3 Reduced Query Mix benchmark results

Page 16: SPARQL and Linked Data Benchmarking

Performance Results

• Results provided by third parties:– Cudre-Mauroux, et al.: NoSQL Databases for RDF: An Empirical Evaluation (November 2013,

Uses the BSBM benchmark with workloads from 10 million to 1 billion triples to benchmark several NoSQL databases).

– Peter Boncz, Minh-Duc Pham: Berlin SPARQL Benchmark Results for Virtuoso, Jena TDB, BigData, and BigOWLIM (April 2013, 100 million to 150 billion triples, Explore and Business Intelligence Use Cases).

– Christian Bizer, Andreas Schultz: Berlin SPARQL Benchmark Results for Virtuoso, Jena TDB, 4store, BigData, and BigOWLIM (February 2011, 100 and 200 million triples, Explore and Update Use Cases).

– Christian Bizer, Andreas Schultz: Berlin SPARQL Benchmark Results for Virtuoso, Jena TDB and BigOWLIM (November 2009, 100 and 200 million triples).

– L.Sidirourgos et al.: Column-Store Support for RDF Data Management: not all swans are white. An experimental analysis along two dimensions – triple-store vs. vertically-partitioned and row-store vs. column-store – individually, before analyzing their combined effects. In VLDB 2008.

– Christian Bizer, Andreas Schultz: Berlin SPARQL Benchmark Results. Benchmark along an e-commerce use case comparing Virtuoso, Sesame, Jena TDB, D2R Server and MySQL with datasets ranging from 250,000 to 100,000,000 triples and setting the results into relation to two RDBMS. 2008. (Note: As discussed in Orri Erling's blog, the SQL mix results did not accurately reflect steady-state of all players, and should be taken with a grain of salt. Warm-up steps will change for future runs.)

Page 17: SPARQL and Linked Data Benchmarking

Performance Results

• Results provided by third parties (cont):– Michael Schmidt et al.: SP2Bench: A SPARQL Performance Benchmark. Benchmark based on the DBLP data

set comparing current versions of ARQ, Redland, Sesame, SDB, and Virtuoso. TR, 2008 (short version of the TR to appear in ICDE 2009).

– Michael Schmidt et al.: An Experimental Comparison of RDF Data Management Approaches in a SPARQL Benchmark Scenario. Benchmarking Relational Database schemes on top of SP2Bench Suite. In ISWC 2008.

– Atanas Kiryakov: Measurable Targets for Scalable Reasoning– Baolin Liu and Bo Hu: An Evaluation of RDF Storage Systems for Large Data Applications– Christian Becker: RDF Store Benchmarks with DBpedia comparing Virtuoso, SDB and Sesame. 2007– Kurt Rohloff et al.: An Evaluation of Triple-Store Technologies for Large Data Stores. Comparing Sesame, Jena

and AllegroGraph. 2007– Christian Weiske: SPARQL Engines Benchmark Results– Ryan Lee: Scalability Report on Triple Store Applications comparing Jena, Kowari, 3store, Sesame. 2004– Martin Svihala, Ivan Jelinek: Benchmarking RDF Production Tools Paper comparing the performance of

relational database to RDF mapping tools (METAmorphoses, D2RQ, SquirrelRDF) with native RDF stores (Jena, Sesame)

– Michael Streatfield, Hugh Glaser: Benchmarking RDF Triplestores, 2005

Page 18: SPARQL and Linked Data Benchmarking

Observations

• LUBM and BSBM results are shown often on the major players’ own websites.

• SP2Bench is harder to find on their websites, and other resources.

• A lot of the reported results by companies highlights performance on a very high performance computer, not commodity computers.

• http://www.w3.org/wiki/RdfStoreBenchmarking


Recommended