Download - Hobbit presentation at Apache Big Data Europe 2016

Unified Benchmarking of Big Data PlatformsThe HOBBIT Platform

Axel-Cyrille Ngonga Ngomo

Horizon 2020GA No 688227

01/12/2016–30/11/2018

Apache Big DataSevilla, Spain

November 15, 2016

Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 1 / 44

Summary

Rationale

A community-driven unified benchmarking platform for the community

Focus on Big (Linked) DataProvide benchmarks and baselinesProvide reference implementation of KPIsExtensible and referenceableResult analysisOpen-Sourcehttp://project-hobbit.eu

@hobbit_project


http://project-hobbit.eu

Summary

Rationale


Focus on Big (Linked) DataProvide benchmarks and baselinesProvide reference implementation of KPIsExtensible and referenceableResult analysisOpen-Sourcehttp://project-hobbit.eu

@hobbit_projectNgonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 2 / 44

http://project-hobbit.eu

A Lot of Data

1

1http://www.ibmbigdatahub.com/infographic/four-vs-big-dataNgonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 3 / 44

http://www.ibmbigdatahub.com/infographic/four-vs-big-data

A Lot of Tools

2

2https://cloudramblings.me/Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 4 / 44

https://cloudramblings.me/

A Lot ... of Tools

33http://mattturck.com/2016/02/01/big-data-landscape/


http://mattturck.com/2016/02/01/big-data-landscape/

Questions

Developers: How good is my tool?Vendors: Who is my tool good for?Users: Which tool(s) should I use formy application?


Many Questions

Where are the current bottlenecks?Which steps of the data lifecycle arecritical?Which solutions are available?Which key performance indicatorsare relevant?How well do or should toolsperform?How do existing solutions performw.r.t. relevant indicators?


A Lot of Views

44https://steemit.com/philosophy/@l0k1/

subjectivity-and-truth-how-blockchains-model-consensus-buildingNgonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 8 / 44

https://steemit.com/philosophy/@l0k1/subjectivity-and-truth-how-blockchains-model-consensus-building

https://steemit.com/philosophy/@l0k1/subjectivity-and-truth-how-blockchains-model-consensus-building

SolutionBenchmark

ComponentsDataset(s), e.g., Twitter stream, sensor dataTask(s), e.g., entity recognition, storagePerformance indicators, e.g., precision, recall, queries per second


SolutionBenchmark

ComponentsDataset(s), e.g., Twitter stream, sensor dataTask(s), e.g., entity recognition, storagePerformance indicators, e.g., precision, recall, queries per second


SolutionBenchmark

TPCH-H (3,000 GB Results): −5.6 × 106 QphH between 2014 and 20165QALD: ≈ 5% increase in Micro F-MeasureACE2004: ≈ 6% increase in Micro F-measure

5http://www.tpc.org/tpch/results/tpch_perf_results.aspNgonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 10 / 44

http://www.tpc.org/tpch/results/tpch_perf_results.asp

ChallengesDataset and KPI Mismatch

Year

ACE

Wiki

AQUA

INT

MSN

BC

IITB

Meij

AIDA

/CoN

LL

N3collection

KORE

50

Wiki-D

isamb3

0

Wiki-A

nnot30

Spotlight

Corpus

SemEv

al-2013task

12

SemEv

al-2007task

7

SemEv

al-2007task

17

Senseval-3

NIF-based

corpus

Micr

oposts2014

Softw

areavailable?

Webservice

available?

Cucerzan 2007 3Wikipedia 2008 3* 3MinerIllinois Wikifier 2011 3 3 3* 3 3Spotlight 2011 3 3 3AIDA 2011 3 3 3**TagMe 2 2012 3 3 3 3Dexter 2013 3 3KEA 2013 3WAT 2013 3 3AGDISTIS 2014 3 3 3 3 3 3 3 3 3 3Babelfy 2014 3 3 3 3 3 3 3NERD-ML 2014 3 3 3 3

BAT- 2013 3 3 3 3 3 3 3* 3FrameworkNERD 2014 3 3 3 3 3FrameworkGERBIL 2014 3 3 3 3 3 3 3* 3 3 3 3 3 3 3


ChallengesUnclear KPI Semantics

ExampleFederated queries in distributed storage solutionsWhich time do we measure?

First or last result?With or without network delay?


ChallengesUnclear KPI Semantics

ExampleEntity recognition and linkingWhen is an annotation correct?

Weak or strong annotation?Semantically equivalent or exact URI?


Solution!Unified Benchmarking Framework

RationaleProvide all benchmark components in one packageInclude reference datasets and baselinesDevise standardized tasks and reference KPI implementations

Benchmark Core

Web service calls

Dataset Wrapper

Web service calls

Interface View

AnnotatorWrapper

Interface View

Open Datasets

Configuration(Model)

...

Benchmark Core

Your Annotator

Your DatasetDataHub.io

GERBIL Core Controller

Persistent Experiment Database

(Model)


Solution!Unified Benchmarking Framework

RationaleProvide all benchmark components in one packageInclude reference datasets and baselinesDevise standardized tasks and reference KPI implementations

Benchmark Core

Web service calls

Dataset Wrapper

Web service calls

Interface View

AnnotatorWrapper

Interface View

Open Datasets

Configuration(Model)

...

Benchmark Core

Your Annotator

Your DatasetDataHub.io

GERBIL Core Controller

Persistent Experiment Database

(Model)


GERBILHOBBIT v0.1

FeaturesUnified benchmarking platformfor NER/NEL18 reference annotation systems32 reference datasetsReference implementations ofKPIs

AdvantagesBenchmarking ≈ 30× fasterArchiving of resultsCiteable URIsAdditional analysis

AvailabilityOpen-source projectLocal deploymentOnline instanceFeedback for developers and users

http://gerbil.aksw.orghttp://github.org/aksw/gerbil


http://gerbil.aksw.org

http://github.org/aksw/gerbil

GERBILHOBBIT v0.1








GERBILHOBBIT v0.1








GERBILHOBBIT v0.1

Annotator TasksNIF-based Annotators 2519Babelfy 958DBpedia Spotlight 922TagMe 2 811WAT 787Kea 763Wikipedia Miner 714NERD-ML 639Dexter 587AGDISTIS 443Entityclassifier.eu NER 410FOX 352Cetus 1Overall 24.3K exps

50+ papers





HOBBITRationale

Rationale


Build upon 24.3K GERBIL experimentsExperiments focus on Big Linked DataDesigned to accomodate all Big Data

Cover all steps of the Big (Linked) DatalifecycleOpen benchmarks based on industrial dataand use cases


HOBBITRationale

Rationale


Build upon 24.3K GERBIL experimentsExperiments focus on Big Linked DataDesigned to accomodate all Big Data

Cover all steps of the Big (Linked) DatalifecycleOpen benchmarks based on industrial dataand use cases


HOBBITAims

1 Gather real requirementsPerformance indicatorsPerformance thresholds

2 Develop benchmarks based on real data3 Provide universal benchmarking platform

Standardized hardwareComparable results

4 Periodic benchmarking challenges5 Periodic reporting6 Found independent Hobbit association


HOBBITOverview

Data Collection

Industrydata

Measure Collection

Benchmark Creation

Benchmark 1

KPIsTasks

KPIsTasksKPIsTasks

KPIsTasks

KPIsTasks

KPIsTasks

Benchmark 2

Benchmark n

HOBBITPlatform

Solution 1

Solution k

Solution 2

Challenges

Reports

Participants/Community


SurveyQuestions

QuestionsIn what areas are organizations active?What do people expect from benchmarks?How are benchmarks being used?

Profile CountSolution providers 56Technology users 67Scientific community 65


SurveyCan your solution be benchmarked?


SurveyDo you benchmark your solution?

Own datasets and settings in many casesOwn implementations of measuresResults not comparable


SurveyApplication Areas

http://big-data-europe.eu


http://big-data-europe.eu

HOBBIT PlatformFeatures

Uses established deploymenttechnologies (Docker)

Decoupled componentsBenchmark and systems can bewritten in different languages

Uses scalable message queues forcommunicationOpen-source implementationSupports distributed benchmarksand systemsOnline instance on server cluster


HOBBIT PlatformFeatures

FeaturesUnified benchmarking platform for BigData20+ reference annotation systems40+ reference datasetsReference implementations of KPIs

AdvantagesBenchmarks derived from real industrialdata and use casesScalable size of benchmarksArchiving of resultsCiteable URIsResult analysis

AvailabilityOpen-source projectLocal deployment


HOBBIT PlatformArchitecture

PlatformController

Data Generator

Task Generator

Data Generator

Data Generator

Task Generator

Task Generator

Front End

Benchmarked System

data flowcreates component

StorageAnalysis

BenchmarkController

Evaluation Module

Eval. Storage

Logging


HOBBIT PlatformBenchmark Initialization

PlatformController

Data Generator

Task Generator

Data Generator

Data Generator

Task Generator

Task Generator

Benchmarked System


Storage

BenchmarkController

Eval. Storage


HOBBIT PlatformBenchmark Execution

PlatformController


Storage

Data Generator

Task Generator

Data Generator

Data Generator

Task Generator

Task Generator

Benchmarked System

BenchmarkController

Eval. Storage

ex:Entity rdf:type ex:Class...



PlatformController


Storage

Data Generator

Task Generator

Data Generator

Data Generator

Task Generator

Task Generator

Benchmarked System

BenchmarkController

Eval. Storage

vex:Entity...

SELECT ?vWHERE { ?v a ex:Class }



PlatformController


Storage

Data Generator

Task Generator

Data Generator

Data Generator

Task Generator

Task Generator

Benchmarked System

BenchmarkController

Eval. Storage

X

vex:Entity...


HOBBIT PlatformBenchmark Evaluation


PlatformController

Storage

BenchmarkController

Evaluation Module

Eval. Storage

precision=...recall=...F1-score=... precision=...

recall=...F1-score=...

benchmark parameters: ...

vex:Entity...

vex:Entity...


HOBBIT PlatformBenchmarks

Streaming and static deterministic benchmarksRealistic benchmarksControlled volume and velocity

Generation and AcquisitionConversion of XML into RDFEntity recognition and linkingRelation extraction

Analysis and ProcessingLink DiscoveryMachine LearningSupervised and unsupervised

Storage and CurationTriple storesVersioningIncl. updates

Visualization and ServicesQuestion AnsweringFaceted BrowsingUsage-based benchmarks


DatasetsTWIG

Goal: Simulate real Twitter FirehoseRelies on 476 million tweets as training dataMimicking algorithm based on

Distribution of character frequenciesDistribution of transportation frequencyNetwork topology


DatasetsLinkedConnections

Goal: Simulate real transport networkReal transportation data from Belgium for trainingMimicking algorithm based on

Observed correlation between population density and transportationDistribution of transportation frequencyNetwork topology


DatasetsPrinting Machinery

Goal: Simulate events from printing machineryMimicking algorithm using event correlations and distributions

Changing plate

Double sheet

Early sheet

Finish job

Misaligned sheet

Missing sheet

Operation partially completed

Performance

Printing interval

Produktion Good Sheet

Side guide warning

Start job

Washing blanket

Washing impression cylinder

Washing ink rollers

with washing ink fountain roller

with washing plates

Mai 01 00:00 Mai 01 06:00 Mai 01 12:00 Mai 01 18:00 Mai 02 00:00Time

Eve

nts


DatasetsWeidmüller

Goal: Simulate events from injection molding machineryMimicking algorithm using event correlations and distributions


DatasetsSemantic Publishing

Goal: Simulate data from the BBCGenerator based on manually configurable set of correlations


HOBBIT RunsTriple Stores

1 4 161

10

100

1000

QmpH Updates

virtuosoblazegraphfuseki

SPARQL worker

upda

tes


HOBBIT RunsRuntimes

10× more effort for reduction of error rate by 30%Ngonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 39 / 44

HOBBIT RunsA2KB

System AIDA

/CoN

LL-Com

p.

IITB

KORE

50

MSN

BC

Micr

op.2014-Train

N3-Re

uters-128

AIDA 0.668 0.141 0.625 0.622 0.363 0.391Babelfy 0.448 0.129 0.564 0.423 0.311 0.289DBpedia Spotlight 0.545 0.262 0.341 0.457 0.448 0.320FOX 0.512 0.100 0.268 0.127 0.309 0.518FREME NER 0.358 0.074 0.160 0.208 0.254 0.263WAT 0.673 0.137 0.543 0.631 0.403 0.480xLisa 0.363 0.233 0.352 0.365 0.322 0.274


Summary

Rationale


Provide benchmarks and baselinesProvide reference implementation of KPIsExtensible and referenceableOpen-Source

@hobbit_project


Summary

Rationale


Provide benchmarks and baselinesProvide reference implementation of KPIsExtensible and referenceableOpen-Source

@hobbit_projectNgonga Ngomo (InfAI) Benchmarking Big Data November 15th, 2016 41 / 44

Join HOBBIT

α completedJoin the HOBBIT communityProvide KPIsProvide datasetsJoin the platform developmentFollow us on Twitter

https://project-hobbit.eu


https://project-hobbit.eu

Thank You

Axel NgongaAKSW Research GroupInstitute for Applied [email protected]

Michael RöderAKSW Research GroupInstitute for Applied [email protected]


Acknowledgment

This work was supported by grants from the EU H2020 Framework Programmeprovided for the project HOBBIT (GA no. 688227).