T OWARDS A B IG D ATA C OMMUNITY C HALLENGE

TOWARDS A BIG DATACOMMUNITY CHALLENGE

Tilmann Rabl, Florian Stegmaier,Michael Granitzer and Hans-Arno Jacobsen

3rd Workshop on Big Data BenchmarkingJuly 16-17

Xi‘an, China

BIG DATA – WHY COMMUNITY CHALLANGES MATTER

• Big Data is a major buzzword in scientific's world- Conferences, workshops, tutorials, panels- Component benchmark, end-to-end systems, etc.

• Variety leads to incomparability of results

• Research communities run challenges to… enable comparability of results… foster evolution of a research field… “Kites rise highest against the wind, not with it.” (W. Churchill)

WHAT SHOULD BE IN THE FOCUS?

DATA!

„[...] other communities, like information retrieval, natural language processing, or Web research, have a much richer and agile culture in creating, disseminating, and re-using interesting new data resources

for scientific experimentation [...]” – G. Weikum, SIGMOD Blog

HOW SHOULD IT BE?

INTERESTING!

HOW ARE „THE OTHERS“ DOING?• Information retrieval community:

– TREC, TRECVid (task-based, measurable scientific impact)

– CLEF Initiative (task-based, benchmarking initiatives)

• Multimedia community:– Multimedia Grand Challenge (tasks defined by “global players”,

e.g., Yahoo! and Microsoft)

– Open Source Software Comp. (foster community activities)

• Semantic Web guys:– Linked Data Cup (data generation)

– Semantic Web in-Use (mashup creation)

SUCCESSFUL COMMUNITY CHALLENGES: TAKE-HOME MESSAGE

• Challenges are not a single event• On-going process, running through different stages:

– Data generation– Solving restricted, high-impact issues– Fostering open source frameworks – Assembling mashups

• Accepted by the community

BRAINSTORMING AREA:STRUCTURE OF THE CHALLENGE

• Challenge needs to be focused on specific tasks:– Tasks assemble a “Big Data pipeline”– Specified by academia and industry

• Hybrid approach to engage participants:– Utilize benchmark activities– Computing tasks on “Open Data”

TIME TO BREAKOUT!• Discussions should focus on:

– Where to find large-scale, interesting “open” data sets?– Which tasks could form a sophisticated Big Data

pipeline ensuring a broad range of implementations?

BREAKOUT HOW-TO:• Breakout and student groups as

yesterday• Prepare one slide for each question

Date post:	15-Feb-2016
Category:	Documents
Upload:	flynn
View:	49 times
Download:	0 times

T OWARDS A B IG D ATA C OMMUNITY C HALLENGE

Documents