+ All Categories
Home > Documents > COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin...

COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin...

Date post: 20-Jan-2016
Category:
Upload: harriet-lewis
View: 218 times
Download: 0 times
Share this document with a friend
42
COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW Australia Week 11 (Part II) http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411 http://www.cse.unsw.edu.au/~sbeheshti/ COMP9321, 15s2, Week 11 Tuesday, 13 October 2015
Transcript
Page 1: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

COMP9321 Web Application Engineering

Semester 2, 2015

Dr. Amin BeheshtiService Oriented Computing Group,

CSE, UNSW Austral ia

Week 11( P a r t I I )

http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411http://www.cse.unsw.edu.au/~sbeheshti/

COMP9321, 15s2, Week 11 Tuesday, 13 October 2015

Page 2: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

Big Data: Challenges and Opportunities

http://webapps.cse.unsw.edu.au/webcms2/course/index.php?cid=2411http://www.cse.unsw.edu.au/~sbeheshti/

COMP9321, 15s2, Week 11

http://www.intelli3.com/

Page 3: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

3

We are Generating Vast Amounts of Data !!

Healthcare

Remote patient monitoring

Manufacturing

Product sensors

Location-Based Services

Real time location data

Retail

Social media…

Digitalization of Artefacts

books, music, videos, etc.

COMP9321, 15s2, Week 11

Page 4: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

4

We are Generating Vast Amounts of Data !!

Air Bus A380: generate 10 TB every 30 min

Twitter: Generate approximately 12 TB of data per day.

Facebook: Facebook data grows by over 500 TB daily.

New York Stock: Exchange 1TB of data everyday.

COMP9321, 15s2, Week 11

Page 5: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

5

We are Generating Vast Amounts of Meta-data !!

Data

Versioning

Provenance

Security

Privacy

COMP9321, 15s2, Week 11

Page 6: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

6

We are Generating Vast Amounts of Meta-data !!

Data

Versioning

Provenance

Security

Privacy

We are Tracing everything: Who did What? When? Where? …

e.g. Twitter handles ~1.6 billion search queries per day.COMP9321, 15s2, Week 11

Page 7: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

7

We are Generating Vast Amounts of Meta-data !!

Data

Versioning

Provenance

Security

Privacy

COMP9321, 15s2, Week 11

Beheshti S.M.R. et al. "E

nabling the Analysis of Cross-

Cutting Aspects in Ad-hoc Processes", C

AiSE Conference

(2013)

Page 8: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

8

Reading a book, e.g. Kindle tracks: what you are reading, when you are reading it, how often you read it, etc.

Listening to music, e.g. mp3 player tracks: what you are listening to, when and how often, in what order, etc.

Smart phones, e.g. iPhone tracks: our location, our speed, what apps we are using, who we are ringing, etc.

We are Generating Vast Amounts of Meta-data !!

COMP9321, 15s2, Week 11

Page 9: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

9

Reading a book, e.g. Kindle tracks: what you are reading, when you are reading it, how often you read it, etc.

Listening to music, e.g. mp3 player tracks: what you are listening to, when and how often, in what order, etc.

Smart phones, e.g. iPhone tracks: our location, our speed, what apps we are using, who we are ringing, etc.

We are Generating Vast Amounts of Meta-data !!

COMP9321, 15s2, Week 11

Projects: Smart TV

Page 10: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

10

Big Data and Big Meta-Data

share, comment, review,crowdsource, etc.

COMP9321, 15s2, Week 11

Big

Page 11: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

11

So, What is Big Data?

Big data refers to our ability to collect and analyse the ever expanding amounts of data and meta-data that we are generating every second!

Challenges: Capture,  Storage, Search, Sharing, Transfer, Analysis, Visualization, etc.

COMP9321, 15s2, Week 11

Page 12: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

12

So, What is Big Data?

Big data refers to our ability to collect and analyse the ever expanding amounts of data and meta-data that we are generating every second!

Challenges: Capture,  Storage, Search, Sharing, Transfer, Analysis, Visualization, etc.

COMP9321, 15s2, Week 11

Big Data !=Large Datasets

The big data problem can be seen as a massive

number of small data islands from personal, shared

and business data.

Linking and analyzing of this data is of high

interest.

Page 13: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

13

So, What is Big Data?

Big data refers to our ability to collect and analyse the ever expanding amounts of data and meta-data that we are generating every second!

Challenges: Capture,  Storage, Search, Sharing, Transfer, Analysis, Visualization, etc.

COMP9321, 15s2, Week 11

Big Data !=Large Datasets

The big data problem can be seen as a massive

number of small data islands from personal, shared

and business data.

Linking and analyzing of this data is of high

interest.

Page 14: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

14

What Makes it Big Data?

Volume the vast amounts of data generated every second.

Velocity the speed at which new data is generated and moves around.

Variety the increasingly different types of data.

Veracity the quality of data, e.g. the messiness of the data. Needs detecting and correcting noisy and inconsistent data

Value Statistical, Events, Correlation, Hypothetical

COMP9321, 15s2, Week 11

Page 15: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

15

Challenges: How to Store and Process?

Big data is high volume, high velocity, and/or high variety information assets.

Require new forms of storage and processing.

On-hand database management tools?

 Traditional data processing applications?

COMP9321, 15s2, Week 11

Page 16: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

16

Challenges: Big Data Storage

NoSQL databases:

Employs less constrained consistency models.Simple retrieval and appending operations.Significant performance benefits.

Examples:• Key–value Store• Document Store• Graph Database• …

COMP9321, 15s2, Week 11

Page 17: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

17

Challenges: Big Data Storage(Graphs are Everywhere)

Use

r

Movie

Netflix

Collaborative Filtering

Doc

s

Words

Wiki

Text Analysis

Social Network

Probabilistic Analysis

COMP9321, 15s2, Week 11

Page 18: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

18

Challenges: Big Data Storage(Graphs are Everywhere)

Use

r

Movie

Netflix

Collaborative Filtering

Doc

s

Words

Wiki

Text Analysis

Social Network

Probabilistic Analysis…Beheshti, et al. “Large Scale Graph Processing Systems:

Survey and An Experimental Evaluation”, Cluster Computing

Journal, 2015

…Beheshti, et al. “On Characterizing the Performance of

Distributed Graph Computation Platforms”. TPCTC Conference,

2014.

Page 19: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

19

Challenges: Big Data Storage(Graphs are Everywhere)

Use

r

Movie

Netflix

Collaborative Filtering

Doc

s

Words

Wiki

Text Analysis

Social Network

Probabilistic Analysis…Beheshti, et al. “Large Scale Graph Processing Systems:

Survey and An Experimental Evaluation”, Cluster Computing

Journal, 2015

…Beheshti, et al. “On Characterizing the Performance of

Distributed Graph Computation Platforms”. TPCTC Conference,

2014.

Page 20: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

20

Challenges: Big Data Storage(Graphs are Everywhere)

Use

r

Movie

Netflix

Collaborative Filtering

Doc

s

Words

Wiki

Text Analysis

Social Network

Probabilistic Analysis…Beheshti, et al. “Large Scale Graph Processing Systems:

Survey and An Experimental Evaluation”, Cluster Computing

Journal, 2015

…Beheshti, et al. “On Characterizing the Performance of

Distributed Graph Computation Platforms”. TPCTC Conference,

2014.

…,Beheshti S.M.R. et al. "DREAM: Distributed RDF Engine

with Adaptive Query Planner and Minimal Communication",

VLDB (2015)

Page 21: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

21

Challenges: Big Data Processing

Apache Hadoop:Hadoop is an open source framework that uses a

simple programming model to enable distributed processing of large data sets on clusters of computers.

Who Use Hadoop?

AmazonFacebookGoogle IBMNew York TimesYahoo!…

Apache Hadoop solution:• Distributed File System (HDFS)• MapReduce• Pig• HCatalog

COMP9321, 15s2, Week 11

Page 22: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

22

Challenges: Big Data Processing

Apache Spark:

EfficientIn-memory

storage

UsableRich APIs in

Java, Scala, Python

Fast and Expressive Cluster Computing Engine Compatible with Apache Hadoop

2-5× less code

Up to 10× faster on disk,100× in memory

COMP9321, 15s2, Week 11

Page 23: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

23

Challenges: Big Data Processing

Apache Spark:

EfficientIn-memory

storage

UsableRich APIs in

Java, Scala, Python

Fast and Expressive Cluster Computing Engine Compatible with Apache Hadoop

2-5× less code

Up to 10× faster on disk,100× in memoryDifference between Spark and MapReduce

• Spark stores data in-memory whereas H

adoop stores data on

disk.

• RDD, uses a clever way of guaranteeing fault to

lerance that

minimizes network I/O.

COMP9321, 15s2, Week 11 Resilient Distributed Dataset (RDD), Spark's data storage model

Page 24: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

24

Challenges: Big Data Integration

PeopleWeb ServicesIT SystemsWorkflows

Example Scenario: Business Processes (BPs)

...

Various

Perspect

ives

and Goals

BPsExecution

LogQuery

and

Explore

COMP9321, 15s2, Week 11

Page 25: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

25

Challenges: Big Data Integration

PeopleWeb ServicesIT SystemsWorkflows

Example Scenario: Business Processes (BPs)

...

Various

Perspect

ives

and Goals

BPsExecution

LogQuery

and

Explore

Beheshti S.M.R. et al. "A

query language for analyzing

business processes execution", BPM Conference (2011)

COMP9321, 15s2, Week 11

Page 26: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

26

Challenges: Big Data Integration

Messy, schema-less and complex Big Data world.

Less than 10% of Big Data world are genuinely relational.

e.g. Linked Data

COMP9321, 15s2, Week 11

Page 27: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

27

Challenges: Big Data Integration

Big Data-as-a-Service:Effective processing of big data within acceptable

processing time Easy access of the big data and the big data analysis

results

COMP9321, 15s2, Week 11

API Engineering• ProgrammableWeb - APIs, Mashups and the Web as Platform;

• www.programmableweb.com/

• DataSift….open data sources

Page 28: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

Reminder28

COMP9321, 15s2, Week 8

Seminars: API Engineering and Micro-Services

Thursday, 15 October from 15:00-17:00;Where: UNSW, Mathews Theatre D.

Two interesting talks: • API Engineering (Scientia Prof. Boualem Benatallah).• Micro-services (Mr. Graham Lea).

Page 29: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

29

Challenges: Big data requires a broad set of skills

COMP9321, 15s2, Week 11

Math and Operations Research Expertise

Develop analytic algorithms

VisualizationExpertise

Interpret data sets, determine correlations andpresent in meaningful ways

Tool Developers

Mask complexity and analytics to lower skills

boundaries

Industry VerticalDomain Expertise

Develop hypothesis, identifyrelevant business issues,

ask the right questions

Data Experts

Data architecture, management,

governance, policy

Decision MakingExecutive andManagement

Apply information to solvebusiness issues

Data Scientist

Page 30: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

30

Challenges: Big Data Analytics

Analytics can be defined in many ways, but what matters is the purpose of analytics. 

Most definitions agree on the following: Analytics is used to gain insights from data in order to make better decisions, using mathematical or scientific methods.

Analyse Decide

Data Insight Action

COMP9321, 15s2, Week 11

Manage the Data Understand the Data Act on the Data

Page 31: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

31

Challenges: Big Data Analytics

Analytics can be defined in many ways, but what matters is the purpose of analytics. 

Most definitions agree on the following: Analytics is used to gain insights from data in order to make better decisions, using mathematical or scientific methods.

Analyse Decide

Data Insight Action

COMP9321, 15s2, Week 11

Manage the Data Understand the Data Act on the Data

• Reporting is the most w

idely used analytic capability

• Gather data from multip

le sources and create standard

summarizations of the data

• Visualizations are created to bring the data to life and make it

easy to interpret.

Page 32: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

32

Challenges: Big Data Analytics

COMP9321, 15s2, Week 11

Page 33: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

33

Challenges: Big Data Analytics

COMP9321, 15s2, Week 11

Cognitive computing systems learn and interact naturally

with people to extend what either humans or machine could

do on their own.

self-learning systems that use :

• Data Mining,

• NLP

• Machine Learning

• Pattern Recognition

• Crowdsourcing

• …

e.g. IBM Watson Q&Ahttp://www.research.ibm.com/cognitive-computing/

Page 34: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

34

Challenges: Big Data Analytics

Example:• Beheshti et al., “Scalable Graph-based OLAP Analytics over Process

Execution Data”, DAPD Journal (2015).• Beheshti et al., “A Framework and a Language for On-Line Analytical

Processing on Graphs”, WISE Conference (2012).

OLAP, is an approach to answering multi-dimensional analytical queries swiftly.

Problem: • extension of existing OLAP techniques to

analysis of graphs is not straightforward.• key business insights remain hidden in the

interactions among objects.

Solution:• On-Line Analytical Processing on Graphs

COMP9321, 15s2, Week 11

Page 35: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

35

Challenges: Big Data Analytics

COMP9321, 15s2, Week 11

Page 36: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

36

Challenges: Big Data Analytics

Big Data Analytics benefits from:• NLP• Machine Learning

• Pattern recognition, Learning, Extraction, Classification, Enrichment, Linking, etc.

COMP9321, 15s2, Week 11

Examples:

• Healthcare• Social Networks

• e.g. Twitter• Education• Finance• …

Page 37: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

37

Challenges: Big Data Analytics

Big Data Analytics benefits from:• NLP• Machine Learning

• Pattern recognition, Learning, Extraction, Classification, Enrichment, Linking, etc.

Beheshti , et al., “Big data and cross-document coreference resolution: Current state and future opportunities”...

COMP9321, 15s2, Week 11

Page 38: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

38

Big Data Leadership !!

Industry has been in the lead Google, Amazon, Yahoo!, etc.

University researchers have been left behind !! due to lack of access to large-scale cluster computing

facilities

Government agencies are making heavy investments Investments in big-data computing will have extraordinary

near-term and long-term benefits. Cloud computing must be considered a strategic resource

COMP9321, 15s2, Week 11

Page 39: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

39

Big Data: Opportunities

COMP9321, 15s2, Week 11

• Varieties of Data• Text• Social Media• Networks• Multimedia• Machine Data• Sensors

• Analytics• Organizing Big Data• Navigating through

data• Summarizing Big Data• Process Data

Analytics• Support decision-

making

• Integration• Integrating enterprise and

public data• Linking data/context• Entity Extraction and Integration• Knowledge Graph

• Big Data Performance• In memory• New Benchmarks and

Architecture

• User Experience• automation and intelligent

guidance• Visualizing with Analytics• Interacting with Analytics• Storytelling

Page 40: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

40

Big Data: Opportunities

COMP9321, 15s2, Week 11

• Varieties of Data• Text• Social Media• Networks• Multimedia• Machine Data• Sensors

• Analytics• Organizing Big Data• Navigating through

data• Summarizing Big Data• Process Analytics• Support decision-

making

• Integration• Integrating enterprise and

public data• Linking data/context• Entity Extraction and Integration• Knowledge Graph

• Big Data Performance• In memory• New Benchmarks and

Architecture

• User Experience• automation and intelligent

guidance• Visualizing with Analytics• Interacting with Analytics• Storytelling

Book:

Beheshti S.M.R., B

oualem Benatallah, et al. , “Process

Analytics: Concepts and techniques for querying and analysing

big process data”,

Springer, ISBN 978-3-319-25037-3 ,(2

015)

http://www.springer.com/us/book/9783319250366

Page 41: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

41

Conclusion

Why Big Data is different from past Very Large Datasets? Meta-Data !!

Having the ability to analyse Big Data is of limited value if users cannot understand the analysis.

How can the industry and academia collaborate towards solving Big Data challenges!!

What is big today maybe not be big tomorrow!COMP9321, 15s2, Week 11

Page 42: COMP9321 Web Application Engineering COMP9321 Web Application Engineering Semester 2, 2015 Dr. Amin Beheshti Service Oriented Computing Group, CSE, UNSW.

42

COMP9321, 15s2, Week 11

Thank you!


Recommended