+ All Categories
Home > Documents > VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of...

VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of...

Date post: 01-Apr-2018
Category:
Upload: vudat
View: 226 times
Download: 2 times
Share this document with a friend
119
1/109 VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages . Jeevan Joishi [[email protected]] MTech Research Associate, Software Analytics Research Lab (SARL) www.software-analytics.in
Transcript
Page 1: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

1/109

VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational

Database Query Languages .

Jeevan Joishi [[email protected]]

MTech Research Associate, Software Analytics Research Lab (SARL)www.software-analytics.in

Page 2: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

2/109

MTech Thesis Evaluation Committee Members

Thesis AdviserProf. Ashish SurekaAdjunct Faculty at IIIT-Delhi and currently Visiting Researcher at Siemens Corporate Research and TechnologyFaculty In-charge, Software Analytics Research Lab (SARL)

External Examiner

Dr. Radha Krishna PisipatiPrincipal Research Scientist at Infosys Technologies Limited.

Internal Examiner

Prof. Sandip AineFaculty Member at IIIT-Delhi

Page 3: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

3/109

1. Research Motivation and Aim

2. Related Work and Novel Research Contributions

3. Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

5. Experimental Dataset

6. Performance Comparison

7. Conclusion

8. Limitations

9. References

Outline

Vishleshan

Page 4: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

4/109

1. Research Motivation and Aim

2. Related Work and Novel Research Contributions

3. Implementation of Alpha Algorithm in SQL, RDBMS

4. Implementation of Alpha Algorithm in CQL, Column Oriented

5. Experimental Dataset

6. Performance Comparison

7. Conclusion

8. Limitations

9. References

Presentation Outline

Vishleshan

Research Motivation and Aim

Page 5: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

5/109

Why NoSQL?

• Global population accessing internet has increasedtremendously.

• Most applications are hosted on the cloud and need tosupport users 24 hours a day, 365 days a year.

Vishleshan

Research Motivation and Aim

Introduction to NoSQL

Figure taken from [17]

Fig 1: Scale of internet usage.

Page 6: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

6/109

Why NoSQL?

• Data is captured in huge volumes and consists of bothstructured and unstructured data.

• Amount of data is growing rapidly and nature of data isgrowing as well.

Vishleshan

Research Motivation and Aim

Introduction to NoSQL

Figure taken from [17]

Fig 2: Growth of data.

Page 7: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

7/109

Why NoSQL?

• What is wrong with relational databases?

• Nothing!

• Relational Databases employ “one size fits all” philosophyfor storage.

• Relational Databases are used when strong consistency is amust.

• Relational Databases can create problem when its time toscale.

Vishleshan

Research Motivation and Aim

Introduction to NoSQL

Page 8: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

8/109

Why NoSQL?

• Explosion of social media sites like Facebook, Twitter with large dataneeds.

• They had to capture and deal with very large volumes of data in a waywhich was difficult to deal with traditional RDBMS.

• Traditional databases are designed to scale up. We required a databasethat can scale out.

• When relational applications become successful, usage goes up. Joinsare inherent in RDBMS and become very slow!

• Application developers find it difficult to get the dynamic scalabilitythey need while maintaining the performance users demand.

Vishleshan

Research Motivation and Aim

Introduction to NoSQL

Page 9: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

9/109

Why NoSQL?

We require a technology that scales out rather than scaling up!

• Scale Up- Add more processor, memory.

• Scale Out- Add more servers.

Vishleshan

Research Motivation and Aim

Introduction to NoSQL

Figure taken from [18]

Fig 3: Scale-up vs. Scale-out.

Page 10: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

10/109

NoSQL Database.

• Hence, NoSQL databases were introduced:• Not Only SQL

• Non-relational data stores.

• Do not require a fixed table schema.

• Do not strictly follow on ACID properties of database,instead focus on CAP(Consistency, Availability, PartitionTolerance).

• Column stores, Graph databases, Document stores.

Vishleshan

Research Motivation and Aim

Introduction To NoSQL

Page 11: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

11/109

RDBMS vs. NoSQL

• Scale up vs. Scale out

• Normalization vs. De-normalization

• ACID vs. CAP

• Schema vs. Schema-less

• Structured Data vs. Unstructured Data.

Vishleshan

Research Motivation and Aim

Introduction to NoSQL

Page 12: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

12/109

Row Oriented vs. Graph Oriented

Vishleshan

Research Motivation and Aim

Row Oriented vs. Graph Oriented Database

Figure taken from [19]

Record No

Name Address City State

01 Jeevan Joishi Uniworld Apartment Bangalore Karnataka

02 Kunal Gupta 15th Cross Road Kanpur Uttar Pradesh

03 Priyanka Verma Sector-7 Jind Haryana

04 Nidhi Agarwal JJ colony Bhiwani Haryana

Table 1: A RDBMS table.Fig 4: A Graph model.

Page 13: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

13/109

Row Oriented vs. Graph Oriented

Vishleshan

Research Motivation and Aim

Row Oriented vs. Graph Oriented Database

• In row oriented, to read specific attributes, whole recordneeds to be read.

• Joins in relational databases are compute-intensive tasks.

• However, graph databases can read individual values basedon nodes, relationships or properties.

• Graph databases avoid joins by traversing relationship(s)using index-free adjacency.

Page 14: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

14/109

Row Oriented vs. Graph Oriented

Vishleshan

Research Motivation and Aim

Row Oriented vs. Graph Oriented Database

Figure taken from [20]

Fig. 5: Relationships in Relational databases.

Page 15: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

15/109

Row Oriented vs. Graph Oriented

Vishleshan

Research Motivation and Aim

Row Oriented vs. Graph Oriented Database

Figure taken from [20]

Fig. 5: Relationships in Relational databases.

Page 16: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

16/109

Row Oriented vs. Graph Oriented

Vishleshan

Research Motivation and Aim

Row Oriented vs. Graph Oriented Database

Figure taken from [20]

Fig. 6: Relationships in Graph databases.

Page 17: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

17/109

Row Oriented vs. Graph Oriented

Vishleshan

Research Motivation and Aim

Row Oriented vs. Graph Oriented Database

Native Graph Processing using index-free adjacencyNon-Native Graph Processing using Global lookup indexFig 7: Fig 8:

• Non-native vs. Native Graph Processing

Page 18: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

18/109

Process Mining

• Process Mining is analysing a process using event log data.

• One of the key aspects is to study the social structure of theorganization using event logs.

Vishleshan

Research Motivation and Aim

Process Mining

Fig 9: Types of Process Mining Techniques

Page 19: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

19/109

Process Mining

Process Mining focuses on the analysis of process usingthe data present in event logs.

• Each event in an event log record details in an activity.

• Each event is associated with Case Identifiers (CaseID).

• Each event has a timestamp.

• Each event has an activity that is being performed.

• An event has an actor that handles the event.

• Additionally, each such event may include a unique identifier.

Vishleshan

Research Motivation and Aim

Process Mining

Page 20: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

20/109

Process Mining

Vishleshan

Research Motivation and Aim

Process Mining

Fig. 10: An example Event Log.

Page 21: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

21/109

Process Mining

• Each event in an event log record details in an activity.

• Each event is associated with Case Identifiers(CaseID).

• Each event has a timestamp.

• Each event has an activity that is being performed.

• An event has an actor that handles the event.

• Additionally, each such event may include a uniqueidentifier.

Vishleshan

Research Motivation and Aim

Process Mining

Page 22: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

22/109

Process Mining

Vishleshan

Research Motivation and Aim

Process Mining

Fig. 10: An example Event Log.

Page 23: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

23/109

Process Mining

• Each event in an event log record details in an activity.

• Each event is associated with Case Identifiers (CaseID).

• Each event has a timestamp.

• Each event has an activity that is being performed.

• An event has an actor that handles the event.

• Additionally, each such event may include a uniqueidentifier.

Vishleshan

Research Motivation and Aim

Process Mining

Page 24: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

24/109

Process Mining

Vishleshan

Research Motivation and Aim

Process Mining

Fig. 10: An example Event Log.

Page 25: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

25/109

Process Mining

• Each event in an event log record details in an activity.

• Each event is associated with Case Identifiers (CaseID).

• Each event has a timestamp.

• Each event has an activity that is being performed.

• An event has an actor that handles the event.

• Additionally, each such event may include a uniqueidentifier.

Vishleshan

Research Motivation and Aim

Process Mining

Page 26: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

26/109

Process Mining

Vishleshan

Research Motivation and Aim

Process Mining

Fig. 10: An example Event Log.

Page 27: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

27/109

Process Mining

• Each event in an event log record details in an activity.

• Each event is associated with Case Identifiers (CaseID).

• Each event has a timestamp.

• Each event has an activity that is being performed.

• An event has an actor that handles the event.

• Additionally, each such event may include a uniqueidentifier.

Vishleshan

Research Motivation and Aim

Process Mining

Page 28: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

28/109

Process Mining

Vishleshan

Research Motivation and Aim

Process Mining

Fig. 10: An example Event Log.

Page 29: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

29/109

Process Mining

• Each event in an event log record details in an activity.

• Each event is associated with Case Identifiers (CaseID).

• Each event has a timestamp.

• Each event has an activity that is being performed.

• An event has an actor that handles the event.

• Additionally, each such event may include a uniqueidentifier.

Vishleshan

Research Motivation and Aim

Process Mining

Page 30: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

30/109

Process Mining

Vishleshan

Research Motivation and Aim

Process Mining

Fig. 10: An example Event Log.

Page 31: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

31/109

Process Mining

Vishleshan

Research Motivation and Aim

Process Mining

3 types of process mining techniques:1. Process Discovery

2. Process Conformance

3. Process Enhancement

3 types of process mining perspectives:1. Control Flow Perspective

2. Organizational Perspective

3. Case Perspective.

Page 32: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

32/109

Similar – Task Algorithm

Similar – Task algorithm focuses on identifying actorsperforming similar activities in the organizationalperspective.

• It focuses on activities the actors perform irrespective ofcases.

• It is based on the notion that people doing similar thingshave a stronger relation than people doing different things.

Vishleshan

Research Motivation and Aim

Process Mining

Page 33: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

33/109

Similar – Task Algorithm

Vishleshan

Research Motivation and Aim

Process Mining

Case Identifier

Activity Identifier

Actor

1 A Nidhi

2 A Nidhi

2 C Kunal

1 B Priyanka

3 A Pooja

1 C Nidhi

3 D Kunal

3 B Priyanka

2 B Pooja

2 D Astha

1 D Astha

A B C D

Nidhi 2 0 1 0

Kunal 0 0 1 0

Priyanka 0 2 0 0

Pooja 1 1 0 0

Astha 0 0 0 2

Actor-Activity Matrix

Sample Event LogTable 2:

Table 3:

Page 34: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

34/109

Similar - Task Algorithm

Given two vectors of attributes, A and B, the Cosine-Similarity if given by

Vishleshan

Research Motivation and Aim

Process Mining

Nidhi Kunal Priyanka Pooja Astha

Nidhi --- 0.32 0.00 0.63 0.00

Kunal 0.32 --- 0.00 0.00 0.70

Priyanka 0.00 0.00 --- 0.70 0.00

Pooja 0.63 0.00 0.70 --- 0.00

Astha 0.00 0.70 0.00 0.00 ---

Cosine – Similarity ValuesTable 4:

Figure taken from [21].

Page 35: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

35/109

Similar – Task Algorithm at a glance!

Vishleshan

Research Motivation and Aim

Similar - Task Algorithm at a glance!

Page 36: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

36/109

Sub – Contract Algorithm

Sub – Contract algorithm focuses on how work moves amongperformers.

• The main idea is to count the number of times individual jperforms an activity in between two activities performed byindividual i.

• The relation between individuals are case dependent.

Vishleshan

Research Motivation and Aim

Process Mining

Page 37: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

37/109

Sub – Contract Algorithm

Vishleshan

Research Motivation and Aim

Process Mining

Case Identifier

Activity Identifier

Actor

1 A Nidhi

2 A Nidhi

2 C Kunal

1 B Priyanka

3 A Pooja

1 C Nidhi

3 D Kunal

3 B Priyanka

2 B Pooja

2 D Astha

1 D Astha

Sample Event LogTable 5:

Case Identifier

Activity Identifier

Actor

1 A Nidhi

1 B Priyanka

1 C Nidhi

1 D Astha

2 A Nidhi

2 C Kunal

2 B Pooja

2 D Astha

3 A Pooja

3 D Kunal

3 B Priyanka

Organized Event LogTable 6:

Zoom Shape 1

Page 38: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

38/109

Page 39: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

39/109

Page 40: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

40/109

Sub – Contract Algorithm

Vishleshan

Research Motivation and Aim

Process Mining

Case Identifier

Activity Identifier

Actor

1 A Nidhi

2 A Nidhi

2 C Kunal

1 B Priyanka

3 A Pooja

1 C Nidhi

3 D Kunal

3 B Priyanka

2 B Pooja

2 D Astha

1 D Astha

Sample Event LogTable 5:

Case Identifier

Activity Identifier

Actor

1 A Nidhi

1 B Priyanka

1 C Nidhi

1 D Astha

2 A Nidhi

2 C Kunal

2 B Pooja

2 D Astha

3 A Pooja

3 D Kunal

3 B Priyanka

Organized Event LogTable 6:

Page 41: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

41/109

Sub - Contract Algorithm

normal = 4.0

Vishleshan

Research Motivation and Aim

Process Mining

Nidhi Kunal Priyanka Pooja Astha

Nidhi 0.00 0.00 1.00 0.00 0.00

Kunal 0.00 0.00 0.00 0.00 0.00

Priyanka 0.00 0.00 0.00 0.00 0.00

Pooja 0.00 0.00 0.00 0.00 0.00

Astha 0.00 0.00 0.00 0.00 0.00

Sub – Contraction Valuesbefore Normalization

Table 7:

Nidhi Kunal Priyanka Pooja Astha

Nidhi 0.00 0.00 0.25 0.00 0.00

Kunal 0.00 0.00 0.00 0.00 0.00

Priyanka 0.00 0.00 0.00 0.00 0.00

Pooja 0.00 0.00 0.00 0.00 0.00

Astha 0.00 0.00 0.00 0.00 0.00

Sub – Contraction Valuesafter Normalization

Table 8:

Zoom Shape 1

Page 42: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

42/109

Page 43: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

43/109

Page 44: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

44/109

Sub - Contract Algorithm

normal = 4.0

Vishleshan

Research Motivation and Aim

Process Mining

Nidhi Kunal Priyanka Pooja Astha

Nidhi 0.00 0.00 1.00 0.00 0.00

Kunal 0.00 0.00 0.00 0.00 0.00

Priyanka 0.00 0.00 0.00 0.00 0.00

Pooja 0.00 0.00 0.00 0.00 0.00

Astha 0.00 0.00 0.00 0.00 0.00

Sub – Contraction Valuesbefore Normalization

Table 7:

Nidhi Kunal Priyanka Pooja Astha

Nidhi 0.00 0.00 0.25 0.00 0.00

Kunal 0.00 0.00 0.00 0.00 0.00

Priyanka 0.00 0.00 0.00 0.00 0.00

Pooja 0.00 0.00 0.00 0.00 0.00

Astha 0.00 0.00 0.00 0.00 0.00

Sub – Contraction Valuesafter Normalization

Table 8:

Page 45: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

45/109

Page 46: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

46/109

Page 47: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

47/109

Sub - Contract Algorithm

normal = 4.0

Vishleshan

Research Motivation and Aim

Process Mining

Nidhi Kunal Priyanka Pooja Astha

Nidhi 0.00 0.00 1.00 0.00 0.00

Kunal 0.00 0.00 0.00 0.00 0.00

Priyanka 0.00 0.00 0.00 0.00 0.00

Pooja 0.00 0.00 0.00 0.00 0.00

Astha 0.00 0.00 0.00 0.00 0.00

Sub – Contraction Valuesbefore Normalization

Table 7:

Nidhi Kunal Priyanka Pooja Astha

Nidhi 0.00 0.00 0.25 0.00 0.00

Kunal 0.00 0.00 0.00 0.00 0.00

Priyanka 0.00 0.00 0.00 0.00 0.00

Pooja 0.00 0.00 0.00 0.00 0.00

Astha 0.00 0.00 0.00 0.00 0.00

Sub – Contraction Valuesafter Normalization

Table 8:

Page 48: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

48/109

Sub – Contract Algorithm at a glance I

Vishleshan

Research Motivation and Aim

Sub - Contract Algorithm at a glance!

Page 49: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

49/109

Sub – Contract Algorithm at a glance II

Vishleshan

Research Motivation and Aim

Sub - Contract Algorithm at a glance!

Page 50: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

50/109

Research Motivation and Aim

• Query languages provide the most standard way tointeract with the database.

• We, try to implement process mining algorithm usingdatabase query languages to the extent possible so thatour application is tightly coupled to the database.

• Our work lies at the intersection of Process Mining andNoSQL databases.

Vishleshan

Research Motivation and Aim

Page 51: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

51/109

Research Aim

Vishleshan

Research Motivation and Aim

Research Aim .

• To investigate the intersection of Process Mining and Graph Database(s) fordetecting social, hierarchical structures.

• To understand application needs that can be modelled into this new domain.

• To implement Similar-Task algorithm and Sub-Contract algorithm in row-orienteddatabase, MySQL.

• To implement Similar-Task algorithm and Sub-Contract algorithm in graphoriented database, Neo4j.

• To compare performance of Similar-Task algorithm and Sub-Contract Algorithm in

MySQL and Neo4j.

Page 52: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

52/109

1. Research Motivation and Aim

2. Related Work and Novel Research Contributions

3. Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

5. Experimental Dataset

6. Performance Comparison

7. Conclusion

8. Limitations

9. References

Presentation Outline

Vishleshan

Page 53: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

53/109

Implementation of Mining Algorithms in Relational Databases

Ordonez et al. [5]• Implement k-means clustering algorithm in SQL.

• Cluster large datasets in RDBMS.

• Define suitable tables, index them and write suitable queries forclustering purposes.

Ordonez et al. [6]• Extend own work in [5].

• Efficient implementation of EM algorithm to perform clustering invery large datasets.

Vishleshan

Related Work and Novel Research Contributions

Implementation of Mining Algorithms in Relational Databases.

Page 54: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

54/109

Implementation of Mining Algorithms in Relational Databases

Berzal et al. [7]• Implemented Tree Based Association Rule Mining to discover

interesting patterns in relational databases.

Sattler et al. [8]• Applied data mining techniques on a decision tree and classifier.

• Tight coupling of data mining and database systems.

Vishleshan

Related Work and Novel Research Contributions

Implementation of Mining Algorithms in Relational Databases

Page 55: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

55/109

Implementation of Mining Algorithms in Graph Databases

Wang et al. [9]• Studied structural pattern mining for large disk based graph

databases.

• They presented a novel ADI index structure and efficient algorithmsfor mining frequent pattern.

Wang et al. [10]• Presented techniques to obtain scalable mining in graph databases.

Vishleshan

Related Work and Novel Research Contributions

Implementation of Mining Algorithms in Graph Databases

Page 56: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

56/109

Implementation of Mining Algorithms in Graph Databases

Vishleshan

Related Work and Novel Research Contributions

Implementation of Mining Algorithms in Graph Databases.

Huan et al. [11]• Presented novel technique to mine maximal frequent sub-graph in

graph databases.

Ozaki et al. [12]• Came up with hyper-clique pattern in graph databases.

• Used hyper-clique pattern to detect highly correlated sub-graphs.

Page 57: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

57/109

Performance Comparison of Mining Algorithms in Relational and Graph Databases.

Vicknair et al. [13]• Performance comparison of Relational and Graph databases for

data provenance systems.

McColl et al. [14]• Evaluated performance of series of open-source graph databases.

• Used various graph algorithms for a graph setup consisting of 256 million nodes.

Vishleshan

Related Work and Novel Research Contributions

Performance Comparison of Mining Algorithms in Relational and Graph Databases.

Page 58: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

58/109

Performance Comparison of Mining Algorithms in Relational and Graph Databases.

Ciglan et al. [15]• Benchmarked graph databases over graph traversal algorithms.

Macko et al. [16]• Presented a performance introspection framework for Graph

database, PIG.

• PIG provided tools and mechanisms to understand performance of graph database.

Vishleshan

Related Work and Novel Research Contributions

Performance Comparison of Mining Algorithms in Relational and Graph Databases.

Page 59: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

59/109

Novel Research Contributions

While there has been work done in implementing data mining algorithmsin relational and graph databases, we are,

First to implement organizational mining algorithms (Similar-Task andSub-Contract) in row oriented database MySQL using SQL.

First to implement organizational mining algorithms (Similar-Task andSub-Contract) in graph oriented database Neo4j using CYPHER.

Performance Benchmarking of organizational mining algorithms(Similar-Task and Sub-Contract) on MySQL and Neo4j.

Vishleshan

Related Work and Novel Research Contributions

Novel Research Contributions.

Page 60: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

60/109

Presentation Outline

1. Research Motivation and Aim

2. Related Work and Novel Research Contributions

3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS

4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

5. Experimental Dataset

6. Performance Comparison

7. Conclusion

8. Limitations

9. References

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Page 61: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

61/109

Steps

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar –Task Algorithm

Implementation of Similar-Task algorithm in SQL can be divided into four (4) broad tasks

Declare and iterate cursor to select distinct tasks.

Create a table to store result.

Fetch actors’ vector and calculate Cosine – Similarity.

Write results to the result table.

Page 62: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

62/109

Define and iterate cursor

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar –Task Algorithm

Declare cursor to select distinct tasks from table

Open cursor. Loop through the results returned by the cursor.

Page 63: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

63/109

Declare table to store results

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar –Task Algorithm

Dynamically create table with the specified table-name.

Prepare SQL statements from the query and execute it.

Page 64: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

64/109

Fetch actors’ vector and calculate Cosine-Similarity I.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar –Task Algorithm

Prepare query to insert into table

Define variables to store values for cosine-similarity calculation.

Page 65: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

65/109

Fetch actors’ vector and calculate Cosine-Similarity II.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar –Task Algorithm

Inside the cursor, collect distinct tasks from the tables for the required calculation.

Page 66: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

66/109

Fetch actors’ vector and calculate Cosine-Similarity III.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar –Task Algorithm

Append parts of cosine similarity calculation to the SQL query.

Page 67: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

67/109

Update Final Results I.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar –Task Algorithm

Declare a cursor to get all distinct teams.

Iterate through the cursor to get distinct teams

Page 68: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

68/109

Update Final Results II.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar –Task Algorithm

Form a query by for creating table and taking distinct teams as columns.

Inside the cursor loop, append distinct teams as columns of the table.

Page 69: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

69/109

Update Final Results III.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Similar –Task Algorithm

Form a query for inserting values into the table (resultant table)

Inside the cursor loop, assign similarity values at the respective column (match teams).

Page 70: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

70/109

Steps

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Sub - Contract Algorithm.

Sub-Contract Algorithm implementation can be studied under four (4) broad categories:

Create table to store results.

Find distinct case identifiers.

Update normal and find sub-contraction within each case.

Normalize the result.

Page 71: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

71/109

Create table to store results I

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Sub - Contract Algorithm.

Declare cursor to select distinct actors.

Iterate through the cursor to collect the distinct actors.

Page 72: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

72/109

Create table to store results II

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Sub - Contract Algorithm.

Form a query to create a table.

Inside the cursor, append each distinct actor as part of the query.

Page 73: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

73/109

Find distinct case identifiers

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Sub - Contract Algorithm.

Declare cursor to select distinct case identifiers with count >= 3

Iterate through the cursor. For each distinct case identifier, call procedure ExecuteCase.

Page 74: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

74/109

Update normal and find sub-contraction I.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Sub - Contract Algorithm.

Update normal.

Page 75: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

75/109

Update normal and find sub-contraction II.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Sub - Contract Algorithm.

Declare a cursor to find sub-contracting actors.

Page 76: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

76/109

Update normal and find sub-contraction III.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Sub - Contract Algorithm.

Iterate through the cursor to find IDs of actor

Page 77: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

77/109

Update normal and find sub-contraction IV.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Sub - Contract Algorithm.

Declare cursor to find sub-contracting actors.

Iterate through the cursor to find IDs of sub-contracting actors.

Page 78: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

78/109

Update normal and find sub-contraction V.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Sub - Contract Algorithm.

For any pair of sub-contracting actor, insert or update sub-contract value between them.

Page 79: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

79/109

Normalize the result.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in SQL, RDBMS

Sub - Contract Algorithm.

Declare cursor to select distinct actors that formed columns of the result table

For each column, form an update query and normalize it by normal

Page 80: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

80/109

Presentation Outline

1. Research Motivation and Aim

2. Related Work and Novel Research Contributions

3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS

4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

5. Experimental Dataset

6. Performance Comparison

7. Conclusion

8. Limitations

9. References

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

Page 81: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

81/109

Steps

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.

Similar – Task Algorithm.

Implementation of Similar – Task algorithm in CYPHER consists mainly of two (2) broad functions.

Load data with Actor and activity nodes being unique.

Calculate Cosine-Similarity between actors.

Page 82: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

82/109

Load actor and activity node uniquely.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.

Similar – Task Algorithm.

Load data directly from the data file. Make unique nodes for actor and activity.

Page 83: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

83/109

Calculate Cosine - Similarity.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.

Similar – Task Algorithm.

Match common activities between actors and calculate similarity.

Page 84: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

84/109

Steps

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.

Sub – Contract Algorithm.

Implementation of Sub – Contract algorithm in CYPHER consists mainly of four (4) broad functions.

Identify sub – contracting actors within each case.

Collect unique names and make new nodes for each of them.

Set sub – contraction strength between unique actor nodes.

Calculate normal and normalize the sub – contraction value.

Page 85: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

85/109

Identify sub – contracting actors.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.

Sub – Contract Algorithm.

Identify sub-contracting actors and connect then via [:RELATED_TO] relationship.

Page 86: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

86/109

Collect unique names and create unique actor nodes.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.

Sub – Contract Algorithm.

Collect unique actor names

Make new nodes, UNIQUEACTOR for each distinct actor names found.

Page 87: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

87/109

Set sub – contraction strength between unique actors.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.

Sub – Contract Algorithm.

For all sub-contracting actor, determine strength of sub-contraction between the actors.

Page 88: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

88/109

Calculate normal and normalize the result.

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented.

Sub – Contract Algorithm.

Calculate normal.

Normalize the sub-contraction strength between actors.

Page 89: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

89/109

Presentation Outline

1. Research Motivation and Aim

2. Related Work and Novel Research Contributions

3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS

4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

5. Experimental Dataset

6. Performance Comparison

7. Conclusion

8. Limitations

9. References

Vishleshan

Experimental Dataset

Page 90: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

90/109

Experimental Dataset.

• We use Business Process Intelligence 2014 (BPI 2014)dataset to conduct our experiments.

• The log contains events from an incident and problemmanagement system of Rabobank Group ICT.

• Contains data about managing requests from RabobankGroup ICT.

• Contains total 466737 records.

Vishleshan

Experimental Dataset

Page 91: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

91/109

Dataset Details

Vishleshan

Experimental Dataset

Fig. 11: Sample Event Log from MySQL.

Page 92: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

92/109

Presentation Outline

1. Research Motivation and Aim

2. Related Work and Novel Research Contributions

3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS

4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

5. Experimental Dataset

6. Performance Comparison

7. Conclusion

8. Limitations

9. References

Vishleshan

Performance Comparison

Page 93: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

93/109

Load Time

Vishleshan

Performance Comparison

Similar – Task Algorithm

Dataset size Load Time (msec)

MySQL Neo4j

65,000 2467 3413

1,01,000 2875 3362

2,19,500 5966 4354

3,00,000 5850 5877

4,66,737 7819 6875

Table 9: Data Load TimeFig 12: Load Time

Page 94: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

94/109

Execution Time I

Vishleshan

Performance Comparison

Similar – Task Algorithm

Table 10: Execution Time of Step-8 & Step-9

Dataset Size

Execution Time (msec)

Step -8 Step -9

MySQL Neo4j MySQL Neo4j

65,000 225 9616 2467 2403

1,01,000 372 11700 2875 2925

2,19,500 713 14655 5966 3664

3,00,000 903 29520 5850 7380

4,66,737 1403 48891 7819 12223

Page 95: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

95/109

Execution Time II

Vishleshan

Performance Comparison

Similar – Task Algorithm

Fig. 13: Execution Time of Step-8 & Step-9

Page 96: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

96/109

Disk Usage in MySQL I

Vishleshan

Performance Comparison

Similar –Task Algorithm

Table 11: Disk Space Usage in MySQL.

Tables Dataset Size

65000 101000 219500 300000 466737

Dataset 3686400 5783552 11026432 15220736 21544960

OTMatrix 65536 65536 65536 81920 81920

InitSim 1589248 1589248 1589248 3686400 3686400

FinalSim 229376 262144 278528 491520 1589248

Page 97: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

97/109

Disk Usage in MySQL II

Vishleshan

Performance Comparison

Similar –Task Algorithm

Fig 14: Disk Space Usage in MySQL.

Page 98: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

98/109

Disk Usage in Neo4j I

Vishleshan

Performance Comparison

Similar –Task Algorithm

Table 12: Disk Space Usage in Neo4j.

Graph Elements

Dataset Size

65000 101000 219500 300000 466737

Nodes 2820 2910 3075 3990 4215

Relationships 770040 414315 479663 8568809 983227

Properties 1033856 563873 651203 1155011 1323439

Page 99: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

99/109

Disk Usage in Neo4j II

Vishleshan

Performance Comparison

Similar –Task Algorithm

Fig. 14: Disk Space Usage in Neo4j.

Page 100: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

100/109

Load Time

Vishleshan

Performance Comparison

Sub – Contract Algorithm

Dataset size Load Time (msec)

MySQL Neo4j

65,000 6575 9567

1,01,000 8390 10476

2,19,500 14279 14873

3,00,000 26437 25435

4,66,737 43712 38234

Table 13: Load TimeFig 15: Load Time

Page 101: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

101/109

Execution Time in MySQL I

Vishleshan

Performance Comparison

Sub – Contract Algorithm

Table 14: Execution Time for 4 main steps in MySQL.

Dataset Size Execution Time (msec)

Update Normal

Sub-ContractDetection

Update Result

Normalize result

65,000 32 11712 8296 16

1,01,000 32 11782 8138 16

2,19,500 35 11713 7940 17

3,00,000 70 11,736 8094 17

4,66,737 73 11747 7754 20

Page 102: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

102/109

Execution Time in MySQL II

Vishleshan

Performance Comparison

Sub – Contract Algorithm

Fig 16: Execution Time for 4 main steps in MySQL.

Page 103: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

103/109

Execution Time in Neo4j I

Vishleshan

Performance Comparison

Sub – Contract Algorithm

Table 15: Execution Time for 4 main steps in Neo4j

Dataset Size Execution Time (msec)

Update Normal

Sub-ContractDetection

Update Result

Normalize result

65,000 118 1542 2077 5

1,01,000 140 1707 2773 5

2,19,500 202 2534 2369 6

3,00,000 336 3442 5261 9

4,66,737 560 4149 5334 9

Page 104: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

104/109

Execution Time in Neo4j II

Vishleshan

Performance Comparison

Sub – Contract Algorithm

Fig. 17: Execution Time for 4 main steps in Neo4j.

Page 105: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

105/109

Disk Space Usage in MySQL I

Vishleshan

Performance Comparison

Sub – Contract Algorithm

Tables Dataset Size

65000 101000 219500 300000 466737

Dataset 4734976 6832128 13123584 18366464 27836416

OrganisedData 4734976 6832128 13123584 18366464 27836416

ResultMatrix 1589248 1589248 1589248 1589248 1589248

Table 15: Disk Space Usage in MySQL

Page 106: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

106/109

Disk Space Usage in MySQL II

Vishleshan

Performance Comparison

Sub – Contract Algorithm

Fig 17: Disk Space Usage in MySQL

Page 107: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

107/109

Disk Space Usage in Neo4j I

Vishleshan

Performance Comparison

Sub – Contract Algorithm

GraphElements

Dataset Size

65000 101000 219500 300000 466737

Nodes 982212 1523732 3360798 4598454 7190330

Relationships 153477921 183955761 285778449 375437997 490033038

Properties 384189475 461537287 719874720 942665404 1238579332

Table 16: Disk Space Usage for graph elements in Neo4j.

Page 108: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

108/109

Disk Space Usage in Neo4j II

Vishleshan

Performance Comparison

Sub – Contract Algorithm

Fig. 18: Disk Space Usage for graph elements in Neo4j.

Page 109: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

109/109

Presentation Outline

1. Research Motivation and Aim

2. Related Work and Novel Research Contributions

3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS

4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

5. Experimental Dataset

6. Performance Comparison

7. Conclusion

8. Limitations

9. References

Vishleshan

Conclusion

Page 110: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

110/109

Conclusion

• Neo4j performs better when it comes to loading data.

• Read operations in MySQL are comparatively faster for asingle node setup.

• Neo4j gives much improved performance wheneverrelationships are of prime importance.

• Writes performance varied greatly for both cases. Forsmaller dataset, MySQL performs better whereas for largerdataset, Neo4j gives improved performance.

Vishleshan

Conclusion

.

Page 111: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

111/109

Presentation Outline

1. Research Motivation and Aim

2. Related Work and Novel Research Contributions

3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS

4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

5. Experimental Dataset

6. Performance Comparison

7. Conclusion

8. Limitations and Future work

9. References

Vishleshan

Limitations

Page 112: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

112/109

Limitations

• Different sizes of single dataset was used.

• Single node setup of databases were used.

• Metrics used for organizational mining were only two innumber.

Vishleshan

Limitations and Future Work

. Limitations

Page 113: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

113/109

Future Work

• To apply the algorithm over larger data sets.

• Create a multi-node Neo4j setup and implement thealgorithms on it.

• Implement and study impact of process enhancement andrecommendation systems.

• Experiment with more relational and graph orienteddatabases.

Vishleshan

Limitations and Future Work.

Future Work

Page 114: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

114/109

Presentation Outline

1. Research Motivation and Aim

2. Related Work and Novel Research Contributions

3. Implementation of Similar-Task and Sub-Contract Algorithms in SQL, RDBMS

4. Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

5. Experimental Dataset

6. Performance Comparison

7. Conclusion

8. Limitations

9. References

Vishleshan

Implementation of Similar-Task and Sub-Contract Algorithm in CYPHER, Graph Oriented

Page 115: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

115/109

References I

Vishleshan

References

WIL VAN DER AALST.Process Mining: Overview and Opportunities.ACM, 2012. vi, 2, 11

P Neubauer. Graph databases, NOSQL and Neo4j? www.infoq.com.

I Robinson, J Webber, E Eifrem. Graph Databaseswww.books.google.com.

Minseok Song, WIL M. P. Van Der Aalst. Towards comprehensive support for organizational mining.Elsevier, 2008.

Page 116: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

116/109

References II

Vishleshan

References

Carlos Ordonez.Programming the K-means clustering algorithm in SQL

C. Ordonez and P. Cereghini.SQLEM: fast clustering in SQL using the EM algorithm.International Conference on Management of Data

Nicolas Marin Jose Maria Serrano Fernando Berzal, Juan Carlos Cubero.TBRAR: An ecient method for association rule mining in relational databases.Elsevier, 2001.

K-U.Sattler and O.Dunemann.SQL Database Primitives for Decision Tree Classiers.Conference on Information and Knowledge Management, 2001.

Page 117: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

117/109

References III

Vishleshan

References

W Wang, C Wang, Y Zhu, B Shi, J Pei, X Yan.Graphminer: a structural pattern mining system for large disk based graph databases and its applications.ACM, 2005.

C Wang, W Wang, Y Zhu, B Shi, J Pei.Scalable Mining of large disk based graph databases.ACM, 2004.

J Huan, W Wang, J Prins.SPIN: mining maximal frequent subgraphs from graph databases.ACM, 2004.

T Ozaki, T Okhwaha.Mining correlated subgraphs in graph databases.Advancement in Knowledge Discovery and Data Mining, 2008.

Page 118: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

118/109

References IV

Vishleshan

References

C Vicknair, M Macais, Z Zhao, X Nan, Y Chen.A comparison of graph databases and a relational database: a data provenance perspectiveACM, 2010.

RC McColl, R Ediger, J Poovey, D Campbell.A performance evaluation of open-source graph databases.ACM, 2014.

M Ciglan, A Averbuch, L HluchyBenchmarking graph traversal operations over graph databases.IEEE, 2012.

P Macko, D Margo, M Seltzer.Performance introspection of graph databasesACM, 2013.

Page 119: VISHLESHAN: Performance Comparison and … VISHLESHAN: Performance Comparison and Programming of Process Mining Algorithms in Graph-Oriented and Relational Database Query Languages

119/109

References V

Vishleshan

References

Why NOSQL?Couchbase.

Scale-out vs. Scale-up.www.natishalom.typepad.com.

Introduction to Graph Databases and Neo4j.www.neo4j.com

Cosine- Similaritywww.Wikipedia.com

From Relational to Neo4j.www.neo4j.com


Recommended