+ All Categories
Home > Documents > XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos,...

XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos,...

Date post: 18-Jan-2021
Category:
Upload: others
View: 4 times
Download: 0 times
Share this document with a friend
35
XESLite Handling Event Logs in ProM Felix Mannhardt ([email protected]) @fmannhardt
Transcript
Page 1: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite

Handling Event Logs in ProM

Felix Mannhardt ([email protected])

@fmannhardt

Page 2: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

Motivation – How do event logs look like?

PAGE 1

multi set table

Page 3: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

Motivation – How are event logs used?

PAGE 2

• Most process discovery techniques

• Most conformance checking techniques

• …

• Data-aware process discovery

• Data-aware conformance checking

• Most enhancement techniques

• …

Of course, the world is not black & white!

Page 4: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

Motivation – Using ProM on a standard computer

PAGE 3

~ 4-8 GB of working memory

Page 5: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

www.xes-standard.org

10.1109/IEEESTD.2016.7740858

Source: 1849-2016 - IEEE Standard for eXtensible Event Stream (XES) for Achieving Interoperability in Event Logs and Event Streams, © IEEE

IEEE

XES – The event log standard

Page 6: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

OpenXES – An (outdated) reference implementation

PAGE 5

Page 7: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

OpenXES – Memory Layout

PAGE 6

XEvent

XID HashMap

UUID Node[m]

Entry

Key XAttribute

Value

Page 8: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

OpenXES – Memory Layout

PAGE 7

XEvent

XID HashMap

UUID32 bytes

Node[m]

Entry

Keyk bytes

XAttribute32 + v bytes

Valuev bytes

Page 9: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

OpenXES – Memory Layout

PAGE 8

XEvent

XID HashMap

UUID32 bytes

Node[m]16 + 4m + (64+k+v)m bytes

Entry 32 + k + 32 + v bytes

Keyk bytes

XAttribute32 + v bytes

Valuev bytes

Page 10: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

OpenXES – Memory Layout

PAGE 9

XEvent

XID16 + 32 bytes

HashMap48 + 16 + (68+k+v)m bytes

UUID32 bytes

Node[m]16 + 4m + (64+k+v)m bytes

Entry 32 + k + 32 + v bytes

Keyk bytes

XAttribute32 + v bytes

Valuev bytes

Page 11: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

OpenXES – Memory Layout

PAGE 10

XEvent24 + 48 + 64 + (68+k+v)m bytes

XID16 + 32 bytes

HashMap48 + 16 + (68+k+v)m bytes

UUID32 bytes

Node[m]16 + 4m + (64+k+v)m bytes

Entry 32 + k + 32 + v bytes

Keyk bytes

XAttribute32 + v bytes

Valuev bytes

Page 12: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

OpenXES – Memory Usage vs ‘Minimal’ Scenario

PAGE 11

OpenXES Minimal

0.1 1.0 10.0 100.0 0.1 1.0 10.0 100.0

0.01

0.10

1.00

4.00

10.00

100.00

1,000.00

Number of events in millions (n)

Mem

ory

usage (

GB

)

Attribute size (bytes) 8 48 Attributes (m) 3 25 50

Minimal scenario: n x m table of attributes (m) and events (n), no compression, no overhead

Page 13: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – Several attempts to solve the issue

PAGE 12

Definition of XESLite

(1) having too much fun in programming

(2) being fed up with OOM exceptions

(3) disbelieving that 17 MB zipped XES

requires GBs of memory

24.02.2014 16:59 – fmannhardt.de

Page 14: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite –Three methods & Assumptions

PAGE 13

Automaton (XL-AT)

In-Memory (XL-IM)

Database(XL-DB)

• no external software / hardware

• ~ 4-8 GB memory

• compatibility

Page 15: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – General ideas – Flyweight literals

PAGE 14

64 bytes – java.lang.String – concept:name

64 bytes – java.lang.String – concept:name

64 bytes – java.lang.String – concept:name

64 bytes – java.lang.String – concept:name

64 bytes – java.lang.String – concept:name

64 bytes – java.lang.String – concept:name

…..

Page 16: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – General ideas – Flyweight literals

PAGE 15

Google Guava (github.com/google/guava)

Interner<String> interner = Interners.newStrongInterner();

XAttribute createAttribute(String key, …) {

String key = interner.intern(key);

}

Disclaimer:

• Considerable overhead when many unique literals!

• No garbage collection when deleting literals!

Page 17: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – General ideas – Sequential IDs

PAGE 16

XEvent24 + 48 + 64 + (68+k+v)m bytes

XID16 + 32 bytes

HashMap

UUID32 bytes

Node[m]

Entry

Key XAttribute

Value

Page 18: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – General ideas – Sequential IDs

PAGE 17

XEvent24 + 8 + 64 + (68+k+v)m bytes

long8 bytes

HashMap48 + 16 + (68+k+v)m bytes

40 bytes saved per event

Auch Kleinvieh macht Mist!

Disclaimer:

• No distributed events!

• Don’t assume the XID returns a real UUID

Page 19: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – General Ideas – Compressed Traces

PAGE 18

What is a trace?

Idea: Delta compression!

ok, quite idealistic situation

LZ4 compression

(400 MB/s compression & several GB/s decompression)

Disclaimer:

• Random-access methods Slow

• Use iterator / foreach instead of get(i)!

Page 20: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – Automaton (XL-AT)

PAGE 19

multi set table

Page 21: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – Automaton (XL-AT)

PAGE 20

finite set

of sequencesmultiplicity

encode

similar problem

Page 22: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – Automaton (XL-AT)

PAGE 21

external informationfinite set of words

research on from the 1990

minimal

deterministic acyclic

finite automaton

minimal perfect

hashing

Page 23: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – Automaton (XL-AT) – Example

PAGE 22

(1) build minimal DAFA

Automata minimization is a well-researched problem• Minimization of any DFA: O(n log(n)) with n states (Hopcroft 1974)

• Minimization for acyclic DFA can be done in linear time (Revuz 1992, Daciuk 2000)

Page 24: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – Automaton (XL-AT) – Example

PAGE 23

(2) build minimal perfect hashing scheme

Assign unique consecutive numbers

1..n to words accepted by the DAFA.

1

2

3

4

Page 25: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – Automaton (XL-AT) – Example

PAGE 24

(2) build minimal perfect hashing scheme

1

2

3

4

• Use lexicographical ordering

• Assign number based on predecessors

• Encode this scheme efficiently in the DAFA

Page 26: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – Automaton (XL-AT) – Example

PAGE 25

(2) build minimal perfect hashing scheme

1

2

3

4

• Remember the number of words accepted from states

• Compute number for word w

• Add the numbers of all those states for which

a transition t leads from the path to the state and

the letter of transition t precedes the next letter.

• Add the number of final states passed.

3 (3)

Page 27: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – Automaton (XL-AT) – Example

PAGE 26

lookup tableDAFA

Luchesi 1992: Applications of Finite Automata Representing Large Vocabularies

Daciuk 2005: Dynamic Perfect Hashing with Finite-State Automata

3 (3)

Page 28: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – In-Memory (XL-IM)

Tabular view instead of the object graph of OpenXES

PAGE 27

Page 29: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – In-Memory (XL-IM)

Events consists only of identifiers

PAGE 28

XEvent12 + 8 + 4 bytes

long (ID)8 bytes

Object (Storage)4 bytes

XEvent24 + 48 + 64 + (68+k+v)m bytes

XID16 + 32 bytes

HashMap48 + 16 + (68+k+v)m bytes

UUID32 bytes

Node[m]16 + 4m + (64+k+v)m bytes

Entry 32 + k + 32 + v bytes

Keyk bytes

XAttribute32 + v bytes

Valuev bytes

with trace compression

?? bytes

Page 30: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – In-Memory (XL-IM)

PAGE 29Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

+ Compression / packing of similar values

+ Many other optimization possible

Page 31: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – In-Memory (XL-IM)

• Column-store like custom in-memory data structure in Java

• No communication overhead with external tools

• Assumptions

• Fixed-width values for fast access (lookup table for literals – flyweights for free)

• Consistent attribute types (i.e., columns types are enforced)

• Dynamic memory allocation in (compressed) blocks

PAGE 30Block storing 2 integer values Block storing 8 boolean values

Disclaimer:

• No real deletion only mark as delete!

• Meta-attributes supported but inefficient!

• Spawns a compressor thread!

Page 32: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

XESLite – (Embedded) Database (XL-DB)

PAGE 31

As XL-IM, a tabular view instead of the object graph of OpenXES

MapDBstored as key/value pairs

• On-disk storage (mmaped-file)

• Uses operating system paging

• Caching mechanism for

common attributes: • concept:name,

• time:timestamp,

• lifecycle:transition

• Supports all OpenXES

functionality!

Disclaimer:

• No real deletion only mark as delete!

• Spawns a multiple threads!

• MMAP files in temp folder might not be deleted!

Page 33: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

Benchmark - Memory

PAGE 32

Road Fines

No difference XL-DB vs XL-IM BPI 2011 vs

Hospital Billing

Page 34: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

Benchmark - Time

PAGE 33

Garbage Coll.

No difference? BTree!

Random-access

implementation detail

Page 35: XESLite - Eindhoven University of Technology XESLite –In-Memory (XL-IM) Stavros Harizopoulos, Daniel Abadi, Peter Boncz (2009): VLDB 2009 Tutorial Column-Oriented Database Systems

Conclusion

PAGE 34

• Discussion on requirements

• Multi set vs Table

• Storage requirements

• Three general ideas

• Flyweights

• Sequential IDs

• Compressed Traces

• Three XESLite implementations

• Automaton (XL-AT)

• In-Memory (XL-IM)

• Database (XL-DB)

• Details in technical report:

• BPM Center Report BPM-16-02


Recommended