Big Data ODSSetting up of a prototype
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
1
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
2
Performance und ScalabilityTopics
1. Why Big Data?2. General Overview3. HighQSoft Approach4. Summary
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
Physical Storage:Meta-Data
What is the ODS 6.0 Proposal?Overview
3
ODS Server
ODS Client
ODS API Definition CORBA Technology
Physical Storage:Mass-Data
Objectives of Proposal:Re-Work of ODS API Definition (streamlining in focus; may include enhancement of e.g. web-services)Replacement of CORBA Technology
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
Big Data Zoo
What is Big Data Integration?Overview
4
ODS Server
ODS Client
ODS API 6.0 Definition ODS 6.0 Technology
Objectives of Proposal: Integration of Big DataEnhancement of ASAM Base ModelEnhancement of ODS Server FunctionalityDefining an Interface to Big Data
The proposal is independent to the current ODS 6.0 proposal. It also covers other areas.
Big Data Enhancements
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
Why Big Data?Overview
5
Data produce gets large scale in volume. Currently, some MDF4.1 measurement files already can’t efficiently be integrated into ODS.
In terms of ODS, the files can inherit a great number of external components (millions per file)The files are too large to move them around (server to client)
There are limits within Oracle:1*10^50 entries is a limitWith 3*10^8 entries, a “select * from table where id = xy” takes 30 seconds (no indexing)The latency grows linear
We want to do 1000 vehicles with 100 measurements with 10^3 to 10^5 channels a day (2*10^10 / year).
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
6
Performance und ScalabilityTopics
1. Why Big Data?2. General Overview3. HighQSoft Approach4. Summary
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
Prototype DevelopmentThe ODS Setup
7
SPARK
Database ODS Server
YARN
TASK CPU
TASK CPU
TASK CPU
TASK CPU
HDFSUSA
HBASEUSA
SOLR / …USA
The general ODS setup remains pretty much alike:ODS Server remains as an organizing entitya database (not Oracle) will still be required
EnvironmentSecurity“Big Data Configuration”(Catalogs)
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
Prototype DevelopmentObjective 1: Defining a BIG ODS interface
8
SPARK
Database ODS Server
YARN
TASK CPU
TASK CPU
TASK CPU
TASK CPU
HDFSUSA
HBASEUSA
SOLR / …USA
The integration of BIG ODS requires:Definition of ODS Request Interface (may have impact on base model), e.g.
Information: Ask for value matrixLocation: USATechnology / Physical Storage: HDFS / MDF4…
Driver implementation
Here: SPARK is used as an middle-ware / umbrella technology
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
Prototype DevelopmentUsing middle-ware technologies like SPARK
9
SPARK
Database ODS Server
YARN
TASK CPU
TASK CPU
TASK CPU
TASK CPU
HDFSUSA
HBASEUSA
SOLR / …USA
SPARK is a processing machine. It processes / distributes on a logical level and “independent” to the physical storage.
Where is the information (cluster location)?Who has the information (technology)?
How does it work?The ODS Server sends a request (“order”)A “job”, a pre-defined execution box, processes an order.“Apps” containing one or multiple jobs are executed
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
Prototype DevelopmentObjective 2: Defining Jobs as part of the ODS interface(?)
10
SPARK
Database ODS Server
YARN
TASK CPU
TASK CPU
TASK CPU
TASK CPU
HDFSUSA
HBASEUSA
SOLR / …USA
Part of the interface definition may be the definition of what a job is. A job depends on
the order (ODS; what to retrieve?)The big data technology / physical storage used (here: SPARK; how to retrieve)?
Top level technologies (like SPARK) require to be defined / supported.
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
Prototype DevelopmentProcessing tasks
11
SPARK
Database ODS Server
YARN
TASK CPU
TASK CPU
TASK CPU
TASK CPU
HDFSUSA
HBASEUSA
SOLR / …USA
YARN is a resource manager. It processes / distributes on a physical level.
CPU / MemoryWork-Load Distribution (within cluster)Supports major physical storage technologies
How does it work?SPARK Apps are executed as tasksTasks are outsourced and executed
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
Prototype DevelopmentObjective 3: Is a physical storage definition required?
12
SPARK
Database ODS Server
YARN
TASK CPU
TASK CPU
TASK CPU
TASK CPU
HDFSUSA
HBASEUSA
SOLR / …USA
Big Data Technologies are used as implemented at the customer (HDFS, HBASE, MongoDB, SOLR, …).
There is (probably) no definition of physical storageEach technology needs a new job definition based on the “order” (each job has one derivation per technology / physical storage)
How does it work?The information is retrieved from the physical storage (that is only is defined in the job)
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
Prototype DevelopmentConclusions
13
SPARK
Database ODS Server
YARN
TASK CPU
TASK CPU
TASK CPU
TASK CPU
HDFSUSA
HBASEUSA
SOLR / …USA
The big data technology zoo can be managed with assumptions on a middle-ware technologyThere will not be “a” (as in one) solutionSolutions will depend on use-cases and technologies used
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
Prototype DevelopmentConclusion: Performance and Scalability
14
SPARK
Database ODS Server
YARN
TASK CPU
TASK CPU
TASK CPU
TASK CPU
HDFSUSA
HBASEUSA
SOLR / …USA
e.g. Avalon Distributor (horizontal and vertical)Indexer (Notification Server and ODS API Security!)
This a cluster. Get more clusters.
Ramp up your cluster.
Enhance disk space.Index / Pre‐Process for On‐Demand Performance (with technologies wanted)
ElasticSearch?
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
15
Performance und ScalabilityTopics
1. Why Big Data?2. General Overview3. HighQSoft Approach4. Summary
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
16
HighQSoft ApproachGeneral Ideas
Import data from various formats (MDF, DBC, …) and from distributed sites Provide interfaces to typical data analytics and visualization toolsSupport state of the art security (today ODS has own role models etc. plus LDAP)Migration ODS 5.3 to a big data solution (for the client that shall be transparent)
Understand your project (analyze measure data) status quoUnderstand your organization (from measure events to ODS “project” data) big data challengeSizes: #channels ~ 105, #measurements/day ~ 105
Aliases for channel names etc. (v_fz vfz -> v_veh velocity)
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
17
HighQSoft ApproachSolution Architecture
Avalon ODS Server
ATHOS
Driver
CORBA / ODS 6.0
Big Da
ta Cluster / Partne
r
Web Service
SPARK Future DB
MQ | SM | .
…
V | T | L | …
… ?
HighQSoft
ODS Server remains as ODS gateway / security
Driver is independent to partner / project and connects to big dataUse-Case 1: writeUse-Case 2: readUse-Case 2: stream (read)
Cluster is set up by the partner / customer due to the use-case
HighQSoft requirements are to be implemented. Web Service is independent to partner / project, but contains specific “jobs” (see use-case)
Big Data contains Measured Values and Quantities and Units (utilized)
The data format is to be defined (standardized?)
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
18
Big Da
ta Cluster / Partne
r
Web Service
SPARK Future DB
MQ | SM | .
…
V | T | L | …
… ?
Specific File Formats: When data is itemized files
need to be parsed for import (big data import, notification to ODS)
Require to be (partly) generated for file based third party tool
HighQSoft ApproachSolution Architecture
File XYGenerator
File XYParser / Importer
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
19
Thanks
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
20
This is the first headlineThis is the second headline
102 / 179 / 190
51 / 154 / 169
0 / 129 / 147
204 / 230 / 232
153 / 205 / 212
102 /1146 / 176
51 / 110 / 150
0 / 74 / 123
204 / 219 / 229
153 / 183 / 202
217 / 218 / 219
177 / 179 /180
135 / 136 / 138
88 / 88 / 90
HighQSoft GmbH | www.highqsoft.de | 22.12.2014
21
This is the first headlineThis is the second headline
237 / 179 / 201
218 / 102 / 148
193 / 0 / 76
244 / 208 / 182
232 / 161 / 110
217 / 98 / 13
249 / 235 / 179
243 / 216 / 102
235 / 189 / 0