Date post: | 19-Jan-2016 |
Category: |
Documents |
Upload: | amanda-richard |
View: | 221 times |
Download: | 0 times |
I8K|DQ-BigData: I8K Architecture Extension for Data Quality in Big Data
Bibiano Rivas, Jorge Merino, Manuel Serrano, Ismael Caballero, Mario Piattini
Instituto de Tecnologías y Sistemas de InformaciónUniversidad de Castilla-La Mancha
2
Master Data
<data> <id>45838589</id> <name>Vladimir</name> <surname>Putin</surname> <email>[email protected]</email> <coolnesslvl>9001</coolnesslvl><data>
Exchange
I8K
|DQ
-Big
Data
: I8K
Arch
itectu
re E
xte
nsio
n fo
r D
ata
Qu
ality
in B
ig D
ata
AssetData
3
Master Data
<data> <id>45838589</id> <name>Vladimir</name> <surname>Putin</surname> <email>[email protected]</email> <coolnesslvl>9001</coolnesslvl> <DQlvl>75</DQlvl><data>
Exchange
I8K
|DQ
-Big
Data
: I8K
Arch
itectu
re E
xte
nsio
n fo
r D
ata
Qu
ality
in B
ig D
ata
<data> <id>34953858</id> <name>Stefan</name> <surname>Löfven</surname> <email>[email protected]</email> <coolnesslvl>8000</coolnesslvl> <DQlvl>100</DQlvl><data>
4
I8KI8
K|D
Q-B
igD
ata
: I8K
Arch
itectu
re E
xte
nsio
n fo
r D
ata
Qu
ality
in B
ig D
ata
5
ISO/TS 8000
• Describes specific aspects of Master DataISO/TS 8000-100
• Describes the vocabularyISO/TS 8000-102
• Establishes the way to translate the Master Data Messages ISO/TS 8000-110
• Information about the Master Data life-cycleISO/TS 8000-120
• Adds information about the Quality of Master Data in terms of AccuracyISO/TS 8000-130
• Adds information about the Quality of Master Data in terms of CompletenessISO/TS 8000-140
I8K
|DQ
-Big
Data
: I8K
Arch
itectu
re E
xte
nsio
n fo
r D
ata
Qu
ality
in B
ig D
ata
6
I8K Service Architecture
I8K Manager
certification translating
assessment
Data Base
I8K.Cer110
I8K.Cer140
I8K.Cer130 I8K.Ev130
I8K.Ev140
I8K.110
I8K.Mapper
Data Dictionary
DB_Mapping
I8K
|DQ
-Big
Data
: I8K
Arch
itectu
re E
xte
nsio
n fo
r D
ata
Qu
ality
in B
ig D
ata
7
I8K – ProtocolI8
K|D
Q-B
igD
ata
: I8K
Arch
itectu
re E
xte
nsio
n fo
r D
ata
Qu
ality
in B
ig D
ata
Data ProviderData Requester
A B
Master Data Messages
8
ProblemI8
K|D
Q-B
igD
ata
: I8K
Arch
itectu
re E
xte
nsio
n fo
r D
ata
Qu
ality
in B
ig D
ata
I8K
9
SolutionI8
K|D
Q-B
igD
ata
: I8K
Arch
itectu
re E
xte
nsio
n fo
r D
ata
Qu
ality
in B
ig D
ata
Big Data
10
Proposal Architecture Extension
I8KManager
I8K.Cer140
I8K.Cer130
I8K.Ev140-BiDa
I8K.Ev130-BiDa
Big Data
Regular Data
I8K.Ev130
I8K.Ev140
I8K
|DQ
-Big
Data
: I8K
Arch
itectu
re E
xte
nsio
n fo
r D
ata
Qu
ality
in B
ig D
ata
Enterprise Service Bus
I8K.Cer140-BiDa
I8K.Cer130-BiDa
11
Type Description
I8K.CR-BiDa An application needs to encrypt a Master Data Message and add information about the level of Data Quality for Big Data
I8K.CR130-BiDa An application request the encryption and the addition of information about the level of Accuracy for Big Data
I8K.CR140-BiDa An application request the encryption and the addition of information about the level of Completeness for Big Data
I8K
|DQ
-Big
Data
: I8K
Arch
itectu
re E
xte
nsio
n fo
r D
ata
Qu
ality
in B
ig D
ata
Proposal New Messages
12
I8K
|DQ
-Big
Data
: I8K
Arch
itectu
re E
xte
nsio
n fo
r D
ata
Qu
ality
in B
ig D
ata
Proposal Protocol
Data ProviderData Requester
6’. I8K.CR-BiDa/I8K.CR130-BiDa/I8K.CR140-BiDa
A B
13
hd-slave2(Data Node)
hd-slave1(Data Node)
hd-Master (Name Node)
I8K Manager24GB RAM
8GB RAM 8GB RAM
I8K
|DQ
-Big
Data
: I8K
Arch
itectu
re E
xte
nsio
n fo
r D
ata
Qu
ality
in B
ig D
ata
Proposal Infrastructure
14
Big Data Evaluators 130 and 140
def mapper140(dq_rules): for line in sys.stdin: data = line.strip().split(";") isIndq_rules = True length = range(len(data)) aux = "" for i in length: if (str(i) in dq_rules): if(isEmpty(data[i])==True): isIndq_rules = False else: aux+=data[i]+";" print(‘{0};{1}'.format(isIndq_rules, aux))
I8K
|DQ
-Big
Data
: I8K
Arch
itectu
re E
xte
nsio
n fo
r D
ata
Qu
ality
in B
ig D
ata
Proposal Implementation
15
The ability to assess a huge volume of data. Adaptation to the Preture.
The Improvement in the performance of the assessment
I8K
|DQ
-Big
Data
: I8K
Arch
itectu
re E
xte
nsio
n fo
r D
ata
Qu
ality
in B
ig D
ata
Proposal Advantages
16
It is necessary to have appropriate levels of quality when exchanging data
Using Big Data technologies to tackle the efficiency issues of the I8K architecture has improved the performance
Use Standards as foundations for our work will ease to cope new challenges
ConclusionsI8
K|D
Q-B
igD
ata
: I8K
Arch
itectu
re E
xte
nsio
n fo
r D
ata
Qu
ality
in B
ig D
ata
17
Include the assessment of new Data Quality Dimensions
Include the real-time assessment
Conduct a set of study cases to measure the improvement
Future WorkI8
K|D
Q-B
igD
ata
: I8K
Arch
itectu
re E
xte
nsio
n fo
r D
ata
Qu
ality
in B
ig D
ata
I8K|DQ-BigData: I8K Architecture Extension for Data
Quality in Big Data
Bibiano Rivas, Jorge Merino, Manuel Serrano, Ismael Caballero, Mario Piattini
Instituto de Tecnologías y Sistemas de InformaciónUniversidad de Castilla-La Mancha