+ All Categories
Home > Documents > Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh...

Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh...

Date post: 28-Mar-2015
Category:
Upload: jaden-warren
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
17
Data Format Data Format Description Language Description Language (DFDL) WG (DFDL) WG Martin Westhead EPCC, University of Edinburgh [email protected] Alan Chappell PNNL [email protected]
Transcript
Page 1: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

Data Format Description Data Format Description Language (DFDL) WGLanguage (DFDL) WG

Martin WestheadEPCC, University of [email protected]

Alan ChappellPNNL

[email protected]

Page 2: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

AgendaAgenda

• Introduction and welcome - Martin Westhead 10mins• Binary Format Description Language (BFD) - Alan Chappell 10mins • Binary XML (BinX) - Stephen Rutherford 10mins • DFDL - Martin Westhead 15mins

– Big picture– Structural Description Language– Charter

(20 mins Discussion)

• Examples repository - Alan Chappell 10mins – Bruce Barkstrom Examples at NASA

(15mins Discussion)

Page 3: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

MotivationMotivation

• There will never be a standard data format– E.g. XML – verbose, tree-based, explicit structure– Legacy formats– Application specific formats– One size will never fit all

• But could we provide a language for describing formats– Transparency of physical representation– Automatic format conversion– Unambiguous description of data

Page 4: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

There’s more…There’s more…

Explicit structure enables:• Standard transformation to/from XML

representation– Could allow application to read/write XML – But provide underlying efficient binary representation

• Data stream/file becomes database– Point to parts of the structure– Extract parts of the structure– Modify parts of the structure– Integrate parts of different structures

Page 5: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

And more…And more…

• Generic tools possible– Browsing– Conversion and transformation

• Annotation of data– E.g. identify bits that depict hurricane in an image

• Enables general semantic labels, many ontologies could be developed e.g.:– S.I. units, SQL types, Time– Community specific labels, “starClass = whiteDwarf”– Application specific labels, “nodeColour = green”

• Could lead to a standard transformation language

Page 6: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

Not fairy talesNot fairy tales

• Based on implemented work– BinX

http://www.epcc.ed.ac.uk/gridserve/WP5/Binx/– BFD part of the Scientific Annotation Middleware

project (http://www.scidac.org/SAM/)

• Generalized and extended a little

• Formal semantics

• Foundation for extensibility

Page 7: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

ApproachApproach

• Separate out structure and semantics• General structural language

– Repetition– Pointers– References to data– New structures can be built (compositionality)

• Semantics– Hard to express so…we don’t– General labeling– Label semantics define elsewhere (ontologies)– Labels can be added (extensibility)

Page 8: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

Structure – arbitrary labelsStructure – arbitrary labels

fooSet

fooPairfoo

bunchThings

thing 0

thing 1

thing 1

thing 0

thing 0

thing 1

thing 1

thing 1

bunchThings .

.

.

.

.

.bunchThings

bunchThings

foo .

.

.fooPair .

.

.fooPair

fooPair

Page 9: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

Structure – example labelsStructure – example labels

complexArray

complexfloat

byte

bit 0

bit 1

bit 1

bit 0

bit 0

bit 1

bit 1

bit 1

byte .

.

.

.

.

.byte

byte

float .

.

.complex .

.

.complex

complex

Page 10: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

Structural languageStructural language• Formal semantics

– Structured binary sequence– Defines hierarchical structure over underlying sequence of binary values

• Language for describing hierarchical structure– Repetition

• Explicit number repeats• Termination characters

– Data reference• Conditionals• Data size

– Pointers• Scope

– As general as possible but– Must be concise and implementable

• Draft language definition on web page (www.epcc.ed.ac.uk/dfdl)

Page 11: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

CSV file exampleCSV file example

char:=byte

data:=[(char - [',']).*]

field:=[data; [',']]

finalField:=[data; [‘\n’]]

row:=[field.*] :: [finalField]

table:=[row.*]

Page 12: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

Semantic labelsSemantic labels

• Many ontologies possible• Initial scope probably:

– Basic types (floating point, integer, character)– Simple structures (structs, arrays, tables)

• Obvious extensions:– SQL types– XML Schema types

• Key WG goal:– Define form and requirements of new ontologies

Page 13: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

What is an Ontology?What is an Ontology?

• XML Schema for new types

• Structural description of new types

• Definition of core API behaviour on new type

• API extensions

• Relationships to other types

Page 14: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

WG goalsWG goals

• Formal language for DFDL data structure

• Standard representation of this language in XML

• Requirements for DFDL ontology

• Basic types ontology

• Basic structures ontology

Page 15: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

Currently under discussionCurrently under discussion

• Abstraction from the underlying binary– Compression, encoding, encryption– Physical vs. conceptual binary sequence

• Abstraction of description– complex:=[foo; foo]– Instantiate “foo:= float” or “foo:= double” at use time

• Filtering of results– Getting to data model and leave format behind– CSV -> [[value; value; value]; [value; value; value]]

Page 16: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

DFDL in the VODFDL in the VO

• Generic tools

• Metadata possibilities– Ontologies can define relationships between

types– E.g. polar to Cartesian– Standard classes over data objects

Page 17: Data Format Description Language (DFDL) WG Martin Westhead EPCC, University of Edinburgh M.Westhead@epcc.ed.ac.uk Alan Chappell PNNL chappella@battelle.org.

Getting involvedGetting involved

• Webpages:

http://www.epcc.ed.ac.uk/dfdl

• Mailing list ([email protected])

• My address:[email protected]


Recommended