+ All Categories
Home > Documents > Introduction to the BinX Library eDIKT project team Ted Wen [email protected] [email protected]...

Introduction to the BinX Library eDIKT project team Ted Wen [email protected] [email protected]...

Date post: 28-Mar-2015
Category:
Upload: kaitlyn-manning
View: 219 times
Download: 1 times
Share this document with a friend
Popular Tags:
37
Introduction to Introduction to the BinX Library the BinX Library eDIKT project team eDIKT project team Ted Wen Ted Wen [email protected] [email protected] Robert Carroll Robert Carroll [email protected] [email protected]
Transcript
Page 1: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

Introduction to Introduction to the BinX Librarythe BinX Library

eDIKT project teameDIKT project team

Ted Wen Ted Wen [email protected]@nesc.ac.uk

Robert Carroll Robert Carroll [email protected]@nesc.ac.uk

Page 2: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

AgendaAgenda

About the BinX projectAbout the BinX project A brief introduction to the BinX A brief introduction to the BinX

languagelanguage Introduction to the BinX libraryIntroduction to the BinX library Advanced API to the BinX libraryAdvanced API to the BinX library Use cases and requirementsUse cases and requirements

Dr Bob MannDr Bob Mann Dr Chris MaynardDr Chris Maynard

DiscussionDiscussion

Page 3: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

About the BinX About the BinX projectproject

Page 4: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

The problemThe problem

XML is useful to represent metadataXML is useful to represent metadata Scientific datasets can be too large in Scientific datasets can be too large in

XMLXML Most scientific data are in binary filesMost scientific data are in binary files Binary data files are not all Binary data files are not all

standardizedstandardized Binary data files are platform-Binary data files are platform-

dependentdependent

Page 5: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

BinX – a solutionBinX – a solution

Initially designed for the Grid environmentInitially designed for the Grid environment Annotate data schema for any binary fileAnnotate data schema for any binary file Data elements are marked up in XMLData elements are marked up in XML Describe three levels of features in a Describe three levels of features in a

binary filebinary file Underlying physical representation (byte order)Underlying physical representation (byte order) Primitive data types (integer, float)Primitive data types (integer, float) Structure of the dataset (array, table)Structure of the dataset (array, table)

Page 6: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

The BinX project at The BinX project at eDIKTeDIKT

Implementing a software library for Implementing a software library for BinXBinX

Develop a series of tools based on Develop a series of tools based on the librarythe library

Choose C++ for performanceChoose C++ for performance Write portable code for different Write portable code for different

platformsplatforms Robust and easy to useRobust and easy to use

Page 7: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

Development statusDevelopment status

Requirement gathering from July Requirement gathering from July 20022002

Development started in October 2002Development started in October 2002 Prototype finished in December 2002Prototype finished in December 2002 Alpha version complete in April 2003Alpha version complete in April 2003 Beta version to be released in June Beta version to be released in June

20032003

Page 8: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

The deliverablesThe deliverables

The BinX libraryThe BinX library Compiled code on different platformsCompiled code on different platforms Source code with Open Source licenseSource code with Open Source license

DocumentationDocumentation User’s guideUser’s guide Developer’s guideDeveloper’s guide

Utilities and examplesUtilities and examples

Page 9: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

The BinX The BinX LanguageLanguage

Page 10: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

What is BinX?What is BinX?

The Binary XML Description The Binary XML Description LanguageLanguage

A language for annotating binary data A language for annotating binary data filesfiles

It describes data types, data It describes data types, data structures and attributes such as byte structures and attributes such as byte orderorder

A BinX document is an XML file with A BinX document is an XML file with metadata of a binary data filemetadata of a binary data file

Page 11: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

A BinX documentA BinX document <<dataset dataset

byteOrderbyteOrder=“bigEndian”>=“bigEndian”> <<definitionsdefinitions>>

<defineType <defineType typeNametypeName=“myTyp”>=“myTyp”>

<arrayFixed><arrayFixed> <character-8/><character-8/> <dim <dim indexToindexTo=“9”/>=“9”/>

</arrayFixed></arrayFixed> </defineType></defineType>

</</definitionsdefinitions>> <<filefile srcsrc=“=“myfile.binmyfile.bin”>”>

<useType <useType typeNametypeName=“myTyp”/>=“myTyp”/> <integer-32 <integer-32 varNamevarName=“X” />=“X” />

</</filefile>> </</datasetdataset>>

Root element

Data class section

Data instance section

Abstract data type

Page 12: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

Data elementsData elements

Primitive data elementsPrimitive data elements Byte, character, integer, realByte, character, integer, real

Complex data elementsComplex data elements Arrays, struct, unionArrays, struct, union

User-defined data elementsUser-defined data elements

Page 13: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

Primitive data typesPrimitive data types BitBit

<bit-1><bit-1> CharacterCharacter

<character-8><character-8> <unicodeCharacter-16><unicodeCharacter-16> <unicodeCharacter-32><unicodeCharacter-32>

IntegerInteger <byte-8><byte-8> <short-16>, <unsignedShort-16><short-16>, <unsignedShort-16> <integer-32>, <unsignedInteger-32><integer-32>, <unsignedInteger-32> <longInteger-64>, <unsignedLongInteger-64><longInteger-64>, <unsignedLongInteger-64>

RealReal <ieeeFloat-32><ieeeFloat-32> <ieeeDouble-64><ieeeDouble-64> <ieeeQuadruple-128><ieeeQuadruple-128>

Page 14: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

Complex data typesComplex data types ArraysArrays

Repetitive collection of any data elementRepetitive collection of any data element MultidimensionalMultidimensional Three types of arraysThree types of arrays

Fixed length arrayFixed length array Variable-length arrayVariable-length array Streamed arrayStreamed array

StructStruct A sequence of data elementsA sequence of data elements

UnionUnion One of a group of possible data elements One of a group of possible data elements

conditional to the discriminantconditional to the discriminant

Page 15: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

ArraysArrays Fixed-length arrayFixed-length array

<arrayFixed><arrayFixed> <ieeeDouble-64/><ieeeDouble-64/> <dim indexTo=“3” <dim indexTo=“3”

name=“X” />name=“X” /> <dim indexTo=“4” <dim indexTo=“4”

name=“Y” />name=“Y” /> <dim indexTo=“5” <dim indexTo=“5”

name=“Z” />name=“Z” /> </arrayFixed></arrayFixed>

Variable-length arrayVariable-length array <arrayVariable sizeRef=“byte-<arrayVariable sizeRef=“byte-

8”>8”> <ieeeFloat-32 /><ieeeFloat-32 /> <dim indexTo=“7”/><dim indexTo=“7”/> <dimVariable/><dimVariable/>

<arrayVariable><arrayVariable>

Streamed arrayStreamed array <arrayStreamed><arrayStreamed>

<byte-8/><byte-8/> <dimStreamed/><dimStreamed/>

</arrayStreamed></arrayStreamed>

Page 16: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

StructStruct

<struct><struct> <short-16 varName=“ID” /><short-16 varName=“ID” /> <integer-32 varName=“Count” /><integer-32 varName=“Count” /> <ieeeDouble-64 varName=“Var” /><ieeeDouble-64 varName=“Var” />

</struct></struct>

Page 17: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

UnionUnion <union><union>

<discriminant><discriminant> <byte-8/><byte-8/>

</discriminant></discriminant> <case discriminantValue=“32”><case discriminantValue=“32”>

<ieeeFloat-32 /><ieeeFloat-32 /> </case></case> <case discriminantValue=“64”><case discriminantValue=“64”>

<ieeeDouble-64 /><ieeeDouble-64 /> </case></case> <case discriminantValue=“0”><case discriminantValue=“0”>

<void-0 /><void-0 /> </case></case>

</union></union>

Page 18: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

User-defined data typeUser-defined data type

<defineType <defineType typeName=“HeaderStruct”>typeName=“HeaderStruct”> <struct><struct>

<character-8 varName=“A”/><character-8 varName=“A”/> <character-8 varName=“B” /><character-8 varName=“B” /> <integer-32 varName=“Length” /><integer-32 varName=“Length” />

</struct></struct> <defineType><defineType>

Page 19: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

Data elements as Data elements as instancesinstances

<file src=“myfile.bin”><file src=“myfile.bin”> <short-16 varName=“id”/><short-16 varName=“id”/> <arrayFixed varName=“name”><arrayFixed varName=“name”>

<character-8 /><character-8 /> <dim indexTo=“7” /><dim indexTo=“7” />

</arrayFixed></arrayFixed> <struct varName=“record”><struct varName=“record”>

<short-16 /><short-16 /> <ieeeFloat-32 /><ieeeFloat-32 />

</struct></struct> </file></file>

Page 20: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

Reference defined Reference defined elementselements

<definitions><definitions> <defineType typeName=“A”><defineType typeName=“A”>

<struct><struct> <short-16/><short-16/> <integer-32/><integer-32/>

</struct></struct> <defineType><defineType>

</definitions></definitions>

<file src=“myfile.bin”><file src=“myfile.bin”> <useType typeName=“A” varName=“FirstUse”/><useType typeName=“A” varName=“FirstUse”/> <useType typeName=“A” varName=“SecondUse”/><useType typeName=“A” varName=“SecondUse”/>

</file></file>

Page 21: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

The BinX LibraryThe BinX Library

Alpha versionAlpha version

Page 22: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

Fundamental Fundamental requirementsrequirements

Access to data elements in binary files via Access to data elements in binary files via BinXBinX Parse the BinX documentParse the BinX document Build in-memory data structuresBuild in-memory data structures Read data values from the binary fileRead data values from the binary file

Automatic conversionAutomatic conversion Byte orderingByte ordering PaddingPadding

Producing BinX document and binary dataProducing BinX document and binary data Generate BinX document for data structuresGenerate BinX document for data structures Save assigned data values into binary filesSave assigned data values into binary files

Page 23: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

General use casesGeneral use cases

Data conversion (byte order)Data conversion (byte order) Data extraction (sub-dataset)Data extraction (sub-dataset) Data combination (two arrays to Data combination (two arrays to

one)one) Data presentation (browse, pure Data presentation (browse, pure

XML)XML)

Page 24: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

BinX ComponentsBinX Components The library has core functionality to support The library has core functionality to support

generic utilities and applicationsgeneric utilities and applications

Applications

Utilities

BinX LibraryCore

BinX core functionality Parse BinX document Read binary data

Generic tools Data conversion Extraction Packing/UnpackingApplications Domain-specific

Page 25: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

The BinX library coreThe BinX library core Input: Input: SchemaBinXSchemaBinX, binary data file, binary data file Output: Output: DataBinXDataBinX, In-memory , In-memory

datasetdataset<dataset>… …</dataset>

<dataset>… …</dataset>

0101010101

0101010101

The BinX library

In-memoryData structure

(Values loadedon demand)

<short-16>100</short-16>

<short-16>100</short-16>

Page 26: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

The BinX UtilitiesThe BinX Utilities

DataBinX generatorDataBinX generator DataBinX splitterDataBinX splitter SchemaBinX creatorSchemaBinX creator Binary file indexerBinary file indexer

Page 27: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

DataBinX generatorDataBinX generator Put binary data inside XMLPut binary data inside XML

For browsing, web service return, query For browsing, web service return, query result setresult set

<dataset>… …</dataset>

<dataset>… …</dataset>

0101010101

0101010101

The BinX library

<short-16>100</short-16>

<short-16>100</short-16>

Page 28: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

DataBinX splitterDataBinX splitter

The reverse of DataBinX generatorThe reverse of DataBinX generator Generate binary file for testing, Generate binary file for testing,

transportationtransportation Cross-platform (byte order)Cross-platform (byte order)

<dataset>… …</dataset>

<dataset>… …</dataset>

0101010101

0101010101

The BinX library

<short-16>100</short-16>

<short-16>100</short-16>

Page 29: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

SchemaBinX creatorSchemaBinX creator

GUI and Web-based utilitiesGUI and Web-based utilities Build BinX document interactivelyBuild BinX document interactively Create a BinX document based on Create a BinX document based on

anotheranother

Page 30: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

Binary file indexerBinary file indexer

Generating indices for binary data Generating indices for binary data filesfiles Such indices can be used for fast data Such indices can be used for fast data

accessaccess<dataset>… …</dataset>

<dataset>… …</dataset>

0101010101

0101010101

The BinX library

XY

00000004

Page 31: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

Applications for Applications for astronomyastronomy

FITS and VOTable conversionFITS and VOTable conversion

DataBinX Utility

BinX libraryCore

SIMPLE = T… …END

01010101

SIMPLE = T… …END

01010101

<?xml version=.<VOTABLE>… …

</VOTABLE>

<?xml version=.<VOTABLE>… …

</VOTABLE>

Page 32: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

FITS →DataBinX FITS →DataBinX →VOTable→VOTable

FITS to VOTable conversionFITS to VOTable conversion

DataBinx Utility

FITSFITS

SchemaBinX

SchemaBinX

Preprocessor

DataBinx

DataBinx

VOTable

VOTable

XSLTXSLT

XSLTtransformer

Page 33: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

VOTable→DataBinX→FITVOTable→DataBinX→FITSS

VOTable to FITS conversionVOTable to FITS conversion

XSLTtransformer

VOTable

VOTable

XSLTXSLT

Preprocessor

DataBinx

DataBinx

FITSFITS

SchemaBinX

SchemaBinX

DataBinxUtility

BinaryData

BinaryData

Postprocessor

FITSHeader

FITSHeader

Page 34: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

FITS-VOTable FITS-VOTable experimentexperiment

Sample FITS fileSample FITS file A data table of 82 rows X 20 fieldsA data table of 82 rows X 20 fields File size: 37KBFile size: 37KB

Generated DataBinx by DataBinx Generated DataBinx by DataBinx utilityutility Time spent: 268 msTime spent: 268 ms DataBinx document size: 1.2MBDataBinx document size: 1.2MB

VOTable transformed by MSXMLVOTable transformed by MSXML Time spent: about 1 secondTime spent: about 1 second VOTable document size: 51KBVOTable document size: 51KB

Page 35: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

Possible future releasesPossible future releases

DataBinX parsingDataBinX parsing Utilities (GUI BinX editor)Utilities (GUI BinX editor) XPath-based data queryXPath-based data query DFDL supportDFDL support Preserving special tagsPreserving special tags

For comments, application-specific tags For comments, application-specific tags Text file supportText file support

Page 36: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

Features or issues to Features or issues to considerconsider

Converting floating point numbersConverting floating point numbers 80-bit, 96-bit, 128-bit floating point80-bit, 96-bit, 128-bit floating point

Array manipulation (slice, section)Array manipulation (slice, section) SAX-based XML document parsingSAX-based XML document parsing

Use cases in place of DOM parsingUse cases in place of DOM parsing Built in the library or as add-on component?Built in the library or as add-on component?

Database supportDatabase support Annotating database tables?Annotating database tables? Query database tables through BinX?Query database tables through BinX?

Java version of the libraryJava version of the library Keeping exactly the same features with the C++ Keeping exactly the same features with the C++

version?version? Supporting XQuerySupporting XQuery

Query binary data files with XQuery on BinXQuery binary data files with XQuery on BinX

Page 37: Introduction to the BinX Library eDIKT project team Ted Wen tedwen@nesc.ac.uk tedwen@nesc.ac.uk Robert Carroll robertc@nesc.ac.uk robertc@nesc.ac.uk.

SupportSupport

For problems of usage:For problems of usage: http://www.edikt.org/binxhttp://www.edikt.org/binx (coming (coming

soon)soon) [email protected]@edikt.org

For requirements and suggestions:For requirements and suggestions: [email protected]@edikt.org [email protected]@edikt.org


Recommended