Federated database services for wind tunnel experiment...

Scientific Programming 14 (2006) 173–184 173IOS Press

Federated database services for wind tunnelexperiment workflows

A. Paventhana, Kenji Takedaa,∗, Simon J. Coxa and Denis A. Nicoleb

aMicrosoft Institute for High Performance Computing, School of Engineering Sciences, University of Southampton,SO17 1BJ, UKE-mail: {povs, ktakeda, sjc}@soton.ac.ukbSchool of Electronics and Computer Science, University of Southampton, SO17 1BJ, UKE-mail: [email protected]

Abstract. Enabling the full life cycle of scientific and engineering workflows requires robust middleware and services thatsupport effective data management, near-realtime data movement and custom data processing. Many existing solutions exploitthe database as a passive metadata catalog. In this paper, we present an approach that makes use of federation of databases tohost data-centric wind tunnel application workflows. The user is able to compose customized application workflows based ondatabase services. We provide a reference implementation that leverages typical business tools and technologies: Microsoft SQLServer for database services and Windows Workflow Foundation for workflow services. The application data and user’s codeare both hosted in federated databases. With the growing interest in XML Web Services in scientific Grids, and with databasesbeginning to support native XML types and XML Web services, we can expect the role of databases in scientific computation togrow in importance.

Keywords: Application workflows, workflow activities, scientific data management, database federation, RDBMS

1. Introduction

Scientific and engineering experiments often involvepeople and facilities that are distributed within andacross organizations. The large volumes of data ac-quired during many of these experiments are oftentransferred to different network locations for storageand processing. In the last few years, Grid computinghas generated much interest among scientists and is in-creasingly being adopted in many scientific projects.The majority of scientific applications in the Grid relyon file systems for data management, with very lim-ited use of Relational Database Management Systems(RDBMS). Where used, the RDBMS is often exploitedas a query engine to retrieve metadata and/or results.

In any data-centric application the important func-tionality is to provide effective storage of and ac-

∗Corresponding author.

cess to data. The file systems may provide betterraw read/write performance than database systems,but there are additional benefits database systems canbring in transaction support to guarantee data integrity,query language capability, secured access to data andother features that include support for procedural lan-guage stored procedures and functions, native XMLtypes and web services, transactional messaging, pub-lish/subscribe replication, data mining extensions andso on. The development of such new capabilities isdriven by the business market and it has the potential toenable new approaches to scientific data managementin the Grid environment. An RDBMS with these richnew capabilities may be viewed as database operatingsystems [15] into which one can plug subsystems andapplications.

Database federation can help heterogenous data pro-duced at different geographical locations to be man-aged, and provide the user with a single logical view.The individual database instances in the federation are

ISSN 1058-9244/06/$17.00 2006 – IOS Press and the authors. All rights reserved

174 A. Paventhan et al. / Federated database services for wind tunnel experiment workflows

autonomous and any of them temporarily being un-available does not affect their interactions. Althoughdatabase federation as an approach to data integra-tion [16] can support functions such as query optimiza-tion, the issue we address in this paper is geographi-cal separation of data sources, be it within campus oracross organizations.

Different scientific applications in fields such as HighEnergy Physics [30], Earth Sciences [19] and Geo-sciences [21] have already utilized database-centric ap-proaches in a Grid environment. In the Grid context,there is also a valuable review of database integrationin [35]. The work described in this paper differs in thatwe provide an end-to-end experiment workflow solu-tion exclusively using the database capabilities. Wepresent an architecture based on federation of databaseinstances managing both data and the processing code.We also show how the user is able to compose theircustomized workflow by leveraging database-centricactivities. The data movement operations to transferdata from experimental sites are enabled by databasereplication. The user is able to register their customprocessing codes for a particular application and main-tain different versions of them. The registered coderuns under the user’s security credentials, as we areable to leverage the database security features, suchas certificate based, domain based or password basedauthentication schemes.

The rest of the paper is organized as follows. Sec-tion 2 covers some of the recent developments indatabases and how they can be exploited in scien-tific applications. In Section 3, we present the feder-ated database architecture for wind tunnel experimen-tal workflow. Section 4 covers the implementation de-tails of the database activities that enable workflow in-tegration. In Section 5, we discuss wind tunnel experi-ment workflow based on database activities. Section 6presents discussions on how some of the existing sci-entific projects exploit database technologies. Finally,conclusions and future work are presented in Section 7.

2. Recent database trends – Leveraging forscientific applications

The capabilities of database systems are increas-ing and their architectures are undergoing continuouschange. Some of the features that provide new possi-bilities for scientific application development are dis-cussed below.

2.1. Language runtime

Many popular database systems now host languageruntimes supporting high-level language stored proce-dures, functions, triggers, and user-defined data types.For example, SQL Server 2005 hosts the Microsoft.NET Common Language Runtime (CLR) [11]; theJava Virtual Machine (JVM) and .NET CLR are sup-ported in Oracle [36] and IBM DB2 [3]. This en-ables scientific applications to manage both data andthe processing code in databases. The implementa-tion approach discussed in Section 4 leverages SQLServer CLR integration feature enabling user to regis-ter compute-intensive code written in any of the CLRlanguages (C++, Java, C#, and so on).

2.2. Native XML support

With XML becoming a data type, storing XML doc-uments, validating them against a schema, and query-ing based on XQuery expressions are all part of thecore XML functionalities built into popular databasesystems [18,20,26]. This feature is useful in processingXML message exchanges between Grid services and tostore semi-structured scientific data in XML format.

2.3. XML Web Services

With the increasing interest in XML Web Services,database systems [2,8,9] are beginning to support webservices hosting inside the databases, eliminating theneed for external hosting containers or web servers.This would enable Web Services Resource Framework(WSRF) [10] based, or similar, Grid services to beexposed directly from the databases.

2.4. Transactional messaging

Asynchronous and reliable messaging betweendatabase instances are possible in present day databasesystems (SQL Service Broker [34] or Oraclestreams [7]). We have utilized Microsoft SQL Server2005 Service Broker for service level interactionswhich is discussed in Section 4. Service Broker ob-jects include queues, dialogs, message types, contractsand services. These objects can be created using regu-lar CREATE, ALTER and DROP Data Definition Lan-guage (DDL) commands. The messages from the trans-mit queue of the local database instance to the receivequeue of the remote database instance can be trans-ferred inside a transaction making the message trans-fer reliable. This database feature can be exploited fordeveloping reliable Grid services.

A. Paventhan et al. / Federated database services for wind tunnel experiment workflows 175

2.5. Replication

The publish/subscribe model in replication allowstables, stored procedures or any other database objectsto be published. Different replication styles determinewhen and how the data reaches the subscriber. For ex-ample, transactional push-style replication moves datato the subscriber in near-realtime. Database replicationcan be utilized when scientific applications have to dealwith distributed data and the availability of data is tobe ensured in more than one location.

3. Architecture

Figure 1 shows a federated database architecture fora typical multi-site wind tunnel facility which is similarto other applications. Wind tunnels are widely used todesign, test and verify aerodynamics of aircraft, cars,yachts, and buildings, amongst others. The Universityof Southampton has three main wind tunnel facilities(11’× 8’, 7’ × 5’ and 3’ × 2’) spread over the campus,housing heterogenous, specialized experimental hard-ware and software for academic and industrial research.The high volume of data generated from multiple ex-periments are transferred from the data acquisition sys-tem to a suitable network location where user can carryout further processing and analysis.

There are three logical database instances partici-pating in the federation – (1) Site databases (SiteDB):Considering the importance of timely data movementand near-realtime requirements, the SiteDB publishesthe experiment data tables to the MasterDB using trans-actional and push-style replication. This ensures im-mediate transfer of experimental data to the master assoon as the data imported into SiteDB from the dataacquisition system. (2) Master database (MasterDB):This maintains user and application tables, and pub-lishes them to sites and worker databases. It subscribesto experimental data from all the sites. The master nodealso runs workflow services for users to register, runand monitor their application workflow. (3) Workerdatabases (WorkerDB): This set of database instancesis managed as a cluster of nodes. It carries out the pro-cessing work assigned by the master. It also managesdifferent versions of custom user code for processing.

The database instances in the federation enable acomplete end-to-end wind tunnel experimental work-flow to be created and executed by hosting a set ofdatabase services (activities) with master node provid-ing additional workflow services. The master schedules

the processing activities from multiple user workflowsonto worker nodes for load balancing. Access to otherGrid resources, such as compute clusters, enabled us-ing Grid and/or Web Services, is also supported basedon our earlier work [22].

Figure 2 shows the sequence of messages and dataexchanges between different database instances and theuser’s wind tunnel grid client. The actions labeled withletters A, B, C and D are independent of a particularworkflow instance and they can happen at any stage.The users can compose workflows based on databaseactivities, compile into a workflow assembly and sub-mit to MasterDB using workflow services for schedul-ing (step A). They can also monitor the status of theircurrently running workflows (step B). They can com-pile a customized assembly and register it through theassembly management services running in MasterDB(step C1). The master in turn makes the assemblyavailable to WorkerDB for registration and subsequentload balancing of users jobs (step C2). Each assem-bly is registered with a unique name derived from theusername, application type and user specified versionnumber. This unique name registration enables a userto maintain different versions of algorithms to processthe experimental data.

The actual workflow execution starts when the userinitiates data acquisition during an experimental run(step 1). When the data acquisition is over, the servicewaiting for acquisition to complete (step 2) changesthe state of the current experimental run from “Waitingfor DAQ” to “DAQ over”. As the workflow is basedon a state machine model, this state change transitionsthe workflow to the next stage, triggering an importdata activity (step 3). Since the application data tablesare subscribed at MasterDB and published by means oftransactional push publication in sites, the newly im-ported data is transferred to MasterDB in near-realtime(step 4). Now, with data available at MasterDB andthe user’s application code registered with master andworkers, the data can be distributed for processing (step5). The processing requests to workers comprise user-name, application code and version to uniquely identifythe assembly for processing (step 6). On receiving theprocessing request, the worker either invokes the de-fault processing or a customized method registered bythe user (step 7). The worker node sends the computedresults and the status of the processing to the master(step 8). The master receives, consolidates and recordsthe results (step 9). The final step involves a call to theMatlab interface to generate a plot and save it into theresults table (step 10).


Fig. 1. Federated database architecture for wind tunnel application.

Push

user

&

appli

catio

n inf

o.

SiteDB

WaitForDAQ,

Importmicrophonedata

MasterDB

Workflow services,Assembly management

Distribute micsamples

MergeCSM

User

Call to matlabinterface

for plotting

WorkerDB

Handle assemblymessagesCompute CSM,Beamforming

Register w

okflow

Register assem

bly

Monitor w

orkflow

A

B

1Initiate DAQ

2

3

4

Push sit

e data

D

10

C1

C2

7

5

Process request(app_code, version)

6

CSM submatrix 89

WTGClient

(Multiple instances)

Fig. 2. Data and message flow for microphone array application.

4. Implementation

The implementation details we discuss in this sectionare based on Microsoft SQL Server 2005, leveragingSQL Service Broker [34] and .NET integration [11]features. These two features are typical of a modernRDBMS as discussed in Section 2, and we believe

the generic approach is applicable to other databasesystems.

4.1. Motivating example – Microphone arrays

The microphone array technique (Fig. 3) is used tomeasure noise of aircraft components (slats, landing-


(a) Landing gear test

x

y

f = 4 kHz. dim3 = 0.05 m. Bogie angle = 0 . Opt: Diag Repl. Avrg 1/3-oct.

-0.8 -0.6 -0.4 -0.2 0 0.2-0.5

-0.4

-0.3

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0.5

(b) Typical noise contour plot

Fig. 3. Microphone phased array experiment [29].

gears, flaps etc) to help aerospace engineers improveaircraft design and to reduce the overall airframenoise [32]. Microphone arrays consist of multiple,around 100, microphones that must be simultaneouslysampled. The phase shift between channels is thenused to derive acoustic source information [29,32]. Themicrophone array processing happens in two stages:cross spectral matrix computation and beamforming.The cross-spectral matrix (CSM) is an M ×M matrix,where M is the number of microphones. The CSMcomputing steps involve data calibration of the rawsamples, Frequency Fourier Transform (FFT) compu-tation, block averaging of cross spectral componentsand background noise removal [29,33]. By dividing themicrophone samples, the CSM steps can be run in par-allel as can be seen in Section 4.3. In the beamformingstep, the beamforming expression is formed using pre-computed CSM and grid coordinates are generated forplotting. The generation of beamforming plots for mul-tiple frequencies can be run in parallel. Even thoughwe consider microphone arrays as an example in thispaper, the approach and the discussions are valid forother wind tunnel experiments as well [23].

4.2. Wind tunnel database

Figure 4 shows the database schema used for thewind tunnel experimental data management. The usertable holds the username, user’s role, user’s X.509 cer-tificate subject and other user-specific details. Whenthe new application is created, the applications table isadded with dedicated data, results and run table names.Both user table and applications table are published bythe master and subscribed by sites and worker nodes.The run and results tables are maintained in the master.The data tables store both the raw data and the con-

figuration information from an experiment. For exam-ple, LDAData and MicArrayData hold data for LaserDoppler Anemometry (LDA) experiment and micro-phone arrays experiment respectively. The data tablesare maintained at wind tunnel sites and published tomaster by transactional replication. The raw data im-port and update happen at sites independent of the mas-ter and at the same time the data gets propagated fromsites to the master in near-realtime. Every experimen-tal run in the run table has an associated dataset in adata table. The relationship between runs and data ismany-to-one. This allows multiple runs with differentprocessing parameters to be associated with a singledataset. Similarly, the results table holds multiple re-sult records for each experimental run. The user as-semblies table is maintained at the master and workernodes. It has one entry for each assembly registered bythe user as will be discussed further in Section 4.3.

4.2.1. Stored proceduresThe service code for the database activities are

written as CLR stored procedures in three differ-ent assemblies, namely, site assembly, master as-sembly and worker assembly and they are regis-tered in SiteDB, MasterDB and WorkerDB respec-tively. The main stored procedure in SiteDB is Im-portMicData. The stored procedures that run at Mas-terDB are Register/Remove/ReinstallAssembly, AddMi-cArrayRun, MasterCSMScheduler, CSMCompute andBeamforming. At the worker the main stored pro-cedures are WorkerCSMScheduler, CSMCompute andBeamforming.

4.2.2. Service broker objectsTable 1 shows how different service broker objects

(messages, contracts, queues and services) are cre-


Published by master

Published by sites

Fig. 4. Database schema for wind tunnel application data management.

ated using regular SQL DDL commands in the workerdatabase. The message types can be binary or XMLwith a validation schema. A contract specifies the typesof messages that can flow between a sender and re-ceiver. Queues are placeholders for messages and cen-tral to transactional messaging; on new message arrivalthe activation stored procedure (for example, Work-erServiceProc in the Queue declaration) is invoked. Aservice is an endpoint that participates in a conversa-tion. The master and worker communicate with thehelp of broker objects. Table 2 shows how the mas-ter would send a CSMComputeRequest message toworker.

4.3. Database activities

The following are the database activities that form thebasis for the development of customized wind tunnelexperiment workflows.

4.3.1. Assembly activitiesAssemblies are libraries containing a user’s applica-

tion-specific processing code, which can be registeredwith the master and worker databases. The user canchoose the default processing functions available orwrite a custom one, register with the master, and in turn,with the worker. Once the assembly is registered, thepublic interfaces (public class, static functions, data)are available for access from service stored procedures.The assembly activities RegisterAssembly, Reinstal-lAssembly and RemoveAssembly will internally trans-late into SQL DDL statements CREATE ASSEMBLY,ALTER ASSEMBLY and DROP ASSEMBLY respec-tively. On receiving register assembly message, theservice procedure catalogs the assembly with a uniquename derived from username, application code andversion. The user’s custom processing functions areidentified and a lookup table (HandlerTable column)


Table 1Service Broker objects creation using SQL DDL commands

Message types:CREATE MESSAGE TYPE RegisterAssemblyRequest VALIDATION = WELL FORMED XML;CREATE MESSAGE TYPE ReinstallAssemblyRequest VALIDATION = WELL FORMED XML;CREATE MESSAGE TYPE RemoveAssemblyRequest VALIDATION = WELL FORMED XML;CREATE MESSAGE TYPE AssemblyReply VALIDATION = WELL FORMED XML;CREATE MESSAGE TYPE CSMComputeRequest VALIDATION = NONE;CREATE MESSAGE TYPE CSMComputeReply VALIDATION = NONE;CREATE MESSAGE TYPE BeamformingRequest VALIDATION = NONE;CREATE MESSAGE TYPE BeamformingReply VALIDATION = NONE;

Contract:CREATE CONTRACT WorkerContract(

RegisterAssemblyRequest SENT BY INITIATOR,ReinstallAssemblyRequest SENT BY INITIATOR,RemoveAssemblyRequest SENT BY INITIATOR,AssemblyReply SENT BY TARGET,CSMComputeRequest SENT BY INITIATOR,CSMComputeReply SENT BY TARGET,BeamformingRequest SENT BY INITIATOR,BeamformingReply SENT BY TARGET );

Service Procedure:CREATE ASSEMBLY WorkerAssembly FROM ’path/to/assembly’ WITH PERMISSION SET = SAFE;CREATE PROC WorkerServiceProc

EXTERNAL NAME WorkerAssembly.[WTG.DBLibrary.WorkerDBService].ServiceProc

Queue:CREATE QUEUE WorkerQueue

WITH STATUS = ON, RETENTION = OFF,ACTIVATION( STATUS = ON, PROCEDURE NAME = WorkerServiceProc,

MAX QUEUE READERS = 4, EXECUTE AS SELF );

Service:CREATE SERVICE WorkerService ON QUEUE WorkerQueue (WorkerContract);

Table 2An illustration of a transactional message exchange

Master database Worker database

BEGIN TRANSACTION BEGIN TRANSACTIONSET @Message = <CSMComputeRequest>... WAIT FOR(BEGIN DIALOG @conversationHandle RECEIVE TOP(1)

FROM SERVICE [MasterService] @mesg type = message type name,TO SERVICE [WorkerService] @Message = message bodyON CONTRACT [WorkerContract] FROM [WorkerQueue]

SEND ON CONVERSATION @conversationHandle WHERE conversation handle =MESSAGE TYPE [CSMComputeRequest] @ConversationHandle(@Message) );

COMMIT COMMIT

with metadata for function invocation is stored into theUserAssemblies table.

4.3.2. Process activitiesThe process activities execute application-specific

code either from the default assembly or from a userregistered one. When a processing message is sentfrom the master, corresponding lookup table is retrievedto invoke the user’s custom processing function. Forexample, the microphone array processing involves across spectral matrix (CSM) computation and beam-

forming step. The CSM computation can be executedin parallel by splitting the raw microphone array sam-ples equally among processing threads to improve theperformance. The threads running on worker nodescompute the cross spectral matrix in parallel on portionof the data. The master receives the part results fromworkers, performs an averaging operation to form thecross spectral matrix and stores this in the MicResultstable. The beamforming step is executed once for eachfrequency to generate the beamforming plot. In the caseof multiple frequencies, different frequency values can


be sent to worker nodes to generate the beamformingplot in parallel. The output of individual beamformingsteps are three square matrices X, Y and Z with gridpoint values for plotting.

4.3.3. Plot activityMatlab has been chosen in order to generate

publication-quality scientific plots. The plot functionis written in Matlab which takes various argumentsfor plotting. The invoking of Matlab code from .NETwas achieved using the Matlab .NET builder tool [5]which wraps the Matlab function into a .NET class.The plot activity uses the wrapper class to generatethe plot and stores the image into the results table fordownload/visualization.

4.4. Microphone processing: Performance

Table 3 shows the cross spectral matrix processingtimings for a C# command line application and for thesame code hosted inside SQL Server as a CommonLanguage Runtime (CLR) stored procedure. The twonodes utilized for this test are Dual Pentium III 1 GHzwith 1 GB RAM running Windows Server 2003 andSQL Server 2005, and connected over a 100 MbpsLAN. The SQL Server hosts the .NET 2.0 runtime. Theraw data samples used for the test were acquired from56 microphones consisting of 100 blocks with a blocksize of 2048 (total samples = 204800) [29]. The loadtime is the time taken to read the microphone samplesinto memory for processing. In the command line case,the samples are read from a delimited text file and inthe SQL CLR case, they are read from the RawDataBLOB (Binary Large Object) column in the MicArray-Data table. As can be seen from the table, parsingof samples from text file takes more than double thetime of deserializing the raw data BLOB into memory.The SQL CLR cross spectral matrix processing tim-ing is comparable and the overhead due to processinginside the database is marginal, as can be seen fromthe Fig. 5. The split time is the time taken to parti-tion the samples among the threads and the merge timeis the time taken to combine the cross spectral matrixreceived from threads by an averaging operation. Thesplit and merge time for 2 threads of SQL CLR case ona single node is again comparable with the commandline timings.

The computing of the beamforming expression in-volves multiplying the cross spectral matrix with aweight vector for each grid point of the plot. Thismatrix-vector multiplication can be optimized with spe-

cialized Intel’s Streaming SIMD (Single InstructionMultiple Data) Extension [17] instructions. The opti-mized command line beamforming timing is obtainedusing the NMath Core [6] C# library (which in turnuses Intel’s Math Kernel Library). This optimizationlibrary could not be registered into the database due tothe SQL CLR strict versioning policies (we expect thisto be resolved in the future). In order to provide a faircomparison, we measured the beamforming timingswithout optimizations (marked with � in Table 3); theSQL CLR timings is comparable to the same commandline version.

The overhead due to queueing of the raw data is no-ticeable in the four thread case. But, this particular testis more to illustrate the advantage of reliably partition-ing and load balancing a service using database mes-saging, than to show any speedup. With multiple usersrunning different experiments producing a high volumeof data, a reliable service to the wind tunnel experimen-tal environment is more important. This is discussedfurther in Section 6. Also, for cases, where the exper-imental processing is long running due to data volumeor the nature of the processing, the queueing overheadcan be amortized. Further, the database messages canbe routed through a low latency and high bandwidthnetwork technologies, such as, InfiniBand and otherhigh speed interconnects, to improve the performance.

5. Workflow integration

The workflow integration is achieved using Mi-crosoft Windows Workflow Foundation that is part ofthe upcoming Microsoft .NET development framework3.0 [4].

5.1. Windows Workflow Foundation

The workflow in Windows Workflow Foundationcan be composed using a visual workflow designeror declaratively written in XML Applications MarkupLanguage (XAML) or coded completely in CLR lan-guages. The workflow must be compiled with aworkflow compiler before it can be run. There aretwo workflow models supported [12]: (1) Sequen-tial workflow model – comprising activities that ex-ecute in a predictable sequential path, and (2) Statemachine model – a flow driven by events trigger-ing state transitions. In both these models the ba-sic element of the workflow is called an activity.Some of the Windows Workflow Foundation’s activ-


Table 3Microphone array processing (CSM timings)

Dual P-III Load data Split Merge CSM step CSM Total Beamforming step1 GHz CPU; 1GB RAM (single frequency)

Sequential (Command line) 30.231 – – 89.400 89.400 99.043 (288.149�)2 threads (Command line) 28.587 0.996 1.499 51.287 53.782 99.043 (288.149�)

Sequential (SQL CLR) 12.253 – – 89.635 89.635 292.553�

2 threads (SQL CLR) 12.529 1.321 1.604 54.423 57.348 292.553�

4 threads (SQL CLR) 11.850 13.971† 84.496‡ 30.899 129.366‡ 292.553�

on two nodes�Without using matrix-vector multiplication optimizations. All timings in seconds.†Time to split, serialize & send.‡Time due to queuing delay & merge.

1 2 4

10

20

40

80

120

160Cross spectral matrix timings (204800 samples)

Number of threads

Exe

cutio

n tim

e in

sec

onds

Dual P-III 1GHz 1GB RAM (Command line)Dual P-III 1GHz 1GB RAM (SQL CLR)

Fig. 5. Microphone cross spectral matrix performance.

ity types include control-flow (While, IfElse, Delay),exception (throw, exception-handler and BPEL com-pensations), data handling (Update, Select), transac-tions (and compensations for long-lived “transactions”that cannot be directly unwound) and communica-tion (InvokeWebService, InvokeMethod). The Sys-tem.Workflow.ComponentModel.Acitivity is the baseclass for all the activities. This extensible developmentmodel enables creation of domain-specific activitieswhich can then be used to compose workflows that areuseful and understandable by domain scientists. Theworkflow hosting layer part of the Windows WorkflowFoundation is responsible for communication, persis-tence, tracking, transaction, timing, dynamic updatesand threading. A long running workflow instance canbe persisted, when it is faced with resource constraints,

with all its state in a database so that it can be restartedagain. With this flexible approach to workflow hostingand an extensible framework for workflow activities,most of the functionality of typical state-of-the-art sci-entific workflow systems [37] can be hosted on top ofWindows Workflow Foundation.

5.2. Wind tunnel experiment workflow

The database activities discussed in Section 4 arewrapped into an experiment-specific workflow activitylibrary for users to compose workflow and submit tothe master node for hosting. Figure 6 shows the com-position of custom microphone workflow based on astate machine workflow model.


Fig. 6. Microphone experiment workflow based on database activities.

The initial state of the microphone workflow is Wait-ForDAQ and the final state on success is Plot or Work-flowError in case of any error during workflow exe-cution. The experiment-specific activities are derivedfrom the State activity. The State activity consists ofone more event driven activities. For example, the Im-portMicData state has DataImported event transition-ing to MoveData state and ImportError event transi-tioning to WorkflowError state. On completion of anevent, the SetState as part of the event-driven activitysequence transitions the workflow to the next state. TheCSMCompute and Beamforming are workflow statesrepresenting processing.

The user composes the workflow, compiles it into anassembly and submits it to the master node for host-ing. The workflow is scheduled and run at the masternode. The workflow activities connect to the masterdatabase to execute the corresponding database activi-ties described in Section 4. The state transitions of theworkflow are recorded into the run tables of the masterdatabase. Users can monitor the status of the submittedworkflow instance, to find out whether it is still running,completed successfully, or terminated with an error.

A similar, customized workflow approach to LaserDoppler Anemometry (LDA) experiment based on se-quential workflow model is presented in [23].

6. Discussions

The nature and degree of use of Relational DatabaseManagement Systems (RDBMS) in scientific data man-agement has been variable. Some of the usage has beento stream data near-realtime, to host scientific servicesby means of static stored procedures, to partition datato improve query performance, to store results, to storemetadata and so on. In this section, we discuss relatedscientific projects, highlighting the degree of databasesystems usage and offer our arguments in favor of keep-ing databases central to the entire experimental work-flow.

The NEESgrid framework [24], part of the Net-work for Earthquake Engineering Simulation project,supports instrument integration and exposes domain-specific Grid services for conducting and monitoringdistributed earthquake engineering experiments. Interms of the experimental facilities, NEESgrid hassome close similarities with wind tunnel experiments,but the emphasis is more on remote access to in-struments in a multi-site environment. NEESgriduses databases for metadata management only. ThemyLEAD [25] tool, part of the Linked Environment forAtmospheric Discovery (LEAD) project, provides spe-


cialized services for atmospheric scientists to search,store and catalog data objects generated during theirinvestigations. The metadata catalog is managed inan RDBMS along with a set of database-stored proce-dures to expose persistent Grid services. It uses OGSA-DAI as a middleware for client interactions. OGSA-DAI [13] provides Grid service interfaces to differentdata sources (relational, XML, flat files). The advan-tages of database management systems in real-worldscientific application have been demonstrated in SloanDigital Sky Survey (SDSS) [31] project. With effi-cient indexing, join and parallel query operations, atwenty times speedup was achieved as compared to afile-based implementation. MySQL’s streaming sup-port has been utilized while hosting archival and real-time Geographical Information Systems (GIS) Grid ser-vices in [14]. The BioSimGrid project [1] manageslarge-scale biomolecular simulation data in flat filesand associated metadata in RDBMS (Oracle). It sup-ports simulation data to be deposited into a repositorywhich is then replicated to different sites for retrievaland analysis.

As can be seen from the above applications, the majorfactors that influence how database systems are utilizedin a scientific environment include data characteristics,nature of acquisition, processing requirements and per-formance. In the case of multi-user facilities such aswind tunnels where different experiments, multiple lo-cations, multiple runs, changing parameters, high vol-ume data and customized processing are the order ofthe day, it requires an approach that meets this set ofdemanding requirements.

The emphasis in the federated database approach [28]is on the ability of the local database instances to con-tinue to support local operations autonomously, whilethey are part of the federation, to provide a set of globaloperations. Our architecture takes this approach whileintegrating the geographically distributed sites. Thesite databases operate independently in the federation,sharing information through publish/subscribe replica-tion with the master. Also, the communication betweenthe master and the worker nodes are by means of reli-able transactional messaging. Any site node or workernode or even the master not being available temporarilywill not affect the global operations. This is of partic-ular importance in wind tunnel operations, which aredeemed mission critical. A typical industrial scenariowould be for a Formula One racing team. Sites wouldinclude the factory, multiple wind tunnel sites, testingand race tracks in different countries, where the net-work bandwidth and quality of service cannot always

be guaranteed. Any of these sites could be offline fora number of reasons, and many are required to operatearound the clock. Hence, local autonomy and reliabil-ity are important for such an application.

In general, scientific projects keep their raw data inflat files. In our approach, the wind tunnel raw datais imported as a Binary Large Object (BLOB) into thedatabase at experiment sites and these are replicatedto the master node for processing. As can be seenfrom [27], the load performance of read-only BLOBobjects are comparable to the file systems, whereaswrite/update operations that result in fragmentation ofthe BLOB affect the load time. In our microphone ex-periment example, the raw data is a read-only object,the BLOB structure is accessed before CSMComputeand never gets updated. Similarly the CSM matrixBLOB, once stored, is always read before the Beam-forming step. In this usage scenario, deserializing theBLOB is more advantageous as can be verified by theload time in Table 3 when compared to loading theraw data from text file stored in the file system. Thereis an additional overhead in parsing the floating-pointsamples from a text file.

In typical wind tunnel processing, an aerodynamicistwould be interested in changing and customizing theprocessing algorithms. User customization of the pro-cessing algorithm, together with an ability to use thedefault processing steps, is an essential requirement.By taking advantage of the language runtime supportinside databases, managing the user customized algo-rithm is possible in our approach. The Microsoft .NETdevelopment environment has an extensive support forlanguages such as Java, C++ and C#, with other lan-guage compilers such as Python and FORTRAN alsobeing available, users can program in their language ofchoice. The user code is registered to run under theuser’s security credentials to gain authorized access tothe dataset and results.

7. Conclusions

In this paper, we presented an approach to supportingan end-to-end engineering workflow based on federa-tion of databases. The database instances host databaseservices which are invoked from a state machine modelworkflow. We have demonstrated this approach witha reference implementation leveraging the features ofSQL Server and Windows Workflow Foundation. Thearchitecture is generic and can be implemented using


database systems other than SQL Server, such as OR-ACLE or IBM DB2.

The advantages in this approach include reductionin the overall turnaround time by providing an easy-to-use, extensible workflow framework, relieving theuser of data management issues, and providing a robustand reliable system by using the features typical ofcommercial database systems, such as replication andtransactional messaging. Even though the approachand the implementation discussed in this paper is withreference to wind tunnel experiments, it can be easilyextended to other scientific and engineering applicationwith similar characteristics.

Acknowledgments

The authors would like to thank Microsoft for theirongoing support. The microphone array work is fundedby the UK EPSRC under grant number GR/S68446/01.

References

[1] BioSimGrid project. http://www.biosimgrid.org, [September2006].

[2] DB2 Web Services: The Big Picture. http://www.ibm.com,[August 2002].

[3] A detailed look at DB2 Stinger .NET CLR Routines. http://www.ibm.com, [June 2004].

[4] Introducing the .NET Framework 3.0. http://msdn.microsoft.com/, [July 2006].

[5] MATLAB Builder for .NET User’s Guide Version 2. http://www.mathworks.com, [March 2006].

[6] NET Numerical Applications with NMath Core, white paper,http://www.centerspace.net, [July 2006].

[7] Sharing Information with Oracle Streams. www.oracle.com,[May 2005].

[8] Using Native XML Web Services in SQL Server 2005. http://msdn2.microsoft.com, [July 2006].

[9] Virtualize Your Oracle Database with Web Services. http://www.oracle.com, [November 2005].

[10] Web Services Resources Framework v1.2 OASIS Standard.http://www.oasis-open.org/, [April 2006].

[11] A. Acheson et al., Hosting the .NET Runtime in Microsoft SQLServer, in ACM SIGMOD Conference, 2004, 860–865.

[12] P. Andrew et al., Presenting Windows Workflow Foundation,Beta Edition, Sams, September 2005.

[13] M. Antonioletti et al., The design and implementation of Griddatabase services in OGSA-DAI, Concurrency and Computa-tion – Practice and Experience 17 (2005), 357–376.

[14] G. Aydin, G.C. Fox et al., Streaming Data Services to Sup-port Archival and Real-Time Geographical Information Sys-tem Grids, in Sixth Annual NASA Earth Science TechnologyConference, June 2006.

[15] J. Gray, The Revolution in Database Architecture, Tech. Re-port MSR-TR-2004-31, Microsoft Research, March 2004.

[16] L.M. Haas, E.T. Lin and M.A. Roth, Data integration throughdatabase federation, IBM Systems Journal 41 (2002), 578–596.

[17] Intel Corporation, IA-32 Intel Architecture Software Develop-er’s Manual, Programming with Streaming SIMD Extensions3 (SSE3), June 2006.

[18] Z.H. Liu, M. Krishnaprasad and V. Arora, Native XQueryprocessing in Oracle XMLDB, in ACM SIGMOD Conference,2005, 828–833.

[19] S. Narayanan, T. Kurc, U. Catalyurek and J. Saltz, DatabaseSupport for Data-driven Scientific Applications in the Grid,Parallel Processing Letters 13 (2002), 245–271.

[20] M. Nicola and B.V. der Linden, Native XML Support in DB2Universal Database, in Proceedings of the 31st VLDB Con-ference, 2005, 1164–1174.

[21] S. Pallickara, B. Plale, S. Jensen and Y. Sun, Structure, Shar-ing and Preservation of Scientific Experiment Data, in IEEEWorkshop on Challenges of Large Application in DistributedEnvironments, July 2005.

[22] A. Paventhan, K. Takeda, S.J. Cox and D.A. Nicole,MyCoG.NET: a Multi-language CoG toolkit, Concur-rency and Computation – Practice and Experience, DOI:10.1002/cpe.1133.

[23] A. Paventhan, K. Takeda, S.J. Cox and D.A. Nicole, Workflowsfor Wind Tunnel Grid Applications, in International Confer-ence on Computational Science, 2006, 928–935.

[24] L. Pearlman et al., Distributed Hybrid Earthquake Engineer-ing Experiments:Experiences with a Ground-Shaking GridApplication, in Proceedings of the 13th IEEE Symposium onHigh Performance Distributed Computing, June 2004.

[25] B. Plale et al., Active Management of Scientific Data, IEEEInternet Computing 9 (2005), 27–34.

[26] M. Rys, XML and Relational Database Management Systems:Inside Microsoft SQL Server 2005, in ACM SIGMOD Con-ference, 2005, 958–962.

[27] R. Sears, C. van Ingen and J. Gray, To BLOB or Not To BLOB:Large Object Storage in a Database or a Filesystem?, Tech.Report MSR-TR-2006-45, Microsoft Research, June 2006.

[28] A.P. Sheth and J.A. Larson, Federated database systemsfor managing distributed, heterogeneous, and autonomousdatabases, ACM Computing Surveys 22 (1990), 183–236.

[29] M.G. Smith, B. Fenech et al., Control of noise sources on air-craft landing gear bogies, in 12th AIAA/CEAS AeroacousticsConference, AIAA Paper 2006-2626, May 2006.

[30] H. Stockinger, Distributed Database Management and theData Grid, in Proceedings of the Eighteenth IEEE Symposiumon Mass Storage Systems and Technologies, April 2001.

[31] A.S. Szalay, J. Gray et al., The SDSS SkyServer – Public Ac-cess to the Sloan Digital Sky Server Data, in ACM SIGMODConference, 2002, 570–581.

[32] K. Takeda, X. Zhang and P.A. Nelson, Unsteady aerodynamicsand aeroacoustics of a high-lift device configuration, in 40thAIAA Aerospace Sciences Meeting and Exhibit, AIAA Paper2002-0570, 2002.

[33] R.J. Underbrink, Aeroacoustic Measurements, Springer, 2002.[34] R. Walter, The Rational Guide To SQL Server 2005 Service

Broker, Rational Press, 2005.[35] P. Watson, Databases and the Grid, in: Grid Computing: Mak-

ing the Global Infrastructure a Reality, F. Berman, G.C. Foxand T. Hey, eds, Wiley, 2003.

[36] M.A. Williams, Pro .NET Oracle Programming, Apress, 2004.[37] J. Yu and R. Buyya, A taxonomy of scientific workflow sys-

tems for grid computing, ACM SIGMOD Record 34 (2005),44–49.

Submit your manuscripts athttp://www.hindawi.com

Computer Games Technology

International Journal of

Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Distributed Sensor Networks


Advances in

FuzzySystems

Hindawi Publishing Corporationhttp://www.hindawi.com

Volume 2014


ReconfigurableComputing

Hindawi Publishing Corporation http://www.hindawi.com Volume 2014


Applied Computational Intelligence and Soft Computing

Advances in

Artificial Intelligence


Advances inSoftware EngineeringHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014


Electrical and Computer Engineering

Journal of

Journal of

Computer Networks and Communications


Hindawi Publishing Corporation

http://www.hindawi.com Volume 2014

Advances in

Multimedia


Biomedical Imaging


ArtificialNeural Systems

Advances in


RoboticsJournal of



Computational Intelligence and Neuroscience

Industrial EngineeringJournal of


Modelling & Simulation in EngineeringHindawi Publishing Corporation http://www.hindawi.com Volume 2014

The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014


Human-ComputerInteraction

Advances in

Computer EngineeringAdvances in


Date post:	09-May-2020
Category:	Documents
Upload:	others
View:	3 times
Download:	0 times

Federated database services for wind tunnel experiment...

Documents