Archiving/retrieval of research medical images: a working solution

Archiving/Retrieval of research medical images : a working solution

Rozenn Rougetet-Le Goff, Bernard Sécher, Jean Lépine,Pascal Merceron, Nathalie Ravenel and Vincent Frouin

Service Hospitalier Frédéric Joliot, CEA Orsay, France

ABSTRACT

We present an efficient and daily used solution to archive the whole research image data set produced in ourMulti-Modality Medical Imaging Research Laboratory. It is divided into two parts, 1) the IMASERV project whose purposeis to backup and distribute the data generated by the acquisition devices (3 PET scanners, 2 MRI, 3 SPECT systems and 1phosphor imager system) and 2) the SILO project whose purpose is to archive and retrieve data issued from processingsoftwares available on the 80 Unix Workstations of the local area network. The two main technical difficulties were thecomputer architecture differences and to permit a uniform access to the data in the image formats available on our site.Adatabase system was used to store information related to the description of the archived data. Softwares developments werenecessary to push and pull data towards and from the archiving system. Automatic data transfer procedures between theimaging devices and the archiving system were implemented with a high quality control level. Data consulting and retrievalare available through suitable web interfaces.

Keywords : archiving system, medical image, indexation, database

1. INTRODUCTION

The automatic management of the image data flow produced in a Multi-Modality Medical research facility is a difficultquestion that requires a global solution. It involves tools that may belong to some research domains such as the means thedifferent scanners communicate with each other, the means for optimized archiving/Retrieval of the data and how to processsuch data. We chose to address in this work only the archiving problem and propose a routinely-used working solution. Ourpurpose is 1) to backup and distribute the data generated by the acquisition devices and 2) to archive and retrieve resultsproduced by data processing softwares.

2. THE PREREQUISITES

2.1 The Research context of the SHFJ

The Service Hospitalier Fréddric Joliot (SHFJ) is a comprehensive research facility dedicated to projects related tofunctional imaging, from development and validation of data acquisition and processing methods to cognitive, biologicaland clinical applications. The SHFJ mainly houses three PET cameras, three SPECT cameras and two whole-body MRIscanners. All scanners are connected through an internal network to an extensive park of workstations and servers.The researchers are scientists (physics, mathematics and statistics, computer sciences, chemistry, pharmacist) as well asclinicians (neuropediatry, neurology, psychiatry, cardiology and radiology). The research focuses mainly on the brain andthe heart.

2.2 The network computer facilities

The SHFJ local area network hosts all data acquisition and data processing applications. 220 TCPIIP plugs are dispatchedbetween 90 Unix workstations and about 1 30 office computers. There are two main servers (SS 1000, SS2000) managingNFS and NIS services. Three high capacity (60 GBytes) disk matrices have been added and filled in to optimize disk space.The whole network backedup disk space is about 200Gbytes. The large throughput network of lOOMbits/s helps fast remotedata access.Eighty end-users are given either one or two Gbytes private workspaces according to the evaluation of their computerrequirements. The access to all the softwares ressources of the network is standardized from each end-user terminal througha common desktop environment.

Part of the SPIE Conference on PACS Design and Evaluation: Engineering and Clinical IssuesSan Diego, California. February 1998. SPIE Vol. 3339 • 0277-786X/98/$1O.OO 561

Downloaded From: http://proceedings.spiedigitallibrary.org/ on 05/23/2014 Terms of Use: http://spiedl.org/terms

Several analysis softwares are available. They are dedicated to mere analysis (display, Region of Interest edition (ROl), ROlstatistics, Time-Activity curves) or anatomical study, registration

2.3 The researchers

Researchers at the SHFJ are members of research groups whose purpose is to group the persons working on the samesubjects and needing to share pools of data sets. They are all active users since they frequently need to access computerressources, but we usually distinguish two researcher profiles: the clinicians use the data analysis softwares in order to set adiagnostic whereas the methodologists create and optimize the methods in order to develop the appropriate softwares. Thefirst ones are really dependent of the data acquisition flow and they need helpful tools to access the data. On the other hand,the methodologists do not require high data volume but generally require representative data sets to experiment theirmethods.

2.4 The research protocols

Clinicians at the SHFJ acquire image data for the benefit of various research programs. Those programs are divided inseveral research protocols whose purpose is to investigate a group of patients or volonteers. There are several types ofprotocol : Study of a specific pathology (Parkinson, Alzeihmer, Hungtington, Epilepsy, Aphasia...), neuro-scienceactivations, pharmacologic study of a new tracer, quantification of a physiological index (cerebral blood flow, glucoseconsumption, myocardic perfusion.. .). Because of legal constraints, data produced by those protocols are stored as any othermedical data. Moreover, those stored data can be potentially used again, especially in researcb protocols where it is usual torerun some analyses.

3. THE ANALYSIS

3.1 A centralized archiving process

We planed to design a global archive solution instead of multiple local backup strategies which are cost uneffective,time-consuming and unperpetuate solutions. Only a federative choice can valorize the acquired data sets. First, patients areusually difficult to collect and many administrative assessments are necessary to run a research protocol. Therefore it isimportant not to loose such data sets and to control the archiving process. Second, as far as multi-modality studies areconcerned, the neighborhood of complementary data sets is obviously a practical advantage in many cases, for instance, toperform image registration. Third, a centralized archiving solution is easier to manage than several different and localarchiving tools. Such a system can be easily upgraded to increase the capacity and to follow the software releases. Finally, aglobal solution is of course more appropriate to perform an optimal security of the data.

3.2 A need for data indexation

To classify and properly store the whole data sets on the centralized archiving system, an adequate indexation wasnecessary. Efficient retrieval operations imply a uniform access to the data. To retrieve the data generated by the acquisitiondevices, a simple data index may be created using the acquisition source device and the data study number as it is defined atthe acquisition step by the acquisition software. To retrieve the data produced by analysis softwares, it is interesting to usethe two following main indexes : the protocol name and the subject identity.

3.3 Some additional constraints

We decided to archive the whole data set of each study whatever the acquisition device, i.e. all the files that are generatedby the acquisition process. For instance, such files are raw data, projections, reconstructed images and manufacturerdependent parameter files. The manufacturer-dependent data format is not translated during the archiving process. Data arearchived with their own native format. This choice is intended to prevent from any loose or transformation in the rawacquired data. Data may only be packed, i.e. files issued from the same study are sorted and grouped together.Since the data are archived in their native format, it is possible to push back the data towards the acquisition disk space inorder to be able to reintegrate those data in the local manufacturer database.Tools we had to design require to run within the network multi-architecture computer context in order to be run on Unixmachines, PC and Mac as well. We chose to design archiving tools through a Netscape browser to implement them in our

562


intranet environment. Finally it would be a great benefit to use the Java programming language which definitely enablesportability between multi-computer platforms through the Java virtual machine.The end-user, clinician as well as methodologist, requires to have a free and efficient access to the data he/she works with.Their need is twofold : first, tools allowing the start of the archiving process at any moment, second, the data sets need tobe online such that a fast access is provided. A long delay in the retrieval process or an operator dependence is not admitted.

3.4 Automatic archiving procedures requirements

Automatic data transfer procedures between the imaging devices and the archiving system needed to be developed.Systematic data transfer through the local area network, from the source location towards the archiving location wereassumed to be completely automatic. To take into account computer architecture differences, it was necessary to developone backup procedure per imaging device based upon a similar strategy.Those procedures were designed to scan acquisition disks, to list new acquired data to archive, to package those data and totransfer them through the network. Each imaging device procedure had to be started automatically and being permanentlyinstalled. In order not to disturb the data acquisition and image reconstruction processes, we planed to run archivingprocedures only the night after the acquisition date. This is the optimal moment when the network has a minime charge.Data flow speed was optimized too.

3.5 Quality control requirements

Many quality control processes require to be performed durjng the operations. They are required 1) to insure that data hayebeen successfully transfered and stored on the archiving media and 2) to verify that data after archiving process are exactlythe replicate of data before the start of the archiving process. It implies to introduce some automatic control steps during thearchiving process itself. It is also necessary to have a synthetic overview of the whole archiving process to be able to detecterrors and to correct them. When simultaneous archiving processes run together (one per imaging device), control testresults have to be summed up and easily readable to reinforce daily quality control operations.

3.6 The size of the data

Imaging research protocols usually produce large data volumes. Because in our lab, both anatomical and functional scannersare available, data sets tend to get ever larger. Tomographic acquisition mode naturally delivers a great number ofprojections. New 3D acquisition schemes generate data with a higher spatial resolution but imply a higher data volume too.Time sampling of dynamic phenomenon (imaging of a tracer fixation in the body) or repetitive cognitive tasks do increasethe volume of data. New storing media technologies, new acquisition methods or new analysis softwares can also infer asignificant increase of experimental data volume.We had to design a hardware solution being able to increase its storage capacity and to allow the software evolution. Wehad to anticipate the arrival of new acquisition devices, as well as new acquisition method or new processing method.However, it is impossible to prospect all the future schemes because they are linked to the evolution of the technology (typeand capacity of storage media for instance, integration of scanners in the same gantry) and to the new methods in dataanalysis or data acquisition. According to our experience, when trying to estimate today the required storage capacity for agiven period, the relative part of the integrated size of the data over this period is negligible compared to the estimated sizeof the data that will be produced every day at the end of the period. We proved that such an analysis was consistent in 1992when we estimated the storage capacity of our present archiving system to 170 GBytes. This first project is largelydescribed in this work. Our future estimation of the storage capacity for the next 5 years period leads to an estimation of200 TBytes.

4. TilE HARDWARE AND SOFTWARE IMPLEMENTATIONS

4.1 Life-cycle of a research protocol

During the life-cycle of a research protocol, two main types of events may happen:- the acquisition events produce data directly from imaging devices.- the analysis events process acquired data to produce derived data able to lead to significant results.Transition steps are Archiving/Retrieval steps which allow network data circulation between first the acquisition devicesand the centralized storage space (the IMASERV space), and second between the end-user workspace and the same storagespace (the SILO space).The following figure (Fig. 1) describes how Archiving/Retrieval needs are attached to a protocol life-cycle scheme.

563


LL1 iIrIL i

J

Fig 1 : Insert of archive/retrieve data steps in a typical data flow scheme

4.2 Materials

We chose commercially-proved software and hardware equipments to carry out the archiving project. This allowed tofacilitate the maintenance and the future technical evolution of such a system. To realize the specifications described insection 3 from a hardware point of view, we decided:1) to dedicate for this task a computer server of the network targeted by all the imaging acquisition devices,2) to have a high capacity storage volume.

From a software point of view, we had to decide between two technologies : the Hierarchical System Migration (HSM) oneor the systematic backup (BKP) one. HSM (or staging), refers to the movement of data between dedicated magnetic spaceand secondary storage devices, such as optical disks. BKP refers to the complete replicate of data on such secondary storagedevices. We chose HSM because it was Operating System independent and faster than BKP technology.

The archiving server is a dedicated Unix CPU configured to manage a 170 GBytes optical disk juke-box for archiving taskand an Exabyte juke-box for the network backup task. A 20 Gbytes magnetic disk space was designed as cache-disk of theoptical space. This disk space is physically divided into 1 or 2 Gbytes parts. Some parts are affected to the imaging devicesfor the archiving of acquisition data and the other ones are affected to the research groups for the archiving of processeddata. Each magnetic part is configured for migration towards the optical disk jukebox.The staging is controlled by the Epoch Migration Software which starts migration when predefined criteria are reached (lowand high watermarks exclusively). This allows the Epoch system to appear as an all-magnetic storage server that never fillsup.A Disaster Recovery procedure can be performed in the case of major problems to securize the data. Pools of specificoptical disks are affected for the backup of current optical disks. When filled up, they are put in a fire-proved cupboard.The storage server was upgraded to increase its local magnetic space because new acquisition devices were connected to thenetwork. Specific high capacity disk matrix has been added for this purpose.A commercial database system (Sybase system) is used to store information related to the description of the archived dataissued from the data analysis step.The Epoch archiving system and the Sybase database are the two main equipments which allow to develop suitablearchiving processes able to answer our needs.

564

processed dataarchiving space

acquisition dataarchiving space

— —---

_tFkVp

1st proc

—V

2dproc processing flow,-1'4

•;?

startresearch

protocol

— —4a n n.¶

) ) u>: : :c2Ce : CS

lstacq. 2dacq.

A'CS

AC)CS

acquisition flow


4.3 Data Indexation

4.3.1 The IMASERV projectA data index using the acquisition source name and device-dependent study number is involved to point out the data set tobe retrieved. For each acquisition device, the list of archived study numbers is sorted and presented. For each study number,the different files of each data set are available.

4.3.2 The SILO Project

To retrieve the data issued from processing softwares, a set of parameters which describe the origin of each data set can bepointed to. Two of them are considered as main indexes. There are the name of the research protocol and a list of subjectidentities from whom the data are acquired. Those parameters are adequate to target a unique data set.

4.3.2.1 The data model through the relational database

In this work, only the database developed to index data issued from the data analysis is described. Different informationlevels are dispatched in dedicated database tables (Fig. 2).A first group of tables is designed to receive the basic entities such as the end-user list, the research group list, the researchprotocol list, the imaging acquisition device list and properties (Magnetic Resonance acquisition mode and NuclearMedicine tracer identification).Then a second group of tables is required to describe the content of a protocol and the logical way data are indexed. Here isthe series of indexes that leads to data from the end-user point of view : Group/ProtocoL/Subject/Data.We defined the logical entities such as the archiving box and the archiving event. Those entities mimic usual tools tomanage collection of data.

4.3.2.2 The relations between the entities of the model

Relations between database tables (Fig. 2) may be separated in two groups. First, basic relations link the protocol with theexam and the subject entities. Each protocol is linked to a list of subject identities and each subject is related to one orseveral exams referenced by the study number delivered by the acquisition device itself. Second, specific relations describehow the archiving box is addressed. Each box is assumed to receive the data of a unique subject (potentially several exams).The end-user starts archiving events of those exams in the corresponding box. However, software developments werenecessary to push and pull data towards vngyt__

565

Fig 2 : Data model


4.4 The development of Application Programming Interfaces

4.4.1 The IMASERV project

This project includes the way to archive as well as how to distribute data generated by the acquisition devices.

4.4.1.1 Archive process

The scanner archive strategy is the same for each acquisition device. Nevertheless, there is only one archive procedure foreach of the eight acquisition devices daily stored. It is necessary to take into account the manufacturer dependent parameterswhich are the computer architecture differences, the data format differences, the data organization differences and theproduced data size differences.Data are automatically transfered during night using conventional or customized file transfer protocols. For each devicesystem, the archiving process (Fig. 3) is composed by three sequential steps.

Fig 3: Imaserv: The imaging device archiving strategy

1) an export procedure from the acquisition disk (disk A) towards a networked disk (disk B).2) a data repacking (disk B) and transfer procedure towards the magnetic storage space of the archiving system (disk C).3) a call to the Hierarchical System Migration technology of the Epoch system to free the magnetic space towards theoptical disk jukebox.

566

acquisitionimaging optical mediadevice library

disk A disk B disk C

A

step 1 step 2

local area network


step 1 step 2 step 3

Fig 4: Imaserv : The imaging device archiving steps

Step 1 is sometimes proposed by the imaging device manufacturers. In this case, it is only necessary to locally define theaddress of the networked computer which is targeted for the exportation process. Step 3 is automatically performed whenthe predefined watermarks are reached (section 4.2).When self-developed, step 1 is made of a list of specialized operations to be chained. They are described in Figure 4. Thethree main operations consist in 1) scanning the acquisition disk to search for new data, 2) comparing the free space of thenetworked disk with the size of data to be archived and 3) starting the transfer protocol. Some specific operations aresometimes added to take into account the characteristics of each imaging device. For instance, part of the data may bediverted to run a remote image reconstruction algorithm or, in the case of a local manufacturer-dependent database, part ofinformation useful to characterize the study needs to be extracted. At the end of step 1 , several bundles of data are collectedon disk B. Each bundle is attached to a unique study.

The purpose of step 2 (Fig. 4), running on disk B, is then to prepare data to make them look like their ultimate archivingform. If necessary, each study data set is sorted by experiment number, packed and compressed to build consistent data setssuch as the raw data set and the reconstructed data set. Eventually, those packaging files are renamed using exportingdatabase information. This allows to present a consistent-named data to the end-user. So prepared data are thenautomatically moved towards dedicated parts of disk C.

Many quality control processes are performed during the operations involving follow up of number of files, size andchecksum of each file before and after each archiving step. When quality tests are successful, the person in charge of theresearch protocol is notified to free the acquisition disk.

4.4.1.2 Retrieving process

The distribution of archived and reconstructed data is performed through a specialized software widely available on thelocal network from the whole set of terminals. An interface was designed 1) to list the whole archived data sets by imagingdevice and 2) to permit the retrieval of selected data. The directory location for retrieving and the image format for dataconversion have then to be notified before starting the retrieval process. All the image formats available on our site are

567

scanning of disk ADisk C declared forHierarchical SystemMigration technology

diskB datacompaction

disk B datacompression

NFS transfertowards disk C

remote— — — — reconstruction

data process

Vdatasize

optimization

data transferprotocoltowards disk B

'final opticalstorage location


proposed. The ANALYZE format (Mayo clinic) is proposed by default for each acquisition device because it is readable bymost of our analysis tools. But there are some soft-dependent image formats too. In any case, when needed, the nativemanufacturer-dependent data formats may be chosen to retrieve data.

Fig 5 : Imaserv : The retrieving process

When running, the retrieve process is a series of operations described in Figure 5.Decompression and unpacking steps areautomatically performed before chaining with one of the conversion format procedures available on the network.

One must notice that no individual information is stored in the IMASERV space. This service only delivers Bitstream files.Users have to refer to the acquisition device name and acquisition number to find their data. The only condition to accessthose data is to have a registered account on the local area network. We deliberatly chose to put the individual index in theSILO project. The added value of the processed data does legitimate this choice.

4.4.2 The SILO project

The SILO project describes the set of tools that were developed to solve the limited-space management of the end-userworkspace. It is partly filled up when retrieving data sets through the IMASERV API. Then data analysis softwares maypotentially create derived data such as parametric images, large sets of results files, different object types. .. Some deriveddata are intermediate files for which archiving is not required but some of them need to be kept to run some furtheranalyses.The SILO project is intended to archive only data issued from a research protocol. It means such tools are especiallydedicated to the clinicians. Data archived in an acquisition protocol context are shared only within a research group but areprivate for other research groups. An indexation capability is developed using the data model described in section 4.3.2.1 inorder to propose functionalities such as data archiving, data retrieving and data consulting. Those three main tasks areaccessed through suitable web interfaces.

4.4.2.1 Archiving process

As in the IMASERV project, four sequential steps are involved (Fig 6). Step 1 is composed of a unique operation : dataexportation. Step 2 performs user authentication, web form filling and database access. Step 3 performs data compressionon disk B, creates and names archiving boxes, and delivers appropriate access rights to transfer data towards disk C. Step 4is performed via HSM.

568

Fig 6 : Silo: The archiving strategy


From the end-user point of view, the archiving task is a two step-operation (Fig. 7). In step A, a request is made by theresearcher who must fill in a first web form in order to describe the data set he needs to archive. This information isautomatically posted by E-mail to the database administrator (Al) who inserts received information into the database (A2)and creates a new archiving box if necessary. When no inconsistency is found, the end-user receives a notification ofsuccess (A3). Then in step B, the archiving process (Fig. 7) is started for each chosen directory (and sub-directories) of theend-user workspace, after he has filled in a second web form. Only the main information useful to extract from the databasethe archiving box location is required (Bi). The web form contains too a free-comment paragraph. An event report isautomatically stored in the database with the archiving date (B2) and the comment. A detailed list of archived files is thenposted to the end-user (B3). He can then decide to free part of his workspace.

A

B2

database

,D2

A A A AA2 Bi

Cl C2 Dl

administraor

-I—D3

Al A3 B3

y

*.

Fig 7 : Silo : The archiving and retrieving time.events schemes

4.4.2.2 Retrieving process

The retrieving process is intended to import archived data sets. It is a three-step operation (Fig 7). First, the end-usernavigates through dedicated web pages where the main database indexes (research protocol name, subject identification) arelisted (Dl). Then a list of archive events, described by both the archive event date and a free comment, is proposed for eachsubject (D2). The choice of such an archive event allows to access directly the list of the stored files. The retrieve request isthen initialized when selecting a file and the transfer protocol is immediately started towards the given workspace location(D3).

4.4.23 Consulting task

Database consulting tools are performed through SQL requests (Cl), within the pool of data of a unique research group.They were designed with the help of the end-user to automate database queries. Some of them are intended to basically

569

archiving space

archive time-events scheme retrieve time-events scheme

Al : give informationsA2 : insert informationsA3 : reportB 1 : archiving requestB2 : archiving processB3 : files list report

Cl : requestC2 : give informationsDl : requestD2 : locationD3 : retrieving process


obtain an exhaustive list of studies acquired for a selected protocol and/or a selected imaging device and/or a selected tracer(C2). Such queries allow the end-user to easily analyze the number of studies, their occurence and the dispatching betweendevices inside the same protocol. Others are designed to extract from database the exhaustive identifications of volunteerswhich are listed for a given research group. Raw and processed image data of such a list can be shared within differentprotocols of the same group. This avoids to multiply unuseful volonteer acquisitions. Finally, a last type of query isdeveloped to extract the list of grouped analyses performed in a given protocol. In this case, pools of data corresponding to amulti-subject group can be easily identified.

5. THE OPERATING TASKS

5.1 The need for a management tool of the optical library

As described in section 4.2, the magnetic space declared for HSM, is divided into two large pools of disks dedicated toIMASERV API and SILO API respectively. Moreover, Epoch Softwares are intended to help for the management of storagemedia. However, intensive use of such a storage system to archive the image data flow of the SHFJ, showed such softwareswere inadequate to optimize the operation-control. No basic software tool was simply available to have an automatic andsure synthetic view of the optical library content. It was also impossible to anticipate the media circulation in order to limitthe number of operations.For this purpose, we develop API to manage some logical pools of disks defined in the next section. Such an applicationrelies upon an extension of the original concept developed by Epoch. It allows to have a synthetic view of the JukeBox andto automatically predict the disks to get in and out.

5.2 The routinely-used operations

5.2.1 The different categories of media

Optical media are grouped by category according to their use. A first pool of media is affected to the IMASERV and SILOapplications to receive migrated data from the magnetic cache-disk. A second one is affected to administration tasks such asdedicated Disaster Recovery Backup (DR) and Catalog Backup (Archive). A special category, the retrospective one, hasbeen created for both IMASERV and SILO Applications. A media is defined as a retrospective one when the date of thestored files is anterior to an arbitrary application-dependent date (one per imaging device for the IMASERV application andone per research group for the SILO application). Such a retrospective category allows the media turn-over by giving thedirect access to recently acquired data. It was also intended by the requirement of researchers to perform retrospectiveanalyses.Finally, such following rules have been decided to be integrated in the management tool.- each category of disk supports at less two states : a current one, i.e. currently-used, an available one, i.e. next-current.- anumber of media has been arbitrary decided for each category.

5.2.2 Implementation

Automatic scripts were developed to classify in above described media categories the detailed content of the library. Thepurpose is 1) to verify the adequation with the predefined numbering constraints and 2) to manage the retrospectiverequests. The results of analysis are then presented over dedicated web pages daily consulted by the administrator and thelist of media input/output operations is given. A validation step is performed after the manual operations to verify theadequation with criterions and to notify to both IMASERV and SILO applications, the successfull insert of retrospectivemedia.

Both the SILO and IMASERV interfaces clearly indicate through flags whether the stored file is located on a retrospectivemedia or not. When the retrieve mode is selected and a restrospective file is requested, the application automatically poststhe request to the administrator. The day alter, the administrator processes all the requests and inserts the retrospectivemedia into the library. The retrieve operation can then be successfully performed by the user through the interface.

570


6. CONCLUSION

In this work, we performed an analysis to specify the storage requirements of the SHFJ. This analysis led us to choose someappropriate hardware and software solutions.The hardware solution is based on a dedicated CPU with 60 Gb magnetic disk cache configured for migration towards a 170Gbytes optical media Juke Box and driven with the Epoch HSM software. A Sybase database was used to store informationrelated to the description of the archived data.The software solution is divided into two parts the IMASERV API whose purpose is to backup and distribute the datagenerated by the imaging acquisition devices and 2) the SILO API whose purpose is to archive and retrieve data issuedfrom data analysis.In our future work, we propose first to develop more portable interface using Java support and second to increase the storagecapacity. The present source codes could be easily extended.

571


Date post:	24-Jan-2023
Category:	Documents
Upload:	independent
View:	0 times
Download:	0 times

Archiving/retrieval of research medical images: a working solution

Documents