+ All Categories
Home > Documents > Ian McCrea Rutherford Appleton Laboratory Chilton, Oxfordshire, UNITED KINGDOM 25 May 2009 EISCAT-3D...

Ian McCrea Rutherford Appleton Laboratory Chilton, Oxfordshire, UNITED KINGDOM 25 May 2009 EISCAT-3D...

Date post: 01-Jan-2016
Category:
Upload: alexander-leonard
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
19
Ian McCrea Rutherford Appleton Laboratory Chilton, Oxfordshire, UNITED KINGDOM 25 May 2009 EISCAT-3D Data System Specifications and Possible Solutions
Transcript

Ian McCreaRutherford Appleton Laboratory

Chilton, Oxfordshire, UNITED KINGDOM

25 May 2009

EISCAT-3D Data SystemSpecifications and Possible Solutions

Title here 2

Principles

EISCAT-3D is very different to EISCAT Much more low-level data

Continuous operation, unattended remotes

Interferometry as well as standard IS

Lots of supporting instruments

Store data at the lowest practical level Analysis can be done direct from samples

Any pre-processing reduces flexibility

A wide range of appliations and techniques

Data volumes are very large Can’t store lowest level data forever

Keep them until they are “optimally processed”

Keep a set of correlated data forever (as now)

Title here 3

Types of Data

Incoherent Scatter Continuous, complex, amplitude-domain data

Two polarisation streams/beam

80 MHz sampling at 16 bits

Bandpassing, but limit set by modulation bandwidth

Interferometry Continuous on limited number of baselines

Don’t record if nothing happening….but need ability to “backspace” and “run on”.

Save data until optimum brightness function is made and transferred to archive

Supporting Instruments EISCAT-3D will attract many supporting instruments, using same data system

Some data sets big (e.g. imagers) but not always interesting

Suitable for mixture of short-term buffer and permanent archive

Title here 4

Types of Archive

Ring Buffer High volume (~100 TB) short duration (hours to days)

Data accumulate constantly – oldest data over-written

Records IS data and interferometry when events detected

Needs to record latent archive data in event of network outage

Interferometry System Small storage area (~100 GB), holds only the past few minutes of data

Data accumulate constantly, and tested against threshold

If event detected, divert to ring buffer (for backscpacing) otherwise delete

Permanent Archive Large capacity (~1PB) permanent archive

Mid and high-level data @ 200 TB/year

Tiered storage, connected to multi-user computing facility

Title here 5

Data System Overview

Title here 6

Some data rates…..

• Lowest-level data– 80 MHz, 16 bits, 2.56 Gb/s/element, 4x1013 b/s (!)

– Impossible to store

– Combine by group (49 antennas) then into <10 beams

– Each beam 25 TB/day (still the same order as the LHC)

• Central site– Only one signal beam (because of transmitter)

– Calibration beam(s) will be small data volume

– Approx 1 TB/hour (320 MB/s)

• Remote sites– 5-10 beams, but intersection limited

– Challenge is of same order as central site

– Need identical ring buffers at all sites

Title here 7

Band-passing and Bandwidths

• 80 MHz sampling oversamples a 30 MHz band– But not all of this contains data…

– N*Ion lines + 2N*plasma lines

– Ion lines ~ 50 kHz, plasma lines ~ 100 kHz.

– Seems that we can bandpass….

• Bandpassing limited by modulation bandwidth– Convolution of backscatter spectrum and pulse spectrum

– Shorter pulses/bauds have higher bandwidth

– Some codes at EISCAT have 500 kHz mod. band.

• Bandpassing depends on coding..– Worthwhile to bandpass for standard codes

– But we need an algorithm to set the pass bands…

– Design for worst case (no bandpass)

– Don’t forget interferometry….

Title here 8

Higher-Level Data Rates

Interferometry Data 19 modules tested (202 MB/s, 17 TB/day)

But maybe only 5% of samples above threshold

Five minutes of data = 60 GB (lead-in)

Permanent Archive Continue to store lag profiles

Ability to store limited raw data

Data from supporting instruments

Current archive growth is a few TB/year

We want better time/range resolution

Allow archive to grow at > 1 TB/year

Title here 9

Supporting Instruments

Examined data rates from variety of instruments Other radars (coherent scatter, meteor)

Lidars and advanced sounders

Passive optical (high-resolution imagers)

Radio instruments (riometers, VLF, GPS)

Magnetometers

Advanced instruments at central site only Remote sites unattended, therefore

No instruments needing manual intervention

No huge data sets allowed

Data volumes can still be large High-resolution cameras can produce 100s of GB/day

But not all of the data are interesting….

Design for 150 GB/day at central site, 30 GB/day at remotes.

Title here 10

Approach to Vendors

Ring Buffer and Central Archives very different Need completely different technical solutions

Interferometer is a subset of the ring buffer Same problem, but smaller data sets

Based on data rates, produced specifications Two questions for manufacturers:

Are requiremements achieveable now ?

What kind of technologies can be used to achieve them ?

Manufacturer approaches began at Storage Expo 2007 ~20 companies contacted

Significant discussions with ~10

Title here 11

Specifications: Ring Buffer

Minimum of 56 TB short-term storage IS and interferometry both produce ~ 1 TB/hour

~2 days of full-bandwidth ISR data

~2 weeks of bandpassed data

Several months of high-level data (weather latency)

4 input, 8 output channels @ 160 MB/s Somebody needs to read these data !!

38 input, 38 output channels @ 6 MB/s Less demanding things, including monitoring

Identical systems at central site and remotes

Power draw < 300 kW

System management, monitoring tools, warranties

Title here 12

Specifications: Central Archive

1 PB of usable initial storage Initially a five year archive

400 TB on line

600 TB “near on-line” e.g. tape library

Extensible at 200 TB/year

Input 300 MB/s over all channels > 20 TB/day – allows fast filling

Output 1 TB/s over all channels 100 output channels

Assume we have lots of simultaneous users

2 high-volume output channels

Power draw <300 kW

System management, monitoring tools, warranties

Title here 13

Solutions: Ring Buffer

Two solutions for multiple input channels Each channel separate, lots of channels

Multiplex into a few high-rate channels

Second solution probably better

e.g. 20 channels, multiplexed into 8 links of 6 GB/s.

Multiple drives, multiple enclosures 10 drive enclosures

Each enclosure 50% filled (300 x 300 GB SAS drives)

Resulting capacity 90 TB (72 TB directly usable)

Expansion and degradation Half-filled cabinets allow expansion

RAID6 allows graceful degradation

Power draw and temperature range within spec.

Title here 14

Solutions: Central Archive

Central site only Can be a staffed system, easier maintenance

Mix of on-line and near-on-line

User access should look immediate, even for historic data

MAID – Massive Array of Idle Disks Large disk arrays with “sleep mode”

Spin down if unused for given time (5-330 mins)

Example system: 1.2 PB archive 24 GB/s bandwidth over 20 channels, 20 TB/hour backup

RAID 6 graceful degradation, modular “hot swap” components

Single system supports 1200 disk drives

Title here 15

Network Issues

Big issue is data transfer from the remotes 1 beam is 320 MB/s, remotes will have multiple beams

Supporting instruments add ~30% overhead

Need to recover from interrupts quickly Otherwise we may never catch up

Interrupts might last days/weeks

Fast links already practical Protocols for 10 GB/s links exist already

We may need to provide some of the networking…

A back-up option is needed if the network fails Something to tell us if the site is alive

…and how cold it is……

Options are mobile phone, satellite, microwave link…

Title here 16

Some Other Ideas

• Project Blackbox

• Containerised data centre

• Transports on a truck

• 1.5 PB disk storage

• 7 TB memory

• 250 servers, 2000 cores, 8000 threads

• When full, drive back to HQ and put in another…

• Turns out to be a very high-bandwidth solution…..

• …provided you can integrate rates over time

Title here 17

Visualisation

What kinds of visualisation will we need ? This work transferred from RAL to UiT

Carried out by Bjorn Gustavsson

Learn from other radars AMISRs already have same problem

Jicamarca has lots of imaging software

Software is open-source and adaptable

Allow users to bring their own routines Develop an open source library

Make full use of cuts, movies etc.

Don’t try to be too smart The human brain can only interpret so much !

Title here 18

Risk and Return

Some questions to consider:

Should we provide a data system like this ourselves ?

Why not let a commercial data centre do this ?

Should we explore other funding sources for this ?

The EU e-infrastructures programme

What’s the cost/benefit between the data system and (say) an extra site ?

What about metadata and services – almost forgotten so far ?

Title here 19

Summary and Conclusions

Based on an analysis of expected performance and data rates of the new radar, we proposed a data system with three distinct elements:

Cyclic buffers (short duration for low-level data)

Interferometry (allows ability to back-space to start of event)

Permanent Archive (ulimate home of all summary data and centre of user analysis)

Data system also handles supporting instruments

Storage, I/O and other specifications were put on all system elements and discussed with vendors:

Data volumes and rates are challenging, but can already be handled now

Appropriate systems can be had to provide functionality we need

Implementation depends on many things (funding, politics, scientific priorities etc.)

These will be better defined during next phase of the project


Recommended