TDP calibration and processing group (CPG): activities and status
Athol Kemball (for CPG)University of Illinois at Urbana-Champaign
US SKA May 2008
Current CPG membership
• Athol Kemball (Illinois) (Chair)• Geoff Bower (UCB)• Jim Cordes (Cornell; TDP PI)• Joe Lazio (NRL)• Colin Lonsdale (Haystack/MIT)• Steve Myers (NRAO)• Jeroen Stil (Calgary)• Greg Taylor (UNM)
Calgary.... .
. .Cornell
NRL
UIUC MIT
NRAO
UCB UNM
US SKA May 2008
Calibration and processing challenges for the SKA
(LSST)
SKA-era telescopes & science require:• Surveys over large cosmic volumes (Ω,z),
fine synoptic time-sampling Δt, and/or high completeness
• High receptor count and data acquisition rates
• Software/hardware boundary far closer to receptors than at present
• Efficient, high-throughput survey operations modes
Processing implications• High sensitivity, Ae/Tsys~104 m2K-1, wide-
field imaging;• Demanding (t,ω,P) non-imaging analysis• Large O(109) survey catalogs
High associated data rates (TBps), compute processing rates (PF), and PB/EB archives (HI galaxy surveys, e.g. ALFALFA HI
(Giovanelli et al. 2007); SKA requires a billion galaxy survey.)
US SKA May 2008
SKA design & development phase (2007-2011)
SKA science requirements
SKA technical specifications, e.g. sensitivity, survey speed, time, spectral, and angular
resolution, etc.
SKA reference design, e.g.
receptor, array configuration, receptor and
receivers, signal transport and
correlation, cal & processing, data
management
Antenna design elements e.g.
receptor, receivers,
antenna etc..
Calibration and processing
design elements e.g. calibration and imaging algorithms,
scalability, etc..
Design drivers
• Hardware design choices will define calibration and processing performance (e.g. dynamic range), cost, and feasibility.
• In turn, need to identify calibration and processing constraints on hardware designs.
US SKA May 2008
Calibration and processing design elements
CPG goals• Determine feasibility of calibration and
processing required to meet SKA science goals;
• Determine quantitative cost equation contributions and design drivers as a function of key design parameters (e.g. antenna diameter, field-of-view, etc)
• Measure algorithm cost and feasibility using prototype implementations
• Demonstrate calibration and processing design elements using pathfinder data.
US technology development emphases• Large-N small-diameter (LNSD) parabolic
antenna design, wide-band single-pixel feeds, mid to high frequency range.
• Close liaison with current international and national efforts.
PrepSKACDIT
TDP project
CPG – calibration and processing
groupPrepSKA WBS 2
(ATA)
US SKA May 2008
• 08/07: Initiate workplan development for CPG. Work ramps up.
• 09/07: PrepSKA collaboration discussions.• 11/29/07: Face-to-face CPG meeting for in-depth workplan
refinement. • 12/15/07: Finalize CPG project execution plan as part of
general TDP project execution plan.• 01/08: URSI LNSD session.• 04/08: SKA CALIM conference Perth.• Q1-Q3/08: Hire CPG postdocs at UCB, MIT, & UIUC.
• …
CPG immediate future steps
November 2007
CompletedCompletedCompleted
Completed 01/08CompletedCompleted Completed In process
US SKA May 2008
CPG activity timeline Oct 07 to present
OCT NOV DEC JAN FEB MAR APR MAY JUN JUL OCT NOV
CPG formation
Project execution plan
Execution plan implementation
F2F planning meeting (Urbana)
URSI 2008 LNSD session
CPG meeting
CPG F2F meeting (Perth)
SKA CALIM 08
CPG F2F meeting (Washington DC)
US SKA May 2008
CPG organization and coordination
• Targeted design and development research working group, not production software development for SKA.
• Prototype development will be software-package neutral, i.e any package allowing the research task can be used.
• Close liaison with current international and national efforts.
• Regular schedule of face-to-face meetings and telecons
• All results aggregrated upwards into CPG Project Book; template defined. Intermediate results in memo series.
• Internal mailing list, collaborative workspace, progress tracking wiki, and document repository.
(Internal collaborative workspace: progress tracking and communication)
US SKA May 2008
CPG engagement and partnerships
• Community web-site launched to publicize intermediate results and activities
• Engagement with:– National pathfinders (e.g.
ATA, LWA, MWA, EVLA)– National center (NRAO)– Canadian efforts (Russ
Taylor/Jeroen Sil [Calgary])
– International pathfinders:• MeerKAT• ASKAP
– PrepSKA– Computer science &
computer engineering groups
(http://rai.ncsa.uiuc.edu/SKA/RAI_Projects_SKA_CPG.html)
US SKA May 2008
CPG project execution plan
CPG work breakdown structure • WBS 2.0: General• WBS 2.1: Signal transport• WBS 2.2: Calibration algorithms• WBS 2.3: Imaging, spectroscopy, &
time-domain imaging• WBS 2.4: Scalability, & high-
performance computing• WBS 2.5: RFI• WBS 2.6: Surveys• WBS 2.7: Data management
Cross-cutting goals• LNSD feasibility:
– e.g. dynamic range error budget• LNSD cost equation contributions (per
calibration and processing technology)
OVERALL TDP GOALS
WBS 2 CAL & PROCESSING GOALS
1. Feasibility
2. Cost equation contribution
3. Design driver identification
4. Pathfinder demonstration
Calibration & processing issue #1
Calibration & processing issue #2
Calibration & processing issue #3
Calibration & processing issue #4
… XDesign parameters (e.g. diameter, mid-range upper cutoff frequency, amongst others)
Calibration and processing issues Design parameters
PrepSKA/CDIT SKA pathfinders
US SKA May 2008
Primary CPG deliverables
• CPG work breakdown structure made up
of prioritized calibration and processing technologies that are central to SKA LNSD design.
• Key cross-cutting milestones are feasibility and cost assessments as envelope of design parameters (e.g. antenna diameter) and key science goals.
• Feasibility and cost model release planned annually; successively refined based on research results.
Calibration and processing
design elements e.g. calibration and imaging algorithms, scalability.
Feasibility relative to science goals
Cost equation contributions and design drivers
Prototyping and demonstration with pathfinder data
US SKA May 2008
Feasibility: imaging dynamic range
Richards 2000 HDF VLA 1.4 GHz 7.5 μJy
Norris et al 2005 HDF-S ATCA 1.4 GHz 10 μJy
Middelberg et al
2008
ELAIS I ATCA 1.4 GHz < 30 μJy
Miller et al 2008 E-CDF-S {E}VLA 1.4 GHz 6.4 μJy
Reference specifications (Schillizzi et al 2007)• Targeted λ20cm continuum field: 107:1.• Routine λ20cm continuum: 106:1.• Driven by need to achieve thermal noise limit
(nJy) over plausible field integrations.• Spectral dynamic range: 105:1.• Current typical state of practice near λ ~ 20
cm given below.
(de Bruyn and Brentjens, 2005)
High-sensitivity deep fieldsNoordarm et al
1982
3C84 WSRT 1.4 GHz 10,000:1
Geller et al 2000 1935-692 ATCA 1.4 GHz 77,000:1
de Bruyn &
Brentjens 2005
Perseus WSRT 92 cm 400,000:1
de Bruyn et al,
2007
3C147 WSRT 1.4 GHz 1,000,000:1
Dynamic range
US SKA May 2008
Feasibility: imaging dynamic range budget
Visibility on baseline m-n
Visibility-plane calibration effect
Image-plane calibration effect Source
brightness (I,Q,U,V)Direction
on sky: ρ
Basic imaging and equation for radio interferometry (e.g. Hamaker, Bregman, & Sault et al. 1996):
Key contributions• Robust, high-fidelity image-plane (ρ) calibration:
– Non-isoplanatism.– Antenna pointing errors.– Polarized beam response in (t,ω), …
• Non-linearities, non-closing errors• Deconvolution and sky model representation limits• Dynamic range budget will be set by system design
elements.
(Bhatnagar et al. 2004; antenna pointing self-cal: 12µJy => 1µJy rms)
US SKA May 2008
(Cornwell et al. 2006: example of 1.4 GHz edge effect at 2% PB level)
US SKA May 2008
SKA dynamic range assessment – beyond the central pixel• Current achieved dynamic ranges degrade significantly with radial projected distance from field center, for reasons
understood qualitatively (e.g. direction-dependent gains, sidelobe confusion etc.)• An SKA design with routine uniform, ultra-high dynamic range requires a quantitative dynamic range budget.• Strategies:
– Real data from similar pathfinders (e.g. ATA, EVLA) are key.– Simulations are useful if relative dynamic range contributions or absolute fidelity are being assessed with simple
models.– New statistical methods:
• Assume convergent, regularized imaging estimator for brightness distribution within imaging equation; need to know sampling distribution of imaging estimator per pixel, but unknown PDF a priori:
• Statistical resampling (Kemball & Martinsek 2005ff) and Bayesian methods (Sutton & Wandeldt 2005) offer new approaches.
Feasibility: dynamic range assessment
( )S ( )S
US SKA May 2008
Monte Carlo reference
variance image
MODEL-BASED BOOTSTRAP RESAMPLING EXAMPLE
Np=1; Δt = 60 s Np=1; Δt = 150 s
Np=1; Δt = 300 s Np=2; Δt = 900 s
US SKA May 2008
WBS 2.3.1: Cost equation: wide-field image formation
Algorithm technologies• 3-D transform (Perley 1999), facet-based tesselation / polyhedral
imaging (Cornwell & Perley 1992), and w-projection (Cornwell et al. 2003).
(Cornwell et al. 2003; facet-based vs w-projection algorithms)
US SKA May 2008
• LNSD data rates (Perley & Clark 2003):
where D = dish diameter, B = max. baseline, Δν = bandwidth, and ν = frequency• Wide-field imaging cost ~ O(D-4 to -8) (Perley & Clark 2003; Cornwell 2004; Lonsdale et al
2004).• Full-field continuum imaging cost (derived from Cornwell 2004):
• Strong dependence on 1/D and B. Data rates of Tbps and computational costs in PF are readily obtained from underlying geometric terms.
• Spectral line imaging costs exceed continuum imaging costs (further multiplier )• Possible mitigation through FOV tailoring (Lonsdale et al 2004), beam-forming, and
antenna aggregation approaches (Wright et al.)
– 550 GBps/na2 (Lonsdale et al 2004)
• Runaway petascale costs for SKA tightly coupled to design choices
WBS 2.3.1: Imaging cost equation contributions
t
NNN
TBps
V antantchanvis
1210
)1(20
2
1410~D
NB
TBps
Vvis
1500273.02
2 2103.22
D
B
ant D
BN
PF
C
0
500
1000
1500
2000
D=12.5m, B=5km
D=12.5m,B=35km
D=6m, B=5km
TB per hour
0
2
4
6
8
10
D=12.5m,B=5km
D=12.5m,B=35km
D=6m,B=5km
Peak PF
chanN
US SKA May 2008
WBS 2.4: Scalability
Inconvenient truths
• Moore’s Law holds, but high-performance architectures are evolving rapidly:– Breakpoint in clock speed evolution
(2004)– Lateral expansion to multi-core
processors and processor augmentation with accelerators
• Theoretical performance ≠ actual performance
• Sustained petascale calibration and imaging performance for SKA requires:– Demonstrated mapping of SKA
calibration and imaging algorithms to modern HPC architectures, and proof of feasible scalability to petascale: [O(105) processor cores].
– Remains a considerable design unknown in both feasibility and cost.
0
20000
40000
60000
80000
100000
10 TF 100 TF 1 PF
No processors
(Golap, Kemball et al. 2001, Coma cluster, VLA 74 MHz, parallelized facet-based wide-field imaging)
US SKA May 2008
NSF support for open petascale computing
US SKA May 2008
WBS 2.4: Scalability
Fastest current
NCSA system
(abe.ncsa.uiuc.
edu*)
Generic
petascale
system
Peak
performance0.090 PF 10-20 PF
Number of
processors9,600 300,000-
750,000
Amount of
memory0.0096 PB 0.5-1.0 PB
Disk storage 0.10 PB 25-50 PB
Archival
storage0.005 EB 0.5-1 EB
(Dunning 2007)
*Abe: Dell 1955 blade cluster– 2.33 GHz Intel Cloverton Quad-Core• 1,200 blades/9,600 cores• 89.5 TF; 9.6 TB RAM; 170 TB disk– Power/Cooling• 500 KW / 140 tons
US SKA May 2008
WBS 2.4.5: Computing hardware cost models
• Computing hardware system costs
vary over key primary axes:– Time evolution (Moore’s Law)– Level of commoditization
0
50
100
150
200
250
300
GPU 500GF CPU 100-1000GF CPU 100-1000TF
$1000 per TF
Commoditization effects in computing hardware costs models for general- purpose CPU and GPU accelerators at a fixed epoch (2007). Estimated from public data.
Moore’s Law for general-purpose Intel CPUs.
Trend-line for Top 500 leading-edge performance.
US SKA May 2008
• Predicted leading-edge LINPACK Rmax performance from Top 500 trend-line (from data tyr = [1993, 2007]):
• Cost per unit teraflop cTF(t), for a commiditzation factor η, Moore’s Law doubling time Δt, and construction lead time Δc:
[with cTF(t0) = $300k/TF, t0 = 2007, η = [0.3-1.0], Δt ~ 1.5 yr, Δc ~ 1-4 yr]
WBS 2.4.5: Computing hardware cost models
0
20
40
60
80
100
2011 2012 2013 2014 2015 2016
Predicted Rmax (PF)
)1993(6217.0max 05555.0
yrteTF
R
0( )ln(2)
0( ) ( )t t
tTF TFc t c c t e
0
50
100
150
200
250
300
350
400
1 PF(2012)
10 PF(2012)
7.5 PF(2016)
90.1 PF(2016)
Approximate projected costs ($M)
US SKA May 2008
• Facility parameters:– One PF sustained requires tens MW; O(104) sq. ft.– Green innovations essential, will likely be mandated in
US by law:• Current US data centers 61bkWh• Will double by 2011; peak 12 GW, $7.4b per year
electricity cost• Software development costs (Boehm et al. 1981):
where β ~ ratio of academic to commerical software construction costs (~ 0.3-0.5); can mitigate through re-use (see adjacent)
• LSST computing costs ~25% of project; order of magnitude smaller data rates than SKA (~ tens of TB per night).
WBS 2.4.5: Related computing cost components
NCSA Petascale Computing Facility (20,000 ft2 machine room; chilled water with free cooling 6/12 months)
1.05
2.41000
COST Lines of code
FTE month
(Kemball et al., 2007, “A component-based framework for radio-astronomical imaging software systems”, Software: Practice & Experience, 38 (5), 493-507)
US SKA May 2008
• CPG work plan continues per project execution plan.• Q2-Q3/08: Hire CPG postdocs at MIT, & UIUC.• 08/08: URSI GA 2008 (presentations and associated CPG
meeting)• 10/08: First release of cost-feasibility LNSD model• …
CPG upcoming activities