Research
Life Cycle
Acquire
Plan
Analyse
Access Collaborate
Manage Archive
Publish Reuse
Data
HPC Cloud Virtual labs
Dataset transfer Databases Web-based file sharing Collaborative sites
Automated ingest and management
RDM support
Technical advice Costing Grant assistance
Comprehend
Visualisation facilities
Institutional repository
Emerging Researcher Series
What is eResearch?
Thursday, 9 February 2017
Research Liaison Jason van Rooyen, PhD
Overview
1. Research data and generators
2. Why eResearch?
3. Research challenges 3. Service catalogue
4. Summary
http://datablog.is.ed.ac.uk/files/2013/12/bitsissue8_2.png
Emerging Researcher Series #1 9 Feb 2017
Research Data and Generators
Emerging Researcher Series #1 9 Feb 2017
Data origins
Emerging Researcher Series #1 9 Feb 2017
Data role-players • Producers:
• Student/researcher • Facilities • Downstream and re-analyses
• Consumers:
• Lab/group • Collaborators • Discipline / Community
• Managers:
• IT • Data managers / libraries • Administration
• …….((((((Owners))))))……. researchers libraries
admin
collaborators
IT
data scientists
platforms/facilities
community
P
M C
Emerging Researcher Series #1 9 Feb 2017
Data’s value to the community
• Intrinsic value: • Evidence • IP/commodity • Productivity metrics
• Drivers for sharing data:
• Validation • Re-use • Publicity
• Community and journal standards • Funding agency mandates • Innovation regulations
Top-down
Bottom-up Findable Accessible Interoperable Re-usable
DATA
Emerging Researcher Series #1 9 Feb 2017
Types of researchers • Researchers differ in:
• Resources • Skills and experiences • Risk appetites
“Rock Star” researchers
• Scarce • Well-funded • Prioritised • Large skilled teams • Early adopters / innovators • Risk takers • Networked
“Long Tail” researchers
• Under-resourced • Sometimes isolated • Cost ($ & minutes) / risk averse • Abundant
http://vignette1.wikia.nocookie.net/walkerjourn515/images/6/60/Rogers_adoption_curve_deaderick_version.png
Emerging Researcher Series #1 9 Feb 2017
Fields / Labs / Groups / Units all differ in capacity • Types of groups
• Academic groups / solitary PIs • Facilities • Programmes
• Staffing structures:
• Students • Research staff • IT/data engineers • Programmers
• Infrastructure differences
• Desktops • Server rooms
Jeannie T. Lee, M.D., Ph.D. Professor of Genetics and Pathology, Harvard Medical School
Lee Lab:
different needs
Emerging Researcher Series #1 9 Feb 2017
Why eResearch?
Emerging Researcher Series #1 9 Feb 2017
Why eReseach?
Pace and scale increasing
New tools and methodologies Internationalisation
Research Revolution
Emerging Researcher Series #1 9 Feb 2017
Who is eResearch UCT?
UCT eResearch partners with research groups to accelerate and transform research, connecting them to the most appropriate services to support the research
life cycle.
• New research strategy (2015-2025) • Research life cycle:
− Forecasting and grant
writing − Data collection − Analysis and computation − Publication − Data management − Sharing & collaboration − Profile-raising
Emerging Researcher Series #1 9 Feb 2017
ICTS Engineers Research support Project management
Research Office: Communications
Libraries: Digital/Data services Digital scholarship
Emerging Researcher Series #1 9 Feb 2017
Research Challenges
Emerging Researcher Series #1 9 Feb 2017
Managing volumes - challenges at scale Lee Lab: • Movement
• from processing to storage
• Findability and recoverability • context (metadata)
• Privacy & access
• Infrastructure
• Local, central • Support, admin • Backup, security
• Education in best practise and tools
• Costs
• Consumer/enterprise, lifecycle
Emerging Researcher Series #1 9 Feb 2017
Managing volumes @ UCT Lee Lab:
0
50
100
150
200
250
300
350
400
2014/09/18 2014/12/27 2015/04/06 2015/07/15 2015/10/23
TB
• Sources: instruments, processing, collaborations
• 400 TB allocated
• Average allocated vs. provision ration 2:1
• Current rate 40 TB/m provisioned
• 90 TB fast parallel storage on HPC (fhgfs)
Uptake Rate
Storage Provisioned arceibo (74 TB)
CASA (74 TB)
SATVI (70 TB)
astronomy
Emerging Researcher Series #1 9 Feb 2017
Managing volumes – data deluge Lee Lab:
Field Technique Data rate Geomatics Laser scanning ~ 4 TB / year Neurosciences MRI ~ 5 TB / year
Biosciences Next-Gen Sequencing > 10 TB / year Biophysics Direct electron detectors TEMs > 200 TB / year Super-resolution microscopes > 1 PB / year
Emerging Researcher Series #1 9 Feb 2017
Sharing data - challenges Lee Lab: • Managing access permissions
• Who and how
• Internal vs. external collaborators
• Privacy and POPI act
• Small vs. large • Tools • Firewalls • Bandwidth
• Costs
• Bandwidth • Importance vs. other traffic
• Licence fees
//researchdata
Emerging Researcher Series #1 9 Feb 2017
Sharing data @ UCT Lee Lab: • Data is shared with:
• Project members • Collaborators
• Internal & external • Local & international
• Journals & repositories • Using array of tools
0.00
10.00
20.00
30.00
40.00
50.00
60.00
Sep-14O
ct-14N
ov-14Dec-14Jan-15Feb-15M
ar-15Apr-15M
ay-15Jun-15Jul-15Aug-15Sep-15O
ct-15N
ov-15Dec-15Jan-16Feb-16M
ar-16Apr-16M
ay-16Jun-16Jul-16Aug-16Sep-16O
ct-16N
ov-16
Tera
byte
s Tra
nsfe
rred
Month
ARC Globus Endpoint
heinedej#ARC-Ubuntu
heinedej#H3ABioNet
heinedej#eResearchUCT
+ 140 TB to Wits
Emerging Researcher Series #1 9 Feb 2017
Efficient analyses - challenges Lee Lab:
• Appropriate infrastructure • Local, central, cloud
• Staff
• Time & skills • Hardware & systems • Training students
• Managing and storing processing results
• Standardizing workflows
• Resourcing / sustainability
• Costs for start-up • Lifecycle & upgrades • Seed funding
• Suitability of central compute resources
• Allocations, system, support
Emerging Researcher Series #1
9 Feb 2017
Efficient analyses @ UCT Lee Lab:
2011 2015
http://hpc.uct.ac.za/ • HPC @ UCT: • Inception in 2009 • 5-fold expansion to 1 450 cores (end
2013) -> exponential increase in usage • GPU servers • Community has consumed 12 million
compute hours
• VMs • ± 40
• ARC
• 15 compute nodes • 256GB of RAM per node • 360 processing cores • Over 400TB storage • NW - 500TB object storage
Emerging Researcher Series #1 9 Feb 2017
Archiving and preserving - challenges
• Deciding what to keep • triage –raw, processed, versions
• Metadata
• Which metadata to keep • How to keep associated with data
• Replication, vs. backup vs. archiving
• Best systems • Infrastructure
• Sustainability
• Ownership of data • Long-term costs
Emerging Researcher Series #1 9 Feb 2017
Publishing data – challenges Lee Lab: • Staying compliant :
• Agencies / owners • Deriving the most value for investors
• Deciding what to share
• Tension between competitiveness and
openness (patents)
• Where to put the data and how to fund it?
• Sharing large data
• Tracking impact, attribution, and proving compliance.
Emerging Researcher Series #1 9 Feb 2017
Supporting data-intensive research with ICT Lee Lab:
• Increased connectivity of
researchers • Security vs. convenience • Policy challenges (data
ownership)
• Sustainability • Charge model or subsidy • Brokerage (connecting to
competitors) • Seed funding
Imag: permabit.com/data-affordability-gap/
Emerging Researcher Series #1 9 Feb 2017
Ideal Services for a Modern University
Emerging Researcher Series #1 9 Feb 2017
Data management and planning Lee Lab:
• Planning assistance
• Costing • DMP team and tool • Funder guidelines • Data policy
• Acquisition / ingest
• Tools (iRods/MyTardis) • Support
• Training
• Compliance monitoring
• eRA integration
• Institutional repositories
• Collaboration spaces
Emerging Researcher Series #1 9 Feb 2017
Data storage Lee Lab:
• Convenient
• User / IT • Easily accessible • Shareable
• Applicable
• Fast HPC • Archival • Open
• Secure • Backups • Private
• Scalable
• Tiered storage
• Affordable • Graduated costs
Emerging Researcher Series #1 9 Feb 2017
Data movement Lee Lab:
• Intuitive
• Non-sysadmins
• Scale appropriate
• Convenient • One-sided?
• Optimized transfers
• DMZs • Scheduled
• Performance monitoring
• Sustainable service
• Impact on network • Costs
Wikipedia
Emerging Researcher Series #1 9 Feb 2017
Data analysis • Suitable
• Scale (cores & memory) • Flexible (efficiency) • Fit for purpose (service, HPC, big
data)
• Supported • Admin • Porting • Teaching
• Permit learning
• Free allocations • Suitable rights
• Integrated
• Storage • Sharing services
• Governed • Flexible • Transparent • Accommodation for
collaborations and groups
• Shareable • Common workflows • Group permissions
Emerging Researcher Series #1 9 Feb 2017
Enabling Open Data
Lee Lab:
Emerging Researcher Series #1 9 Feb 2017
• Assistance with research data management (RDM):
• RDM policy • Funder guidelines • DMPonline • Guidelines for depositing data • Guidelines for sharing data
• Implementation of preservation infrastructure
• Preservation of research data via UCT Libraries preservation infrastructure, Archivematica
• Storage of research data via storage facilities at ICTS • Dissemination, access and reuse via UCT online repository
• Institutional repository
Training and education Lee Lab:
• Data science
• HPC • Data analytics courses • Data carpentry • Digital humanities
• Data management
• Library carpentry
• Scientific software development • Software carpentry
• Sysadmins
• Research IT
• Storage, network, compute, cloud
Emerging Researcher Series #1 9 Feb 2017
Data visualisation
• Interrogation • Scale &
resolution • Immersion & 3D
• Collaboration
• Outreach
VR
Visualisation wall
Digital Dome
Emerging Researcher Series #1 9 Feb 2017
Summary Why eResearch?
To accelerate outputs and competitiveness in support of UCT’s research agenda
How do I get hold of eResearch? [email protected] or www.eresearch.uct.ac.za
What do eResearch services cost? Our cost model is available on the website at: http://www.eresearch.uct.ac.za/billing-model
Can staff and students both make use of eResearch services? Absolutely, if you are a researcher you can work with eResearch
Do you work with individual researchers or only communities? We prefer to work with communities of researchers because in this way our efforts have the greatest impact for the least cost
Do you work with Humanities and Social Sciences, or only with Sciences? We are happy to assist any researcher
Emerging Researcher Series #1 9 Feb 2017
Questions ?
Emerging Researcher Series #1 9 Feb 2017
https://tinyurl.com/ztoug6s www.eresearch.uct.ac.za