Date post: | 03-Jan-2016 |
Category: |
Documents |
Upload: | dillon-chavez |
View: | 22 times |
Download: | 0 times |
Tieniu TAN
Deputy Secretary-General
Chinese Academy of Sciences (CAS)
29 Mar. 2010, Irvine, USA
The 4th China-US Roundtable on Scientific The 4th China-US Roundtable on Scientific Data Data CooperationCooperation
Advanced Cyber-infrastructure for Advanced Cyber-infrastructure for Scientific Data ApplicationsScientific Data Applications in CASin CAS
OutlineOutlineBackgroundAdvanced Cyber-Infrastructure in CASTypical Data Intensive e-Science
Applications in CASConclusion
ScientificScientific DataData DelugeDeluge
Scientists face a data deluge– Vast volume of scientific data captured by
large scientific facilities, ubiquitous sensors, new instruments and computer models
Science and engineering research have become increasingly data-intensive – New scientific opportunities are emerging
from increasingly effective data organization, access and usage (NSF, 2007)
Data-intensive scientific discovery:Data-intensive scientific discovery:e-Sciencee-Science
The fourth paradigm: data-intensive scientific discovery (Microsoft, 2009)– A Transformed Scientific Method
e-Science is synthesis of information technology and science, giving priority to scientific data lifecycle and data exploration (Jim Gray) – data captured by instruments or generated by simulator;
processed by software; information/knowledge stored in computer; scientist analyzes database / files; using data management and statistics
China National Scientific Data China National Scientific Data Sharing InitiativesSharing Initiatives
Ministry of Science and Technology (MOST) started the implementation of Scientific Data Sharing Program (SDSP) in 2002– Supporting almost 20 projects to promote scientific data
sharing
National Science & Technology Infrastructure (NSTI) was launched in 2005 by MOST and Ministry of Finance ( Http://www.escience.gov.cn)– Supporting 38 projects for promoting Science and
Technology Resources, data and information sharing and Open Access
– Total funding ~2 billion RMB
High Speed Network-CSTNET-CSTNET-CNGI-GLORIAD
1.Field observation stations2.Large scientific facilities3.others
Advanced CI for Data Lifecycle in CASAdvanced CI for Data Lifecycle in CAS
Application
Generation&Collection
Trans-mission
Computing&Analysis
Storage &Curation
Data
Information Stream
Information S
tream
Information Stream
Information S
tream
Info
rmatio
n Stre
am
Data Centers-storage &preservation-Curation-Sharing and Service
Supercomputing Grid-Computing-Analysis-Mining -visualization
Data intensive e-Science Applications
Data generation Data generation Large scientific facilities produce huge data
– +20 in operation– +20 under construction
Long-term field observation stations– +100 stations covering Ecology, Environment,
Space, etc.Other research data, including experiments,
modeling, computing, etc.– 100 institutes, more than 50000 researchers in
CAS
Network Field ObservationNetwork Field ObservationNetwork expanded to link field observations
– Real Time Data Collection
CERN China
Ecology system Research Network
Disaster and Environment Observation
Astronomy and space observation
Meridian Space Weather Meridian Space Weather Monitoring ProgramMonitoring Program
More than 10TB data will be generated and transmitted to Beijing per year
data analysis needs 20Tflops
A data system and processing infrastructure being built
Cosmic-ray observatory: Cosmic-ray observatory:
ARGO/ASARGO/ASCosmic-ray observatory at
Yangbajing in Tibet: – ARGO: China-Italy
– AS: China-Japan
~200TB raw data per year.
Data transferred from YBJ-ARGO and processed at IHEP and INFN
Rec. data accessible by collaborators.
BEPCII / BESIIIBEPCII / BESIII
BEPC: Beijing Electron-Positron Collider– upgrade: BEPCII/BESIII, operational in 2008
– 2.0 ~ 4.6 GeV/C
– (3~10)×1032 cm-2s-1
– 36 Institutions from China, US, Germany, Russian, and Japan
– 4000+ KSI2K for data process and physics analysis
– 5+ PB in five years
Data Transmission-High Speed Data Transmission-High Speed NetworkNetwork
China Science and Technology Network (CSTNet)
Non-profitable, academic and research networks in China to support advanced science applications and research on next generation Internet
Connect some 200 institutes, and 1,000,000 end users
Lanzhou
Xinjiang
Xian
Shenyang
Changchun
Chengdu
Kunming
Wuhan
Guangzhou
ShanghaiHefeiLasa
Qingdao
Haerbin
Xining
Dalian
Guiyang
Yangbajing
Xishuangbanna
Changsha
TianJin
2.5Gb/s
155Mb/s
< 155Mb/s
Figure
HongKong1Gb/s Taiwan
Shenzhen
Fuzhou
Ningbo
Nanjing
ShanxiShijiazhuang
Beijing
CSTNET Backbone
Interconnecting with otherInterconnecting with other Networks Networks
RussiaNetherland
USA
KISTI Korea
NICT Japan
AS Hongkong
GOOGLE Hongkong
HKIX Hongkong
CUHK Hongkong
China169 China Unicom
ChinaNet TELECOM
CERNET
HKOEPCSTNET
Gloriad
10G
2.5G
2.5G
1G
1G
1G
1G
1G1G
2.5G
2.5G
2G
155M
155M
700M
BJ NAP
2.5GHongkong 2G
Internet
Beijing
上海
Jiling
辽宁
Guangzhou
兰州
XinJiang
Beijing
10Gbps
International Link10G
羊羊羊
100+ Institutes40+ Field stations and big science facilitiesComputing facilities and storage facilities
CSTNET-CNGIAn IPv6 Network for Science based on CSTNET will start to build this year
Chengdu
XI’AN
Kunming
WuHan
HefeiNanjing
Data Storage and CurationData Storage and Curation
A General Scientific Data Center – Common data infrastructure construction, operation
– Data archive and preservation
Some domain specific scientific data centers– Discipline data curation and sharing service
A CAS scientific data app project – Multi-discipline data sharing and applications
A series of domain-based scientific data sharing systems and institute level data sharing infrastructure
Data Resource CenterData Resource Center
A General Scientific Data Center
A new organization responsible for data preservation, curation and access service in CAS Mass data backup
Data online
service
Mas
s da
ta a
naly
sis
and
proc
ess
Long-term preservation of important data
Data ResourceCenter
Tech
nolo
gy s
ervi
ce Netw
ork storage space
system environment
Application
service
mas
s da
ta
Managemen
t system
collaborator
staf
f
Massive Storage System in Data Massive Storage System in Data Resource CenterResource Center
Massive Storage System– Scientific data archive system (5PB
tape) – Online data storage system (1PB
disk array)Internet-based service (Cloud
Service) – Data backup– Archiving and curation– on-line data access and analysis
Domain Specific Scientific Data Domain Specific Scientific Data CentersCenters
World Data Center(World Data System) in CAS– Natural Resource Environment Data Center
– Astronomy Data Center
– Space Data Center
– Geophysics Data Center
– Glacier and Frozen Earth Data Center
Scientific Databases (SDB)
A Long-term mission started in 1986 which was funded by CAS– data from research, for research
Collecting multi-discipline research data and promoting data sharing– More than 350 research
databases and 400 datasets by 61 institutes
– Over 60TB data available to open access and download
http://www.csdb.cn
Scientific Databases (cont.)
8 Resource databases– Geo-Science
– Biodiversity
– Chemistry
– Astronomy
– Space Science
– Micro biology and virus
– Material science
– Environment
2 Reference databases– China Species
– compound4 Application-Oriented
databases– High Energy (ITER)
– Western Environment Research
– Ecology research
– Qinghai Lake Research
Scien
tific Data G
ridScientific Data and databases
Scientific Data Grid Middleware
Scientific Data Grid Applications
Bioscience Gateway Geosciences Gateway
Chemistry Gateway Other Gateways
CAS Scientific Data Grid
Integrating distributed scientific data into a com-prehensive service and application environment
Linking all data canters as a data net
Scientific Computing GridScientific Computing Grid
Access Through network
Local/Remote User
Resource Abstracting
Cooperation
Resource Interconnection
Other network resource and environment
Database, e-Science, ARP, website, science, TRP
CNGRID & environment
Super Computing Grid
App
licat
ion
serv
ice
and
Tech
nica
l sup
porti
ng S
yste
m,
Uni
form
Sys
tem
ope
ratin
g, S
uppo
rting
& S
ervi
ce.
Uniform
Regulations
SCCAS, 120+Tflops
Computing capacity
8+ Branches:50 Tflops commonComputing capacity
Institute Computing Resource
50 Tflops common
Computing capacity
Lenovo 7000, Peak: 143TeraFLOPS
Scientific Computing GridScientific Computing Grid
HPC, Cluster, Workstation, Storage
Windows / Linux Clients
Web Portal
Grid Middleware
Users Administrator
HEP Grid in ChinaHEP Grid in China
Access to the LHC data for scientific research: A grid computing system is built in CAS
WLCG MoU signed with CERN in 2006 to build a Tier-2 center at IHEP for both the ATLAS and CMS experiments.
IHEPIHEPPKUPKU
SDUSDUUSTCUSTC
NJUNJU
Tier-2 site at IHEPTier-2 site at IHEP
WLCG site based on EGEE/gLite
Associated with CC-IN2P3 in Lyon
Work nodes with 1600 cores
400 TB disk space
Typical data intensive e-Science Typical data intensive e-Science ApplicationsApplications
Developing a series of pilot e-Science applications– Most are data intensive
Pt>20 GeV/c Tracks
ttH(2l2b4j2) full simulation event display
ttH-2L selection
ttbar mimic to ttHWW
HEP Grid Applications: ATLAS HEP Grid Applications: ATLAS MC StudyMC Study
Rosetta
Early/Late Stage
HEP Grid application: protein HEP Grid application: protein predictionprediction
Explore the non-natural protein sequence space Set up a massive protein structure prediction environment Develop web tools for the biology community Result of EUChinaGrid project (EU FP6 project)
KWCWPFASHNDLKVQSQWYVEPPDTIPPYNKYGTNFIKHCQYIAHMQGDTHFFNRVRMHQLWKIIVDCAY
ChinaFLUXBuilt in 2002 for climate change and environment research
31
Data System
Observation systemObservation system Modeling and visualizationModeling and visualizationData transmission Data transmission
ChinaFLUX e-Science Environment
Real data from sensors to field stations, then to institutes, finally to data centers to process and share
Cyberinfrastructure for data collection Cyberinfrastructure for data collection
综合研究中心
服务器 可视化显示屏 存储设备
归档 在线存储
通量塔
摄像头 无线传输设备
台站
服务器
下一代互联网络
近距离无线连接
远距离无线连接
Internet接入
软件工具
仪器设备状态监控和异常警报
数据管理
数据处理
ChinaFlux数据服务门户
Vpn
北京生态网络综合中心
长白山站
Vpn
内蒙古站
Vpn
禹城站Vpn
千烟洲站Vpn
鼎湖山站Vpn
哀牢山站
Vpn当雄站
Vpn
Vpn
海北站
基地
Internet
集中器/交换机
传感器
无线网终端接收器
传感器
无线网终端接收器
传感器分散监测点 传感器集中
监测点
传感器
无线网终端接收器
无线网络传感器
无线传感器网络监测点
Data intensive Data intensive applicationapplication environment environment
Data synthesis and integration
Data analysis and modeling
visualization
OPEN S
CIENCE
CLOUD
OPEN S
CIENCE
CLOUD
IaaSNetwork Service
Computing ServiceStorage Service
…
IaaSNetwork Service
Computing ServiceStorage Service
…
Conclusion
PaasData intensive
application environment
…
PaasData intensive
application environment
…
SaasSoftware and tools for data
curation, analysis, mining and visualization…
SaasSoftware and tools for data
curation, analysis, mining and visualization…
Building an Open Science Cloud serving not only CAS researchers, but also the wider scientific community!
DaaSScientific data and databases
Service
DaaSScientific data and databases
Service
Thank youThank you !!