Date post: | 25-Dec-2015 |
Category: |
Documents |
Upload: | mervyn-lyons |
View: | 214 times |
Download: | 0 times |
Scientific Data Infrastructure in CAS
Dr. Jianhui Li([email protected])
Scientific Data Center
Computer Network Information Center
Chinese Academy of Sciences
Scientific Data infrastructure
Middle ware(Scientific data grid middleware,
internet-based storage service middleware…)
Scientific databases
Massive storage systemData-intensive computing facilities
High speed network
Application enabled environments and typical applications
Software and Toolkits
(scientific data collection, curation, and publishing, data analyzing and
visualization…)
DRC: Data Resource Center
• A new organization responsible for data preservation, curation and access service in CAS
Mass data backup
Data online service
Mas
s da
ta a
naly
sis
and
proc
ess
Long-term preservation of important data
Data ResourceCenter
Tech
nolo
gy se
rvic
e Netw
ork storage space
system environment
Application
service
mas
s da
ta
Managemen
t system
collaborator
staf
f
Infrastructure for DRC• High Speed Network
– 2Gbps linked with CSTNET– 2 Gbps linked with CSTNET-CNGI– GLORIAD
• Data Intensive Computing facilities– ~1000 CPU Core Clusters + Scientific Computing
Grid( ~200Tflops)• Massive Storage System
– 1PB online disk + 5PB Tape– A storage network will start to build this year
• 1 center + 1 archive center + 10 storage nodes around China
• Over 20PB
Scientific Databases (SDB)
• A Long-term mission started in 1986 which funded by CAS– many institutes involved– long-term, large-scale
collaboration– data from research, for research
• Collecting multi-discipline research data and promoting data sharing– More than 350 research
databases and 400 datasets by 61 institutes
– Over 60TB data available to open access and download
http://www.csdb.cn
Scientific Databases (cont.) • SDB Contents
– Physics & Chemistry, Geosciences, Biosciences, Atmospheric & Ocean Science, Energy Science, Material Science, Astronomy & Space Science
GeoSci ence 43%
Chemi stry 9%Bi oSci ence 18%
I CT 6%
Space 4%
Astronomy 1%
Physi cs 6%Ocean 5%Materi al 5% Energy 3%
Scientific Databases (cont.) • Database integration
– Resource database– Reference database– Application oriented database
Research databaseResearch database
Resource database
Reference database
Applicationorienteddatabase
Scientific Databases (cont.)
• 8 Resource databases– Geo-Science– Biodiversity– Chemistry– Astronomy– Space Science– Micro biology and virus– Material science– Environment
• 2 Reference databases– China Species
– compound• 4 application-Oriented
databases– High Energy (ITER)– Western Environment
Research– Ecology research– Qinghai Lake Research
CAS Scientific Data Grid
• Based on Scientific Data Grid Middleware (SDG)– SDG is built upon the Scientific Database, supporting to find
and access large scale, distributed and heterogeneous scientific data uniformly and conveniently in a SECURE and proper way
• Building scientific data application grid according to domain requirements– Integrate distributed data, analysis tools and storage and
computing facilities, providing a uniform data service interface
– 4 pilot grids • bioscience grid• geoscience grid• Chemistry grid• Astronomy and space science grid
Function Framework of SDG• A scalable and integrated data sharing environment
– Providing services for grid users, grid managers and resource provides
– Operating by the operation center, science gateways and data nodes
最终用户
数据资源提供者
网格管理者
网格运行服务总中心 网格主节点
所享受的服务
所承担的职责
所承担的职责
数据导航数据查询和获取用户注册单点登录
学科应用入口监控和统计信息
数据查询和获取学科应用单点登录
监控和统计信息
政策标准和规范管理网格组织机构管理
数据管理存储管理服务管理用户管理运维管理
监控和统计分析网格运行服务总中心门户
学科领域标准规范管理数据管理用户管理服务管理运维管理
监控和统计分析主题库门户
数据质量保障数据服务维护
网格节点
数据查询和获取学科应用单点登录
应用咨询服务
硬件资源管理数据服务管理
数据增长和维护数据质量管理
基于数据的网格应用
User
Grid Manager
Resource Provider
Operation Center Science Gateway Data Node
Access Scientific Data Grid
Software Tool
Research Database Research Database Research Database
Resource Databases
Reference Databases
Research Database
App-Oriented Databases
External Data Source
Science Gateway and access portal
Grid MiddlewareGrid Middleware
VisualDB - Powered your database
• A toolkit to manage, publish and share scientific database by visual configure interface without writing codes
• A database integration access broker• A data quality assessment tool• A database access and usage statistics tool
Application enabled environments and typical applications
• Domain specific data intensive application environment– Support one specific research area– Integrated scientific data, storage, computing analysis model
and tools– An easily and friendly interactive interface– Scalable user defined data process workflow
• Typical pilot systems– Remote sensing data on-demand accessing and processing
service environment– CFCI - China FLUX Cyber-Infrastructure– DarwinTree——Molecular data analysis and application
environment– Atmospheric science data integration analysis platform
Atmospheric science data integration analysis platform • Status quo
Atmospheric Scientists and Researchers
Iteration
Data Preprocessing
NCL、Matlab、CDO
Scientific Data Storage
Web Service、SRB、FTP、HTTP
Data accessing
NCL、Matlab、CDO
Data Computing
NCL、Matlab、CDO
Data Analysis
NCL、Matlab、CDO
Result Output
Data VisualizingResult Data
Atmospheric science data integration analysis platform
• Problems– The size of Atmospheric data has reached
TB level and they are distributed.– The personal computer hard disk, memory
limit of the research work– Many algorithm finished by scientific
researcher can’t be shared easily.
Scientific Data Analysis Online Platform
DistributedDistributed data
Algorithm Model
Web browser 1)custom2)visualize
Algorithm Chosen Data FindingComputing for
Workflow
Combined with data and model
Define workflow
IterativeResercher
Result
Result
Using
Architecture