Community Accessible Datastore of High-Throughput Calculations: Experiences from the Materials
Project Dan Gunter, Shreyas Cholia, Anubhav Jain, Michael Kocher, Kristin Persson,
Lavanya Ramakrishnan
Shyue Ping Ong, Gerbrand Ceder
BACKGROUND
November 12, 2012 Slide 1
November 12, 2012 2
Our energy future relies on the rapid development of novel functional materials.
But it takes almost twenty years to develop new materials. How can we do it faster?
Solar cells, advanced batteries, TCOs, and fuel cells will all play a role in our energy future.
Materials Genome Initiative
November 12, 2012 3
June 2011: Materials Genome Ini/a/ve which aims to “fund computa(onal tools, so-ware, new methods for material characteriza2on, and the development of open standards and databases that will make the process of discovery and development of advanced materials faster, less expensive, and more predictable”
Source: "Materials Genome IniBaBve for Global CompeBBveness" hFp://www.whitehouse.gov/sites/default/files/microsites/ostp/materials_genome_iniBaBve-‐final.pdf
It's the , stupid!
November 12, 2012
Really hard work on some computaBons
FantasBc paper in a journal
Really hard work on some computaBons
FantasBc paper in a journal
Black Hole
data
data
Drink margaritas
FantasBc paper in a journal
DB
data
Brilliant analysis
Brilliant analysis
Brilliant analysis
Escape velocity?
data data
data
data
Very specialized skill-set
November 12, 2012 5
Physics
Deep dive on specific soYware
Computer Science
Really hard work on
some computations
Example
November 12, 2012 6
Predicted and measured performance of of Li9V3(P2O7)3(PO4)2 during cell cycling.
The Materials Project used quantum chemistry calculations to screen over 20,000 materials as potential cathodes for Li ion batteries. From the results, three new materials were identified, tested, and currently have patents pending.
COMPONENTS
November 12, 2012 7
November 12, 2012 8
Parallel computation
Parallel HPC resources
Datastore
Data dissemination
Collaborative toolsWeb
server
Analysis library
Science apps
Data V&V
Midrange compute resources
Workflow
HPC storage
Data
Data analytics
NoSQL Datastore
November 12, 2012 9
Powerful but simple query language Ease of administration Good performance on read-heavy workloads where most of the data can fit into memory. Poor performance at huge scale Bad for write-heavy workloads
FireWorks workflow engine
November 12, 2012 10
Programmability. Scripting, not GUIs and DSL’s. Administration overhead. No extra servers. Flexibility. DB support, reconfiguring running workflows.
Re-runs Detours Duplicates Iteration
Why?!
Dissemination with REST
November 12, 2012 11
https://www.materialsproject.org/rest/v1/materials/Fe2O3/vasp/energyPreamble Version Application I.D. Datatype Property
Web UI
November 12, 2012 12
3-D model of unit cell
Disqus comment
button
Detailed structure
X-ray diffraction
pattern (interactive)
Bandstructure and Density of
states (interactive)
Calculation iterations
Comments
November 12, 2012 13
WE'RE DOING IT WRONG?
Running on HPC
• Batch queues and large numbers of jobs with unpredictable runtimes
• Talking to the database
November 12, 2012 14
Data analytics
• Scaling community contributions to code • Scaling analytic functions
November 12, 2012 15
Data V&V
• Loading new data into a production resource • Constant validation and verification
November 12, 2012 16
Data dissemination
• Security and privacy • Query performance
November 12, 2012 17
FUTURE WORK
November 12, 2012 18
Opening up data access
November 12, 2012 19
November 12, 2012 20
Compute properties
Stability and
synthesis
Materials Project Source
ideas
User sandboxes
MP Workflow
(b)
(a)
(c)
(d)
(e)
pymatgen
MP
datastore
(f)
Towards materials design
Questions?
November 12, 2012 21