Date post: | 23-Jan-2018 |
Category: |
Engineering |
Upload: | ddm314 |
View: | 238 times |
Download: | 6 times |
Opportunities inMaterials Informatics
Dane Morgan
University of Wisconsin, Madison
[email protected], W: 608-265-5879, C: 608-234-2906
Fisher Barton Technology Center
Watertown, WI
December 7, 2015 1
What is Materials Informatics?
Materials informatics is a field of study that applies the tools and principles of information extraction from data (informatics) to materials science and engineering to better understand the use, selection, development, and discovery of materials.
– Mining for materials information in large data sets
– Applying new information technologies to enable new materials science
2
What Are Materials Informatics Applications?
Related buzzwords: Data science, data analytics, data mining, knowledge discovery, machine learning, artificial intelligence, deep learning, big data …
• Interpolation/Extrapolation/Correlation of Data – determine controlling factors, fill in what is missing, optimize
• Design of Experiments – Perform experiments in optimal order to achieve your goal
• Clustering (Feature Extraction) – group like things together, either supervised or unsupervised
• Image Recognition – identify things in pictures and analyze them
• Optimization – find the optimal solution in complex spaces
• Text Mining – Extract data from published documents, web
3
Associated Infrastructure: Cloud computing, high-performance computing clusters, high-throughput/combinatorial experiment+computation, …
Materials Informatics Has a Strong History
Mendeleev 1871
Ashby map4
Turning Point for Materials Informatics
Data availabilityData Production Informatics Tools
-6
-5.5
-5
-4.5
-4
-3.5
-3
0 10 20 30 40
PredictedLogk*(cm
/s)
Eabovehull(meV/atom)
LaBO3
YBO3
PrBO3
(Sr,Ba)BO3
5
Informatics Tools Explosion
6
Prediction API
Real Time Translation with Deep Learning from Microsoft
https://www.youtube.com/watch?v=Nu-nlQqFCKg
Time: 6:30s
7
Google Image Captioning
http://www.nytimes.com/2014/11/18/science/researchers-announce-breakthrough-in-content-recognition-software.html?_r=0
8
Focus Area: Informatics for Knowledge Discovery in Large Data Sets
Use machine learning techniques to
• Organize your data by putting all relevant, cleaned input and output into one place
• Understand your data by finding the most important factors controlling output values
• Expand your data by interpolating and extrapolating
• Optimize your data by finding correlations between input and output data to optimize desired output
9
Example
• Organize: Build a database of all the relevant factors (impurity concentrations, processing conditions, testing conditions, …) and output performance.
• Understand: Which impurities matter most. Size of impurity effects vs. other contributions.
• Expand: Interpolate/extrapolate to other impurity concentrations to assess performance under conditions we have not yet explored.
• Optimize: Determine impurity concentrations that lead to optimal performance.
I know impurities impact my device lifetime, so …
10
Undergraduate “Materials Informatics Skunkworks”
Benjamin Anderson Liam Witteman
Team support
Henry WuAren LorensonHaotian Wu
Zachary Jensen
11
Jason MaldonisJosh Perry Tom Vandenberg Robert Darlington
Example: Predicting Impurity Diffusion in FCC Alloys
12
UNPUBLISHED DATA – CONFIDENTIAL – DO NOT DISSEMINATE
Calculated activation energies with ab initio methods
1.0
1.5
2.0
2.5
3.0
Dif
fusi
on
Bar
rier
[eV
]
Sc YLa
Ti ZrHf
V NbTa
Cr MoW
Mn TcRe
Fe RuOs
Co RhIr
Ni PdPt
Cu AgAu
Zn CdHg
Ga InTl
Ge SnPb
As SbBi
Ca SrBa
K RbCs
1.0
1.5
2.0
2.5
3.0
Dif
fusi
on
Bar
rier
[eV
]
Sc YLa
Ti ZrHf
V NbTa
Cr MoW
Mn TcRe
Fe RuOs
Co RhIr
Ni PdPt
Cu AgAu
Zn CdHg
Ga InTl
Ge SnPb
As SbBi
1.0
1.5
2.0
2.5
3.0
Dif
fusi
on B
arri
er [
eV]
Sc YLa
Ti ZrHf
V NbTa
Cr MoW
Mn TcRe
Fe RuOs
Co RhIr
Ni PdPt
Cu AgAu
Zn CdHg
Ga InTl
Ge SnPb
As SbBi
2.0
2.5
3.0
3.5
4.0D
iffu
sio
n B
arri
er [
eV]
Sc YLa
Ti ZrHf
V NbTa
Cr MoW
Mn TcRe
Fe RuOs
Co RhIr
Ni PdPt
Cu AgAu
Zn CdHg
Ga InTl
Ge SnPb
As SbBi
2.0
2.5
3.0
3.5
4.0
Dif
fusi
on
Bar
rier
[eV
]
Sc YLa
Ti ZrHf
V NbTa
Cr MoW
Mn TcRe
Fe RuOs
Co RhIr
Ni PdPt
Cu AgAu
Zn CdHg
2.0
2.5
3.0
3.5
4.0
Dif
fusi
on
Bar
rier
[eV
]
Sc YLa
Ti ZrHf
V NbTa
Cr MoW
Mn TcRe
Fe RuOs
Co RhIr
Ni PdPt
Cu AgAu
Zn CdHg
Mg Al
Cu Ni
Pd Pt
Example: Predicting Impurity Diffusion in FCC Alloys
• 15 FCC hosts x 100 impurities = 1500 systems, ~15m core-hours (~$500k to produce, ~2 years).
• We have computed values for ~10%
• How can we quickly (and cheaply) get to ~100% coverage?
13
M Al Ca Ni Cu Sr Rh Pd Ag Yb Ir Pt Au Pb Ac Th
X 13 20 28 29 38 45 46 47 70 77 78 79 82 89 90 H 1
He 2
Li 3
Be 4
B 5
C 6
N 7
O 8
F 9
Ne 10
Na 11
Mg 12
Al 13
Si 14
P 15
S 16
Cl 17
Ar 18
K 19
Ca 20
Sc 21
Ti 22
V 23
Cr 24
Mn 25
Fe 26
Co 27
Ni 28
Cu 29
Zn 30
Ga 31
Ge 32
As 33
Se 34
Br 35
Kr 36
Rb 37
Sr 38
Y 39 N/A N/A
Zr 40
Nb 41
Mo 42 N/A
Tc 43 N/A N/A
Ru 44 N/A N/A
Rh 45 N/A N/A
Pd 46 N/A
Ag 47
Cd 48
In 49
Sn 50
Sb 51
Te 52
I 53
Xe 54
Cs 55
Ba 56
La 57 N/A N/A
Ce 58
Pr 59
Nd 60
Pm 61
Sm 62
Eu 63
Gd 64
Tb 65
Dy 66
Ho 67
Er 68
Tm 69
Yb 70
Lu 71
Hf 72
Ta 73
W 74
Re 75
Os 76
Ir 77
Pt 78
Au 79
Hg 80
Tl 81
Pb 82
Bi 83
Po 84
At 85
Rn 86
Fr 87
Ra 88
Ac 89
Th 90
Pa 91
U 92
Np 93
Pu 94
UNPUBLISHED DATA – CONFIDENTIAL – DO NOT DISSEMINATE
Materials Informatics Approach –Regression and Prediction
• Assume Activation energy = F(elemental properties)
• Elemental properties = melting temperature, bulk modulus, electronegativity, …
• F is determined using a one of many possible methods: linear regression, neural network, decision tree, kernel ridge regression, …
• Fit F with calculated data, test it with cross-validation, then predict new data.
M Al Ca Ni Cu Sr Rh Pd Ag Yb Ir Pt Au Pb Ac Th
X 13 20 28 29 38 45 46 47 70 77 78 79 82 89 90 H 1
He 2
Li 3
Be 4
B 5
C 6
N 7
O 8
F 9
Ne 10
Na 11
Mg 12
Al 13
Si 14
P 15
S 16
Cl 17
Ar 18
K 19
Ca 20
Sc 21
Ti 22
V 23
Cr 24
Mn 25
Fe 26
Co 27
Ni 28
Cu 29
Zn 30
Ga 31
Ge 32
As 33
Se 34
Br 35
Kr 36
Rb 37
Sr 38
Y 39 N/A N/A
Zr 40
Nb 41
Mo 42 N/A
Tc 43 N/A N/A
Ru 44 N/A N/A
Rh 45 N/A N/A
Pd 46 N/A
Ag 47
Cd 48
In 49
Sn 50
Sb 51
Te 52
I 53
Xe 54
Cs 55
Ba 56
La 57 N/A N/A
Ce 58
Pr 59
Nd 60
Pm 61
Sm 62
Eu 63
Gd 64
Tb 65
Dy 66
Ho 67
Er 68
Tm 69
Yb 70
Lu 71
Hf 72
Ta 73
W 74
Re 75
Os 76
Ir 77
Pt 78
Au 79
Hg 80
Tl 81
Pb 82
Bi 83
Po 84
At 85
Rn 86
Fr 87
Ra 88
Ac 89
Th 90
Pa 91
U 92
Np 93
Pu 94
Train F(properties)
M Al Ca Ni Cu Sr Rh Pd Ag Yb Ir Pt Au Pb Ac Th
X 13 20 28 29 38 45 46 47 70 77 78 79 82 89 90 H 1
He 2
Li 3
Be 4
B 5
C 6
N 7
O 8
F 9
Ne 10
Na 11
Mg 12
Al 13
Si 14
P 15
S 16
Cl 17
Ar 18
K 19
Ca 20
Sc 21
Ti 22
V 23
Cr 24
Mn 25
Fe 26
Co 27
Ni 28
Cu 29
Zn 30
Ga 31
Ge 32
As 33
Se 34
Br 35
Kr 36
Rb 37
Sr 38
Y 39 N/A N/A
Zr 40
Nb 41
Mo 42 N/A
Tc 43 N/A N/A
Ru 44 N/A N/A
Rh 45 N/A N/A
Pd 46 N/A
Ag 47
Cd 48
In 49
Sn 50
Sb 51
Te 52
I 53
Xe 54
Cs 55
Ba 56
La 57 N/A N/A
Ce 58
Pr 59
Nd 60
Pm 61
Sm 62
Eu 63
Gd 64
Tb 65
Dy 66
Ho 67
Er 68
Tm 69
Yb 70
Lu 71
Hf 72
Ta 73
W 74
Re 75
Os 76
Ir 77
Pt 78
Au 79
Hg 80
Tl 81
Pb 82
Bi 83
Po 84
At 85
Rn 86
Fr 87
Ra 88
Ac 89
Th 90
Pa 91
U 92
Np 93
Pu 94
Y. Zeng and K. Bai, Journal of Alloys and Compounds 624, p. 201-209 (2015).14
Model Predictive Ability
• Leave one out cross validation
• Predictive RMS = 0.14 eV (vs. 0.24 eV for linear fit) –predicts diffusion of new impurity within <10x at 1000K
• Time to predict new system < 1s!
0 1 2 3 4 5 6DFT Activation Energy [eV]
0
1
2
3
4
5
6
Pre
dic
ted
Act
ivat
ion
En
erg
y [
eV]
AlCuNiPdPtAuCaIrPb
Leave One Out Cross Validation
y = 0.9909x
R2 = 0.9312
UNPUBLISHED DATA – CONFIDENTIAL – DO NOT DISSEMINATE
15
Al-X Recrystallization Temperature (Tx)
• Data on Tx for 82 Al-X alloys with 11 alloying elements
• What controls Tx and how can we optimize it? 16
0
50
100
150
200
250
300
350
400
0
2
4
6
8
10
12
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81
Recrystalliza
onTemperatureT
x
MoleFraconAllo
yingElement
AlloyNumber
Fe Y Ni La Ti CoCu Sn Ga B Ce Tx(°C)
Courtesy of
Izabela
Szlufarska, John
Perepezko, Zach
Jensen
Materials Informatics Approach –Regression and Prediction
• Assume Tx = F(elemental composition)
• Elemental composition = mole fraction of Fe, Cu, Y, …
• F is determined using a one of many possible methods: linear regression, neural network, decision tree, kernel ridge regression, …
• Fit F with calculated data, test it with cross-validation, then predict new data.
Train F(properties)
17
0
50
100
150
200
250
300
350
400
0
2
4
6
8
10
12
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81
Recrystalliza
onTemperatureT
x
MoleFraconAllo
yingElement
AlloyNumber
Fe Y Ni La Ti CoCu Sn Ga B Ce Tx(°C)
0
50
100
150
200
250
300
350
400
0
2
4
6
8
10
12
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81
Recrystalliza
onTemperatureT
x
MoleFraconAllo
yingElement
AlloyNumber
Fe Y Ni La Ti CoCu Sn Ga B Ce Tx(°C)
Linear Regression Prediction of Tx
18
Max RMS: 91°C
Min RMS: 13°C
Avg RMS: 28°C +/- 10.3°C
Original Data Std Dev: 48°C
Worst Case
1000 leave out 20% cross-validation tests
Best Case
TrainingTraining
TestingTesting
Courtesy of
Izabela
Szlufarska, John
Perepezko, Zach
Jensen
The Undergraduate “Materials Informatics Skunkworks”
We are establishing ~10-20 undergraduates working together to provide materials informatics research for companies• Help researchers in academia and industry develop and
utilize this new field• Provide training in rapidly growing field of informatics to
undergraduates to enhance employment opportunities and key workforce development
• Be supported financially/academically through credits, internships, senior design/capstone projects, funded projects from industry
• Be supported intellectually through group culture of teamwork and knowledge continuity (more senior train more junior members) with limited faculty involvement for advanced issues
19
What the Informatics SkunkworksMight Provide You
WORKFORCEA team of talented students who are ready to work quickly with
your company to get the most out of your data
DATA ANALYTICSTechnical skills to help you organize, understand and expand data
sets and utilize data to optimize materials development
20
What You Might Provide the Informatics Skunkworks
FINANCIAL/COURSE CREDIT SUPPORTInternships, Co-ops, Senior design/Capstone projects, Research
projects, Research funding or course credits
SHARED DATAData sets of materials related performance and property data that are large (> ~50), can be shared (ideally published), and are worth
mining
21
Thank You for Your Attention
22
Backup
23
Present Best ApproachGaussian Kernel Ridge Regression
• We have systems M-X labeled with i, and descriptors labeled with j for each M-X system. Assume yi are output, xi,j are input descriptors
• Regression: Find {aj} that minimize
• Ridge Regression: Find {aj} that minimize
• Kernel Ridge Regression: Find {ai} that minimize
yi - a jxi, jj
åæ
èçç
ö
ø÷÷
i
å
yi - a jxi, jj
åæ
èçç
ö
ø÷÷
i
å + l a j2
j
å
yi - ai 'K xi ',xi( )i '
åæ
èç
ö
ø÷
i
å + l ai ai 'i,i '
å K xi ',xi( )
K xi ',xi( ) =
exp - xi ' - xi2
2s 2( )New values are given by y
* = aii,i '
å K xi,x*( )
Kernel is
Must fit s and l
G. Montavon, et al., NJOP ‘13.
A. Gretton, Introduction to RKHS, and some simple kernel algorithms, 1/27/15 (lecture notes)
Gaussian Kernel Ridge Regression
Introduction to RKHS, and some simple kernel Algorithms, Arthur Gretton, January 27, 2015