Mat informatics opportunties fisherbarton 2015 12-07 1.1

Post on 23-Jan-2018

238 views 6 download

transcript

Opportunities inMaterials Informatics

Dane Morgan

University of Wisconsin, Madison

ddmorgan@wisc.edu, W: 608-265-5879, C: 608-234-2906

Fisher Barton Technology Center

Watertown, WI

December 7, 2015 1

What is Materials Informatics?

Materials informatics is a field of study that applies the tools and principles of information extraction from data (informatics) to materials science and engineering to better understand the use, selection, development, and discovery of materials.

– Mining for materials information in large data sets

– Applying new information technologies to enable new materials science

2

What Are Materials Informatics Applications?

Related buzzwords: Data science, data analytics, data mining, knowledge discovery, machine learning, artificial intelligence, deep learning, big data …

• Interpolation/Extrapolation/Correlation of Data – determine controlling factors, fill in what is missing, optimize

• Design of Experiments – Perform experiments in optimal order to achieve your goal

• Clustering (Feature Extraction) – group like things together, either supervised or unsupervised

• Image Recognition – identify things in pictures and analyze them

• Optimization – find the optimal solution in complex spaces

• Text Mining – Extract data from published documents, web

3

Associated Infrastructure: Cloud computing, high-performance computing clusters, high-throughput/combinatorial experiment+computation, …

Materials Informatics Has a Strong History

Mendeleev 1871

Ashby map4

Turning Point for Materials Informatics

Data availabilityData Production Informatics Tools

-6

-5.5

-5

-4.5

-4

-3.5

-3

0 10 20 30 40

PredictedLogk*(cm

/s)

Eabovehull(meV/atom)

LaBO3

YBO3

PrBO3

(Sr,Ba)BO3

5

Informatics Tools Explosion

6

Prediction API

Real Time Translation with Deep Learning from Microsoft

https://www.youtube.com/watch?v=Nu-nlQqFCKg

Time: 6:30s

7

Google Image Captioning

http://www.nytimes.com/2014/11/18/science/researchers-announce-breakthrough-in-content-recognition-software.html?_r=0

8

Focus Area: Informatics for Knowledge Discovery in Large Data Sets

Use machine learning techniques to

• Organize your data by putting all relevant, cleaned input and output into one place

• Understand your data by finding the most important factors controlling output values

• Expand your data by interpolating and extrapolating

• Optimize your data by finding correlations between input and output data to optimize desired output

9

Example

• Organize: Build a database of all the relevant factors (impurity concentrations, processing conditions, testing conditions, …) and output performance.

• Understand: Which impurities matter most. Size of impurity effects vs. other contributions.

• Expand: Interpolate/extrapolate to other impurity concentrations to assess performance under conditions we have not yet explored.

• Optimize: Determine impurity concentrations that lead to optimal performance.

I know impurities impact my device lifetime, so …

10

Undergraduate “Materials Informatics Skunkworks”

Benjamin Anderson Liam Witteman

Team support

Henry WuAren LorensonHaotian Wu

Zachary Jensen

11

Jason MaldonisJosh Perry Tom Vandenberg Robert Darlington

Example: Predicting Impurity Diffusion in FCC Alloys

12

UNPUBLISHED DATA – CONFIDENTIAL – DO NOT DISSEMINATE

Calculated activation energies with ab initio methods

1.0

1.5

2.0

2.5

3.0

Dif

fusi

on

Bar

rier

[eV

]

Sc YLa

Ti ZrHf

V NbTa

Cr MoW

Mn TcRe

Fe RuOs

Co RhIr

Ni PdPt

Cu AgAu

Zn CdHg

Ga InTl

Ge SnPb

As SbBi

Ca SrBa

K RbCs

1.0

1.5

2.0

2.5

3.0

Dif

fusi

on

Bar

rier

[eV

]

Sc YLa

Ti ZrHf

V NbTa

Cr MoW

Mn TcRe

Fe RuOs

Co RhIr

Ni PdPt

Cu AgAu

Zn CdHg

Ga InTl

Ge SnPb

As SbBi

1.0

1.5

2.0

2.5

3.0

Dif

fusi

on B

arri

er [

eV]

Sc YLa

Ti ZrHf

V NbTa

Cr MoW

Mn TcRe

Fe RuOs

Co RhIr

Ni PdPt

Cu AgAu

Zn CdHg

Ga InTl

Ge SnPb

As SbBi

2.0

2.5

3.0

3.5

4.0D

iffu

sio

n B

arri

er [

eV]

Sc YLa

Ti ZrHf

V NbTa

Cr MoW

Mn TcRe

Fe RuOs

Co RhIr

Ni PdPt

Cu AgAu

Zn CdHg

Ga InTl

Ge SnPb

As SbBi

2.0

2.5

3.0

3.5

4.0

Dif

fusi

on

Bar

rier

[eV

]

Sc YLa

Ti ZrHf

V NbTa

Cr MoW

Mn TcRe

Fe RuOs

Co RhIr

Ni PdPt

Cu AgAu

Zn CdHg

2.0

2.5

3.0

3.5

4.0

Dif

fusi

on

Bar

rier

[eV

]

Sc YLa

Ti ZrHf

V NbTa

Cr MoW

Mn TcRe

Fe RuOs

Co RhIr

Ni PdPt

Cu AgAu

Zn CdHg

Mg Al

Cu Ni

Pd Pt

Example: Predicting Impurity Diffusion in FCC Alloys

• 15 FCC hosts x 100 impurities = 1500 systems, ~15m core-hours (~$500k to produce, ~2 years).

• We have computed values for ~10%

• How can we quickly (and cheaply) get to ~100% coverage?

13

M Al Ca Ni Cu Sr Rh Pd Ag Yb Ir Pt Au Pb Ac Th

X 13 20 28 29 38 45 46 47 70 77 78 79 82 89 90 H 1

He 2

Li 3

Be 4

B 5

C 6

N 7

O 8

F 9

Ne 10

Na 11

Mg 12

Al 13

Si 14

P 15

S 16

Cl 17

Ar 18

K 19

Ca 20

Sc 21

Ti 22

V 23

Cr 24

Mn 25

Fe 26

Co 27

Ni 28

Cu 29

Zn 30

Ga 31

Ge 32

As 33

Se 34

Br 35

Kr 36

Rb 37

Sr 38

Y 39 N/A N/A

Zr 40

Nb 41

Mo 42 N/A

Tc 43 N/A N/A

Ru 44 N/A N/A

Rh 45 N/A N/A

Pd 46 N/A

Ag 47

Cd 48

In 49

Sn 50

Sb 51

Te 52

I 53

Xe 54

Cs 55

Ba 56

La 57 N/A N/A

Ce 58

Pr 59

Nd 60

Pm 61

Sm 62

Eu 63

Gd 64

Tb 65

Dy 66

Ho 67

Er 68

Tm 69

Yb 70

Lu 71

Hf 72

Ta 73

W 74

Re 75

Os 76

Ir 77

Pt 78

Au 79

Hg 80

Tl 81

Pb 82

Bi 83

Po 84

At 85

Rn 86

Fr 87

Ra 88

Ac 89

Th 90

Pa 91

U 92

Np 93

Pu 94

UNPUBLISHED DATA – CONFIDENTIAL – DO NOT DISSEMINATE

Materials Informatics Approach –Regression and Prediction

• Assume Activation energy = F(elemental properties)

• Elemental properties = melting temperature, bulk modulus, electronegativity, …

• F is determined using a one of many possible methods: linear regression, neural network, decision tree, kernel ridge regression, …

• Fit F with calculated data, test it with cross-validation, then predict new data.

M Al Ca Ni Cu Sr Rh Pd Ag Yb Ir Pt Au Pb Ac Th

X 13 20 28 29 38 45 46 47 70 77 78 79 82 89 90 H 1

He 2

Li 3

Be 4

B 5

C 6

N 7

O 8

F 9

Ne 10

Na 11

Mg 12

Al 13

Si 14

P 15

S 16

Cl 17

Ar 18

K 19

Ca 20

Sc 21

Ti 22

V 23

Cr 24

Mn 25

Fe 26

Co 27

Ni 28

Cu 29

Zn 30

Ga 31

Ge 32

As 33

Se 34

Br 35

Kr 36

Rb 37

Sr 38

Y 39 N/A N/A

Zr 40

Nb 41

Mo 42 N/A

Tc 43 N/A N/A

Ru 44 N/A N/A

Rh 45 N/A N/A

Pd 46 N/A

Ag 47

Cd 48

In 49

Sn 50

Sb 51

Te 52

I 53

Xe 54

Cs 55

Ba 56

La 57 N/A N/A

Ce 58

Pr 59

Nd 60

Pm 61

Sm 62

Eu 63

Gd 64

Tb 65

Dy 66

Ho 67

Er 68

Tm 69

Yb 70

Lu 71

Hf 72

Ta 73

W 74

Re 75

Os 76

Ir 77

Pt 78

Au 79

Hg 80

Tl 81

Pb 82

Bi 83

Po 84

At 85

Rn 86

Fr 87

Ra 88

Ac 89

Th 90

Pa 91

U 92

Np 93

Pu 94

Train F(properties)

M Al Ca Ni Cu Sr Rh Pd Ag Yb Ir Pt Au Pb Ac Th

X 13 20 28 29 38 45 46 47 70 77 78 79 82 89 90 H 1

He 2

Li 3

Be 4

B 5

C 6

N 7

O 8

F 9

Ne 10

Na 11

Mg 12

Al 13

Si 14

P 15

S 16

Cl 17

Ar 18

K 19

Ca 20

Sc 21

Ti 22

V 23

Cr 24

Mn 25

Fe 26

Co 27

Ni 28

Cu 29

Zn 30

Ga 31

Ge 32

As 33

Se 34

Br 35

Kr 36

Rb 37

Sr 38

Y 39 N/A N/A

Zr 40

Nb 41

Mo 42 N/A

Tc 43 N/A N/A

Ru 44 N/A N/A

Rh 45 N/A N/A

Pd 46 N/A

Ag 47

Cd 48

In 49

Sn 50

Sb 51

Te 52

I 53

Xe 54

Cs 55

Ba 56

La 57 N/A N/A

Ce 58

Pr 59

Nd 60

Pm 61

Sm 62

Eu 63

Gd 64

Tb 65

Dy 66

Ho 67

Er 68

Tm 69

Yb 70

Lu 71

Hf 72

Ta 73

W 74

Re 75

Os 76

Ir 77

Pt 78

Au 79

Hg 80

Tl 81

Pb 82

Bi 83

Po 84

At 85

Rn 86

Fr 87

Ra 88

Ac 89

Th 90

Pa 91

U 92

Np 93

Pu 94

Y. Zeng and K. Bai, Journal of Alloys and Compounds 624, p. 201-209 (2015).14

Model Predictive Ability

• Leave one out cross validation

• Predictive RMS = 0.14 eV (vs. 0.24 eV for linear fit) –predicts diffusion of new impurity within <10x at 1000K

• Time to predict new system < 1s!

0 1 2 3 4 5 6DFT Activation Energy [eV]

0

1

2

3

4

5

6

Pre

dic

ted

Act

ivat

ion

En

erg

y [

eV]

AlCuNiPdPtAuCaIrPb

Leave One Out Cross Validation

y = 0.9909x

R2 = 0.9312

UNPUBLISHED DATA – CONFIDENTIAL – DO NOT DISSEMINATE

15

Al-X Recrystallization Temperature (Tx)

• Data on Tx for 82 Al-X alloys with 11 alloying elements

• What controls Tx and how can we optimize it? 16

0

50

100

150

200

250

300

350

400

0

2

4

6

8

10

12

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81

Recrystalliza

onTemperatureT

x

MoleFraconAllo

yingElement

AlloyNumber

Fe Y Ni La Ti CoCu Sn Ga B Ce Tx(°C)

Courtesy of

Izabela

Szlufarska, John

Perepezko, Zach

Jensen

Materials Informatics Approach –Regression and Prediction

• Assume Tx = F(elemental composition)

• Elemental composition = mole fraction of Fe, Cu, Y, …

• F is determined using a one of many possible methods: linear regression, neural network, decision tree, kernel ridge regression, …

• Fit F with calculated data, test it with cross-validation, then predict new data.

Train F(properties)

17

0

50

100

150

200

250

300

350

400

0

2

4

6

8

10

12

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81

Recrystalliza

onTemperatureT

x

MoleFraconAllo

yingElement

AlloyNumber

Fe Y Ni La Ti CoCu Sn Ga B Ce Tx(°C)

0

50

100

150

200

250

300

350

400

0

2

4

6

8

10

12

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81

Recrystalliza

onTemperatureT

x

MoleFraconAllo

yingElement

AlloyNumber

Fe Y Ni La Ti CoCu Sn Ga B Ce Tx(°C)

Linear Regression Prediction of Tx

18

Max RMS: 91°C

Min RMS: 13°C

Avg RMS: 28°C +/- 10.3°C

Original Data Std Dev: 48°C

Worst Case

1000 leave out 20% cross-validation tests

Best Case

TrainingTraining

TestingTesting

Courtesy of

Izabela

Szlufarska, John

Perepezko, Zach

Jensen

The Undergraduate “Materials Informatics Skunkworks”

We are establishing ~10-20 undergraduates working together to provide materials informatics research for companies• Help researchers in academia and industry develop and

utilize this new field• Provide training in rapidly growing field of informatics to

undergraduates to enhance employment opportunities and key workforce development

• Be supported financially/academically through credits, internships, senior design/capstone projects, funded projects from industry

• Be supported intellectually through group culture of teamwork and knowledge continuity (more senior train more junior members) with limited faculty involvement for advanced issues

19

What the Informatics SkunkworksMight Provide You

WORKFORCEA team of talented students who are ready to work quickly with

your company to get the most out of your data

DATA ANALYTICSTechnical skills to help you organize, understand and expand data

sets and utilize data to optimize materials development

20

What You Might Provide the Informatics Skunkworks

FINANCIAL/COURSE CREDIT SUPPORTInternships, Co-ops, Senior design/Capstone projects, Research

projects, Research funding or course credits

SHARED DATAData sets of materials related performance and property data that are large (> ~50), can be shared (ideally published), and are worth

mining

21

Thank You for Your Attention

22

Backup

23

Present Best ApproachGaussian Kernel Ridge Regression

• We have systems M-X labeled with i, and descriptors labeled with j for each M-X system. Assume yi are output, xi,j are input descriptors

• Regression: Find {aj} that minimize

• Ridge Regression: Find {aj} that minimize

• Kernel Ridge Regression: Find {ai} that minimize

yi - a jxi, jj

åæ

èçç

ö

ø÷÷

i

å

yi - a jxi, jj

åæ

èçç

ö

ø÷÷

i

å + l a j2

j

å

yi - ai 'K xi ',xi( )i '

åæ

èç

ö

ø÷

i

å + l ai ai 'i,i '

å K xi ',xi( )

K xi ',xi( ) =

exp - xi ' - xi2

2s 2( )New values are given by y

* = aii,i '

å K xi,x*( )

Kernel is

Must fit s and l

G. Montavon, et al., NJOP ‘13.

A. Gretton, Introduction to RKHS, and some simple kernel algorithms, 1/27/15 (lecture notes)

Gaussian Kernel Ridge Regression

Introduction to RKHS, and some simple kernel Algorithms, Arthur Gretton, January 27, 2015