+ All Categories
Home > Documents > Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop...

Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop...

Date post: 29-Dec-2015
Category:
Upload: rodney-barker
View: 215 times
Download: 1 times
Share this document with a friend
36
Computing & Information Sciences Kansas State University 18 May 2006 Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-KMC energy estimation: experiments with genetic programming William H. Hsu and Martin S. R. Paradesi Thursday, 18 May 2006 Laboratory for Knowledge Discovery in Databases Kansas State University http://www.kddresearch.org/KSU/CIS/KMC-20060518-Learn ing.ppt
Transcript
Page 1: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

LEAP-KMC Workshop 2006

Multiscale machine learning for LEAP-KMC energy estimation:

experiments with genetic programming

LEAP-KMC Workshop 2006

Multiscale machine learning for LEAP-KMC energy estimation:

experiments with genetic programming

William H. Hsu and Martin S. R. Paradesi

Thursday, 18 May 2006

Laboratory for Knowledge Discovery in Databases

Kansas State University

http://www.kddresearch.org/KSU/CIS/KMC-20060518-Learning.ppt

Page 2: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Technical Objectives ofPrevious and New WorkTechnical Objectives ofPrevious and New Work

Page 3: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

OutlineOutline

Background, Related Work and Rationale

Novel Contributions

Development Plan

Experimental Approach and Progress Report

Future Directions and Open Problems

Page 4: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Specification of Pattern Recognition Problem

Given: spatial occupancy atomistic representation

Estimate: barrier energies for transitions (processes)

Simulation Approaches for Dynamics

Molecular (MD), temperature-accelerated (TAD)

Kinetic Monte Carlo (KMC)

Representation of Problem

Parameter estimation

Nonlinear system identification

Problem StatementProblem Statement

Source Data

Central Atom1st shell2nd shell3rd shell

Rahman, Kara,et al. (2004)

Page 5: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Prior Work:Sastry et al. (2003-Present)

Prior Work:Sastry et al. (2003-Present)

Test bed

(100) config only

CuxCo1-x

Accuracy 97.2 - 99.6%

Solution Approach: Symbolic Regression (sr-KMC)

SR Method: Genetic Programming

Inline function (macro learning)

Objective: whole potential energy surface (PES)

Figure of merit: generalization accuracy

Sastry, Johnson, Goldberg, & Bellon (2004)

Page 6: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Five Components of GP Specification (cf. Koza, 1992)

1. Terminal symbols: bound parameters

2. Operator set: algebraic (logical connectives or arithmetic)

3. Fitness criterion: loss function

4. Termination condition: short-term convergence analysis

5. Result designation: energy estimator for new barriers

Iterative Procedure

Initialization of population

Fitness evaluation

Selection, recombination, mutation

Replacement

Genetic Programming (GP) [1]:Basic Definition

Genetic Programming (GP) [1]:Basic Definition

Page 7: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Selection strategies

Fitness-proportionate (Roulette-Wheel)

Rank-proportionate

Tournament

Structural GP operations

Crossover: subtree-based

Mutation: structural

Mechanisms

Elitism: chromosomal

Crowding, niching: population

Fitness scaling, sharing: objective function

Genetic Programming (GP) [2]:Basic Definition

Genetic Programming (GP) [2]:Basic Definition

Sastry, Johnson, Goldberg, & Bellon (2004)

Page 8: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Rationale

Learns functional form of estimator

Flexible and expressive

Fast to apply once learned

Ability to generalize functional form: multiobjective, multi-scale

Parallellism in learning: task-level (functional), multi-deme

Challenges

Slow convergence

Combinatorially large search space

Local optima

Complexity of estimator: code growth (aka code bloat)

Genetic Programming (GP) [3]:Rationale and Challenges

Genetic Programming (GP) [3]:Rationale and Challenges

Page 9: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Speedup Learning [1]:The Basic Idea

Speedup Learning [1]:The Basic Idea

Given

More precise method for calculating target function

Slower

May be exact

Methods: Saddle point search, MD, TAD, pure KMC

Evaluations (ground truth) for some pairs of states (102 ~

103), aka transitions, aka barriers

Output: Faster (100-5000x) but Close Approximator

Objective: Use in Tandem with Other Methods

Page 10: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Speedup Learning [2]:Technical Objectives & Evaluation

Speedup Learning [2]:Technical Objectives & Evaluation

Block Diagram:

Hybrid

Hierarchical scales

Error Function for M Transitions:

Sastry, Johnson, Goldberg, & Bellon (2004)

Sastry, Johnson, Goldberg, & Bellon (2004)

Page 11: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Other Genetic and Evolutionary Computation: Permutation GA

Other Genetic and Evolutionary Computation: Permutation GA

Permutations

[0 1 2 3 4 5], [3 5 1 4 2 0], etc.

Searching space of n! permutations

Selection: evaluate ordering by application (e.g., test a

variable ordering using validation set accuracy)

Mutation: swap maintains permutation property

Crossover: cycle, ordered

Other GAs: representation highly important

GAs: special case of GP

Page 12: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Prior Results - Permutation GA forBN Structure Learning

Inferential RMSE for Forward Simulation

0

0.05

0.1

0.15

0.2

0.25

1 2693 5385 8077 10769 13461

Samples

RM

SE

GoldStandardNetwork

K2 Outputon OptimalOrdering

K2 Outputon GAOrdering

K2: 20K FS: 1500

(Hsu, Guo, Perry & Stilson, 2002)

Page 13: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Limitations of Prior WorkLimitations of Prior Work

Single scale

Speedup bottlenecked by fixed sample complexity

Overspecialization

Emphasis on (100), Cu/Co or Fe/Cu

Episodic

Need to retrain on new data; not anytime, not anyspace

Not designed to scale up

Higher number of shells (9 vs. 3), atoms (209 vs. 36)

2-D for epitaxial and thin film growth

No feature extraction mechanism

Page 14: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

OutlineOutline

Background, Related Work and Rationale

Novel Contributions

Development Plan

Experimental Approach and Progress Report

Future Directions and Open Problems

Page 15: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Multiscale ModelingMultiscale Modeling

Abstraction mechanism

Captures constitutive relationships, e.g.,

phase change

equilibria

dynamical systems

Voter (1997, 2002), Grujicic (2003), etc.

Scalability

Inversely proportional to homogeneity

Goal: graceful degradation

Necessary for KMC

Page 16: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Generalization to More Types of Lattice Structures

Generalization to More Types of Lattice Structures

Typical phenomena in current (NSF ITR) research

Metal vapor deposition

Thin film growth

Longer-term objectives: non-crystalline models

Crack propagation

Molecular ligand modeling

Peptides, proteins, and mRNA in proteomics

Other macromolecules?

Phenomena: signal transduction, ion transport

Nanostructures: wires, tubes, C60

Page 17: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

IncrementalityIncrementality

Definitions

Anytime: returns partial estimate on demand at any time

after initial computation time requirement is met

Anyspace: returns partial estimate within specified space

beyond some minimum

Need: to eliminate the need to retrain

Incremental mechanisms being investigated

Reuse in GP: ADFs, pre-evolved individuals

Evolutionary approach: incrementally staged learning

from easier subtasks (ISLES, Hsu et al. 2004)

Page 18: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Scaling UpScaling Up

Finer Granularity

Intermediate trajectories and energies from drag code

FCC vs. RCP

Larger Problems

9 vs. 3 shell

209 atom occupancy model vs. 36

Hybrid Models for Very Large Instances

Incrementality: Interaction Protocol

(Synchronization)

Page 19: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

3-D Modeling andGraphical Models3-D Modeling andGraphical Models

Continuing Work:Speeding up Approximate Inference using Edge Deletion - J. Thornton (2005)Bayesian Network tools in Java (BNJ) v4 - W. Hsu, J. M. Barber, J. Thornton (2006)

Dynamic Bayes Netfor Predicting 3-D

Energetics

Hsu, Kara, Karim & Rahman (in prep., 2006)

Page 20: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Other Applications of GEC:Feature Selection/ExtractionOther Applications of GEC:

Feature Selection/Extraction

Feature Selection

Genetic filters

Genetic wrappers, cf.

Witten and Frank – built into WEKA (2005)

Cherkauer and Shavlik (1996)

Simultaneous Feature Extraction and Selection

Raymer, Punch, Goodman, Sanschagrin, Kuhn (1997)

Focus of Current Work

Relational features

Constructed features

Page 21: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Interim Progress Report andPlan Overview for Years 2-3Interim Progress Report andPlan Overview for Years 2-3

Page 22: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Previous Results (2005): Supervised Learning – Energy

Estimation

Previous Results (2005): Supervised Learning – Energy

Estimation

Results for 36-bit occupancy vector, 10-fold cross-validationTarget attribute: external energy function (numeric)

Source data: Baza C500, Step16MDD

Page 23: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

OutlineOutline

Background, Related Work and Rationale

Novel Contributions

Development Plan

Experimental Approach and Progress Report

Future Directions and Open Problems

Page 24: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Approximate TimelineApproximate Timeline

2004 – fall: state of the field (sr-KMC)

2005 – KMC data model, classical inducers, export

2006 – completion of general LEAP-KMC estimator

Spring: data export, full battery of GEC experiments

Fall: multi-scale, multiobjective, change of representation

2007: incrementality, esp. anytime; speedup eval

2008: meta-learning abstractions in LEAP-KMC

2009: 3-D, temporal models; other utility functions

Page 25: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

BNJ Graphical User Interface:Editor

© 2005 KSU Bayesian Network tools in Java (BNJ) Development Team

ALARM Network

Page 26: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

OutlineOutline

Background, Related Work and Rationale

Novel Contributions

Development Plan

Experimental Approach and Progress Report

Future Directions and Open Problems

Page 27: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Experimental Design ApproachExperimental Design Approach

GP-Based Discovery

Intermediate features

Immutable macros (ADFs) for easier subtasks

Non-inline units for easier subtasks (GP-ISLES)

Characterizing

Reuse

Impact on code growth

Size and age statistics of trees

Usability: efficiency of evaluation, human readability

Visualization-Oriented

Page 28: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Progress ReportProgress Report

Experiments and Software Packages

Waikato Environment for Knowledge Analysis (WEKA) 3.5

Evolutionary Computation in Java (ECJ) 14 – current estimator

Bayesian Network tools in Java 4 (http://bnj.sourceforge.net)

Papers

Accepted: GECCO-2006 Late-Breaking Paper

In preparation

Int’l Joint Conf. on Artificial Intelligence (IJCAI) 2007 Workshop

Journal of Graphics Tools, Journal of Online Math & its Applications

Grant Pipeline

Under review: ONR/DHS Inst. Disc. Sci. (with UIUC), NSF CCLI

In preparation: KSU Targeted Excellence HCII

Page 29: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Preliminary Resultsand Next Steps

Preliminary Resultsand Next Steps

Priority: Estimator Integration (ECJ/WEKA & KMC)

Multi-Attribute Classification Task: Social Network

Genetic Wrappers for Above Task

Filter: CFS Subset Eval – 3 attributes

Wrapper: J48 (decision tree inducer) – 3 attributes

Wrapper: OneR – 3 attributes

Inducer All NoDist BkDist Dist Interest

J48 98.2 94.8 95.8 97.6 88.5

OneR 95.8 92.0 95.8 95.8 88.5

Logistic 91.6 90.9 88.3 88.9 88.4Data set from Hsu, King, Paradesi, Pydimarri, Weninger (GECCO-2006, accepted, to appear)

Page 30: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

OutlineOutline

Background, Related Work and Rationale

Novel Contributions

Development Plan

Experimental Approach and Progress Report

Future Directions and Open Problems

Page 31: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Continuing WorkContinuing Work

Multi-scale

Multi-objective version cf. Sastry et al. 2006

Generalize

Over crystal lattice structures and materials processes

Incrementality

Anytime (modulo data synchronization protocol)

Scaling up to 9-shell, 209-atom model

Higher number of shells (9 vs. 3), atoms (209 vs. 36)

3-D models

Concurrent feature extraction by GP?

Page 32: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Educational Outreach: HCI Issues

Previous PHP GUI - Ali Al-Rawi Java GUI - Prototype by Andrew King

Desiderata: usability (Q&A), ergonomics,

accessibility, view control

Elements: unified data model, visualization widgets,

figures of merit, evaluation mechanism (cf. BNJ)

Page 33: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

References [1]References [1]

Sastry, K., Johnson, D.D., Thompson, A. L., Goldberg, D. E., Martinez, T. J., Leiding, J., Owens, J. (2006). Multiobjective Genetic Algorithms for Multiscaling Excited State Direct Dynamics in Photochemistry.

Sastry, K., Abbass, H. A., Goldberg, D. E., Johnson, D. D. (2005). Sub-structural Niching in Estimation of Distribution Algorithms.

Sastry, K., Johnson, D.D., Goldberg, D.E., Bellon, P. (2004) Genetic Programming for Multiscale Modeling.

Sastry, K. Johnson, D. D., Goldberg, D. E., Bellon, P. (2003) Genetic Programming for Multi-Timescale Modeling.

Page 34: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

References [2]References [2]

Karim, A., Al-Rawi, A., Kara A., & Rahman, T.S.  (2006). Diffusion of Small 2D-Cu Islands on Cu (111) studied with a kinetic Monte Carlo method, Phys. Rev. B 73:165411.

Karim, A., Al-Rawi, A., Kara A., & Rahman, T.S.  (2006).   Thecrossover from collective motion to periphery diffusion for adatom-islands on Cu (111) and Ag (111), to be submitted to Phys. Rev. Lett.

Thornton, C. (2005). Self teaching kinetic monte-carlo (user interface) . Retrieved 15 May 2005, from Charlie Thornton -- Research Web site: http://www.cis.ksu.edu/~clt3955/research.php

Trushin, O., Karim, A., Kara, A., & Rahman, T. S. (2005).Self-learning kinetic Monte Carlo method: Application to Cu(111), Phys. Rev. B 72:115401.

Page 35: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

AcknowlegementsAcknowlegements

Abroad Amar

Trushin

KSU Physics Al-Rawi, Kara, Karim, Rahman

KSU CIS Knowledge Discovery in Databases (KDD):

King (info vis), Pydimarri (machine learning), Walters (info vis), Weninger (data model)

Parallel Computing: Jundt, Mairal, Wallentine

Alumni: Thornton, C., Ramakrishnan

Page 36: Computing & Information Sciences Kansas State University 18 May 2006Second Annual KMC Workshop LEAP-KMC Workshop 2006 Multiscale machine learning for LEAP-

Computing & Information SciencesKansas State University

18 May 2006Second Annual KMC Workshop

Questions and DiscussionQuestions and Discussion


Recommended