Report for Scientific Machine Learning Workshopfmliang/STAT598Purdue/MLS.pdfInterpretable Machine...

Report for Scientific Machine Learning Workshop

Faming Liang

Purdue University

April 6, 2018

Scientific Machine Learning Workshop

The workshop was hosted by the U.S. Department of Energy(DOE) and held in North Bethesda, MD from January 30 toFebruary 1, 2018, which aims to identify challenges andopportunities for statistical and applied mathematical research toincrease the rigor, robustness, and reliability of machine learningfor DOE mission requirements.

https://www.orau.gov/ScientificML2018/workshop-report.htm

Scientific Machine Learning

Motivation: Development of Data Science

Data Science relies on two pillars:

I Data Collection: The integration of computer technology intoscience and daily life has enabled the collection of massiveamounts of data, e.g., climate data, multiple omics data,electronic health records, website transaction logs, credit cardrecords, etc.

I Data Analysis: Advances in high-performance computing,such as the use of GPUs, have enabled analysis of massiveamounts of data.

Evolution of Data Analysis: Small ⇒ Big

I Small Data Era: driven by human expectations andhypothesesGuided by their subject matter expertise, experience andintuition, scientists will develop hypothesis and tailor analysisapproaches to verify or disprove them.

I Big Data Era: driven by Data, leading to scientificdiscoveries

I Data Reduction: improper use of data reduction will increasethe likelihood of missing opportunities for breakthroughs

I Data Driven techniques: Many areas of science are movingtoward more data-driven techniques that ultimately aim tosubstitute the need for prior hypotheses with massive datacollections.

I Successful cases of machine learning-based big data analysishave been reported by industry, academia and researchcommunities, e.g., image and speech processing, alphaGo,self-driving, etc.

10 Priority Research Directions

Interpretable Machine LearningMachine learning is now being used as a black box and people needto develop trust for it.

I Key ChallengesI Understanding the learning/model fitting processI Understanding the model inference processI Understanding structural differences between models

I New Research DirectionsI Task-driven dimension reduction for meaningful interpretationI Characterizing the fitness surfaces, its minima and its

dependenciesI Mapping features to the domain contextI “metrics” to express qualitative differences between data,

between models, and between resultsI Potential Scientific Impact

I Human-machine partnerships to accelerate scientific discoverywith machine learning

I Provides insights for the development of better machinelearning techniques

I Increase adoption of machine learning in new domains

Effective Features for Scientific Machine Learning

I Key ChallengesI Incorporating a priori knowledge such as physical principles,

symmetries, constraints, expert knowledge into featuresI Developing features that are relevant, representative,

informative, interpretable and generalizableI Evaluating effectiveness of features

I New Research DirectionsI Automatic learning of features that satisfy a given set of

constraintsI Fusion of multi-modal data sources to extract featuresI Learn features for processes described by large heterogeneous

datasetsI Methodology to identify phase transitions with respect to

quality/volume of features

I Potential Scientific ImpactI Principled feature extractionI Extraction of more information from DOE obs and exp dataI Scientific discovery and hypothesis generation

Leveraging domain knowledge and constraints in MLformulations

I Key ChallengesI Use constraints to guide the learning processI Incorporate incomplete/uncertain knowledgeI Quantify merits of incorporating knowledge

I New Research DirectionsI Devise efficient, scalable constrained formulations for machine

learningI Develop scalable constrained, decomposition/parallel,

inference, learning, modeling frameworks

I Potential Scientific ImpactI Reduce data requirements (size and amount)I Increase scope for ML techniques for science applications with

limited/incomplete/diverse dataI Improve training efficiency

ML in High Dimension

I Key ChallengesI Reliable parameter and hyper-parameter estimation in high

dimensionI Non-parametric Identification of structure in high-dimensional

dataI Uncertainty quantification in machine learning in high

dimension

I New Research DirectionsI Methods for dimension reduction for both data and modelI Sparse/low-rank model representationsI Efficient statistical learning in high dimensionI Probabilistic methods for high-dimensional uncertainty

quantification in ML

I Potential Scientific ImpactI Enable ML for discovering structure in large scale systemsI Enable probabilistic ML methods for providing confidence

bounds on ML predictions in complex physical systemsI Scientific discovery from large scale models and data

ML for enhancing Data Collection & Use on DOE Facilities

I Key ChallengesI Integrating simulation and experiments using MLI Learning from and managing real-time, high velocity and/or

streaming dataI Steering of data collection using ML and related methods

I New Research DirectionsI Mathematically justified methods to guide data acquisition and

assure data quality and adequacyI New/improved ML methods for multimodal dataI Using VERY large data in ML analysis workflowsI Mathematics for data access surmounting security and

communication challenges

I Potential Scientific ImpactI Promoting increased efficient use of large scale DOE

computing and experimental facilitiesI Leveraging and guiding advances in computing, data and

networking resources for future science needs

ML for Inverse Problems and Inverse Problems for ML

I Key ChallengesI Identifying effective latent parameters that will make ML

schemes more interpretable and allow us to discover andcompute quantities of interest

I Inadequate computing resources for inverse problems

I New Research DirectionsI Fusion of models obtained from different methodologies, e.g.,

integration of neural networks, statistical, hierarchical andmultiscale physics models to accelerate inverse problems

I Methodologies appropriate for using very large, complex,diverse and/or streaming data in inversions

I Learning of regularization to improve solutions of ill-posedproblems

I Potential Scientific ImpactI Solutions of inverse problems faster and more reliably using ML

will benefit many areas of scientific discovery and engineering

Reproducibility of MLI Key Challenges

I Understand and characterize practical conditions under whichML process is reproducible, i.e., gives quantities of interestwhich have continuous dependence with respect toperturbations of algorithms, model selection, parameterization,data, etc.

I New Research DirectionsI Develop theory of well-posedness for machine learning with

respect to the model, data, numerical algorithms, andcomputer architecture, which is valid for practical MLalgorithms under realistic conditions

I Develop new ML approaches that lead to reproducible resultsI Enhance the understanding of the classes of data for which ML

can be shown to be reproducibleI Potential Scientific Impact

I Reproducibility is a basic tenet of science and as such it is vitalfor scientific ML to be reproducible

I Lack of reproducibility in ML casts doubts on the validity andrelevance of the whole concept

Quantifying the discrepancy in Quantities of Interestsderived using ML

I Key ChallengesI Establish rigorous numerical estimates for discrepancy in

quantities of interest derived using machine learningI establish well-defined criteria on the domain of applicability

under which the machine learning process leads to reliablepredictions

I New Research DirectionsI Mathematical foundation of ML as applies to DOE needsI Metrics for assessing discrepancies in predictions, model

matching and input dataI methods that provide realistic quantitative estimates for these

metrics

I Potential Scientific ImpactI Many DOE applications involve safety critical decisions and it

is essential that one has access to mathematically rigorous andreliable estimation on the quality of the information thatmachine learning provides.

ML-enabled Adaptive Scientific Computing

I Key ChallengesI Training is expensiveI High-fidelity models are expensiveI All models have limited prediction values, how can we make

them useful?

I New Research DirectionsI Using ML in the inner loop for tuning parameters, detecting

behavior that requires correction, etc.I Using ML in the outer loop for intelligent search,

preconditioning, etc.I Using ML for in situ analysis, automation detection of

interesting features

I Potential Scientific ImpactI Precision algorithms that are fasterI Collaboration of algorithms reducing concerns about ML

accuracyI ML surrogates could reduce synchrony in iterative algorithms

Addressing the complexity of model architectures & DOEapplications

I Key ChallengesI Complexity of ML modelsI overfitting issues: dropout, early stoppingI model structure determination

I New Research Directions Develop methods that are able toI measure complexity of the model spaceI perform model selectionI avoid overfitting beyond cross-validationI enforce physical constraints by construction or via

reinforcement learning

I Potential Scientific ImpactI Increased generalization abilityI Increased model interpretability

Summary: Two Themes

I To understand machine learning: enhancing its interpretabilityand reproducibility, quantifying uncertainty of prediction, andassessing complexity of model architecture, etc.

I To accelerate scientific discovery using machine learning:feature extraction, high-dimensional data analysis, big dataanalysis, stream data analysis, solutions of inverse problems,DOE applications, adaptive scientific computing, etc.

Date post:	24-Jul-2020
Category:	Documents
Upload:	others
View:	2 times
Download:	0 times

Report for Scientific Machine Learning Workshopfmliang/STAT598Purdue/MLS.pdfInterpretable Machine...

Documents