Report for Scientific Machine Learning Workshop
Faming Liang
Purdue University
April 6, 2018
Scientific Machine Learning Workshop
The workshop was hosted by the U.S. Department of Energy(DOE) and held in North Bethesda, MD from January 30 toFebruary 1, 2018, which aims to identify challenges andopportunities for statistical and applied mathematical research toincrease the rigor, robustness, and reliability of machine learningfor DOE mission requirements.
https://www.orau.gov/ScientificML2018/workshop-report.htm
Scientific Machine Learning
Motivation: Development of Data Science
Data Science relies on two pillars:
I Data Collection: The integration of computer technology intoscience and daily life has enabled the collection of massiveamounts of data, e.g., climate data, multiple omics data,electronic health records, website transaction logs, credit cardrecords, etc.
I Data Analysis: Advances in high-performance computing,such as the use of GPUs, have enabled analysis of massiveamounts of data.
Evolution of Data Analysis: Small ⇒ Big
I Small Data Era: driven by human expectations andhypothesesGuided by their subject matter expertise, experience andintuition, scientists will develop hypothesis and tailor analysisapproaches to verify or disprove them.
I Big Data Era: driven by Data, leading to scientificdiscoveries
I Data Reduction: improper use of data reduction will increasethe likelihood of missing opportunities for breakthroughs
I Data Driven techniques: Many areas of science are movingtoward more data-driven techniques that ultimately aim tosubstitute the need for prior hypotheses with massive datacollections.
I Successful cases of machine learning-based big data analysishave been reported by industry, academia and researchcommunities, e.g., image and speech processing, alphaGo,self-driving, etc.
10 Priority Research Directions
Interpretable Machine LearningMachine learning is now being used as a black box and people needto develop trust for it.
I Key ChallengesI Understanding the learning/model fitting processI Understanding the model inference processI Understanding structural differences between models
I New Research DirectionsI Task-driven dimension reduction for meaningful interpretationI Characterizing the fitness surfaces, its minima and its
dependenciesI Mapping features to the domain contextI “metrics” to express qualitative differences between data,
between models, and between resultsI Potential Scientific Impact
I Human-machine partnerships to accelerate scientific discoverywith machine learning
I Provides insights for the development of better machinelearning techniques
I Increase adoption of machine learning in new domains
Effective Features for Scientific Machine Learning
I Key ChallengesI Incorporating a priori knowledge such as physical principles,
symmetries, constraints, expert knowledge into featuresI Developing features that are relevant, representative,
informative, interpretable and generalizableI Evaluating effectiveness of features
I New Research DirectionsI Automatic learning of features that satisfy a given set of
constraintsI Fusion of multi-modal data sources to extract featuresI Learn features for processes described by large heterogeneous
datasetsI Methodology to identify phase transitions with respect to
quality/volume of features
I Potential Scientific ImpactI Principled feature extractionI Extraction of more information from DOE obs and exp dataI Scientific discovery and hypothesis generation
Leveraging domain knowledge and constraints in MLformulations
I Key ChallengesI Use constraints to guide the learning processI Incorporate incomplete/uncertain knowledgeI Quantify merits of incorporating knowledge
I New Research DirectionsI Devise efficient, scalable constrained formulations for machine
learningI Develop scalable constrained, decomposition/parallel,
inference, learning, modeling frameworks
I Potential Scientific ImpactI Reduce data requirements (size and amount)I Increase scope for ML techniques for science applications with
limited/incomplete/diverse dataI Improve training efficiency
ML in High Dimension
I Key ChallengesI Reliable parameter and hyper-parameter estimation in high
dimensionI Non-parametric Identification of structure in high-dimensional
dataI Uncertainty quantification in machine learning in high
dimension
I New Research DirectionsI Methods for dimension reduction for both data and modelI Sparse/low-rank model representationsI Efficient statistical learning in high dimensionI Probabilistic methods for high-dimensional uncertainty
quantification in ML
I Potential Scientific ImpactI Enable ML for discovering structure in large scale systemsI Enable probabilistic ML methods for providing confidence
bounds on ML predictions in complex physical systemsI Scientific discovery from large scale models and data
ML for enhancing Data Collection & Use on DOE Facilities
I Key ChallengesI Integrating simulation and experiments using MLI Learning from and managing real-time, high velocity and/or
streaming dataI Steering of data collection using ML and related methods
I New Research DirectionsI Mathematically justified methods to guide data acquisition and
assure data quality and adequacyI New/improved ML methods for multimodal dataI Using VERY large data in ML analysis workflowsI Mathematics for data access surmounting security and
communication challenges
I Potential Scientific ImpactI Promoting increased efficient use of large scale DOE
computing and experimental facilitiesI Leveraging and guiding advances in computing, data and
networking resources for future science needs
ML for Inverse Problems and Inverse Problems for ML
I Key ChallengesI Identifying effective latent parameters that will make ML
schemes more interpretable and allow us to discover andcompute quantities of interest
I Inadequate computing resources for inverse problems
I New Research DirectionsI Fusion of models obtained from different methodologies, e.g.,
integration of neural networks, statistical, hierarchical andmultiscale physics models to accelerate inverse problems
I Methodologies appropriate for using very large, complex,diverse and/or streaming data in inversions
I Learning of regularization to improve solutions of ill-posedproblems
I Potential Scientific ImpactI Solutions of inverse problems faster and more reliably using ML
will benefit many areas of scientific discovery and engineering
Reproducibility of MLI Key Challenges
I Understand and characterize practical conditions under whichML process is reproducible, i.e., gives quantities of interestwhich have continuous dependence with respect toperturbations of algorithms, model selection, parameterization,data, etc.
I New Research DirectionsI Develop theory of well-posedness for machine learning with
respect to the model, data, numerical algorithms, andcomputer architecture, which is valid for practical MLalgorithms under realistic conditions
I Develop new ML approaches that lead to reproducible resultsI Enhance the understanding of the classes of data for which ML
can be shown to be reproducibleI Potential Scientific Impact
I Reproducibility is a basic tenet of science and as such it is vitalfor scientific ML to be reproducible
I Lack of reproducibility in ML casts doubts on the validity andrelevance of the whole concept
Quantifying the discrepancy in Quantities of Interestsderived using ML
I Key ChallengesI Establish rigorous numerical estimates for discrepancy in
quantities of interest derived using machine learningI establish well-defined criteria on the domain of applicability
under which the machine learning process leads to reliablepredictions
I New Research DirectionsI Mathematical foundation of ML as applies to DOE needsI Metrics for assessing discrepancies in predictions, model
matching and input dataI methods that provide realistic quantitative estimates for these
metrics
I Potential Scientific ImpactI Many DOE applications involve safety critical decisions and it
is essential that one has access to mathematically rigorous andreliable estimation on the quality of the information thatmachine learning provides.
ML-enabled Adaptive Scientific Computing
I Key ChallengesI Training is expensiveI High-fidelity models are expensiveI All models have limited prediction values, how can we make
them useful?
I New Research DirectionsI Using ML in the inner loop for tuning parameters, detecting
behavior that requires correction, etc.I Using ML in the outer loop for intelligent search,
preconditioning, etc.I Using ML for in situ analysis, automation detection of
interesting features
I Potential Scientific ImpactI Precision algorithms that are fasterI Collaboration of algorithms reducing concerns about ML
accuracyI ML surrogates could reduce synchrony in iterative algorithms
Addressing the complexity of model architectures & DOEapplications
I Key ChallengesI Complexity of ML modelsI overfitting issues: dropout, early stoppingI model structure determination
I New Research Directions Develop methods that are able toI measure complexity of the model spaceI perform model selectionI avoid overfitting beyond cross-validationI enforce physical constraints by construction or via
reinforcement learning
I Potential Scientific ImpactI Increased generalization abilityI Increased model interpretability
Summary: Two Themes
I To understand machine learning: enhancing its interpretabilityand reproducibility, quantifying uncertainty of prediction, andassessing complexity of model architecture, etc.
I To accelerate scientific discovery using machine learning:feature extraction, high-dimensional data analysis, big dataanalysis, stream data analysis, solutions of inverse problems,DOE applications, adaptive scientific computing, etc.