NSF/DHS FODAVA-LEAD:Missions and Plans
Haesun ParkComputational Science and Engineering Division
Georgia Institute of Technology
FODAVA Kick-off Meeting, September 2008
Data and Visual Analytics (DAVA)
Analytical
Reasoning
Data Representation
and Transformation
Visual Representation and Interaction Production, Presentation,
Dissemination
Data and Visual Analytics (DAVA)Analytical Reasoning• Apply human judgment to
reach conclusions• Methods to maximally utilize
human capacity to derive deep understanding and insight into complex situations in a minimum amount of time
Data Representation and Transformation• Representing dynamic, incomplete, conflicting data
to convey important content in a form and level of abstraction appropriate to the analytical task to enable understanding
• Transforming data among possible representations to support analysis and discovery
Visual Representation and Interaction
• Visual presentation of information in ways that instantly convey important content taking advantage of human vision
• Interaction techniques (e.g., search) between the analyst and data to facilitate the analytical reasoning process
Production, Presentation, Dissemination• Seamless integration of data acquisition,
analysis, decision making, and action
A Discipline in Data & Visual Analytics
FODAVA is concerned with defining the mathematical and computational foundations for the Data and Visual Analytics Discipline
FoundationsFoundations
AnalyticalReasoning
DataRepresentation
and Transformation
ProductionPresentation and
Dissemination
VisualRepresentationand Interaction
I think, therefore
I am.
I think, therefore
I am.
“Solving a problem simply means representing it so that the solution is obvious.” Herbert Simon, 96
Applications
• FODAVA team will perform foundational research that can be applied to many different fields
– Common end objective is to apply knowledge in decision making process, at the time and place that a decision is needed.
– Common challenges across applications as well as application specific challenges
Social NetworksBiometric RecognitionText Analysis
Bioinformatics
Epidemiology
Homeland Security
Medical Informatics :Theory and practice of knowledge integration, management and use in healthcare delivery, med,public health
Astrophysics
VISION: Establishing DAVA as a Distinct Discipline
• Develop FODAVA community, engage larger DAVA field– Researchers– Educators– Practitioners
• Establish Body of Knowledge– Foundations, subareas,
applications– Curriculum– Education programs
Data Analytics Visualization
Mathematical and Computational FoundationsMathematical and Computational FoundationsData and Visual Analytics
Analytic Reasoning
Production,
Presentation,
Dissemination
Data and Visual Analytics Communities
National Visualization and Analytics Center
(NVAC)/VAC Consortium
National Visualization and Analytics Center
(NVAC)/VAC Consortium
RVAC/
DHS Science & Technology Center
of Excellence
RVAC/
DHS Science & Technology Center
of Excellence
FODAVAFODAVA leadFODAVA partners (08, 09,
…)
FODAVAFODAVA leadFODAVA partners (08, 09,
…)
FODAVA will interact with several
communities of researchers & practitioners
“This partnership with NSF is the most important event since the creation of NVAC in March 2004. It brings to the front stage efforts by folks within DHS, NVAC and NSF to jointly fund the development of basic research in visual analytics supporting DHS applied mission needs.”~Jim Thomas, NVAC Director
FODAVA-Lead Mission
• Research and Education: Serve as a central facility that will involve all FODAVA awardees in a common effort to develop the scientific foundations for data and visual analytics
• Effective Liaison between FODAVA Researchers and NVAC: Interface with DHS NVAC/RVAC and DHS S&T Center of Excellence in research and educational opportunities
• Community Building: Integrate diverse DAVA communities and reach out for broader participation
FODAVA-Lead Challenges
Research and Collaboration• Creation of the Mathematical and Computational
Sciences Foundations required to represent and transform all types of digital data in ways to enable efficient and effective Visualization and Analytic Reasoning
• Intrinsic Challenges: Data sets massive, heterogeneous, multi-dimensional, dirty, incomplete, time-varying; solutions must be produced with time and space constraints, ….
• Understanding Fundamental issues/needs in VA and Communicating results– Isolated theoretical research is not enough– Problem driven foundational research is needed
FODAVA-Lead Challenges (cont’d)
• Education and Research– Defining Foundations of Data and Visual Analytics – Undergraduate and Graduate Curriculum (core
body of knowledge) for Data and Visual Analytics
• Community Building/Integration– A community of researchers who claim DAVA as
their own discipline and FODAVA an essential part
– Conferences, journals, books, professional society engagement,
– Industry, tech transfer, …
FODAVA-Lead PIs at GAtech
Alex GrayCSE
Machine LearningFast Algorithms for Massive DA
Haesun ParkDirector
CSE, Associate ChairNumerical Computing
Data AnalysisResearch, FODAVA Community Building
Vladimir KoltchinskiiMathematics
Machine Learning TheoryComputational Statistics
John StaskoAssociate DirectorIC, Associate ChairSRVAC Co-Director
Information Vis.Collaboration with NVAC and RVACs
Liaison with Vis. community
Renato MonteiroISyE
Continuous OptimizationStatistical Computing
FODAVA-Lead Senior Personnel
James Foley Associate Dean CoC
Graphics and Visualization, HCIVisual Analytics Digital Library
Richard FujimotoAssociate Director
CSE, ChairModeling and Simulation Education and Outreach
Guy LebanonCSE
Machine LearningComputational Statistics
Arkadi NemirovskiISyE
OptimizationNon-parametric Stat.
Alexander ShapiroISyE
Stochastic ProgrammingOptimization
Multivariate Stat. Analysis
Santosh VempalaCS
Theory of ComputigDirector of ARC
Hongyuan ZhaCSE
Numerical ComputingData Analysis
Director of Graduate Studies
Hao-Min ZhouMathematics
Wavelet and PDEImage Processing
2008 FODAVA Partners• Global Structure Discovery on Sampled Spaces
Leonidas Guibas and Gunnar Carlsson (Stanford University)
• Visualizing Audio for Anomaly Detection
Mark Hasegawa-Johnson, Thomas Huang, Hank Kaczmarski, Camille Goudeseune (University of Illinois Urbana-Champaign)
• Principles for Scalable Dynamic Visual Analytics
H. Jagadish, and George Michailidis (University of Michigan)
• Efficient Data Reduction and Summarization
Ping Li (Cornell University)
• Uncertainty-Aware Data Transformations for Collaborative Reasoning
Kwan-Liu Ma (UC Davis)
• Mathematical Foundations of Multiscale Graph Representations and Interactive Learning
Mauro Maggioni, Rachael Brady, Eric Monson (Duke University)
• Visually-Motivated Characterizations of Point Sets Embedded in High-Dimensional Geometric Spaces
Leland Wilkinson and Robert Grossman (University of Illinois Chicago)
Adilson Motter (Northwestern University)
Expertise of FODAVA team
Machine LearningMachine Learning
OptimizationOptimization
Information VisualizationInformation Visualization
SimulationSimulationHuman Computer Interaction
Human Computer Interaction
Computational Math&StatisticsComputational Math&Statistics
GamingGaming
DatabaseDatabase
Real-time Systems
Real-time Systems
Discrete/Graph Algorithms
Discrete/Graph Algorithms
Speech Recognition
Speech Recognition
High Performance Computing
High Performance Computing
Graphics and Vis. Graphics and Vis.
Information Retrieval
Information Retrieval
Numeric & Geometric Computing
Numeric & Geometric Computing
FODAVA Activities• Body of Knowledge
– Curriculum development– Repository for education materials– Distinguished lecture series– Outreach to underrepresented groups
• Community Development– Communications: project description and results– FODAVA web site
• Repository of FODAVA data sets and results
– Conferences and meetings• Annual FODAVA Workshop • NVAC Consortium meetings• Activities at established meetings• Meetings to establish new research directions
Curriculum Development
• Goals– Identify and catalog curriculum development efforts in
Data and Visual Analytics• Individual courses, minors, degree programs• Undergraduate and graduate level
– Leverage existing efforts (e.g., RVAC)– Share experiences, develop best practices– Develop curriculum recommendations
• Curriculum workshop– POCs: Cook (NVAC), Fujimoto (FODAVA), Stasko
(RVAC and FODAVA)– December 2008, Atlanta, Georgia
Visual Analytics Digital Library(http://vadl.cc.gatech.edu)
• Developed by Georgia Tech (Foley et al.) in Southeast Regional Visual Analytics Center
• Repository for curriculum and education materials– Lecture notes– Homeworks, projects– Reference materials, videos, etc.
• Includes evolving taxonomy for Data and Visual Analytics• FODAVA will build upon this resource to
– Provide a library and web portal of FODAVA educational materials– Expand support to DAVA community to include FODAVA areas– Document curriculum develop efforts
Distinguished Lecture Series• Goal: Provide forum for
leaders in DAVA community to articulate vision and DAVA-related research and education activities and applications
• Plans (2009)– Lecture series featuring leaders in the data and visual
analytics community– Develop in collaboration with FODAVA partners, NVAC,
RVAC, DHS/S&T CoE– Webcast
Photo: Joe Kielman, VAC Consortium meeting, 2008
Outreach to Underrepresented GroupsExample: GT CRUISE Program
• CRUISE: CSE Research Undergraduate Intern Summer Experience
• Encourage students to consider PhD studies
• Diverse student participation
– Multicultural, emphasizing minorities, women
– U.S. and international students
• Ten week summer research projects in areas such as data and visual analytics, high performance computing, modeling & simulation
• Interdisciplinary individual and group projects
– Year-long collaboration with North Carolina A&T University
• CRUISE-wide events
– Weekly seminars (technical, grad studies)
– Social events
– Symposium: conference-style presentations
FODAVA Website http://fodava.gatech.edu
• Functionality– Dissemination of results to user communities– DAVA community events and meeting information
depot– Repository of data sets for FODAVA community
• Forum for FODAVA Community
• Maintain close collaboration with NVAC
FODAVA Annual Workshop(from Fall 2009)
• Annual Theme – Initially more mathematically/computationally oriented
– Increasing emphasis over time on visualization, human-computer interaction, cognitive science, …
• Organizers – Co-organized in collaboration among FODAVA-Lead,
FODAVA-Partners, NVAC, and DHS S&T Center of Excellence
• Time– Co-locate with NVAC Fall Consortium meeting
• Location– PNNL/NVAC, Richland, WA
FODAVA Annual Workshop 2009
• Theme: Machine Learning & Geometric Computing in Visual Analytics
• Organizers: Vladimir Koltchinskii (GATech)
and Mauro Maggioni (Duke)
• Time: November, 2009
• Location: PNNL/NVAC, Richland, WA
VAC Consortium Meetings
• Provides broader exposure of work, to DHS and NVAC communities
• Semi-annual:
Next Meeting: Nov 11-13, 2008, PNNL– Nov. 11: University Technical Exchange Day– FODAVA Panel session– FODAVA Demo/Poster session
• Please participate!
Additional Workshops
• FODAVA workshops at major conferences and meetings• IEEE VAST Conference
– Birds of a Feather session at VAST Oct., 2008
• Workshop on Temporal Analytics
• Other Potential venues– International Conference on Machine Learning– Neural Information Processing Systems (NIPS)– SIAM CSE / SIAM Optimization / SIAM ALA Conferences– ACM Knowledge Discovery and Data Mining (KDD)– AAAS meeting– Others?
Calendar of Events• Sept 2008: FODAVA Kick-Off Meeting• Oct 2008: VAST 2008 BoF Session• Nov 2008: VAC Consortium meeting, FODAVA
Panel and Poster/Demo Session• Dec 2008: DAVA Curriculum Workshop• May 2009: VAC Consortium Meeting• Oct 2009: VAST Conference• Nov 2009: VAC Consortium and FODAVA Annual
Workshop • Temporal Analytics Workshop under consideration
Project Materials• Goal: Articulate contributions being made by
the FODAVA community• Benefits
– Potential collaborators– Foster technology transition opportunities– Broader exposure to potential sponsors
• Materials requested– Project brochures and other collateral material– Videos especially welcome
• Tell us what you’re doing!• POC: Richard Fujimoto
Concluding Remarks• DAVA represents a new, exciting discipline that
brings together diverse communities• Research is motivated and driven by real-world
problems• FODAVA will play a key role in developing and
defining the foundations for DAVA• Communication and collaboration with other
elements of DAVA (e.g., NVAC, RVAC, DHS/S&T CoE) is essential– We need to educate ourselves!
Thank you!
Extra slides
Student Interns
• Support deep research collaboration between FODAVA lead, FODAVA partners, and PNNL / NVAC– Fundamental research driven by real-world
applications
• Leverage existing intern programs at PNNL– Summer interns
• Leverage GT distance learning capability for academic year interns
• Details to be determined
Undergraduate Education• Georgia Tech Threads curriculum
– Undergraduate program defined as a set of 8 threads– Thread is a body of coursework targeting a certain career
path, e.g., modeling and simulation, human computer interaction, embedded systems, etc.
– Students take two threads to complete BS in CS degree• Existing threads
– Modeling and Simulation: representing processes/systems– Devices: embedded computing– Theory: theoretical foundations of computing– Information Networks: information communication– Intelligence: human-level intelligence– Media: systems for creative expression– People: human-centric computing – Platforms: computing systems, architecture, languages
Modeling & Simulation Thread
• Many students come to Georgia Tech with an inherent love for math and science
• Computation provides a framework to view, understand, analyze, and design systems
Computational modeling is about going from
to
Fluid flow
model
Cellular Automat
a
Queueing Model
Involves developing mathematical / conceptual abstractions of systems that can be represented by efficient software
A Data and Visual Analytics Thread?
Foundations
ComputingMath Science
Discrete MathContinuous Math
TheorySoftwareHardwareAlgorithms
PhysicsBiologyChemistry
Computational Methodsfor Data AnalysisAnd Visualization
Application Discipline(pick one)
AeroCivil, Elect.
EAS, BiologyChemistry, Math
Physics, Industrial Eng.
?
• Curriculum• Foundational mathematics, computing, science• Data analytics, information visualization• Application-oriented specialization
• Integrated approach with capstone design project• Natural complement to modeling and simulation thread
Application Domains
• DHS: Intelligence analysis, Law Enforcement, Emergency response, Intrusion and fraud detection, ….
• BioMedical Informatics• Bioinformatics/Systems Biology• Astronomy• Text Analysis: Documents, e-mails, …• Cybersecurity• Transportation• …
Vladimir Koltchinskii, School of Mathematics
Sparse Recovery : For automatic determination of relevant features (Basis pursuit, Soft threshholding, LASSO …)Comprehensive theory is only starting to be developed
Penalized Empirical Risk Minimization: Basis for many solutions in basic problems of learning theory, e.g. regression, classification, density estimation
Challenge: extend the theory of sparse recovery to broader framework of learning theory, e.g. infinite classes of functions
• Machine Learning- Learning Theory- Feature Selection - Theory of Sparse Recovery - Empirical Risk Minimization
• Computational Statistics
• Continuous Optimization
- Interior-point methods- Semidefinite programming- Cone programming- Algorithms for large-scale optimization
• Computational Statistics and Graph Theory
Renato Monteiro, School of Industrial & Sys. Eng.
Dimension Reduction and Semi-definite Programming
• Higher level of reduction with more difficult objective function• Learning manifolds which preserve ordering of distances• Off-the-shelf SDP software does not scale• Design of efficient algorithms based on the first-order method, convex-concave saddle point problem
Alexander Gray, Computational Sci. & Eng.
Goal: make machine learning efficient– For massive datasets, e.g. for astronomy,
Large Hadron Collider, network traffic– For fast visualization, e.g. our new
manifold learning methods
• Developed fastest practical algorithms for many learning methods
• Coming in Dec 2008: MLPACK library
John Stasko, School of Interactive Computing and GVU Center
Visualization for Investigative Analysis - Putting the Pieces Together with Jigsaw
Information VisualizationHuman Computer Interaction
Help investigative analysts discover plans, plots and threats embedded across large document collections
Multiple visualizations (views) of the documents, entities, & their connections Views are highly interactive and coordinatedAnalysts explore the documents and entities through the views
Building a collaborative versionRepresenting reliability and uncertaintyEntity aliasing and hierarchy supportVisualizing the investigative process
Haesun Park, Computational Sci. & Eng.
Effective Dimension Reduction with Prior Knowledge
• Dimension Reduction for Clustered Data: Linear Discriminant Analysis (LDA), Generalized LDA (LDA/GSVD), Orthogonal Centroid Method (OCM)
• Dimension Reduction for Nonnegative Data: Nonnegative Matrix Factorization (NMF)
• Applications: Text Classification, Face Recognition, Fingerprint Classification, Gene Clustering in Microarray Analysis …
• Numerical Computing• Algorithms for Massive Data Analysis
- Dimension Reduction- Clustering and Classification
• Bioinformatics- Microarray analysis- Protein structure prediction
Education and Outreach Goals
FODAVA lead will• Encourage and coordinate development of
FODAVA Curriculum• Encourage and coordinate knowledge exchange
toward creating a workforce pipeline– Undergraduate education– Graduate education– Lifelong learning
• Facilitate research collaboration• Facilitate outreach to underrepresented groups
Engaging FODAVA Community
• FODAVA program provides a platform to bring together community of researchers, educators and practitioners
• Activities might include– Education workshops to share experiences,
develop best practices– Curriculum development– Repository of information and teaching
materials (e.g., SRVAC, VADL)