Date post: | 09-May-2015 |
Category: |
Education |
Upload: | james-hendler |
View: | 1,111 times |
Download: | 1 times |
Data ExplorationJim Hendler
Director, Rensselaer Institute for Data Exploration and Applications
THE RENSSELAER IDEARensselaer Polytechnic Institute, USA
http://www.cs.rpi.edu/~hendler
IDEA
• Data-driven Medical and Healthcare Applications• Predictive Models for Business and Economics• “Biome” studies for Built and Natural Environments• Question Answering from texts and data• Resiliency Models for Population-Scale Problems and cyber-
security domains• Semantically-enabled Data Services for Science and
Engineering Research• Materials genome and nano-manufacturing informatics• Platforms for testing Policy and Open Data issues • …
Data-driven research areas at RPI
IDEA
The Rensselaer IDEA: empowering our researchers
Data discovery, integration,
and interaction technologies
Application-specificdata tools
IDEA
High Performance Modeling and Simulation• Center for Computational Innovation
Cognitive Computing • Watson at Rensselaer IBM Partnership
Perceptualization• Experimental Multimedia Performing Arts Center
Data Science• Data Science Research Center
The trunk: Shared Data Technologies
IDEA
Roots: Data Exploration
Discover
Integrate
Validate
Explain
Geekopedia: Data exploration helps a data consumer focus an information search on the pertinent aspect of relevant data before true analysis can be achieved. In large data sets, data is not gathered or controlled in a focused manner. Even in smaller data sets, it is also true that data gathered are not in a very rigid and specific technique can result in a disorganized manner and a myriad of subsets each…
DATA
IDEA
Data Exploration Challenges
Discover
Integrate
Validate
Explain
These needs live outside traditional data/info architectures
IDEA
Discovery needs semantics
How do you find the Data you need?
Middle Eastern Terrorists for $800 ?
IDEA
Discovery – there’s a lot out there
IDEA
Discovery needs more than keywords
World Bank: Africa
US Data.gov: Crop
Africover: Agriculture
Kenya: Agricultural
IDEA
Integration needs Semantics
Person
RIN 660125137
Address # 1118
Address St Pinehurst
Address zip 12203
Course topic CSCI
Course # 4961
Campus Personnel
RPI ID 660125137
Name Hendler
Campus Classes
CRN 1118
Name Intro to Physics
YES
NO!!!!
IDEA
Semantic Web and Linked Data (UK)
County Council
Ordnance Survey
Royal Mail
IOGDC Open Data Tutorial 11
IDEADistribution Statement
http://logd.tw.rpi.edu
Data Mashups
IDEA
Validation needs semantics
Easy for us
IDEA
Hard for machines…
Head to head comparison shows that burglaries in Avon and Somerset (UK) far exceed those in Los Angeles, California
IDEA
Data + everything else you know
Same or different?
Do the terms mean the same? Are they collected in the same way? Are they processed differently? …
IDEA
Trends in Smoking Prevalence, Tobacco Policy Coverage and Tobacco Prices (1991-2007)
Validation/Explanation need knowledge
Statistical correlation needs explanation
IDEA
Explanation also needs Semantics
Inference Web: McGuinness – various DoD/IC projects
IDEA
Closing the loop: where do the semantics come from?
Data
Prediction
Model
Design
How do we go from the predictive analytics of Big Data to models/explanations that allow newunderstanding?
IDEA
1. Better tools for Analytics, Agents and HPC
Make the tools and algorithms being developed by RPI researchers more “reusable” and multitask (including HPC data-analytic tools)
IDEA
2. Next-Gen Visualization (at scale)
How can multi-modal, multi-user, large scale sensory (visualization, sonification, haptics) interaction change the way we understand data?
IDEA
3. Include “agents” in the modeling
Develop technologies that enable researchers to work with “human-based” data at larger scales and in new ways• Population-scale
computing models for agent-based simulations
IDEA
Approach
Platform: Research in using supercomputers fordiscrete modeling• Carothers’ ROSS model
KR Model:• Weaver’s restricted rules
on graphs
Challenge problem:• Classification algorithms at petaflop scale• “Logical” (nonlinear, discontinuous) agents
IDEA
4. Exploit Cognitive Computing
IDEA will be the hub of Rensselaer’s cognitive-computing research• eg. Answer questions such as “Why” and “How”
integrated with large scale simulations
IDEA
Watson’s parallel model
Distributed (coarse-grained) parallelism© Making Watson Fast, IBM J Res and Dev,3/4 2012
IDEA
DeepQA type approach best on large clusters
(Physical) Simulation runs on supercomputers
Cognitive Computing at Scale
IDEA
Approach: link these computational models
Surmise (unproven): Cognitive Computing on a fast (large) cluster can query computations run against data generated by simulations (physical or agent-based) on the supercomputer
IDEA
• Semantics is a key technology for common data services
5. Data services will provide synergy across disciplines
Discovery, Integration. ValidationCuration, Citation,Archiving …
IDEA
Conclusions• The “warehouse” is only a small part of the data
ecosystem• Database technologies are only part of the story• Discovery, Integration, … , validation, explanation are key to
solving problems with data
• Closing the loop means “exploring” our data • Humans are still a key player in this
• The Rensselaer IDEA will explore• Data-driven applications and tools, but also…• … multimodal visualization, multiscale and agent modeling,
cognitive computing, and semantic data platforms
Rensselaer Institute for Data Exploration and Applications