2
The purpose of DGRC
To Make Digital Government Happen• Advance information systems research• Bring the benefits of cutting edge IS research to
government systems• Help educate government and the community• Learn needs from government partners to drive
next stage system development• Build pilot systems as part of new infrastructure
3
The problem and the solution
Solution: Create a system to provide easy standardized access: need multi-database access engine, need powerful user interface, need terminology standardization mechanism.
Problem:FedStats has thousands of databases in over seventy Government agencies: data is duplicated and near-duplicated, even Government officials and specialists cannot find
it
4
The Vision: Ask the Government...
How have property values in the area changed over the past decade?
How many people had breast cancer in the area
over the past 30 years?
Is there an orchestra? An art gallery? How far are the nightclubs?
We’re thinking of moving to Denver...What are the schools like there?
CensusLaborStats
5
Research challenges
Scale to incorporate many databases… build data models automatically
Process large and disparate data efficiently… develop fast processing techniques… create aggregation and substitution operators
Integrate data models across sources and agencies…take a large ontology and link the models into it
automatically… develop ways to automatically harvest glossary data for
building ontologies
Develop new ways to interact with data… use language processing tools for question-answering
Display complex information from distributed sources…develop and evaluate new presentation techniques
6
The Energy Data Consortium EDC members
Government partners
Research challenge
Information Sciences Institute, USC Columbia University
Energy Information Admin. (EIA) Bureau of Labor Statistics (BLS) Census Bureau
Make accessible in standardized way the contents of thousands of data sets, represented in many different ways (webpages, pdf, MS Access, text…)
Xxx x xXxx xxX xxx xXxxx xXxxxxxxx
Xx xxXxx xxXx xxxX Xxx x xxx
x x x x
7
The Vision: Ask the Government...
Are alternative energysources any cheaper touse?
Which state has the
highest oil production?
How long has thenuclear plant been inservice?
We’re thinking of moving to Cambridge…How much does gas cost there?
CensusLaborStats
8Data Integration
Labor
EPA
EIA
Census
Heterogeneous DataSources
User InterfaceInformation Access
DefinitionOntology
query
9
From Phase I to Phase IIPhase One Terminology/ontology Information integration and in-memory data
analysis New Interfaces for Complex Human-computer
interactionPhase Two Question-Answering Usability Testing and Evaluation Privacy Portal
10Data Integration
Labor
EPA
EIA
Census
Heterogeneous DataSources
User InterfaceInformation Access
DefinitionOntology
Trade
MainMemoryQuery
Processing
Question-AnswerAccess
User Evaluation
Task-basedEvaluation
query
11Data Integration
Labor
EPA
EIA
Census
Heterogeneous DataSources
User InterfaceInformation Access
DefinitionOntology
Trade
MainMemoryQuery
Processing
Question-AnswerAccess
User Evaluation
Task-basedEvaluation
query
12Data Integration
???
EPA
EIA
Census
Heterogeneous Data& Meta-data Sources
User InterfaceInformation Access
Data Definitions(Ontology)
interface
queryLabor
definitions
Metadata mediates
13
http://www.eia.doe.gov/emeu/states/main_ca.html
Recent exampleEIA problem: Data cleared for
publication is grouped together across states
Also need data gathered by state separately
Need general ability to ungroup and reaggregate data
http://www.eia.doe.gov/emeu/states/main_ca.html
14
Main Memory
Achievements on large data manipulation – optimization for efficiency and speed
New input for visualization with dials that user can manipulate
Applications with electoral boundaries
15
Get Gloss The Identification of Glossaries in High
Fan-out Websites Large sites with many links Glossaries hidden all over No coherent view within and across
sites No way to determine who is defining
what and how
16
Glossary Finding Function
Function to compute a best guess score Ranked list Higher is better
Evaluation to determine how likely it is that a high score will be associated with a (large) glossary.
17
ParseGloss Once a glossary is found, then how can
individual definitions be analyzed Once analyzed into components, how
then can this be loaded into the ontology
GetGloss ParseGloss Ontology
18
Evaluation New Effort Peter Sommer, Director of Education
Center for New Media Teaching and Learning
Focus on purposeful use of emerging technologies for researchers, students, teachers, analysts…
Funded by NSF and BLS
19
Privacy Portal Increasing multiple access to data bases
creates a security problem Original DGRC proposal included
component on privacy Newly funded NSF SGER proposal Columbia – Computer Science and
School of Business (Stolfo and Johnson)
20
Privacy and Government Websites What are user fears? What are their preferences? What are their perceptions of privacy
issues? What are the implications for design of
systems and interfaces?
21
Social Science Research
Explorations of “dial manipulation” application for health databases for dynamic querying
Useful for interactive mapping for redistricting Use statistics on neighborhoods, e.g. CPS (long
and wide) Census summary data is another source – tables
compiled for various levels Joint with ISERP Social Science Research
Center
22
Proposals
SGER proposal funded Topic: Urban transportation study—new methods for
freight tracking in LA by comparing across databases Grant awarded to USC, shared by ISI and USC’s Dept of
Policy and Planning White paper to DoT
Topic: Searching for patterns in freight traffic Submitted by USC campus people and Jose Luis Ambite
ITR proposal submitted Topic: Semi-automated topic hierarchy creation Partners: Eduard Hovy communicated with EPA group If funded will use EPA’s CARAT ontology as starting point
and evaluation standard
23
Digital Government is Here! An increasing quantity and variety of
information is available in digital form Government agencies already collect much
digital information Government is a holder and provider of often
unique data and services Access to information/services by industry
and citizen-users must be facilitated, while limiting cost and risk
24
Well – Not Quite... Expectations are very high due to the
pervasiveness of Web/Internet information technology
Government IT/IS is behind best practices Legacy, stovepipe systems designed for trusted
staff Failed very large modernization efforts
A disconnect exists between the research community and government IS
25
The purpose of DGRC
To Make Digital Government Happen• Advance information systems research• Bring the benefits of cutting edge IS research to
government systems• Help educate government and the community• Learn needs from government partners to drive
next stage system development• Build pilot systems as part of new infrastructure