Date post: | 07-Dec-2014 |
Category: |
Education |
Upload: | summersocialwebshop |
View: | 2,042 times |
Download: | 0 times |
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
Words and Networks:
Considering the Content of Text Data
for Network Analysis
Jana Diesner
Assistant Professor
The iSchool, University of Illinois at Urbana-Champaign
Talk at Summer Social Webshop 2012
1
Words and Networks
• Problem statement/ motivation:
“We cannot reduce
communication to message
transmission” (Corman et al.
2002)
“Travelling through the
network are fleets of social
objects” (Danowski 1993)
• Goal with my research: Understand the interplay and co-evolution of a) knowledge/ information and b) structure/ functioning of socio-technical networks.
2
• Information Extraction (IE)
• Socio-Linguistics
• Probabilistic Graphical Models
• Theory and models
Natural
Language
Processing
Machine
Learning
Social
Science,
Network
Analysis
Co
mp
uta
tio
na
l
Inte
gra
tio
n
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
Classic Approach: Semantic Networks
Collins and Loftus (1975). A spreading activation theory of semantic
memory. Psychological Review, 82, 407-428.
Overview: From Words to Networks
Text Data Network Data Applications
• Need: scalable, reliable, robust methods & tools
• Unstructured
• At any scale
• Network Analysis
• Answer substantive and graph-theoretic questions
• Visualizations
• Develop and test hypothesis and theories
• Populate databases
• Input to further computations, e.g. simulations, machine learning
4
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
Example for application context: Sudan
Problem: Develop, evaluate and apply a methodology and computational solution for extracting socio-technical network data from large-scale text corpora.Paper: Diesner J, Tamabyong L, Carley KM (accepted) Mapping socio-cultural networks of Sudan from open-source, large-scale text data. Journal of Computational and Mathematical Organization Theory.
1. Mental Models (Spreading Activation) (Collins & Loftus 1975)
2. Case Grammar and Frame Semantics (Fillmore 1982, 1986)
3. Discourse Representation Theory (Kamp 1981)
4. Knowledge representation in AI, assertional semantic networks
(Shapiro 1971, Woods 1975)
5. Centering Resonance Analysis (Corman et al. 2002)
6. Mind maps (Buzan 1974)
7. Concept maps (Novak & Gowin 1984)
8. Hypertext (Trigg & Weiser 1986)
9. Qualitative text coding (Grounded Theory) (Glaser & Strauss 1967)
10. Definitional semantic networks incl. text coding with ontologies
(Fellbaum 1998)
11. Semantic Web (Berners-Lee et al. 2001, Van Atteveldt 2008)
12. Frames (Minsky 1974)
13. Semantic Grammars (Franzosi 1989, Roberts 1997)
14. Network Text Analysis in social science (Carley & Palmquist 1991)
15. Event Coding in pol. science (King & Lowe 2003, Schrodt et al. 2008)
16. Semantic networks in comm. science (Danowski 1993, Doerfel 1998)
17. Probabilistic graphical models (Howard 1989, Pearl 1988) 6
Au
tom
ati
on
Ab
stra
ctio
n
Ge
ne
rali
zati
on
Methods for Constructing
Networks of Words
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
7
Who?
(people, groups)
Why?
(beliefs, sentiments,
mental models)
What?
(tasks,
events)
When?
(time)
Oil
UN
Security
Conflict
Food
Nodes for Networks:
Named Entities and Beyond
How?
(resources,
knowledge)
Where?
(places)
Sudan
Recipe for using machine learning to build a
prediction model for text data
• Get some labeled ground-truth data • Build a classifier/model (h) that for every
sequence of words (x) and label per word (y) predicts one category per word (y = h (x)),incl. for new and unseen text data
• Exploit many clues from the text data (lexical, syntactic, statistical)
• Train and validate the model• 87% to 89% accuracy (compare to intercoder
reliability) • Make model available in end-user product
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
Recipe for extraction network data from text
data
• Use prediction model to extract entities from
text data, consider them as nodes
– Applied to about 80,000 text data documents
• Link the nodes according to
– Proximity
– Surface patterns
– Syntax
– Statistical information
Results
10
Degree Centrality 03 04 05 06 07 08 09 10
Omar al-Bashir 3 3 2 1 1 1 1 1
Ali Osman Taha 1 2 3 4 3 3 3 3
John Garang 2 1 1 3 3 4 6 8
Salva Kiir Mayardit 8 10 4 2 2 2 2 2
Hosni Mubarak 4 7 5 6 9 8 4 6
Sadiq al-Mahdi 6 5 10 9 5 7 8 4
Hassan al-Turabi 5 6 7 10 5 8 9 5
Abdul Wahid al Nur 10 9 9 8 7 4 5 7
Yoweri Museveni 7 8 7 6 11 10 7 8
Kofi Annan 9 4 6 5 8 11 11 11
Deng Alor 11 11 11 11 10 6 9 8
Betweenness Centr. 03 04 05 06 07 08 09 10
Omar al-Bashir 1 1 1 1 1 1 1 1
Salva Kiir Mayardit 6 10 2 5 2 2 2 2
Ali Osman Taha 4 3 3 7 6 7 5 4
John Garang 3 6 5 4 4 6 7 7
Sadiq al-Mahdi 2 8 10 2 7 5 6 3
Abdul Wahid al Nur 8 4 7 8 3 4 3 6
Kofi Annan 7 2 4 3 10 11 8 10
Yoweri Museveni 5 5 9 6 5 9 8 10
Deng Alor 8 10 10 9 9 3 8 5
Hosni Mubarak 8 9 8 11 8 8 4 8
Hassan al-Turabi 8 7 6 10 11 10 8 9
Eigenvector Centr. 03 04 05 06 07 08 09 10
Ali Osman Taha 1 2 3 3 3 3 3 4
Omar al-Bashir 3 3 5 2 2 2 2 3
Salva Kiir Mayardit 7 10 4 1 1 1 1 1
John Garang 2 1 1 4 4 4 7 9
Hosni Mubarak 4 5 6 5 11 5 4 7
Kofi Annan 8 4 7 6 6 11 11 1
Yoweri Museveni 9 8 8 7 9 6 5 8
Hassan al-Turabi 5 7 10 8 8 10 8 5
Sadiq al-Mahdi 6 6 9 9 7 8 10 6
Deng Alor 11 11 1 10 5 7 9 10
Abdul Wahid al Nur 10 9 11 11 10 9 6 11
Triads 03 04 05 06 07 08 09 10
Omar al-Bashir 1 1 1 1 1 1 1 1
Ali Osman Taha 2 3 3 4 4 3 2 2
John Garang 3 2 2 2 2 6 7 7
Salva Kiir Mayardit 7 10 4 3 3 2 3 3
Hosni Mubarak 7 4 5 6 6 8 4 5
Sadiq al-Mahdi 4 7 7 7 6 7 7 3
Abdul Wahid al Nur 10 9 9 7 4 5 5 7
Kofi Annan 7 5 5 5 11 11 7 7
Yoweri Museveni 6 6 8 9 9 10 6 5
Hassan al-Turabi 5 8 9 9 8 9 7 7
Deng Alor 10 10 9 9 10 4 7 7
• President North: Known performer
• President South: Now established
• Legacy of religious leaders
• Presence of neighboring
presidents
2003 2004 2005 2007 2010
Darfur
conflict
Continuous
civil war (since
1993)
Comprehensive Peace Agreement
Garang 1st VP, followed by Kiir
Autonomous South Sudan
SPLA withdraws
from government
Votum in South Sudan
about Separation
Activity: Control: Close to power:
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
Prominent Organizations
• Strong presence of armed forces
• Strong influence of external groups
• Not shown from top 10 Sudanese groups:– Janjaweed, Nuer, Oil and gas
corporation, prisons and jails
• Two ethnic groups/ tribes among top ten Sudanese groups
11
Degree Centrality 0304 05 06 07 08 09 10
United Nations 4 2 1 1 1 1 1 5
Rebel Groups 1 1 2 3 4 3 2 3
Military 2 3 3 2 2 2 4 2
SPLA # 6 5 4 3 4 3 1
Security Council 5 5 4 5 5 5 5 6
Sudan government 3 4 6 6 8 8 9 7
Nat. Congress Party 6 9 9 8 6 7 10 4
African Union 8 7 8 7 7 9 7 10
Inter. Criminal Court # 11 7 11 9 6 6 9
Dinka 9 10 11 9 10 10 8 8
Churches 7 8 10 10 11 11 11 11
Betweenness Centr. 0304 05 06 07 08 09 10
Military 1 1 3 3 1 1 2 1
United Nations 3 6 2 2 3 2 1 3
SPLA # 3 1 1 2 3 5 2
Rebel Groups 4 2 4 4 7 5 3 4
Sudan government 2 4 5 8 4 7 6 10
Nat. Congress Party 6 9 8 5 5 4 8 7
Churches 5 7 9 10 6 6 9 9
Dinka 8 5 6 6 8 11 11 6
African Union 7 8 7 11 10 10 10 5
Inter. Criminal Court # 11 10 9 9 8 4 11
Security Council 9 10 11 7 11 9 7 8
Eigenvector Centr. 0304 05 06 07 08 09 10
United Nations 4 2 1 2 1 2 1 5
Military 2 3 3 1 2 1 5 2
Rebel Groups 1 1 4 3 4 3 6 3
Security Council 5 5 2 4 5 4 2 8
SPLA # 6 5 5 3 5 7 1
Sudan government 3 4 7 6 8 7 8 6
African Union 8 7 8 7 6 9 4 10
Inter. Criminal Court # 10 6 9 9 6 3 7
Nat. Congress Party 6 9 10 8 7 8 9 4
Churches 7 8 9 10 10 10 10 11
Dinka 9 11 11 11 11 11 11 9
Triads 0304 05 06 07 08 09 10
Military 1 1 1 1 2 1 6 1
United Nations 4 3 2 2 1 4 1 2
Rebel Groups 2 2 4 4 4 2 4 5
SPLA # 5 3 3 3 3 2 4
Sudan government 3 4 5 7 5 7 4 6
Nat. Congress Party 5 9 10 8 6 6 9 3
African Union 8 6 6 6 7 10 7 9
Security Council 7 7 7 5 8 9 8 8
Inter. Criminal Court # 11 8 9 10 5 3 7
Churches 6 8 9 10 9 8 10 11
Dinka 9 10 11 11 11 11 11 10
What themes connect tribes?
12
Degree Centrality (Activity)2003 2004 2005 2006
population conflict population conflictconflict kinship conflict populationcultural population cultural kinshippeace_making pol_boundary kinship culturalbiomes_land_cover biomes_land_cover pol_boundary pol_boundary
2007 2008 2009 2010population pol_boundary pol_boundary kinshipconflict population conflict peace_makingkinship measures_num. peace_making conflictcultural conflict cultural pol_boundarypeace_making cultural kinship culturalBetweenness Centrality (Bridging)
2003 2004 2005 2006industry economy water_mgmt. climate_changemeasures_num. hunger discourse subsistenceemotion labor disaster disasterrumors ideology_political environment ideology_religion
disaster preposition aid water_mgmt.2007 2008 2009 2010
ideology_religion finance education emotionwelfare preposition literature lawsecurity_forces ideology_political war internal_conflictpolitical prejudice_discrim. ideology_pol. kinshipwater_mgmt. economy health age
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
13
Year Number
of
tribes
Tribes linked
to conflict or
war
Intertribal links
for pairs linked
to conflict or war
2003 32 38% 32%
2004 44 45% 66%
2005 33 39% 40%
2006 46 50% 83%
2007 47 62% 78%
2008 50 60% 65%
2009 28 68% 95%
2010 27 56% 100%
2003
200820072006
20052004
• High and increasing rate of tribes associated with conflict or war
• Many of links between tribes for tribes associated with conflict and war
14
What resources are associated with war and
conflict?
• Conflict: Agriculture, Livestock (farmers vs. herders)
• War: Land Resource (concept of dar)
• Conflict and War: Oil, Civic, Transportation
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
From Words to Networks:
Dimensions of Accuracy
15
Hmm,
Information
Extraction looks
like a nice idea.
How accurate are
your results?
I fine-tuned our
method and
technology based
on F-values and
feedback from
SMEs.
The F values
tell me all I
need to know.But the F only shows the
increase in accuracy over
a baseline or
benchmark. Maybe we
need to ask a different
question…
Research Question
– Problem: Impact of Relation Extraction methods and
subroutines on network data and analysis results
unknown
– Question: How do network data and analysis results differ
depending on specific relation extraction methods?
– Who cares?
– Increased comparability, generalizability,
transparency of methods and tools
– Increased control and power for developers and users
– Supports drawing of reasonable and valid conclusions
• Paper: Diesner J, Carley KM (2012) Impact of Relation Extraction Methods
from Text Data on Network Data and Analysis Results. ACM Web Science
Conference, Words and Networks Workshop (WON 2012), Evanston, IL16
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
Methods
17
18
Sudan Corpus Funding Corpus Enron Corpus
Genre Newswire Scientific Writing Emails
Size 80,000 articles 56,000 proposals 53,000 emails
Source LexisNexis Cordis FERC/ SEC
Time span 8 years 22 years 4 years
Text-based
networks
Article bodies Project description Email bodies
Meta-data
network
Index terms
(knowledge)
Index terms
(knowledge) and
collaborators (social)
Email headers (social)
• All: large scale, over time, open source data from different domains
Data
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
Results: Performance of node prediction
models in application domains
• Method: systematic evaluation of auto-generated thesauri on all 3 datasets
• No meaningful differences in accuracy across domains, time, writing styles
– Technology generalizes AND generalizes better than manually built thesauri
– Creation and refinement more efficient (time) and effective (finding nodes) than manually built thesauri
• Subtype “specific” more unique/different instances, but “generic” far more total instances
– Rethink focus of network analysis:
• More references to roles and collectives than to individuals
• Importance of extracting unnamed entities
• Specific” instances lower accuracy than “generic” ones due to sparseness 19
Results: How do relation extraction methods
compare?
• Ground truth data (SME) hardly resembled by analyzing text bodies, not at all by meta-data networks
• SME in TextM: 53% nodes 20% links
• SME in TextA: 11% nodes, 5% edges
• Agreement in structure and key entities mainly function of:
• Size of extracted graph
• External material/ sources used
• Post-processing/ cleaning
– Agreement can be coincidental if no proper word sense disambiguation performed
• Type of network20
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
Results: How do relation extraction methods
compare?
3. Agreement between text-based, and with meta-data
depends on type of network
Type Text-Based Networks Meta-Data Network
Social
networks
- Substantial overlap TextM
and TextA, esp. key players
(identity, rank)
- Localized view on geo-
political entities and culture
- Small overlap in key entities
with text-based networks
- Key players: major
international agents, hardly
localized views
Knowledge
networks
- Minimal overlap between
manual and automated
- Gist of information in terms
of common sense, highly
salient entities
- Seem more informative
(crafted mini-summaries)
-Less coreference resolution
issues
- Minimal overlap with text-
based
For more complete view, combine automated text-based
with meta-data network
Cover common/highly salient terms and entities and domain-specific ones21
Text data
Interaction data
Behavioral DataUtilization
Database
Analysis tools
Data integration
and management
Data management
and analysis
• Enhance social
network data with
content nodes in a
none-arbitrary
fashion
• Combine social
networks and
semantic networks
• Cluster social
networks and
compare content per
group
• Reveal
alliances,
factions,
redundancies
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
Research Question
• Question: What
thematic profiles are
used by individuals or
groups who assume
theoretically grounded
roles that make them
prone to actuate or
inhibit changes and
innovation in socio-
technical networks?
23
Change agents
Preservation agentsPaper: Diesner J, Carley KM (2010) A methodology for integrating network theory and topic modeling and its
application to innovation diffusion. IEEE International Conference on Social Computing (SocComp), Workshop
on Finding Synergies Between Texts and Networks, Minneapolis, MN, August 2010.
Theory for relationship between
language and networks
• Socio-linguistic theory (Milroy & Milroy 1985):
– Structural position/role of agents in networks impacts their
motivation and ability to introduce or adopt changes in system.
– Network features more powerful explanation of language change
than alterative extra-linguistic factors (status, class, socio-
demographics).
• Structural roles:
– Innovators: marginal to adopting group, globally peripheral,
mobile, under-conforming to deviant, many weak ties.
– Early adopters: central & strongly tied members of adoption group.
– Late adopters: members of dense, multiplex, close-knit networks
benefit from organizational capabilities (support, resistance to
external pressures) and are constrained by them.
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
Data
• 55,000 proposals funded through “Framework Programmes for Research
and Technology” (FP), FP 1 to 7 (1984 to present), from CORDIS, ©
European Communities, http://cordis.europa.eu/
• Increase transparency over state-level decision making processes
25
•Principal investigator
(name, affiliation)
•Research partners
(name, affiliation)
•Amount awarded for
number of years
•Research category
• Project
description
Explicit
Social
Network
Text
Metadata
Methodology:
Network Analysis• Operationalize roles
• No canonical set of metrics
and values for roles,
solutions:
– Literature review
– Empirical data: not fully
automated, requires data-
driven and case-wise
decisions (incl. basic NSA
expertise)
26
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
Methodology: Text Analysis
• Analysis of substance of language data via Topic
Modeling:
– Reduces dimensionality of text data to gist of a body of
information (Griffiths, Steyvers & Tenenbaum, 2007)
– Output: user-defined number of words clusters (topics)
– Topic: text terms, where each term has probabilistic
weight indicates strength of association of term with
topic.
– Tool: Mallet (McCallum)
27
28Image from: Wikipedia, Latent Dirichlet allocation
Pro
ba
bilistic
Ge
ne
rativ
e
Pro
cess
Ba
ye
sia
n
Infe
ren
ce
Methodology: Computational Integration
of Texts and Networks
some latent
structure, probabilistic
graphical model
Preservation
AgentsChange
Agents
Social
Network
Analysis
Topic Modeling
process
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
Results for FP 6 (2002-2006)
29
change agents
topicproject management
networking and learning
project management
regional development,
waste management
engineeringalternat ive energies
emission reduct ion
emission reduction
public health
regional development
medical
1st project research data regional water structures energy water engine food services tnf
2nd development european management policy waste aircraft gas monitoring diesel europe ict disease
3rd systems europe assessment regions european material hydrogen eu combust ion human business gene
4th system network tools policies europe materials combustion chemical fuel virus satellite arthrit is
5th based innovation project development land performance biomass pollutants sensor studies rural human
6th high knowledge information sustainable market composite solar directive emission million informat ion mouse
7th develop training fisheries region eu damping fuel system integrated developing robot genes
8th technologies projects support national smes forming low pollution power health communication diseases
9th control support studies sustainability aquaculture monitoring process groundwater emissions forest systems mice
DP 0.731 0.276 0.165 0.080 0.070 0.055 0.053 0.050 0.046 0.044 0.038 0.036
preservat ion agents
topicproject management
research in EU
industrynetworking and learning
environmental issues
genetics energytransportation
cancer security industrypublic health
1st project research product ion research water genetic energy services drug governance materials food
2nd european european products network management gene environmentaltransport clinical security properties consumer
3rd development act ivities industry european risk genes eu solut ions cancer social devices quality
4th develop countries design excellence environmentaldisease policy business cell science temperature products
5th research information manufacturing integration data genomic assessment information cells eu techniques production
6th systems eu product training monitoring factors agricultural cities hiv issues high animal
7th based projects industrial europe informat ion molecular european end tumour public industrial safety
8th integrated europe processes knowledge assessment genomics sustainable service therapeut ic ethical based health
9th knowledge act ion materials researchers pract ices studies impact data molecular europe structures project
DP 0.921 0.414 0.160 0.102 0.080 0.077 0.076 0.071 0.062 0.061 0.056 0.055
Preservation agents
• 2nd: “networking”, “training”
(inherent to innovators?)
• Term/ topics addressed only
by them: “innovation”,
“waste”, “regional”
• Environment, sustainability,
alternative energies,
emission reduction: both, but
more prevalent among
change agents
Change agents
• Both: dominating topic
project management, PA’s
load higher on it
• 2nd highest ranking topic for
change agents: generic terms
relating to research in the
European Union
• Topics addressed only by
hubs: industry in the context
of manufacturing, nuclear
energy, cancer research
30
Results FP6
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
Results: FP4 – FP6
• Trends over time:
• Change agents strongly associated with research related to the environment
and climate, preservation addressed this topic with lower weight.
• Preservation agents: focus on transportation and related industries.
• Topics occasionally overlap in subject matter but then differ in prevalence.31
Fourth FP 1994–1998 Fifth FP 1998–2002 Sixth FP 2002–2006
change agent preserv. agent change agent preserv. agent change agent preserv. agent
project mngmt. 0.767 project mngmt. 0.708 project mngmt. 0.660 project mngmt. 0.765 project mngmt. 0.731 project mngmt. 0.921
industry 0.420 industry 0.326 industry 0.319 project mngmt. 0.315 networking & learning 0.276 project mngmt. 0.414
networking 0.171 environment 0.093 project mngmt. 0.214 transportation 0.234 project mngmt. 0.165 industry 0.160
climate 0.075 transportation 0.090 transportation 0.147 project mngmt. 0.230 regional development 0.080 networking & learning 0.102
environment & tech 0.065 environment 0.059 computing 0.137 material science 0.090 waste mngmt. 0.070 environment 0.080
material science 0.065 aviation 0.055 environment 0.092 public health 0.087 engineering 0.055 genetics 0.077
satellite data 0.062 aviation 0.048 genetics 0.080 genetics 0.074 energy 0.053 energy 0.076
environment & tech 0.057 e-commerce 0.045 public health 0.075 energy 0.065 pollution 0.050 transportation 0.071
energy 0.054 public health 0.040 aviation 0.057 genetics 0.064 emission 0.046 cancer 0.062
environment & tech 0.049 environment 0.036 material science 0.054 services & tech 0.063 public health 0.044 security 0.061
environment & tech 0.049 data mngmt. 0.030 genetics 0.051 aviation 0.062 regional development 0.038 industry 0.056
energy 0.043 environment 0.030 energy 0.050 ? 0.060 medical 0.036 public health 0.055
aviation 0.039 material science 0.028 environment 0.050 environment 0.057 automobiles 0.035 energy 0.043
environment & food 0.034 environment 0.025 public health 0.045 environment 0.055 transportation 0.029 emissions 0.040
energy 0.027 genetics 0.017 climate 0.044 emission 0.048 environmental 0.027 ecology & climate 0.039
pollution 0.026 medical 0.009 hightech 0.043 public health 0.045 medical 0.025 nuclear energy 0.039
genetics 0.015 environment 0.003 climate 0.040 climate 0.040 energy 0.025 aviation 0.031
services & tech 0.036 hightech 0.033 genetics 0.024 public health 0.024
environment 0.035 genetics 0.030
science 0.031 environment 0.026
Limitations and What’s Next
• Limitations:
– Incomplete data, no rejected proposals.
– Validation of unsupervised learning results (Chang et al.
2009).
• Next steps:
– Very coarse level of aggregation: use more fine-grained
levels/ clusters (fields, socio-demographic attributes, …)
– Test robustness of role operationalization.
– Take award money and other meta data into account as
additional constraint.
– Investigate competition.
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
Technology-Mediated Social Participation
1) Clarify national priorities
• Apply methods to analyze large collections of text data in application contexts/ domains to reveal patterns and explain underlying mechanisms
2) Develop deep science questions
motivation, trust, empathy, responsibility, identity
3) Promote novel research methodologies
• Consider substance of text data for network analysis
• Combine two types of behavioral data (quantitative, qualitative) in scalable, robust, systematic fashion
4) Identify extreme technology challenges
• Human side of security (protect not only technical infrastructures, but also data and reputation)
• Scalability: make data sets analyzable that were traditionally assessed via manual or computer-supported methods
5) Influence national policy
6) Increase educational opportunities
Acknowledgements
• This work was supported by the National Science Foundation (NSF) IGERT 9972762, the Army Research Institute (ARI) W91WAW07C0063, the Army Research Laboratory (ARL/CTA) DAAD19-01- 2-0009, the Air Force Office of Scientific Research (AFOSR) MURI FA9550-05-1-0388, the Office of Naval Research (ONR) MURIN00014-08-11186, and a Siebel Scholarship. Additional support was provided by CASOS, the Center for Computational Analysis of Social and Organizational Systems at Carnegie Mellon University. The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the NSF, ARI, ARL, AFOSR, ONR, or the United States Government.
34
Jana Diesner, UIUC, The iSchool
Summer Social Webshop 2012 @ University of Maryland
Thank you!
• For questions, comments, feedback, follow-up:
Jana Diesner
Phone: (217) 244-3576
• (Copies of) Publications at
http://people.lis.illinois.edu/~jdiesner/publications.h
tml35