+ All Categories
Home > Education > Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Date post: 07-Dec-2014
Category:
Upload: summersocialwebshop
View: 2,042 times
Download: 0 times
Share this document with a friend
Description:
 
18
Jana Diesner, UIUC, The iSchool Summer Social Webshop 2012 @ University of Maryland Words and Networks: Considering the Content of Text Data for Network Analysis Jana Diesner Assistant Professor The iSchool, University of Illinois at Urbana-Champaign Talk at Summer Social Webshop 2012 1 Words and Networks Problem statement/ motivation: “We cannot reduce communication to message transmission” (Corman et al. 2002) “Travelling through the network are fleets of social objects” (Danowski 1993) Goal with my research: Understand the interplay and co-evolution of a) knowledge/ information and b) structure/ functioning of socio-technical networks. 2 • Information Extraction (IE) • Socio-Linguistics • Probabilistic Graphical Models • Theory and models Natural Language Processing Machine Learning Social Science, Network Analysis Computational Integration
Transcript
Page 1: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

Words and Networks:

Considering the Content of Text Data

for Network Analysis

Jana Diesner

Assistant Professor

The iSchool, University of Illinois at Urbana-Champaign

Talk at Summer Social Webshop 2012

1

Words and Networks

• Problem statement/ motivation:

“We cannot reduce

communication to message

transmission” (Corman et al.

2002)

“Travelling through the

network are fleets of social

objects” (Danowski 1993)

• Goal with my research: Understand the interplay and co-evolution of a) knowledge/ information and b) structure/ functioning of socio-technical networks.

2

• Information Extraction (IE)

• Socio-Linguistics

• Probabilistic Graphical Models

• Theory and models

Natural

Language

Processing

Machine

Learning

Social

Science,

Network

Analysis

Co

mp

uta

tio

na

l

Inte

gra

tio

n

Page 2: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

Classic Approach: Semantic Networks

Collins and Loftus (1975). A spreading activation theory of semantic

memory. Psychological Review, 82, 407-428.

Overview: From Words to Networks

Text Data Network Data Applications

• Need: scalable, reliable, robust methods & tools

• Unstructured

• At any scale

• Network Analysis

• Answer substantive and graph-theoretic questions

• Visualizations

• Develop and test hypothesis and theories

• Populate databases

• Input to further computations, e.g. simulations, machine learning

4

Page 3: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

Example for application context: Sudan

Problem: Develop, evaluate and apply a methodology and computational solution for extracting socio-technical network data from large-scale text corpora.Paper: Diesner J, Tamabyong L, Carley KM (accepted) Mapping socio-cultural networks of Sudan from open-source, large-scale text data. Journal of Computational and Mathematical Organization Theory.

1. Mental Models (Spreading Activation) (Collins & Loftus 1975)

2. Case Grammar and Frame Semantics (Fillmore 1982, 1986)

3. Discourse Representation Theory (Kamp 1981)

4. Knowledge representation in AI, assertional semantic networks

(Shapiro 1971, Woods 1975)

5. Centering Resonance Analysis (Corman et al. 2002)

6. Mind maps (Buzan 1974)

7. Concept maps (Novak & Gowin 1984)

8. Hypertext (Trigg & Weiser 1986)

9. Qualitative text coding (Grounded Theory) (Glaser & Strauss 1967)

10. Definitional semantic networks incl. text coding with ontologies

(Fellbaum 1998)

11. Semantic Web (Berners-Lee et al. 2001, Van Atteveldt 2008)

12. Frames (Minsky 1974)

13. Semantic Grammars (Franzosi 1989, Roberts 1997)

14. Network Text Analysis in social science (Carley & Palmquist 1991)

15. Event Coding in pol. science (King & Lowe 2003, Schrodt et al. 2008)

16. Semantic networks in comm. science (Danowski 1993, Doerfel 1998)

17. Probabilistic graphical models (Howard 1989, Pearl 1988) 6

Au

tom

ati

on

Ab

stra

ctio

n

Ge

ne

rali

zati

on

Methods for Constructing

Networks of Words

Page 4: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

7

Who?

(people, groups)

Why?

(beliefs, sentiments,

mental models)

What?

(tasks,

events)

When?

(time)

Oil

UN

Security

Conflict

Food

Nodes for Networks:

Named Entities and Beyond

How?

(resources,

knowledge)

Where?

(places)

Sudan

Recipe for using machine learning to build a

prediction model for text data

• Get some labeled ground-truth data • Build a classifier/model (h) that for every

sequence of words (x) and label per word (y) predicts one category per word (y = h (x)),incl. for new and unseen text data

• Exploit many clues from the text data (lexical, syntactic, statistical)

• Train and validate the model• 87% to 89% accuracy (compare to intercoder

reliability) • Make model available in end-user product

Page 5: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

Recipe for extraction network data from text

data

• Use prediction model to extract entities from

text data, consider them as nodes

– Applied to about 80,000 text data documents

• Link the nodes according to

– Proximity

– Surface patterns

– Syntax

– Statistical information

Results

10

Degree Centrality 03 04 05 06 07 08 09 10

Omar al-Bashir 3 3 2 1 1 1 1 1

Ali Osman Taha 1 2 3 4 3 3 3 3

John Garang 2 1 1 3 3 4 6 8

Salva Kiir Mayardit 8 10 4 2 2 2 2 2

Hosni Mubarak 4 7 5 6 9 8 4 6

Sadiq al-Mahdi 6 5 10 9 5 7 8 4

Hassan al-Turabi 5 6 7 10 5 8 9 5

Abdul Wahid al Nur 10 9 9 8 7 4 5 7

Yoweri Museveni 7 8 7 6 11 10 7 8

Kofi Annan 9 4 6 5 8 11 11 11

Deng Alor 11 11 11 11 10 6 9 8

Betweenness Centr. 03 04 05 06 07 08 09 10

Omar al-Bashir 1 1 1 1 1 1 1 1

Salva Kiir Mayardit 6 10 2 5 2 2 2 2

Ali Osman Taha 4 3 3 7 6 7 5 4

John Garang 3 6 5 4 4 6 7 7

Sadiq al-Mahdi 2 8 10 2 7 5 6 3

Abdul Wahid al Nur 8 4 7 8 3 4 3 6

Kofi Annan 7 2 4 3 10 11 8 10

Yoweri Museveni 5 5 9 6 5 9 8 10

Deng Alor 8 10 10 9 9 3 8 5

Hosni Mubarak 8 9 8 11 8 8 4 8

Hassan al-Turabi 8 7 6 10 11 10 8 9

Eigenvector Centr. 03 04 05 06 07 08 09 10

Ali Osman Taha 1 2 3 3 3 3 3 4

Omar al-Bashir 3 3 5 2 2 2 2 3

Salva Kiir Mayardit 7 10 4 1 1 1 1 1

John Garang 2 1 1 4 4 4 7 9

Hosni Mubarak 4 5 6 5 11 5 4 7

Kofi Annan 8 4 7 6 6 11 11 1

Yoweri Museveni 9 8 8 7 9 6 5 8

Hassan al-Turabi 5 7 10 8 8 10 8 5

Sadiq al-Mahdi 6 6 9 9 7 8 10 6

Deng Alor 11 11 1 10 5 7 9 10

Abdul Wahid al Nur 10 9 11 11 10 9 6 11

Triads 03 04 05 06 07 08 09 10

Omar al-Bashir 1 1 1 1 1 1 1 1

Ali Osman Taha 2 3 3 4 4 3 2 2

John Garang 3 2 2 2 2 6 7 7

Salva Kiir Mayardit 7 10 4 3 3 2 3 3

Hosni Mubarak 7 4 5 6 6 8 4 5

Sadiq al-Mahdi 4 7 7 7 6 7 7 3

Abdul Wahid al Nur 10 9 9 7 4 5 5 7

Kofi Annan 7 5 5 5 11 11 7 7

Yoweri Museveni 6 6 8 9 9 10 6 5

Hassan al-Turabi 5 8 9 9 8 9 7 7

Deng Alor 10 10 9 9 10 4 7 7

• President North: Known performer

• President South: Now established

• Legacy of religious leaders

• Presence of neighboring

presidents

2003 2004 2005 2007 2010

Darfur

conflict

Continuous

civil war (since

1993)

Comprehensive Peace Agreement

Garang 1st VP, followed by Kiir

Autonomous South Sudan

SPLA withdraws

from government

Votum in South Sudan

about Separation

Activity: Control: Close to power:

Page 6: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

Prominent Organizations

• Strong presence of armed forces

• Strong influence of external groups

• Not shown from top 10 Sudanese groups:– Janjaweed, Nuer, Oil and gas

corporation, prisons and jails

• Two ethnic groups/ tribes among top ten Sudanese groups

11

Degree Centrality 0304 05 06 07 08 09 10

United Nations 4 2 1 1 1 1 1 5

Rebel Groups 1 1 2 3 4 3 2 3

Military 2 3 3 2 2 2 4 2

SPLA # 6 5 4 3 4 3 1

Security Council 5 5 4 5 5 5 5 6

Sudan government 3 4 6 6 8 8 9 7

Nat. Congress Party 6 9 9 8 6 7 10 4

African Union 8 7 8 7 7 9 7 10

Inter. Criminal Court # 11 7 11 9 6 6 9

Dinka 9 10 11 9 10 10 8 8

Churches 7 8 10 10 11 11 11 11

Betweenness Centr. 0304 05 06 07 08 09 10

Military 1 1 3 3 1 1 2 1

United Nations 3 6 2 2 3 2 1 3

SPLA # 3 1 1 2 3 5 2

Rebel Groups 4 2 4 4 7 5 3 4

Sudan government 2 4 5 8 4 7 6 10

Nat. Congress Party 6 9 8 5 5 4 8 7

Churches 5 7 9 10 6 6 9 9

Dinka 8 5 6 6 8 11 11 6

African Union 7 8 7 11 10 10 10 5

Inter. Criminal Court # 11 10 9 9 8 4 11

Security Council 9 10 11 7 11 9 7 8

Eigenvector Centr. 0304 05 06 07 08 09 10

United Nations 4 2 1 2 1 2 1 5

Military 2 3 3 1 2 1 5 2

Rebel Groups 1 1 4 3 4 3 6 3

Security Council 5 5 2 4 5 4 2 8

SPLA # 6 5 5 3 5 7 1

Sudan government 3 4 7 6 8 7 8 6

African Union 8 7 8 7 6 9 4 10

Inter. Criminal Court # 10 6 9 9 6 3 7

Nat. Congress Party 6 9 10 8 7 8 9 4

Churches 7 8 9 10 10 10 10 11

Dinka 9 11 11 11 11 11 11 9

Triads 0304 05 06 07 08 09 10

Military 1 1 1 1 2 1 6 1

United Nations 4 3 2 2 1 4 1 2

Rebel Groups 2 2 4 4 4 2 4 5

SPLA # 5 3 3 3 3 2 4

Sudan government 3 4 5 7 5 7 4 6

Nat. Congress Party 5 9 10 8 6 6 9 3

African Union 8 6 6 6 7 10 7 9

Security Council 7 7 7 5 8 9 8 8

Inter. Criminal Court # 11 8 9 10 5 3 7

Churches 6 8 9 10 9 8 10 11

Dinka 9 10 11 11 11 11 11 10

What themes connect tribes?

12

Degree Centrality (Activity)2003 2004 2005 2006

population conflict population conflictconflict kinship conflict populationcultural population cultural kinshippeace_making pol_boundary kinship culturalbiomes_land_cover biomes_land_cover pol_boundary pol_boundary

2007 2008 2009 2010population pol_boundary pol_boundary kinshipconflict population conflict peace_makingkinship measures_num. peace_making conflictcultural conflict cultural pol_boundarypeace_making cultural kinship culturalBetweenness Centrality (Bridging)

2003 2004 2005 2006industry economy water_mgmt. climate_changemeasures_num. hunger discourse subsistenceemotion labor disaster disasterrumors ideology_political environment ideology_religion

disaster preposition aid water_mgmt.2007 2008 2009 2010

ideology_religion finance education emotionwelfare preposition literature lawsecurity_forces ideology_political war internal_conflictpolitical prejudice_discrim. ideology_pol. kinshipwater_mgmt. economy health age

Page 7: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

13

Year Number

of

tribes

Tribes linked

to conflict or

war

Intertribal links

for pairs linked

to conflict or war

2003 32 38% 32%

2004 44 45% 66%

2005 33 39% 40%

2006 46 50% 83%

2007 47 62% 78%

2008 50 60% 65%

2009 28 68% 95%

2010 27 56% 100%

2003

200820072006

20052004

• High and increasing rate of tribes associated with conflict or war

• Many of links between tribes for tribes associated with conflict and war

14

What resources are associated with war and

conflict?

• Conflict: Agriculture, Livestock (farmers vs. herders)

• War: Land Resource (concept of dar)

• Conflict and War: Oil, Civic, Transportation

Page 8: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

From Words to Networks:

Dimensions of Accuracy

15

Hmm,

Information

Extraction looks

like a nice idea.

How accurate are

your results?

I fine-tuned our

method and

technology based

on F-values and

feedback from

SMEs.

The F values

tell me all I

need to know.But the F only shows the

increase in accuracy over

a baseline or

benchmark. Maybe we

need to ask a different

question…

Research Question

– Problem: Impact of Relation Extraction methods and

subroutines on network data and analysis results

unknown

– Question: How do network data and analysis results differ

depending on specific relation extraction methods?

– Who cares?

– Increased comparability, generalizability,

transparency of methods and tools

– Increased control and power for developers and users

– Supports drawing of reasonable and valid conclusions

• Paper: Diesner J, Carley KM (2012) Impact of Relation Extraction Methods

from Text Data on Network Data and Analysis Results. ACM Web Science

Conference, Words and Networks Workshop (WON 2012), Evanston, IL16

Page 9: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

Methods

17

18

Sudan Corpus Funding Corpus Enron Corpus

Genre Newswire Scientific Writing Emails

Size 80,000 articles 56,000 proposals 53,000 emails

Source LexisNexis Cordis FERC/ SEC

Time span 8 years 22 years 4 years

Text-based

networks

Article bodies Project description Email bodies

Meta-data

network

Index terms

(knowledge)

Index terms

(knowledge) and

collaborators (social)

Email headers (social)

• All: large scale, over time, open source data from different domains

Data

Page 10: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

Results: Performance of node prediction

models in application domains

• Method: systematic evaluation of auto-generated thesauri on all 3 datasets

• No meaningful differences in accuracy across domains, time, writing styles

– Technology generalizes AND generalizes better than manually built thesauri

– Creation and refinement more efficient (time) and effective (finding nodes) than manually built thesauri

• Subtype “specific” more unique/different instances, but “generic” far more total instances

– Rethink focus of network analysis:

• More references to roles and collectives than to individuals

• Importance of extracting unnamed entities

• Specific” instances lower accuracy than “generic” ones due to sparseness 19

Results: How do relation extraction methods

compare?

• Ground truth data (SME) hardly resembled by analyzing text bodies, not at all by meta-data networks

• SME in TextM: 53% nodes 20% links

• SME in TextA: 11% nodes, 5% edges

• Agreement in structure and key entities mainly function of:

• Size of extracted graph

• External material/ sources used

• Post-processing/ cleaning

– Agreement can be coincidental if no proper word sense disambiguation performed

• Type of network20

Page 11: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

Results: How do relation extraction methods

compare?

3. Agreement between text-based, and with meta-data

depends on type of network

Type Text-Based Networks Meta-Data Network

Social

networks

- Substantial overlap TextM

and TextA, esp. key players

(identity, rank)

- Localized view on geo-

political entities and culture

- Small overlap in key entities

with text-based networks

- Key players: major

international agents, hardly

localized views

Knowledge

networks

- Minimal overlap between

manual and automated

- Gist of information in terms

of common sense, highly

salient entities

- Seem more informative

(crafted mini-summaries)

-Less coreference resolution

issues

- Minimal overlap with text-

based

For more complete view, combine automated text-based

with meta-data network

Cover common/highly salient terms and entities and domain-specific ones21

Text data

Interaction data

Behavioral DataUtilization

Database

Analysis tools

Data integration

and management

Data management

and analysis

• Enhance social

network data with

content nodes in a

none-arbitrary

fashion

• Combine social

networks and

semantic networks

• Cluster social

networks and

compare content per

group

• Reveal

alliances,

factions,

redundancies

Page 12: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

Research Question

• Question: What

thematic profiles are

used by individuals or

groups who assume

theoretically grounded

roles that make them

prone to actuate or

inhibit changes and

innovation in socio-

technical networks?

23

Change agents

Preservation agentsPaper: Diesner J, Carley KM (2010) A methodology for integrating network theory and topic modeling and its

application to innovation diffusion. IEEE International Conference on Social Computing (SocComp), Workshop

on Finding Synergies Between Texts and Networks, Minneapolis, MN, August 2010.

Theory for relationship between

language and networks

• Socio-linguistic theory (Milroy & Milroy 1985):

– Structural position/role of agents in networks impacts their

motivation and ability to introduce or adopt changes in system.

– Network features more powerful explanation of language change

than alterative extra-linguistic factors (status, class, socio-

demographics).

• Structural roles:

– Innovators: marginal to adopting group, globally peripheral,

mobile, under-conforming to deviant, many weak ties.

– Early adopters: central & strongly tied members of adoption group.

– Late adopters: members of dense, multiplex, close-knit networks

benefit from organizational capabilities (support, resistance to

external pressures) and are constrained by them.

Page 13: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

Data

• 55,000 proposals funded through “Framework Programmes for Research

and Technology” (FP), FP 1 to 7 (1984 to present), from CORDIS, ©

European Communities, http://cordis.europa.eu/

• Increase transparency over state-level decision making processes

25

•Principal investigator

(name, affiliation)

•Research partners

(name, affiliation)

•Amount awarded for

number of years

•Research category

• Project

description

Explicit

Social

Network

Text

Metadata

Methodology:

Network Analysis• Operationalize roles

• No canonical set of metrics

and values for roles,

solutions:

– Literature review

– Empirical data: not fully

automated, requires data-

driven and case-wise

decisions (incl. basic NSA

expertise)

26

Page 14: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

Methodology: Text Analysis

• Analysis of substance of language data via Topic

Modeling:

– Reduces dimensionality of text data to gist of a body of

information (Griffiths, Steyvers & Tenenbaum, 2007)

– Output: user-defined number of words clusters (topics)

– Topic: text terms, where each term has probabilistic

weight indicates strength of association of term with

topic.

– Tool: Mallet (McCallum)

27

28Image from: Wikipedia, Latent Dirichlet allocation

Pro

ba

bilistic

Ge

ne

rativ

e

Pro

cess

Ba

ye

sia

n

Infe

ren

ce

Methodology: Computational Integration

of Texts and Networks

some latent

structure, probabilistic

graphical model

Preservation

AgentsChange

Agents

Social

Network

Analysis

Topic Modeling

process

Page 15: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

Results for FP 6 (2002-2006)

29

change agents

topicproject management

networking and learning

project management

regional development,

waste management

engineeringalternat ive energies

emission reduct ion

emission reduction

public health

regional development

medical

1st project research data regional water structures energy water engine food services tnf

2nd development european management policy waste aircraft gas monitoring diesel europe ict disease

3rd systems europe assessment regions european material hydrogen eu combust ion human business gene

4th system network tools policies europe materials combustion chemical fuel virus satellite arthrit is

5th based innovation project development land performance biomass pollutants sensor studies rural human

6th high knowledge information sustainable market composite solar directive emission million informat ion mouse

7th develop training fisheries region eu damping fuel system integrated developing robot genes

8th technologies projects support national smes forming low pollution power health communication diseases

9th control support studies sustainability aquaculture monitoring process groundwater emissions forest systems mice

DP 0.731 0.276 0.165 0.080 0.070 0.055 0.053 0.050 0.046 0.044 0.038 0.036

preservat ion agents

topicproject management

research in EU

industrynetworking and learning

environmental issues

genetics energytransportation

cancer security industrypublic health

1st project research product ion research water genetic energy services drug governance materials food

2nd european european products network management gene environmentaltransport clinical security properties consumer

3rd development act ivities industry european risk genes eu solut ions cancer social devices quality

4th develop countries design excellence environmentaldisease policy business cell science temperature products

5th research information manufacturing integration data genomic assessment information cells eu techniques production

6th systems eu product training monitoring factors agricultural cities hiv issues high animal

7th based projects industrial europe informat ion molecular european end tumour public industrial safety

8th integrated europe processes knowledge assessment genomics sustainable service therapeut ic ethical based health

9th knowledge act ion materials researchers pract ices studies impact data molecular europe structures project

DP 0.921 0.414 0.160 0.102 0.080 0.077 0.076 0.071 0.062 0.061 0.056 0.055

Preservation agents

• 2nd: “networking”, “training”

(inherent to innovators?)

• Term/ topics addressed only

by them: “innovation”,

“waste”, “regional”

• Environment, sustainability,

alternative energies,

emission reduction: both, but

more prevalent among

change agents

Change agents

• Both: dominating topic

project management, PA’s

load higher on it

• 2nd highest ranking topic for

change agents: generic terms

relating to research in the

European Union

• Topics addressed only by

hubs: industry in the context

of manufacturing, nuclear

energy, cancer research

30

Results FP6

Page 16: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

Results: FP4 – FP6

• Trends over time:

• Change agents strongly associated with research related to the environment

and climate, preservation addressed this topic with lower weight.

• Preservation agents: focus on transportation and related industries.

• Topics occasionally overlap in subject matter but then differ in prevalence.31

Fourth FP 1994–1998 Fifth FP 1998–2002 Sixth FP 2002–2006

change agent preserv. agent change agent preserv. agent change agent preserv. agent

project mngmt. 0.767 project mngmt. 0.708 project mngmt. 0.660 project mngmt. 0.765 project mngmt. 0.731 project mngmt. 0.921

industry 0.420 industry 0.326 industry 0.319 project mngmt. 0.315 networking & learning 0.276 project mngmt. 0.414

networking 0.171 environment 0.093 project mngmt. 0.214 transportation 0.234 project mngmt. 0.165 industry 0.160

climate 0.075 transportation 0.090 transportation 0.147 project mngmt. 0.230 regional development 0.080 networking & learning 0.102

environment & tech 0.065 environment 0.059 computing 0.137 material science 0.090 waste mngmt. 0.070 environment 0.080

material science 0.065 aviation 0.055 environment 0.092 public health 0.087 engineering 0.055 genetics 0.077

satellite data 0.062 aviation 0.048 genetics 0.080 genetics 0.074 energy 0.053 energy 0.076

environment & tech 0.057 e-commerce 0.045 public health 0.075 energy 0.065 pollution 0.050 transportation 0.071

energy 0.054 public health 0.040 aviation 0.057 genetics 0.064 emission 0.046 cancer 0.062

environment & tech 0.049 environment 0.036 material science 0.054 services & tech 0.063 public health 0.044 security 0.061

environment & tech 0.049 data mngmt. 0.030 genetics 0.051 aviation 0.062 regional development 0.038 industry 0.056

energy 0.043 environment 0.030 energy 0.050 ? 0.060 medical 0.036 public health 0.055

aviation 0.039 material science 0.028 environment 0.050 environment 0.057 automobiles 0.035 energy 0.043

environment & food 0.034 environment 0.025 public health 0.045 environment 0.055 transportation 0.029 emissions 0.040

energy 0.027 genetics 0.017 climate 0.044 emission 0.048 environmental 0.027 ecology & climate 0.039

pollution 0.026 medical 0.009 hightech 0.043 public health 0.045 medical 0.025 nuclear energy 0.039

genetics 0.015 environment 0.003 climate 0.040 climate 0.040 energy 0.025 aviation 0.031

services & tech 0.036 hightech 0.033 genetics 0.024 public health 0.024

environment 0.035 genetics 0.030

science 0.031 environment 0.026

Limitations and What’s Next

• Limitations:

– Incomplete data, no rejected proposals.

– Validation of unsupervised learning results (Chang et al.

2009).

• Next steps:

– Very coarse level of aggregation: use more fine-grained

levels/ clusters (fields, socio-demographic attributes, …)

– Test robustness of role operationalization.

– Take award money and other meta data into account as

additional constraint.

– Investigate competition.

Page 17: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

Technology-Mediated Social Participation

1) Clarify national priorities

• Apply methods to analyze large collections of text data in application contexts/ domains to reveal patterns and explain underlying mechanisms

2) Develop deep science questions

motivation, trust, empathy, responsibility, identity

3) Promote novel research methodologies

• Consider substance of text data for network analysis

• Combine two types of behavioral data (quantitative, qualitative) in scalable, robust, systematic fashion

4) Identify extreme technology challenges

• Human side of security (protect not only technical infrastructures, but also data and reputation)

• Scalability: make data sets analyzable that were traditionally assessed via manual or computer-supported methods

5) Influence national policy

6) Increase educational opportunities

Acknowledgements

• This work was supported by the National Science Foundation (NSF) IGERT 9972762, the Army Research Institute (ARI) W91WAW07C0063, the Army Research Laboratory (ARL/CTA) DAAD19-01- 2-0009, the Air Force Office of Scientific Research (AFOSR) MURI FA9550-05-1-0388, the Office of Naval Research (ONR) MURIN00014-08-11186, and a Siebel Scholarship. Additional support was provided by CASOS, the Center for Computational Analysis of Social and Organizational Systems at Carnegie Mellon University. The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the NSF, ARI, ARL, AFOSR, ONR, or the United States Government.

34

Page 18: Jana Diesner, "Words and Networks: Considering the Content of Text Data for Network Analysis"

Jana Diesner, UIUC, The iSchool

Summer Social Webshop 2012 @ University of Maryland

Thank you!

• For questions, comments, feedback, follow-up:

Jana Diesner

[email protected]

Phone: (217) 244-3576

• (Copies of) Publications at

http://people.lis.illinois.edu/~jdiesner/publications.h

tml35


Recommended