Applying social network analysis to Parliamentary ProceedingsAutomatic discovery of meaningful cliques
Author:Justin van Wees
Supervisors:Dr. Maarten MarxDr. Johan van Doornik
June 23, 2011
Why?Motivation and research question
Can we discover communities of politicians that debate on a speci!c policy area?
Research question
• It’s unknown which member is responsible for a certain policy area
• Discover what issues are discussed within a policy area
• Serve as example application of social network analysis techniques
Motivation
How?Background and methodology
<root> <docinfo>...</docinfo> <meta>...</meta> <proceedings> <topic> <scene type="speaker" speaker="Hamer" party="PvdA" function="Mevrouw" role="mp" title="Mevrouw Hamer (PvdA)" MPid="02221"> <speech party="PvdA" speaker="Hamer" function="Mevrouw" role="mp" MPid="02221"> <p>Dat is helemaal niet waar. U bewijst nu voor de derde keer dat u niet ...</p> </speech> <speech type="interruption" party="Verdonk" speaker="Verdonk" function="Mevrouw" role="mp" MPid="02995"> <p>Mag ik even uitpraten? Dank u. Zo werkt dat, gewoon fatsoen. Dank u wel. [...]</p> </speech> </scence> </topic> </proceedings></root>
A simple graph
A directed graph
A weighted directed graph
10
100
8
15
12
2132
84
42
A single debate represented in a graph
.8&&%9":3()(;&/%3<"3='()(,-
4,"2'()(B1$A()(,-
!"#$%&'()(**+()(,-
.//0%&1/&'2()(0/1%&3,%32
>":#%1%#$)456/?2%3()(@+A()(,-
456",,%#()(+77()(,-8
8
4
2
2
Debates during Cabinet Kok II
A group of nodes that are relatively densely connected to each other but sparsely connected to
other dense groups in the network
A community
K-clique communities (k = 4)A k-clique (k = 4)
• Retrieve all ‘community text’
• Tokenized at word level
• Lemmatize
• Use parsimonious language models to !nd most ‘descriptive’ terms
Finding issues that a community is discussing
What?Results and conclusion
General network statistics of Kok II
No distinction between MP/MG
roles
With distinction between MP/MG
roles
Nodes 211 218
Edges 3594 3615
Density 0,081 0,076
• By default, found groups are note ‘cohesive’
• Filter out ‘noise’ by setting a threshold on edge weights
• At 15 interruptions: 197 nodes, 741 edges, 31 k-clique communities
Finding k-clique communties
• All k-clique communities could be traced back to a single policy area
• Except for more ‘general’ policy areas
• 92% of the community members directly related to the policy area covered by the community
• 85% of top 20 ‘issue terms’ relevant to policy area
• K-clique community detection and parsimonious language models are successful methods for automatic discovery of communities within debate networks
Finding k-clique communties
Discussion... and future research
• Method for setting edge weight threshold
• Reviewing of k-cliques done by single person
• Used four years of data, shorter time-window possible?
• Focused on Cabinet Kok II, what about other (earlier) cabinets?
• Completely different data?
Questions?For detailed results, datasets and programs see:http://justinvanwees.nl/goto/bachelorscriptie