+ All Categories
Home > Documents > Disrupting Democracy? · 2019. 7. 24. · process....

Disrupting Democracy? · 2019. 7. 24. · process....

Date post: 22-Jun-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
98
Disrupting Democracy? Voting Advice Applications and political participation in the Digital Age by William Clifton van der Linden A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Graduate Department of Political Science University of Toronto © Copyright 2019 by William Clifton van der Linden
Transcript
Page 1: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Disrupting Democracy?Voting Advice Applications and political participation

in the Digital Age

by

William Clifton van der Linden

A thesis submitted in conformity with the requirementsfor the degree of Doctor of Philosophy

Graduate Department of Political ScienceUniversity of Toronto

© Copyright 2019 by William Clifton van der Linden

Page 2: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Abstract

Disrupting Democracy?Voting Advice Applications and political participation

in the Digital Age

William Clifton van der LindenDoctor of Philosophy

Graduate Department of Political ScienceUniversity of Toronto

2019

So-called Voting Advice Applications (VAAs) are a class of interactive, online applications

which purport to measure how closely an individual user’s political views align with the

policy platforms of candidates running in a given election. They have become a popular

fixture in election campaigns all over the world, drawing millions of users in certain cases.

This dissertation is comprised of a series of papers which examine the implications of

VAAs for the practice and study of political participation. It takes into consideration both

the information that users take from VAAs as well as the information that VAAs take from

users.

The first paper examines the validity of the measures that VAAs use to calculate the

proximity between users and candidates. It demonstrates how certain assumptions in the

design of most VAAs contribute to measurement error, and proposes both a corrective

measure and a general framework for testing the validity of VAA calculations.

The second paper mounts an empirical challenge to the prevailing wisdom that non-

probability samples cannot yield representative inferences about a population of interest.

It offers evidence from VAA data to indicate that vote intention data from certain VAAs,

which are inherently non-probabilistic samples, produce more accurate election forecasts

than do certain rigorous probability samples.

ii

Page 3: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

The third paper uses a combination of social media and VAA data to develop an

experimental method used to infer the ideological positions of Twitter users on the basis of

the lexicon they employ on the platform. It serves as a foundation for a scalable, dynamic

approach to ideological inference from social media data.

The dissertation concludes with reflections on both measurement validity and of VAAs as

a case study in the potential implications of digital technology for the practice of democratic

governance.

iii

Page 4: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

For Bram, Miles, and Charlie

iv

Page 5: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Acknowledgements

This dissertation does not do justice to the abundant generosity shown to me over the pastdecade and change. Were one to take stock of all the goodwill to which I have been privythroughout my time as a graduate student, by all accounts I ought to have produced amasterpiece. That my dissertation falls so far afield of any such a resemblance leaves manydebts to be honoured. I can only hope that my future contributions—whatever form theymay take—will help balance the ledger.

I would not have been able to complete this dissertation without the guidance andsteadfast support of my supervisor, Christopher Cochrane. Chris was a mentor to me longbefore he stepped into the formal role of supervisor. He taught me some hard lessons earlyon in my PhD studies, doing so as skillfully and as thoughtfully as one possibly could. Chrisis a true scholar and a dedicated teacher. He is also one of the most honest, decent, andprincipled people that I have ever had the privilege of knowing.

I am also indebted to Ronald Deibert and Ludovic Rheault for serving as my committeemembers. Ron was my initial draw to the University of Toronto, and his work both asan academic and as a practitioner has continued to inspire me throughout my time here.Ludovic is a bright star in the emergent subfield of computational social science. I suspectthat he has never fully appreciated how much I value his estimation of my work.

I am grateful as well to Michael Donnelly for acting as my internal examiner, and toAnthony Sayers of the University of Calgary for providing a meaningful and substantiveexternal appraisal of my dissertation (and for accommodating the timing of my final oralexamination despite it requiring him to join us by videoconference in the wee hours of themorning from his temporary office at the Australian National University).

While these faculty members in particular saw me through the culmination of mydoctoral studies, my gratitude also extends in equal measure to others in the Departmentof Political Science at the University of Toronto who have made an indelible mark on myscholarship and, indeed, my life. Emanuel Adler was my supervisor for the better part ofthe last decade, before I shelved a partially-written manuscript on International Relationsin favour of the collection of papers that comprise my dissertation in its present state. I owemuch of my doctoral training to Emanuel, whose perspectives have to this day anchoredthe epistemological foundations of my work. Matthew Hoffmann has long shared andcontinues to inspire my thinking on the intersection between positivist methodology andconstructivist ontology. More than that, however, he has been a constant source of supportand encouragement, contributing generously and selflessly to my personal and professionaldevelopment at every opportunity. Randall Hansen took me under his wing early on inmy studies, and showed confidence in me when mine was in short supply. Peter Loewenshielded me from vitriol on more than one occasion, at times making himself a target in the

v

Page 6: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

process. He also taught me a great deal about political behaviour (both theory and practice),resilience in the face of adversity, and experimentation (including, oftentimes, experimentalhumour). John Kirton gave me an incredible opportunity to learn about global governanceand to practice my leadership skills as chair of the G8 Research Group at the Munk Schoolof Global Affairs. My subsequent endeavours, including those which form the basis for thisdissertation, were in no small part informed by my experience with this initiative, and Ithank my former G8RG colleagues, Madeline Koch, and especially Jenilee Guebert for theircollaboration and enduring friendship.

I owe the department staff a debt of gratitude as well. Carolynn Branton in particularhas patiently guided me through the university bureaucracy, as did Joan Kallis before her.Like Carolynn and Joan, Mary-Alice Bailey, Sari Sherman, Elizabeth Jagdeo, and LouisTentsos, have shown me unwavering support and touching gestures of kindness.

Neither this dissertation nor the public interest initiatives that have come of it wouldhave been realized were it not for the generous funding provided by the Social Sciences andHumanities Council of Canada and the Adel S. Sedra Distinguished Graduate Award. Thisfunding has not only made possible the production of this dissertation, but also laid thegroundwork necessary to create highly skilled employment opportunities for more than 50people over the last decade, nearly all of whom have held advanced university degrees. I feelobliged to point this out only because the economic returns of funding for social sciences areoften not made explicit.

My peers in the PhD program invigorated both my intellectual pursuits and my socialactivities, both of which were greatly improved (albeit relative to an admittedly low baseline)as a result. On this account, I am particularly indebted to Yannick Dufresne, Gregory Eady,Jennifer Hove, Jamie Levin, Anna Shamaeva, Kimberly Carter, Patricia Greve, RebeccaSanders, Dubi Kanengisser, Alanna Krolikowski, Joelle Dumouchel, Isabelle Coté, LiorSheffer, David Houle, Wendy Hicks, Marie-Ève Reny, Debra Thompson, Wayne Chu, VincentPouliot, Ethel Tungohan, Kara Santokie, Joshua Gordon, Jordan Guthrie, James McKee,Christopher LaRoche, Charmaine Stanley, Aarie Glas, Stephen Hoffman, Gabriel Eidelman,Gustavo Carvalho, and Arjun Tremblay. I am certain that names have been missed in thislist which ought well to be there. I will attribute these omissions to time lapsed and feeblememory, and hope for forgiveness.

Others outside the Department of Political Science have also contributed to this disserta-tion, both directly and indirectly. I was a non-resident junior fellow at Massey College duringJon Fraser’s reign. The College was the site where I advanced much of the research thatnow finds its way into this dissertation, but more so it was the locus of my intellectual and(again, limited) social life. I am privileged to have been part of such a unique community,and both the College and its fellowship hold a special place in my heart. I was also part of

vi

Page 7: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

the inaugural cohort of the Creative Destructive Lab at the Rotman School of Management,which had a transformative impact on the scale and impact of my work. I am particularlygrateful to Ajay Agarwal for his mentorship and enormous contributions to my professionaldevelopment, as well as to Christian Catalini, Dawn Bloomfield, Jesse Rodgers, Nigel Stokes,and Dennis Bennie for their guidance and support. A special thank you goes out to the lateGeoff Taber, whose example I will no doubt spend the rest of my life trying to live up to.

My debts extend well beyond the University of Toronto. My work has benefittedgreatly from the support and contributions of scholars such as Richard Johnston, FrançoisGélineau, André Blais, Elisabeth Gidengil, François Petry, Taylor Owen, Jack Vowles,Jennifer Lees-Marshment, Danny Osborne, Nicholas Reece, Aaron Martin, and AndreaCarson, among others. It has also been made in part possible thanks to the efforts ofjournalists, producers, and media executives, including but not limited to Jack Nagler,Marie-Paul Rouleau, Catherine Cano, Kristin Wozniak, Spencer Walsh, Matthew Liddy,Gillian Bradford, Sophie Lyon, Paul Smith, and Thorsten Berger.

Henk Overbeek hosted me as a visiting student in The Netherlands at the Vrije UniversiteitAmsterdam, as did Rainer Bauböck at the European University Institute in Italy. At bothinstitutions I had the pleasure of working alongside collegial faculty and graduate studentswho supported me in the advancement of my research, including Bastiaan van Apeldoorn,Gary Marks, Liesbet Hooghe, Wolfgang Wagner, Matthias Stepan, Anouk van Leeuwen,Matthew Wall, and Julie Birkholz. I would be remiss not to note the contributions that AndréKrouwel has made to my thinking on Voting Advice Applications. Although disagreementsin our respective views on how such instruments should operate have since resulted in adegree of acrimony between us, his part in this dissertation should not go unacknowledged.

None of the work that has been undertaken herein would have been possible without theincredible team of people at Vox Pop Labs. This dissertation itself is in part a collaborationwith my esteemed Vox Pop Labs colleagues Yannick Dufrense, Mickael Temporão, andCorentin Vande Kerckhove. I am most grateful to them as well as to colleagues past andpresent, including Gregory Eady, Jennifer Hove, James Aufricht, Joshua Apostolopoulos,Uyen Hoang, Gregory Kerr, Jonathan Salis, Kristin Cheverie, Bruno Opsenica, CharlesBreton, Justin Savoie, Cara Poblador, MiHee Park, Hugo Mailhot, Kevin Johnson, NoamGagliardi-Rabinovich, Emily Koller, Alex Shestapoloff, Leon Lukashevsky, Howard Cohen,and Andrew Peek.

The PhD process can be somewhat opaque to anyone who has not engaged themselvesin such pursuits—and even, quite frankly, to those of us who have. If my family and friendsever lost hope that I would someday earn my doctorate—which would certainly not havebeen unreasonable given the duration of my studies—they never let on. On the contrary,their love and support seemed, at all times and notwithstanding the outcome, unconditional.

vii

Page 8: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

I am tremendously glad that four of my five grandparents are able to mark this momentwith me. That they would look to my accomplishments as a source of pride has alwaysstruck me as misdirected given all that they have overcome and gone on to achieve in theirlifetimes.

My siblings have kept me grounded with their wit and candour. Despite their thickskins, I have yet to discover the limits of their devotion to their quirky older brother. Thissentiment appears to have found a natural extension in my sister- and brother-in-law, whohave shown me the sort of unreserved encouragement typically reserved for kin. As much canbe said for my wife’s family and friends, who have been among my most ardent supporters,my mother-in-law chief among them.

Few among my extended family have had the opportunity to pursue a university education,let alone a graduate degree. I know all too well that my attainment of a doctorate says atleast as much about my privilege as it does my talent. I have been afforded said privilege inpart on account of my socio-demographics, but also as a result of the sacrifices made by myparents.

My father worked tirelessly all my life to provide every opportunity for his children. Hisefforts came at great personal cost, including prolonged absences from the family he workedso hard to support. Only upon becoming a parent myself have I come to fully appreciate theextent of his sacrifice. My mother forfeited early career advancement in order to raise meand my siblings. She then set for all of us a remarkable example of dedication and ambitionwhen she returned later in life to university to complete her undergraduate degree and goon to teachers’ college. Her commitment to higher education and her intellectual pursuits inpolitical philosophy and theology undoubtedly influenced my own trajectory in life.

All this is to say that this dissertation, as with any accomplishment to which I can layclaim, is not truly my own. It is a product of the support extended to me by a great manypeople who have contributed—in ways both large and small—to my life and work. I willhave inevitably failed to capture herein all of those who warrant acknowledgement, but thereis one to whom my gratitude extends beyond all others.

My wife has been my partner throughout this journey, inspiring not only the journeyitself but every turn along the way. This work, and indeed my life, would be but a shadowof what it is today were it not for her.

viii

Page 9: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Contents

Contents ix

List of Tables xii

List of Figures xiii

1 Introduction 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Disclosure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 The function of VAAs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3.1 Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3.2 Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3.3 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.4 Organization and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4.1 The curse of dimensionality in VAAs: reliability and validity in algo-

rithm design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.4.2 On the external validity of non-probability samples: the case of Vote

Compass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.4.3 Ideological scaling of social media users: a dynamic lexicon approach 8

2 The curse of dimensionality in Voting Advice Applications: Reliabilityand validity in algorithm design 102.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Inside the black box . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2.1 Manhattan distance . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2.2 Euclidean distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 Computational epistemology in algorithm design . . . . . . . . . . . . . . . 162.3.1 Return to scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

ix

Page 10: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

2.3.2 Inductive dimensional modeling: the Vote Compass method . . . . . 202.4 Validity indicators for VAA algorithms . . . . . . . . . . . . . . . . . . . . . 22

2.4.1 Data and Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3 On the External Validity of Non-Probability Samples: The Case of VoteCompass 313.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.2 The case for non-probability sampling . . . . . . . . . . . . . . . . . . . . . 323.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.1 Canadian Election Study . . . . . . . . . . . . . . . . . . . . . . . . 363.3.2 Vote Compass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.4.1 Vote share projections . . . . . . . . . . . . . . . . . . . . . . . . . . 393.4.2 District-level projections . . . . . . . . . . . . . . . . . . . . . . . . . 403.4.3 Raw versus weighted . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.5 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.5.1 Vote Share . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.5.2 District-level projections . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

4 Ideological Scaling of Social Media Users: a dynamic lexicon approach 464.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.2 Deriving ideological scales from social media text . . . . . . . . . . . . . . . 48

4.2.1 Calibrating a dynamic lexicon of ideology . . . . . . . . . . . . . . . 504.2.2 Scaling ideological positions for social media users . . . . . . . . . . . 51

4.3 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.3.1 Vote Compass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.3.2 Twitter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.4 Results And Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.4.1 Mapping ideological landscapes . . . . . . . . . . . . . . . . . . . . . 554.4.2 Validating ideological estimates for election candidates . . . . . . . . 564.4.3 Validating ideological estimates for social media users . . . . . . . . 58

4.5 Validating ideological estimates using voting intention . . . . . . . . . . . . . 614.6 Conclusion and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

x

Page 11: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

5 Conclusion 675.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.2.1 Valid measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 685.2.2 Digital democracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 715.3.1 From advice to engagement . . . . . . . . . . . . . . . . . . . . . . . . 715.3.2 On the potential uses and abuses of VAA data . . . . . . . . . . . . 72

Bibliography 74

xi

Page 12: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

List of Tables

2.1 “Misclassification” in the 2011 Canadian federal election edition of Vote Compass 18

3.1 2015 CES Sample Attributes . . . . . . . . . . . . . . . . . . . . . . . . . . 363.2 Vote intention variable design . . . . . . . . . . . . . . . . . . . . . . . . . . 393.3 Sample Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413.4 Vote Share MAE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.5 Vote Share RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423.6 Average district-level RMSE . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.7 Riding projection accuracy rate . . . . . . . . . . . . . . . . . . . . . . . . . 44

4.1 Assessment of the dynamic lexicon approach for citizens . . . . . . . . . . . 59

xii

Page 13: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

List of Figures

2.1 Sample VAA results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Absolute difference between Pilot and VAA factor loadings . . . . . . . . . 202.3 Comparative assessment of fixed scale versus factor weights . . . . . . . . . 262.4 Scaling using inductive dimensional modeling versus a priori assumptions . 282.5 Robustness test: survey item inclusion . . . . . . . . . . . . . . . . . . . . . 29

4.1 Comparison of estimated positions for the reference method (network scalingapproach) and the dynamic lexicon approach . . . . . . . . . . . . . . . . . 57

4.2 Assessing linear combinations of textual and network ideologies . . . . . . . 604.3 Venn diagram illustrating the complementarity between the ideology estimates

to predicting voting intentions of citizens . . . . . . . . . . . . . . . . . . . 624.4 Comparison at the party level of citizens’ Twitter and Survey prediction

efficiencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

xiii

Page 14: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Chapter 1

Introduction∗

1.1 Introduction

So-called Voting Advice Applications, commonly referred to as VAAs, have become a popularfixture of election campaigns in democratic societies around the world. VAAs are a class ofonline instruments which purport to measure how an individual’s political views comparewith the candidates for election. On the basis of a user’s responses to a survey designed toelicit policy preferences, VAAs generate a real-time approximation of their alignment withthe candidates or parties vying for the vote in a given election campaign.

VAAs endeavour to codify politics by estimating candidates’ and users’ political viewswithin the constraints of discrete variables, thus making them readily comparable acrosscommon scales. In doing so they presume to decode election campaigns by parsing campaignrhetoric and presenting users with an ostensibly clear and accessible representation ofcandidate positions. The utility most VAAs claim to offer is twofold. First, users are arguablybetter informed for having used a VAA as to how their views align with their electoraloptions (Dufresne and van der Linden, 2015; Schultze, 2014). Second, VAAs presumablypromote greater accessibility, transparency, and accountability of election platforms bydistilling complex policy options into a readily understandable format.

VAAs have, in many instances, proven to be a wildly popular resource, often mirroringthe style and mode of diffusion of online quizzes popularized by viral content websites. Theiruptake has been remarkable, in some cases amassing millions of users during the course ofan election campaign.1

∗Elements of this introduction also appear in van der Linden, Clifton, and Jack Vowles. “(De)codingelections: the implications of Voting Advice Applications.” Journal of Elections, Public Opinion and Parties27, no. 1 (2017): 2-8. doi:10.1080/17457289.2016.1269773.

1Most VAAs do not validate unique users and thus the number of reported users is likely to be exaggerated.The extent to which this is the case is unknown and may perhaps vary across electoral contexts and contestsas well as individual VAAs.

1

Page 15: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Despite their prevalence—particularly in Europe, but also more recently in North Americaand Oceania as well—VAAs have remained largely epiphenomenal to the study of electoralpolitics. The mass uptake of VAAs, however, warrants critical inquiry into their implications,both in terms of the practice of politics but also for the discipline of political science.

This dissertation is comprised of three articles, each of which endeavours to advanceour understanding of how VAAs contribute to the broader and rapidly changing ethoswithin which contemporary politics operate. Taken together, the dissertation strives tounderstand VAAs as a microcosm of the practice of democracy in the Information Age. Itengages both with the information that voters receive from VAAs as well as the informationthat VAAs receive from voters. In the first article, my co-author and I argue that thedimensional reduction techniques employed by most VAAs can misrepresent the configurationof ideological space and thus alignment with the political parties. We offer and test analternative method that seeks to reduce the error associated with such techniques. In thesecond article, I develop a case for the external validity of the particular kind of ’Big Data’that VAAs can generate. I argue that, although VAAs are a non-probability sample, theycan serve as a reliable source of public opinion and, as such, can play a role in informingpublic policy. In the third article, my co-authors and I extend the case for the representationof public opinion using VAA data by linking said data to a larger, more complex, andcontinuous set of Big Data—namely, Twitter.

1.2 Disclosure

Given their potential implications for electoral politics, VAAs have occupied a growing numberof political scientists—not only in their capacity as researchers, but also as practitioners.Many popular VAAs are developed by political scientists. This casts political scientists ina dual role as both analysts and architects of an emergent political phenomenon. Thereis precedent for this sort of symbiosis in the work of participant observers and academicactivists, but such approaches to scholarship are themselves not without criticism.

It is incumbent upon me to acknowledge that I too operate at the porous boundarybetween the theory and practice of VAAs. The analysis within this dissertation centreson a particular VAA, Vote Compass, which I developed and have operated for nearly adecade. Vote Compass has been featured as part of the editorial programming of majormedia organizations across eight countries and two dozen election or referendum campaigns.To date it has been used by more than 15 million people worldwide.

I would argue that my functional knowledge of VAAs and experience deploying themaround the world is an asset to my potential contributions to scholarship in this area.Moreover, there is a case to be made for why political scientists are best suited to be the

2

Page 16: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

stewards of VAAs. First, they generally have the requisite training to develop the methodsthat unpin these applications. Second, their theoretical and methodological choices aresubject to a level of scrutiny often reserved by academics for other academics, and thepunitive toll exacted upon academics for methodological oversight or error is higher thanfor non-academics (rightly so). To that end, there is something of an informal regulatoryenvironment in which academics operate that is conducive to motivating methodologicalrigour.

Notwithstanding these arguments, at present a universal set of best practices aroundVAAs does not yet exist. The differences of opinion among the membership of politicalscientists who are actively engaged in the design of VAAs are non-trivial. The works thatcomprise this dissertation aim to contribute to a furthering of the academic discourse onVAAs, recognizing the special responsibility on the political science community—particularlygiven its often-overt association with VAAs—to ensure continuous and critical reflectionwith respect to the impact and implications of VAAs.

1.3 The function of VAAs

The research to date on VAAs can be broadly divided into four streams. The first streamconsists of critical inquiry into the normative function or purpose that VAAs purport toserve in a democratic society. The second stream explores shifts in voter behaviour as afunction of VAA usage. The third stream pertains to the design of VAAs, which is largelycentred around how their algorithms function. The fourth stream is related to the data thatVAAs capture in the course of a user’s interaction with the application, and the functionthat data can have in social scientific research.

1.3.1 Purpose

The popular perception of VAAs is that they enhance democratic participation (Fivaz andNadig, 2010; Dufresne and van der Linden, 2015), though the logic that VAAs employ inpursuit of this objective is contested. Fossen and Anderson (2014, p. 245) argue that VAAsare being premised upon the view that “strengthening democracy is a matter of ensuring thatthe support for parties (expressed in votes) more accurately reflects the existing preferencesof voters”, which they posit “fits well with the normative conception of democracy expoundedby social choice theorists, but that view of democracy is contested.”

Even within the context of social choice theory, however, the function of VAAs is subjectto critique. The very nomenclature of this class of applications, which purport to provide“voting advice” suggests that vote choice can be optimised exclusively on the basis of thepolicy positions associated with election platforms. This view is reductive and fails to

3

Page 17: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

account for, among other things, the likelihood that a party’s proposed policies will everbecome law. Most VAAs operate on the premise that all parties have equal prospectsof holding office. They preclude considerations about party size and leadership, previousgovernment experience, possible configurations of governing coalitions, and other factors thathave significant bearing on the likelihood of a party to be able to make good on its campaignpromises. In the absence of such considerations, the ’advice’ provided by VAAs is almostcertainly incomplete—an individual’s vote choice is a more complex calculus than the mereaggregation of policy preferences (Himmelweit, Humphreys and Jaeger, 1993; Zuckerman,Valentino and Zuckerman, 1994; Bélanger and Meguid, 2008).

1.3.2 Effects

The earliest manifestation of the contemporary VAA was a written test called StemWijzer,which was developed as part of the Dutch curriculum (De Graaf, 2010). It was reconstitutedin an online version in 1998. Other variants have since followed. According to the 2017Dutch Parliamentary Election Study, three out of four respondents surveyed indicated thatthey had consulted at least one VAA prior to casting their vote (van der Meer, van derKolk and Rekker, 2018, p. 37). Similar patterns have emerged across Europe. A study byYouGov in the run-up to the 2017 German federal election indicated that a third of theGerman population intended to use the popular Wahl-O-Mat VAA prior to election day(Schneider, 2017). In Canada, Australia, and New Zealand, the Vote Compass VAA wasused by between 8 and 12 per cent of eligible voters during recent elections.

The mass uptake of VAAs calls into question their potential effects on voting behaviour.Two areas of inquiry have emerged in the literature.

The first area draws on arguments by Lijphart (1997) around the importance of electoralparticipation to a well-functioning democracy. Proponents of the normative benefits ofVAAs argue that they act to stimulate voter turnout (Alvarez et al., 2014; Gemenis andRosema, 2014; Enyedi, 2016; Fivaz and Nadig, 2010; Ladner and Pianzola, 2010). Thiseffect is particularly pronounced among younger voters (Dufresne and van der Linden, 2015;Hirzalla, Van Zoonen and de Ridder, 2010; Marschall and Schultze, 2015).

The second area shifts the frame of inquiry from whether citizens vote to how they vote.It examines the extent (if any) to which VAAs have a discernable effect on vote choice.Results in this area are much less definitive. Even where effects are reported, they areusually quite small (Kleinnijenhuis et al., 2017; Mahéo, 2016; Alvarez et al., 2014; Walgrave,Nuytemans and Pepermans, 2009; Schultze, 2014; Ladner and Pianzola, 2010; Wall et al.,2009).

4

Page 18: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Confounding the determination of a generic effect of VAAs on voter behaviour is a lackof consistency in the design of said applications and, consequently, the outputs they producefor users.

1.3.3 Design

Given the growing prevalence of VAAs in election campaigns around the world, and theirostensible effect on voting behaviour, increasing scholarly interest has been devoted tothe design of VAAs and the methodologies employed to produce a representation of users’alignment with parties or candidates.

The most sustained academic interest has been in the methods by which VAAs producetheir so-called ‘voting advice’. There are nearly as many different algorithms for comparingusers with candidates or parties as there are VAAs and no systematic manner to date bywhich to assess the reliability and validity of such algorithms.

Perhaps the most popular approach among VAAs to the estimation of alignment betweenusers and candidates or parties is also the earliest on record—that of the StemWijzerVAA. The methodology behind StemWijzer is remarkably simplistic. Users indicate theiragreement or disagreement with a set of statements that are generally intended to reflectpoints of discrimination between the party platforms. Each candidate or party either providesor is assigned responses to the same set of statements. The sum of the difference betweenthe user and candidate or party positions across all of the statements is then calculated andaveraged as an agreement score.

The most well-known criticism of the so-called StemWijzer method was furnished byWalgrave, Nuytemans and Pepermans (2009), who noted that the selection of statementshas a discernible effect on the results generated by a given VAA. They cite evidence thatsuggests that users may receive dramatically different results from a VAA based on thestatements it includes.

To control for the inconsistency of VAA outputs, dimensionality reduction techniqueswere employed in VAAs such as The Political Compass and Kieskompas VAAs. The mostcommon comparison of VAA design is between those that use dimensionality reductiontechniques and those that do not (Lefevere and Walgrave, 2014; Louwerse and Rosema, 2014;Mendez, 2012). Though dimensionality reduction may result in more reliable outputs, thevalidity of the outputs produced by said techniques in practice has been roundly criticized(Gemenis, 2013; Otjes and Louwerse, 2014; Germann et al., 2015).

Other design considerations that inform the scholarly inquiry on the design of VAAsinclude the scaling of candidate or party positions (Gemenis and Rosema, 2014; Agathokleous,Tsapatsoulis and Katakis, 2013; Trechsel and Mair, 2011).

5

Page 19: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

1.3.4 Data

One of the emergent areas of inquiry in relation to VAAs is the utility of the data thatthey produce (Wheatley, 2012; Mendez, Gemenis and Djouvas, 2014). The sizeable uptakeof VAAs produces remarkable datasets in terms of sample size. However, the prospectivecontributions of these data to the advancement of social scientific research has yet to berobustly demonstrated.

In the absence of any universal standards for data collection among VAAs, the prospectiveutility of the data varies significantly. However, a common characteristic of all VAA datasetsis that respondents are self-selected and thus a non-random sample of the population ofinterest.

For certain applications of VAA data, the randomness of the sample (or lack thereof) isimmaterial to the analysis being undertaken (Wheatley, 2012); however, it generally preventsresearchers from making inferences about public opinion.

Much of the scholarly work to date has been largely around VAA data cleaning, specificallythe identification of multiple entries in the dataset by the same user (Mendez, Gemenis andDjouvas, 2014; Andreadis, 2012). Broader questions around the external validity of VAAdata remain largely unexplored.

1.4 Organization and Overview

This dissertation aims to build on the scholarship on VAAs to date and also advance newresearch frontiers. In this section, I summarize the contributions made by each of the papersthat comprise this dissertation.

1.4.1 The curse of dimensionality in VAAs: reliability and validity inalgorithm design

This paper centres on the methods used by VAAs to produce so-called ‘voting advice’. Giventhe absence of a formal framework for the evaluation of such methods, my co-author andI undertake an analysis of the reliability and validity of the most popular approaches torepresenting proximity between users and candidates or parties.

Most VAAs rely on some form of spatial modeling in order to estimate the alignmentbetween users and candidates or parties. However, practical considerations relating to thepublic accessibility of the results proffered by VAAs necessarily limits the complexity ofthe models that can be employed. We nevertheless raise serious theoretical and empiricalconcerns about present state of the art of VAA design, in particular the application of bothManhattan and Euclidean distance measures in most VAAs.

6

Page 20: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

We endeavour to establish basic criteria for assessing the reliability and validity of theestimates produced by VAAs. Irrespective of the distance measure being used, applying theconventional algorithms used by most VAAs produces results which are often inconsistent orlacking in face validity.

We identify crude dimensional reduction techniques as a possible contributor to thequestionable results produces by most VAAs. We then develop a more sophisticateddimensional reduction technique of the sort that more recent studies of VAAs have advocated(Germann et al., 2015; Gemenis, 2013; Otjes and Louwerse, 2014). VAA data from Canada,the United States, Australia and New Zealand is then used to run a series of tests devisedto evaluate the validity and reliability of VAA algorithms. Results generated using ourdimensional reduction technique consistently outperform those of conventional algorithmsfor estimating user alignment with candidates or parties.

This paper makes an important contribution to the literature on the design of VAAs. Itsurfaces non-trivial methodological concerns with the current implementation of most VAAsand proposes a demonstrably robust corrective. Perhaps even more importantly, however,serves as a call to action to the architects of VAAs. Given the increasingly prevalence ofVAAs in election campaigns around the world and the rising number of voters who consultthem prior to deciding how to vote, further scholarship is warranted with respect to boththe theory and practice of VAA design. This will ideally prompt further refinement of theframework for the evaluation of VAA design proposed in this paper.

1.4.2 On the external validity of non-probability samples: the case ofVote Compass

This paper asks whether and how the respondent data collected by VAAs might contributeto public opinion research. The most obvious critique of VAA data in this regard, however,is that they are non-probability samples. Users self-select into the pool of VAA respondentsand, as such, are a non-random selection of the population of interest. For many publicopinion researchers, this violates a central tenet of statistical inference.

Not all research involving VAAs requires a representative sample—much of the researchon VAA data to date has been specifically around the differences in the demographic makeupof VAA users versus the general population (Montigny, François and Pétry, 2013; Johnston,2017; Alvarez et al., 2014). Compared to other non-probability samples, however, VAA datahave several remarkable properties that compel inquiry into whether adequate statisticalcontrols can be applied so as to produce externally valid estimates.

I make the case for the external validity of (certain) VAA data by comparing its modelledvote intention with that of a robust probability sample. Specifically, I compare the datafrom the 2015 Canadian federal election edition of Vote Compass that of the 2015 Canadian

7

Page 21: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Election Study. I compare the estimates produced by both samples using several measuresof error (MAE, RMSE, and TPR). I find that estimates produced using Vote Compassdata either match or exceed the accuracy of the CES, particularly so at more fine-grainedgeographies such as ridings.

The objective of this paper, and its primary contribution, is not to denigrate probabilitysamples—the CES, for example, offers researchers enormous benefit that Vote Compasscannot readily reproduce. Rather, it seeks to make space for the acknowledgement andstudy of practices that are already well-established in the public opinion research industry.

Similar, if not identical, sample adjustment techniques that are applied to the samplescollected by public opinion research firms can be applied to Vote Compass data. I demonstratein this paper that, when similar adjustments are made, the estimates are either on par withor superior to those of a conventional probability sample.

The diversification of information and communications technology has made the practiceof public opinion research increasingly complex. This has implications for democraticsocieties, as a loss of credibility in public opinion research weakens the accountability ofgovernment to citizens. VAAs can play an important role in explaining and understandingthe shifting dynamics of public opinion and can do so in a valid and reliable manner, as thispaper demonstrates.

1.4.3 Ideological scaling of social media users: a dynamic lexicon ap-proach

This paper endeavours to extend the utility of VAA data by using it to train machinelearning models that estimate the ideological attributes of social media users.

The rhetoric that political elites employ structures civic discourse. The emergence ofsocial media platforms as a medium of politics has enabled ordinary citizens to express theirideological inclinations by adopting the lexicon of political elites and, at times, shaping it.The corpora that social media avails to researchers represents a rich new source of datain the study of political ideology. However, existing ideological scaling methods are bestsuited for the verbose texts for which they were developed—policy platforms, manifestos,etc. They are far less effective when applied to the short, informal style of textual contentthat is characteristic of social media platforms such as Twitter.

The method my co-authors and I develop in this paper allows for the estimation ofindividual-level ideological attributes of both political elites and ordinary citizens using thetextual content they generate on social media platforms. To date methods for ideologicalinference from social media data have relied primarily on network analysis (Barberá, 2015).We examine the content that social media users generate rather than the connections betweenthem.

8

Page 22: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Our technique involves the identification of political lexicon within the broader socialmedia discourse. We begin by first analyzing the lexicon of political elites in order to createdynamic dictionaries using the Wordfish algorithm (Slapin and Proksch, 2008). We thenestimate the ideological positioning of individual social media users by comparing the textualcontent they produce to the aforementioned dynamic dictionaries. These estimates are thenvalidated using respondent data collected from the Vote Compass VAA.

We compare the estimates from this method to those derived using alternative methodsand find that this method outperforms such methods both on the estimation of ideology aswell as other attributes such as vote choice.

The potential applications for this method in electoral behavior studies and publicopinion research are numerous. It offers the potential for real-time passive polling usingsocial media, but also the ability to undertake real-time analysis of content flows on socialmedia. Such methods can, for example, contribute to the scholarship on fake news (Vosoughi,Roy and Aral, 2018) by helping to differentiate between true and false news stories based onthe attributes of the nodes through which said content is shared.

9

Page 23: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Chapter 2

The curse of dimensionality inVoting Advice Applications:Reliability and validity inalgorithm design∗

2.1 Introduction

Trying to determine how your views align with the candidates running in a given electioncampaign? There’s an app for that.

So-called Voting Advice Applications (VAAs) or, as we prefer to call them, VoterEngagement Applications (VEAs),1 are online tools that survey users’ political views and,on the basis of the responses provided, return an estimate of their individual proximity toeach of the candidates for election.

The widespread use of VAAs during elections at all levels of government has raisedquestions about their implications, from their effect on vote choice (Enyedi, 2016; Alvarezet al., 2014; Gemenis and Rosema, 2014; Fivaz and Nadig, 2010; Ladner and Pianzola, 2010)to the accuracy of their output or “advice” (Lefevere and Walgrave, 2014; Louwerse andRosema, 2014). Notwithstanding a burgeoning research program on VAAs (Garzia and

∗Published as van der Linden, Clifton, and Yannick Dufresne. “The curse of dimensionality in VotingAdvice Applications: reliability and validity in algorithm design.” Journal of Elections, Public Opinion andParties 27, no. 1 (2017): 9-30. doi:10.1080/17457289.2016.1268144.

1Our grievance with the term “Voting Advice Application” is not a semantic quibble nor a self-interestedattempt to popularize our own neologism. The purpose of these instruments is improperly communicated tothe public when the concept of “advice” is invoked (for reasons we engage with tangentially in this article).We nonetheless defer—at least for the time being—to path dependence with respect to the nomenclature inthis area.

10

Page 24: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Marschall, 2014; Rosema, Anderson and Walgrave, 2014), questions remain as to how oneeffectively validates VAA outputs.

VAAs have multiplied in recent years and an exact count of the number currently inoperation is difficult to obtain given their geographic span, linguistic diversity, and relativeephemerality. They differ greatly in design and degree of sophistication, but most share adistinctive element that has become a defining feature of a VAA: an aggregation algorithm.

VAA algorithms generate a proximal representation between users and parties, variouslyreferred to in the literature as “voting advice”, “voting recommendation” or “candidatematch” (Garzia and Marschall, 2012; Gemenis and Rosema, 2014). Users are typically serveda questionnaire which can vary considerably in design, but generally features structuredresponse options on either a binary or Likert scale (Rosema and Louwerse, 2015). Candidatesare coded on the same questionnaire and response scale. Calculated by taking the differencebetween user and candidate positions on public policy or ideology, aggregation algorithmsproduce for the user a summary result that arguably reduces the cognitive load on the userwhen evaluating similarity or difference between personally-held views and candidates’ policyplatforms.

Numerous aggregation algorithms have been deployed across a variety of VAAs, eachembedded with varying epistemological assumptions (Louwerse and Rosema, 2014; Mendez,2012). The myriad formulae for evaluating the alignment between users and candidatesreflects the absence of a general theory of VAAs (Fossen and Anderson, 2014). While ourambitions with this paper stop well short of positing such a theory, we invite its beginningsby addressing the validation of VAA algorithms.

With that in mind this paper proceeds in three parts. First, it reviews the algorithmicdesign parameters common to most VAAs. Second, it posits a conceptual framework forevaluating the reliability and validity of VAA algorithms. Third, as a demonstration, itapplies this framework in the evaluation of an algorithmic enhancement developed for VoteCompass, a VAA presently operating in the United States, Canada, Australia, and NewZealand.

2.2 Inside the black box

VAAs employ a number of different aggregation algorithms to estimate user alignment withone or more election candidates. In most cases these algorithms rely on some form of spatialmodeling so as to offer users a graphical representation of said alignment, wherein theproximity of user to candidate is an indicator of alignment or agreement between the two.Notwithstanding various formulaic nuances (Louwerse and Rosema, 2014), the majorityof VAAs estimate user and candidate proximity using either Manhattan or Euclidean

11

Page 25: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

distance measures.2 The pervasive challenge for both measures is a variation on the curse ofdimensionality, which refers to the methodological challenges of working in high-dimensionalspace (Bellman, 1957).

Standard practice when estimating a spatial model is to use eigenvalues to determinewhich components or factors explain most of the variability in a given dataset. VAAs,by virtue of being public-facing and thus necessarily accessible beyond the population ofquantitative social scientists, must normally constrain their proximal renderings to one ortwo dimensions in order to be readily interpretable by laypersons.

VAAs endeavor to address the dimensionality problem in one of several ways. Sometake each survey item (referred to variously in the literature as statements, propositions orquestions) as its own dimension and estimate the alignment between user and candidate usingManhattan distance. This is communicated to the user as an agreement score expressed as apercentage and commonly accompanied by a bar graph visualization. Other VAAs constrainthe number of dimensions to two and estimate proximity using Euclidean distance.3 Theresults are visualized in two-dimensional Euclidean space. Some VAAs have endeavored tocircumvent the dimensionality problem by rendering multidimensional visualizations of userand candidate alignment using radar graphs, also known as star or spider graphs. Radargraphs, however, are neither readily intelligible (Toker et al., 2012; Few, 2005) nor do theymeaningfully capture the relationship between the dimensions being represented.

Despite their prevalence in VAA algorithms, both Manhattan and Euclidean distancemeasures as they have been applied to date have considerable deficiencies in they ways whichthey render proximity between users and candidates.

2.2.1 Manhattan distance

Manhattan distance, as it is applied in VAA algorithms, takes the mean absolute distancebetween candidate and user across the range of items in the VAA questionnaire. Dimensionalassumptions are relaxed using this measure in that each item is effectively assigned its owndimension. Generally, the distance D between an individual user i with positions x and acandidate c with positions y across n items (dimensions) can be expressed as follows:

Di(c) =n∑j=1|xij − ycj | (2.1)

2Though other measures have been proffered in the academic literature (Katakis et al., 2014; Agathokleous,Tsapatsoulis and Katakis, 2013; Louwerse and Rosema, 2014) they remain at the margins in the practice ofVAA design.

3Vote Compass visualized proximity in three dimensional space for the 2015 Canadian federal election inorder to reflect three-dimensional ideological voting in that particular context (Medeiros, Gauvin and Chhim,2015).

12

Page 26: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Figure 2.1: Sample VAA results

(a) Vote Compass 2016 US edition (b) Vote Compass 2015 Canadian edition

A weighting variable can be (and often is) incorporated into this formula such that users canapply weights to each item so as to reflect the salience a user attributes to of issues relativeto one another.

For example, in the case of response options consisting of a 5-point Likert scale rangingfrom ‘strongly disagree’ to ‘strongly agree’, let us say that, for a given question, a user hasa position of ‘somewhat disagree’, Candidate A has a position of ‘somewhat agree’, andCandidate B has a position of ‘strongly disagree’. In this case the user position would becoded as 2, with Candidate A and B positions coded as 4 and 5, respectively. Given thesehypothetical user and candidate positions, we calculate the user to be 2 units away fromCandidate A (i.e. |2 - 4|) and 3 units away from Candidate B (i.e. |2 - 5|). The sum ofthese distances as calculated for each issue thus represents how far a user is from a givencandidate overall.

Manhattan distance offers what is ostensibly the most intuitive measure of alignmentbetween user and candidate in a VAA; however, it is vulnerable to distortion given that itrelaxes any assumption of dimensionality—–each survey item is associated to an independentdimension. This introduces two fundamental problems. First, it is highly probable that manyof the survey items in fact load on the same dimension. Representing each item as its owndimension thus distorts how proximity between user and candidate is rendered. A second andnot entirely unrelated problem is that, when using Manhattan distance, the composition ofthe questionnaire heavily influences user and candidate alignment (Walgrave, Nuytemans andPepermans, 2009). As Lefevere and Walgrave (2014, p. 252) argue, “statement selection—the

13

Page 27: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

set of statements presented to parties and voters—has consequences for the advice users get.Statement selection makes a difference.”

The effect of statement selection introduces substantial volatility into VAA outputs thatare derived using Manhattan distance, which has implications both in terms of reliabilityand validity. Given the inconsistency in the outputs produced by aggregation algorithms,Manhattan distance measures offer relatively unreliable estimates. But are they valid?Many VAAs, either implicitly or explicitly, associate Manhattan distance measures withgeneralizable alignment between user and candidates as opposed to alignment constrainedto the items in the questionnaire. As an example, the American VAA iSideWith expressesthe results of its algorithm as follows: “I side with [candidate] on most 2016 PresidentialElection issues”. However, its survey items contain notable absences in key areas of publicpolicy. It is problematic to infer that a user’s agreement with a candidate on a select subsetof policy issues implies an overall consensus with the platform of said candidate.

The most formidable challenge to the validity of the Manhattan distance measure,however, is precisely that VAA outputs vary based on which survey items are included.VAAs are ostensibly measuring latent variables (i.e. unobservable, abstract theoreticalconstructs) in an effort to tap into the ideological landscape of a particular electorate. Itis highly problematic to assert that existing underlying structures change on the basis ofarbitrary decisions made by VAA designers.

In order to address these challenges to reliability and validity of Manhattan distancemeasures, some VAAs have opted instead to use Euclidean distance to represent the alignmentbetween user and candidates.

2.2.2 Euclidean distance

VAAs that employ Euclidean distance to represent proximity between a user and candidateassign each survey item to a given dimension k and then average nk of survey items x ontheir associated dimensions:

Pi(dk) =1nk

∑j∈{j1,...,jnk

}xij (2.2)

where {j1, . . . ,jnk} ∈ {1, . . . ,n}.

14

Page 28: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

These coordinates are calculated for each candidate in addition to the user. The distancefrom the user i to each candidate c is calculated by taking the Euclidean distance betweenuser and candidate in two-dimensional space:

Di(c) =

√√√√ 2∑k=1

((Pi(dk)− Pc(dk))2 (2.3)

By constraining the number of dimensions, Euclidean distance ostensibly offers a morerobust representation of the proximity between user and candidate in that it generatesconsistent and generalizable results. Unlike approaches that use Manhattan distance, whereinVAAs can produce drastically different results as a function of statement selection, VAAsthat use Euclidean distance are more likely to produce stable results irrespective of thequestionnaire design. Associating each item in the questionnaire with a limited number ofdimensions establishes more consistent constructs than when each item is operationalizedas an independent dimension. So long as the dimensions between VAAs are measuring thesame construct, variance in survey items is likely to have less impact on the user outcome.The inference is thus that Euclidean distance is a more reliable measure of the proximitybetween user and candidate.

In practice, however, the application of Euclidean distance in VAAs is quite crude, somuch so that it can ultimately negate the benefits associated with dimensional reduction.Among the early adopters of this approach was the Dutch VAA Kieskompas, which positioneditself as an alternative to the dominant VAA in The Netherlands, StemWijzer, largely onthe basis of its use of Euclidean over Manhattan distance (Krouwel, Vitiello and Wall, 2012).Within the VAA literature the representation of users and parties in two-dimensional spaceis commonly referred to as the Kieskompas method and has been reproduced in a variety ofVAAs throughout Europe (Louwerse and Rosema, 2014).

The fundamental deficiency in the Kieskompas method pertains to its approach todimensional reduction. Accurately representing the range of themes included in a VAAquestionnaire in lower-dimensional space requires the use of a statistical dimension-reductiontechnique, such as factor or principle component analysis run on data from a pilot study(Germann et al., 2015). However, in the absence of such methods or empirics, Kieskompasand adherents to its approach assign survey items on the basis of “a priori considerations,both in terms of which dimension an issue belongs to, and which side of the dimensiona specific issue position belongs to” (EU Profiler, 2009). Determining dimensionality anddirectionality arbitrarily is suboptimal as it is likely to produce unstable dimensions andconsequently generate biased representations of alignment or proximity between a user andcandidate (Gemenis, 2013; Otjes and Louwerse, 2014). Germann et al. (2015, p. 215) arguethat the “fundamental reason for the existing deficiencies is the current practice of basing

15

Page 29: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

[VAA] spatial maps on pure a priori reasoning”. Consequently, as Cochrane (2011, p. 4)argues, “the random error in the measure of the dimension becomes systematic error in themeasure of the distance between a user and a party on that dimension.”4 This introducesthe likelihood of bias in the rendering of user-candidate alignment. This finding calls intoquestion not only the reliability of the Kieskompas method, but also its validity. In theabsence of dimensions that have been empirically validated, the representation of ideologicalspace and thus the proximity between user and candidates is suspect.

Although the Kieskompas method is the dominant application of Euclidean distance inVAA design, there are alternatives which arguably offer a more robust rendering of proximityin Euclidean space (Otjes and Louwerse, 2014; Germann et al., 2015). In order to evaluatethe efficacy of such alternatives, however, it is first necessary to establish a framework forfor the validation of VAA algorithms.

2.3 Computational epistemology in algorithm design

Efforts to validate the design of a VAA algorithm presuppose that we have a clear under-standing of the objective of VAAs despite a relative sparsity of critical inquiry into thenormative function of VAAs.

Fossen and Anderson (2014, p. 245) posit VAAs as operating under the premise “thatstrengthening democracy is a matter of ensuring that the support for parties (expressedin votes) more accurately reflects the existing preferences of voters.” This reflects a tacitconsensus within the literature that the primary objective of a VAA is to narrow “a‘competence gap’ between how well-informed voters actually are, and how well-informedthey would need to be for the electoral process to function properly” (Anderson and Fossen,2014, p. 218). The ubiquitous reasoning that informs VAA design rests on the premise thatenhancing electoral literacy necessarily advances the practice of democracy. The corollary tothis logic, however, is “that elections are in essence about aggregating the policy preferencesof voters” (Fossen and Anderson, 2014, p. 245), which is at odds with one of the mostconsistent findings in both the Columbia and Michigan schools, i.e. that citizens do not voteexclusively on the basis of public policy (Lazarsfeld, Berelson and Gaudet, 1948; Campbellet al., 1960; Converse, 1964; Delli Carpini and Keeter, 1996; Caplan, 2006).

To demonstrate the scope limitations of VAAs, we draw on data from the 2011 Cana-dian Election Study (CES), which included the following items in its post-election studyquestionnaire:

4The argument made by Cochrane (2011) is applied to the inaugural version of Vote Compass, launchedduring the 2011 Canadian federal election, but the critique speaks directly to the Kieskompas method thatwas used in the Vote Compass algorithm at the time. None of the subsequent iterations of Vote Compasshave used this method.

16

Page 30: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

PES11_82: Have you used the Vote Compass website?

PES11_83: According to the Vote Compass, which party were you

closest to?

Though the sample size of the subset of respondents who used Vote Compass is quite small(n = 391), the results are instructive. For the sake of argument, we measure “misclassification”as instances in which the declared vote choice of a CES respondent (PES11_6) differs from theirself-reported outcome in Vote Compass, i.e. the party to which they perceived themselves tobe closest in the application’s two-dimensional visualization of ideological space. By thismetric, voters for the New Democratic Party (NDP) were most often misclassified, with only29 per cent of NDP voters self-reporting an outcome in Vote Compass that matched theirself-reported vote choice, as compared with 74 per cent of Liberal voters and 61 per cent ofConservative voters.

That NDP voters were substantially more likely to be misclassified in the 2011 Canadianfederal election edition of Vote Compass is not exclusively a function of random error. Resultsfrom an OLS regression model run against the same dataset show that the main determinantof misclassification in Vote Compass was support for the NDP leader, the late Jack Layton(see Table 2.1). This is consistent with the narrative of the 2011 Canadian federal electioncampaign, wherein the surge in support for the NDP was predominately attributed to thepersonal charisma and popularity of ‘Le bon Jack’ (Fournier et al., 2013).

It is unsurprising for students of electoral politics that affinity for a given party leader canhave a more substantial effect on vote choice than do policy preferences. But it does highlighta crucial limitation of VAAs in that their preoccupation with issue positions occludes themuch broader array of considerations that influence vote choice. A critical constraint of allVAAs—at least insofar as they are currently imagined—is the omission of said considerations.When estimating the proximity of users to parties, VAA aggregation algorithms do notfactor into their calculations perceptions of party leaders, party identification, issue salienceor incumbency. Moreover, VAAs echo the commitments parties make during an electioncampaign, not necessary whether voters believe that a given party—by virtue of its sizeor other features—is likely to actually realize the policy change advocated in its platformBélanger and Meguid (2008). These factors complicate our misclassification measure, butdo not entirely undermine its potential utility as an indicator of the validity of aggregationalgorithms. Notwithstanding other considerations we generally expect a certain degree ofconsonance between users’ politics and their vote choice, particularly when aggregatingacross users. To that end, the misclassification measure is more useful when applied tothe aggregate position of users who share a common vote intention. But inferences aboutthe reliability and validity of aggregation algorithms are necessarily constrained by theparameters of said algorithms.

17

Page 31: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Table 2.1: “Misclassification” in the 2011 Canadian federal election edition of Vote Compass

Being misclassified in Vote Compass(1) (2) (3)

Woman 0.076 0.069 0.043(0.217) (0.217) (0.222)

Age −0.061 −0.058 −0.057(0.042) (0.042) (0.043)

Age2 0.001 0.001 0.0005(0.0004) (0.0004) (0.0004)

Below high school −0.146 −0.092 −0.320(0.887) (0.893) (0.918)

University degree 0.470∗∗ 0.485∗∗ 0.387∗

(0.225) (0.226) (0.233)Recent immigrant −1.702 −1.724 −1.619

(1.101) (1.104) (1.113)Urban −0.176 −0.178 −0.240

(0.218) (0.218) (0.225)Low income −0.431 −0.446 −0.408

(0.433) (0.438) (0.446)High income −0.132 −0.125 −0.095

(0.233) (0.233) (0.239)Political awareness −0.388 −0.201

(0.561) (0.576)Political interest −0.264 −0.769

(0.659) (0.699)Liberal partisan −0.086

(0.283)Conservative partisan 0.348

(0.383)NDP partisan 0.164

(0.357)Liberal leader 0.526

(0.562)Conservative leader −0.334

(0.436)NDP leader 1.330∗∗

(0.600)(intercept) 1.037 1.342 0.623

(1.013) (1.114) (1.201)N 391 391 391Log Likelihood −254.584 −254.192 −248.373AIC 529.168 532.383 532.745Source: Canadian Election Study, 2011.∗p .1; ∗∗p .05; ∗∗∗p .01

18

Page 32: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

2.3.1 Return to scale

Recent inquiry into how to overcome the deficiencies inherent to the Kieskompas methodhave raised the issue of scale validation. Germann et al. (2015, p. 218) argue that “an idealsolution would involve the pre-administration of the VAA questionnaire to a representativetest sample before the VAA launch. On the basis of this survey, we could conduct extensivepsychometric testing, and thereby define the spatial map.” Indeed, fielding a pilot studyand then using factor analysis to determine how well each item in the questionnaire loadson a given dimension would contribute substantially toward improving the reliability ofthe Kieskompas method. It would ensure—by appealing to empirics rather than a priorireasoning—that survey items selected for inclusion in a VAA are consistent with the latentideological constructs that structure the representation of user-candidate alignment inlower-dimensional space. Essentially the pilot study serves as training data for a VAA.

In making the case for ‘dynamic scale validation’, Germann et al. (2015, p. 218) arguethat “we must assume that early bird data provides a reliable indication of patterns foundover the full course of a VAA.” The same assumption holds for pilot study results. It isnecessary to assume that the factor loadings derived from the pilot study approximate withthose from the VAA itself. Data from pilot studies fielded for Vote Compass support theassumption of relative consistency between the factor loadings from the pilot study and thosefrom the VAA, notwithstanding the substantial variation in sample size. As a demonstration,Figure 2.2 graphs the absolute difference in the factor loadings between the 2015 Canadianfederal election pilot study and its complement, the 2015 Canadian federal election editionof Vote Compass. The resulting box-plot suggests that the variation is unremarkable, with amaximum difference of 0.18.

Training algorithms using pilot data strike us as a basic and fundamental prerequisitein VAA design. To rely on ex ante estimates is not only methodologically problematic,but practically irresponsible given the potential influence of VAAs on vote choice (Enyedi,2016; Gemenis and Rosema, 2014; Fivaz and Nadig, 2010). By selecting survey items onthe basis of factor analysis run on pilot data, VAAs draw on empirics to ensure that eachitem loads on the dimension with which it is associated (e.g. social, economic, etc.) andscales in the proper direction (e.g. left, right, etc.), thus controlling in part for the statisticalerror identified in the Kieskompas method (Cochrane, 2011; Otjes and Louwerse, 2014) andensuring that VAAs are, in fact, capturing the dimensions they purport to represent.

The dimensional reduction technique in which survey items are selected on the basisof factor loadings derived from pilot data serves as a partial corrective to the reliabilityproblem common to VAA algorithms that seek to represent a large number of attitudinaland policy-related questions in lower-dimensional space. It is partial in that there may be

19

Page 33: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Figure 2.2: Absolute difference between Pilot and VAA factor loadings

0.00

0.05

0.10

0.15

Dim 1 Dim 2 Dim 3

Ab

solu

te d

iffe

ren

ce in

fa

cto

r lo

ad

ing

s o

f e

ach

qu

est

ion

be

twe

en

Pil

ot

an

d V

ote

Co

mp

ass

Sa

mp

le

Source: Vote Compass 2015 Canadian federal election pilot (n=1,444); Vote Compass 2015 Canadian federalelection edition (n=1,829,478)

substantial variation in the factor loadings of each item or that items may load on morethan one dimension being represented in a given VAA.

It also introduces a constraint, however, that may call into question the validity of thisapproach. To select survey items purely on the basis of their associated factor loadings maysubstantially narrow the range of items that are candidates for inclusion in a VAA. This isespecially problematic considering the salience of wedge politics in modern election campaigns,characterized by candidates priming specific issues that do not parse the electorate alongconventional ideological cleavages (Hillygus and Shields, 2008). By definition, wedge issues—including asylum seekers issues in Australia (Goot and Sowerbutts, 2004) or separatism inCanada and Spain (Medeiros, Gauvin and Chhim, 2015)—do not fit well in a two-dimensional“‘one-size fits all’ model” (Otjes and Louwerse, 2014, p. 267). Given the primacy of wedgeissues in the discourse of many election campaigns, their absence from the survey itemswould pose both a real and perceived challenge to a VAA’s validity. VAA algorithms musttherefore integrate a means to include items that do not load well on the given dimensionswhile minimizing the noise they introduce into user and candidate plot estimates.

2.3.2 Inductive dimensional modeling: the Vote Compass method

In an effort to address the validity concerns that emerge when correcting for the reliabilityproblem inherent to the Kieskompas method, we posit an inductive dimensional modelingtechnique that forms the basis of the Vote Compass algorithm. This approach, which we

20

Page 34: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

term the Vote Compass method, weights each survey item by its contribution to a givendimension (or dimensions as may be the case), as determined by pilot data.

As with our proposed corrective to the Kieskompas method, the Vote Compass methoddetermines dimensions by way of factor analysis run on pilot data collected prior to thelaunch of a given version of the application. In this paragraph, we lay out the modelassumptions and main steps followed to derive the abstract political dimension. Denote p asthe number of propositions. Let X be a vector of a user’s responses to p propositions whereX ∈Rp. Assume that we can find Z is a vector of k latent variables (i.e. factors), whichinfluence the users’ responses, such that Z ∈Rk. Then the relationship between X and Zcan be expressed as follows:

X = µ+ ΛZ + ε, (2.4)

where Z ∼ N(0,I), ε ∼ N(0,Ψ), and Λ ∈ Rp×k is a matrix of the factor loadings.Moreover, we also assume that cov(Xi,Zj) = 0 for i = 1, . . . ,p and j = 1, . . . ,k. Fromequation 2.4, it follows that X|Z is distributed as N(µ+ ΛZ;Ψ). Using the multivariateNormal properties, the joint distribution of (X,Z) follows N(µxz,Σ) where

µxz =

0

)and Σ =

(ΛΛT + Ψ Λ

ΛT I

). (2.5)

In order to fit this model, we used the package “factanal” in R. The loadings were estimatedusing the maximum likelihood method, to which we then applied the varimax rotation.We constructed each theoretical dimension on the basis of how well it loaded on everyproposition.

To facilitate the inclusion of salient issues in the electoral discourse without addingundue noise to a given dimension, survey items that are highly associated with the latentdimension receive greater weight in determining both user and candidate positions on thatdimension than those that are only weakly associated with it. The item inclusion thresholdfor each dimension is set using the guideline for practical significance defined by Hair et al.(1998), which indicates a factor loading of ±0.3 indicates the item is of minimal significance,±0.4 indicates it is more important, and ±0.5 indicates the factor is significant. Once thepool of propositions for a specific dimension was determined, we inductively project thatsubset of propositions into a single dimension. This single dimension for an i-th user wasobtained by the regression method, that is

zi = ΛTS−1(xi − x)/sx (2.6)

where x, sx, and S are a vector of the mean, a vector of the standard deviation, and asample correlation matrix of users’ responses for a subset of the propositions; and Λ is a

21

Page 35: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

vector of estimated loadings whose length is equal to the number of the propositions in thesubset.

Having posited a new algorithm for estimating user alignment with candidates for election,we now turn to developing a set of indicators to test the reliability and validity of saidalgorithm.

Building a VAA is not akin to estimating a spatial model. The purpose is not the same.The latter aims at capturing an abstract space on which scholars can test and developtheories. For instance, one can be interested about whether people vote in average for theparty that has the closest proximity to them ideologically or, alternatively, the one thatwould push policies in the ideological direction they wish (see Merrill and Grofman, 1999).Or to the number of ideological dimensions existing in a specific context (Medeiros, Gauvinand Chhim, 2015). However, when it comes to VAA, the goals are much more diffuse. Thefocus of VAAs is not ideology, but policy issues. The aim therefore is not to choose questionsaccording to theoretically-defined ideological dimensions, but rather to based on their policyrelevance in a given election context. While most social research is interested in uncoveringgeneral laws of phenomena, a public tool like VAAs must have concerns for the exceptions.The mere fact that some individuals might use the information provided by VAAs to informtheir vote choice should force VAA builders to put extreme care into the results shownto the users and the theoretical assumptions underlying their development procedure andalgorithms. From the selection of the questions used in the tool to the positioning of partiesand extraction of dimensions, VAA developers must keep in mind that no single theoryapplies to all individual users.

2.4 Validity indicators for VAA algorithms

What makes for a valid VAA algorithm? There is no explicit consensus as to the frameworkby which one evaluates the accuracy of a VAA algorithm’s outputs, but Wall et al. (2009, p.211) argue that “an unbiased [VAA] should offer advice that corresponds to some degree withusers’ stated voting intentions.” Comparing a VAA’s outputs to a user’s stated vote intention,in which a ‘correct’ result is one where a user’s result matches their vote intention, is a usefuldiagnostic but not necessarily an indicator of reliability. First, candidates do not always takepolicy positions that reflect the preferences of their base (Sniderman and Highton, 2011).Second, if the objective were to predict vote intention, electoral studies would suggest thatVAAs would be better served by soliciting users’ socio-demographic attributes, attitudestowards valence issues, or perceptions of the state of the economy rather than positions onpublic policy issues. As we have previously argued, vote intention is not a function of publicpolicy preferences alone. Third, such a measure cannot be readily compared in different

22

Page 36: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

electoral contexts as it is highly contingent on the number of candidates included in a VAAand the relative ideological distance between said candidates. For instance, the percentageof correct predictions will inevitably be higher in a two-party context such as the UnitedStates or a three-party context such as Australia than it would be in New Zealand, whereten or more parties can meaningfully contest an election. Similarly, predictions are moredifficult in contexts where there are candidates competing for similar ideological positions(and are thus densely clustered) than in highly polarized contexts in which each candidate issituated in relative isolation of others.

That said, the intuition that animates Wall et al. (2009) is instructive, which is “ifsupporters of a centrist party are advised to vote for a radical party, this indicates that theremay be something wrong with the way in which advice is generated.” The implicit argumentis that the constitution of the ideological landscape, structured by latent dimensions, isan indicator both of the reliability and validity of a VAA. This is best captured not bythe number of instances in which a user’s vote intention concurs with their alignment to acandidate as rendered in a VAA, but rather by the relative distance of the point estimateswithin the spatial model. Two indicators emerge from this perspective. The first, whichwe term candidate-to-partisan proximity, is the sum of the aggregate distance between acandidate and the subset of users who indicated their vote intention for that candidate. Thismeasure is less sensitive than that proposed by Wall et al. (2009) to the number of partiesin a given election campaign or to the extent to which they are ideologically proximate. Thesecond indicator, which we term candidate-to-candidate proximity, looks beyond the pointestimate of the individual user and focuses instead on the positioning of the candidates.This measure controls for the ambiguity in users’ declared vote intentions by examining therelative distance between candidates in lower-dimensional space.

For both indicators, consistency in distance between point estimates indicates reliability,whereas the relative proximity of point estimates to one another speaks to validity. Forexample, imagine a user with a declared vote intention for a center-left party. If the distancebetween that user and the center-left party is consistent notwithstanding the selection ofsurvey items, it indicates reliability in the application of dimensionality reduction techniques.If the center-left party is itself always three times as close to the radical left as it is to theradical right in lower-dimensional space, it indicates a valid constitution of the ideologicallandscape. If we assume that the underlying ideological structure is constant at a givenmoment in time, then we must accept the corollary that an accurate VAA algorithm isone wherein the relative distance between the candidates does not change substantiallyirrespective of which survey items are included in a VAA. A valid VAA algorithm shouldfurthermore generate a rendering of the ideological landscape that political observers would

23

Page 37: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

recognize in that it accurately reflects the structure of the partisan competition in a givenelectoral context.

2.4.1 Data and Method

To test the proposed validity indicators, we use respondent data from Vote Compass, a VAAwhich has been run in elections in Canada, the United States, Australia, and New Zealand.For the purposes of this analysis, we draw on Vote Compass respondent datasets from thefollowing cases: 2011 Canadian federal election (n=1,986,457), the 2012 U.S. presidentialelection (n=31,880), the 2013 Australian federal election (n=1,427,800), and 2014 NewZealand general election (n=332,753).

We test the robustness of the Vote Compass method by addressing one of the mostsalient and pervasive critiques of VAA algorithms: the volatility associated with inclusionand exclusion of survey items. We argue throughout this paper that said volatility reflectsa fundamental deficiency in terms of how dimensionality is modeled in most VAAs. Todemonstrate how inductive dimensional modeling can serve to redress this deficiency, we usesimulations to test the effect of survey item inclusion on our validity indicators.

The robustness of the Vote Compass algorithm to the inclusion and exclusion of surveyitems is assessed by calculating both the candidate-to-partisan proximity and candidate-to-candidate proximity in each of the four cases under study for all possible combinations ofa set number of survey items. For instance, when one of the 30 survey items is removed,there are 30 combinations of 29 questions that the VAA algorithm can draw on to modelideological dimensions. Due to the exponential number of possible combinations when morequestions are removed, we randomly select a subset of these possible combinations to reducethe computational load. The curves in Figure 2.5 represent the results of non-parametriclocal regressions (lowess) used to summarize the results of these simulations. The 95%confidence intervals are calculated using a t-based approximation.

We calculate candidate-to-partisan proximity by taking the sum of the average distancebetween parties and their voters for random samples from the reference population. Foreach sample, the measure is formulated as follows:

mproximity =C∑j=1

1ncj

ncj∑i=1

D∑k=1

(dik − dcjk)2 ,

where C is the number of candidates, ncj is the total number of supporters for candidate cj ,and D is the number of theoretical dimensions.

24

Page 38: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

The candidate-to-candidate proximity is defined as the absolute difference in candidatedistances between the sample s and the reference population. It can be expressed as:

m(s)structural = |m

(s)candidate −m

(ref )candidate| ,

where m(s)candidate and m

(ref )candidate are the candidate distance measures for the sample s and

the reference population respectively; and

m(·)candidate =

∑ij

D∑k=1

(dik − djk)2 for i,j ∈ {1, . . . ,P} .

2.4.2 Results

To interrogate the structure of the ideological landscape we plot the positions of the userpopulation for each of these four editions of Vote Compass using a fixed scale approach—akinto the Kieskompas method—and the factor weight approach used in the Vote Compassmethod. User plots are color-coded based on the declared vote intention of the user. Themean position of voters for each party as well as the parties themselves are also plotted. Wecontrast a fixed scale approach similar to the Kieskompas method versus factor weights asper the Vote Compass method. These visualizations allow us to compare how candidate-to-partisan and candidate-to-candidate proximity vary on the basis of how dimensionality ismodeled.

The visualization of Vote Compass structured according to the two distinct approachesyields only small differences in average candidate-to-partisan proximity, ranging from 0.01to 0.04. In each of the four cases, however, the candidate-to-partisan proximity is smaller(meaning partisans are closer to the candidate for whom they have declared their voteintention) using the Vote Compass method.

More illustrative of the implications of inductive dimensional modeling, however, is thevisualization of ideological space itself and, particularly, the placement of the political partieswithin that space. A qualitative review of these diagrams implies increased validity for theVote Compass approach. We see, for example, that inductive dimensional modeling appliedto the 2011 Canadian federal election campaign offers a much more intuitive representation ofthe Canadian political landscape, with the New Democratic Party, the Green Party, and theBloc Québécois each occupying distinct positions on the social and economic dimensions (seeDiagram A). Similarly, when the technique is applied to the 2012 United States Presidentialelection, we see the emergence of a single left-right dimension (see Diagram B). In the contextof the 2013 Australian federal election the differences are less pronounced, but both theLabor Party and the Liberal/National Coalition are drawn toward the origin (see DiagramC). In New Zealand, the application of inductive dimensional modeling positions United

25

Page 39: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Figure 2.3: Comparative assessment of fixed scale versus factor weights

(a) 2011 Canadian federal election

(b) 2012 U.S. presidential election

26

Page 40: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

(c) 2013 Australian federal election

(d) 2014 New Zealand general election

27

Page 41: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Future as a more centrist party than the reigning National Party, draws ACT in from themargins, and positions the Labor and Maori parties closer together, as well as the Greensand Mana. These plots reflect intuitive understandings of these parties’ ideological positions.

Overall we do not observe substantial variance on either of our proposed validity indicatorswhen we introduce inductive dimensional modeling; however, the implications are very likelyunderstated given that dimensional association and directionality in the fixed scale diagramswere derived using factor analysis rather than the approach used in the Kieskompas methodas we cannot infer the a priori reasoning other VAA designers would apply when devisingscales. We illustrate this point in Figure 2.4, where we plot the range of potential positionsfor each party by simulating all possible scaling decisions. These illustrations demonstratethe degree of measurement error associated with a reliance on a priori reasoning and contrastthis approach with inductive dimensional modeling.

Figure 2.4: Scaling using inductive dimensional modeling versus a priori assumptions

(a) Vote Compass 2015 Canadian edition (b) Vote Compass 2013 Australian editionInducve Dimensional Modeling A priori assumpons

(c) Vote Compass 2014 New Zealand edition

Where we do see evidence of increased reliability and validity associated with the VoteCompass method is in robustness to survey item inclusion (see Figure 2.5). When inductivedimensional modeling is applied, we observe little variation in candidate-to-partisan proximity.Higher levels of variance present themselves in terms of candidate-to-candidate proximity,though further research is necessary to determine tolerance thresholds for each of theseindicators.

28

Page 42: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Figure 2.5: Robustness test: survey item inclusion

0.00

0.25

0.50

0.75

1.00

1 2 3 4 5 6 7 8 9 10

No. of survey items excluded

Candidate-to-partisan proximity

Canada (2011) United States (2012) Australia (2013) New Zealand (2014)

(a) Candidate-to-partisan proximity

0.00

0.25

0.50

0.75

1.00

1 2 3 4 5 6 7 8 9 10

No. of survey items excluded

Candidate-to-candidate proximity

Canada (2011) United States (2012) Australia (2013) New Zealand (2014)

(b) Candidate-to-candidate proximity

Source: Vote Compass Canada 2011, United States 2012, Australia 2013, New Zealand 2014.

The key finding that emerges from this test, however, is support for the Vote Compassmethod as a corrective to a persistent problem in VAA design wherein statement selectionintroduces volatility into estimates of the alignment between a user and candidate. Usinginductive dimensional modeling, we demonstrate that different combinations of survey itemscan be included in a VAA without introducing substantial variance into candidate-to-partisanproximity. On this basis we posit dimensionality as the site of both reliability and validityin VAA algorithms.

2.5 Conclusion

In this paper we surfaced a number of critical limitations among dominant approaches toVAA algorithm design. In particular, we explored the effect of survey item selection on VAAoutputs. Although originally posited as a means to redress the sensitivity of the Manhattandistance measure to the selection of survey items in a VAA, the Kieskompas method’sapplication of the Euclidean distance measure introduced the potential for excessive errorin its point estimates. In the case of the Kieskompas method, the bias introduced into theaggregation algorithm by survey item selection is reduced by way of dimensional reductionbut simultaneously exacerbated by the error stemming from the association of survey itemswith predetermined dimensions on the basis of a priori considerations.

29

Page 43: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

We then posited an alternative method as a corrective to the deficiencies in the Kieskompasmethod. We called this approach the Vote Compass method in reference to the VAA forwhich it was developed. The Vote Compass method proffers two fundamental enhancementsto the application of the Euclidean distance measure in aggregation algorithms. First, itposits an inductive approach to dimensional modeling that utilizes training data in order tocalibrate the dimensions rendered by the VAA. Said training data is most readily obtained byway of a pilot study in advance of the launch of a VAA which contains the candidate surveyitems for inclusion. Factor analysis run against the pilot study findings will indicate howwell each prospective survey item loads on the specified dimensions and also the directionin which the item scales. This technique controls for the ambiguity in purely theoreticaldeduction of the dimensional associations of each survey item in a VAA. Second, as a furtherrefinement, the Vote Compass method uses the results of the factor analysis to weight eachsurvey item’s contribution to the dimensions represented in the VAA. This ensures that thedimensions in a VAA are themselves a representation of the ideological landscape as it isrendered in the public imagination and captured in the findings of the pilot study.

In order to evaluate the contribution of the Vote Compass method to the improvementof VAA aggregation algorithms, we devised a set of validity indicators and applied themagainst both actual and simulated VAA results using both the Kieskompas and Vote Compassmethods. The Vote Compass method demonstrates modest improvements over the alternativeon each of the validity indicators. Moreover, the Vote Compass method proves to be robustto survey item inclusion and exclusion, thus addressing a critical shortcoming in mostVAA aggregation algorithms. Whereas most VAAs demonstrate substantial volatility givenvariation in survey items, the Vote Compass method produces relatively stable results undervarious combinations of survey items. Further study is required to determine whether thefindings from this paper are generalizable to VAA datasets beyond Vote Compass. A studyof alternative VAAs will further illuminate the ostensible inconsistency between dimensionalreduction techniques that rely purely on a priori considerations and those that are empiricallyinformed.

In putting forward candidate-to-candidate and candidate-to-party distance measures asvalidity indicators for VAA aggregation algorithms, we posit the humble beginnings of aframework for evaluating VAA design. Naturally these proposed measures warrant furthertheoretical and empirical scrutiny before we can advocate for their adoption as standards orbest practices. But at a minimum we hope this research prompts further scholarly inquiryinto the validity of VAA outputs. Given the pace at which VAAs are emerging, their uptakeby publics around the world, and their demonstrated influence on electoral politics, continuedscrutiny of VAAs is warranted and the practice of VAA design must be held to be the highestpossible standard.

30

Page 44: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Chapter 3

On the External Validity ofNon-Probability Samples: TheCase of Vote Compass

3.1 Introduction

How effective are non-probability samples at measuring public opinion? Conventional wisdomholds that only probability samples can be generalized to a population of interest suchas to allow statistical inferences about said population. However, emergent modalities ofcommunication are increasingly diverse and esoteric, compounding the potential for coverageerror and non-response bias in probability samples (De Heer and De Leeuw, 2002; Keeteret al., 2006; Kohut et al., 2012; Holbrook et al., 2007; Steeh et al., 2001; Council et al., 2013).

The same advances in information and communication technologies that have significantlycomplicated the collection of probability samples may, however, bolster the potential forderiving representative inferences about a population of interest using non-probabilitysamples. Specifically, emergent technologies have enabled the collection of non-probabilitysamples of much greater size at faster rates and lower cost than conventional techniques forprobability sampling.

Nevertheless, non-probability samples are widely considered to be inferior to probabilitysamples in that respondents self-select, resulting in an inherently non-random sample.Though techniques such as raking (Battaglia et al., 2009), matching (Vavreck and Rivers,2008), post-stratification weighting (Dever, Rafferty and Valliant, 2008; Gelman et al., 2007),or propensity score weighting (Lee, 2006; Lee and Valliant, 2009; Schonlau et al., 2009) arecommonly applied to attempt to adjust for bias in non-probability samples purporting to

31

Page 45: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

make externally valid inferences, many public opinion researchers contend that statisticalinference is impossible without probability sampling (Baker et al., 2013).

This paper argues that, under specific circumstances, certain types of non-probabilitysample may be capable of yielding reliable inferences about a population of interest. Todemonstrate this argument, it analyzes the inferences derived from the most extraordinaryprobability and non-probability samples collected during the 2015 Canadian federal electioncampaign—the Canadian Election Study (CES) and Vote Compass, respectively. It uses theelection outcome as a benchmark and models the observations collected from each sample toassess how accurately they are able to the forecast the distribution of the vote.

3.2 The case for non-probability sampling

Though non-probability samples are often dismissed as ‘unscientific’, recent scholarly inquiryinto the suitability of non-probability sampling for statistical inference has challenged thisperspective (Baker et al., 2013; Brick, 2011). Resistance to non-probability sampling islargely grounded in notions of sample randomness as the fundamental criterion for externalvalidity, but this is arguably both a theoretically and practically tenuous position.

Statistical theory does not posit random sampling as a requisite condition for statisticalinference, but rather the most generally accepted method. Smith (1983, p. 402) positsthat post-stratification techniques applied to non-random samples can yield externally validinferences so long as neither the known prior values nor the selection variable containsinformation beyond that in the post-stratifying variables. Thus, if the factors that determinethe presence or absence of a member of a given population in a non-probability sample areuncorrelated with the variables of interest in a study, or if they can be fully controlled for bymaking adjustments to the sample, then externally valid inference is theoretically possible.

At the same time, the credibility of probability sampling as a technique which canreliably generate statistical inferences about a population of interest is increasingly subjectto critique. As coverage error and non-response rates increase for probability samples, sotoo do questions as to whether such samples still meet the criteria for probability samples.Edgington (1966) argues that probability samples “rarely meet the assumption of randomsample that conventional statistical hypothesis-testing procedures are generally believed torequire.” Few if any sampling techniques available to public opinion researchers are capableof producing sampling frames which meet the criteria for random sample, specifically thatevery member of the population of interest has a non-zero probability of inclusion in theirsamples and that these probabilities are known. Moreover, non-response would have to bezero unless one were to assume that non-response was uncorrelated with answers to the

32

Page 46: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

survey question of interest—a highly problematic assumption in most cases of conventionalsampling techniques.

However, the empirical evidence amassed by scholars has largely supported, with fewexceptions (Braunsberger, Wybenga and Gates, 2007), arguments about the reliability ofprobability samples over non-probability samples (Yeager et al., 2011; Berrens et al., 2003;Brick, 2011; Chang and Krosnick, 2009; Malhotra and Krosnick, 2007). For example, Yeageret al. (2011) compare the estimates derived from a series of RDD telephone surveys andInternet surveys against benchmarks largely from “official government records or high-qualityfederal surveys with high response rates” (p. 712). They find that probability samplesconsistently yielded more accurate results. Setting aside that “administrative records areoften incomplete and out of date, and typically the data are not missing at random” (Brick,2011) and that the benchmark federal surveys used in the study were ostensibly conductedusing sampling modes similar or equivalent to the RDD telephone surveys, it is noteworthythat the authors rely on comparable sample sizes between the probability and non-probabilitysamples. This belies a nearly ubiquitous assumption among public opinion researchers thatnon-probability samples should be evaluated using sampling frameworks which are equivalentto those of probability samples.

Scholarly comparisons of probability and non-probability samples generally assert a falseequivalence between the two. They assume that equivalent sampling frameworks can be usedto empirically validate the accuracy of both probability and non-probability samples (interms of their ability to accurately estimate a given variable of interest). While this makesfor an ostensibly logical methodological control within a research design (e.g. matchingsample sizes), it unreflexively transposes certain attributes and characteristics of probabilitysamples onto non-probability samples.

Making the case for the plausibility of non-probability samples as having external validityshould not by extension imply that the differences between probability and non-probabilitysamples are immaterial or inconsequential. On the contrary, there are good reasons to believethat the selection effects in most non-probability samples are more pronounced than theyare in probability samples. In probability and non-probability samples of equivalent size,the former is by definition more likely to reflect the actual distribution of the population ofthe interest. In the context of a probability sample, assuming the sample to be approachingrandomness, there is a threshold at which additional sample should produce negligiblestatistical power. If an equivalent threshold exists in a non-probability sample, it is almostcertainly much higher than in a probability sample.

According to the central limit theorem, the distributions within two separate randomsamples should be roughly equivalent and should reflect the distributions within a populationof interest. The same cannot be said of non-random sample. The distributions within two

33

Page 47: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

separate non-random samples of equivalent size will almost invariably be different fromone another and also deviate from a given population of interest. However, depending onthe sampling method, additional observations may contribute significantly to the samplecomposition, thus enhancing the potential for weighting techniques to arrive at morerepresentative inferences about a given population.

Many non-probability sampling frameworks consistently reproduce systematic selectionbias and thus would not realize any gains in external validity as a result of additional sample.Other frameworks, however, may result in the diversification of the sample composition asthe size increases. No matter what the sample size, that respondents self-select into non-probability samples is constant, but it is conceivable that the pattern of self-selection—ergo,the non-response bias itself—is variable across time. In said cases, increasing the size of anon-probability sample may have the effect of reducing non-response bias.

The prospect that increased sample size may, in certain contexts, further diversify thecomposition of a non-probability sample also has the potential benefit of reducing coverageerror. Van de Kerckhove et al. (2009) find little evidence of non-response bias even inprobability samples with low response rates; they do, however, find indications of coverageerror even in cases where the coverage rate is above 80 per cent. Depending on the samplingframework being sufficiently broad and comprehensive, it is possible that non-response bias isequally inconsequential in certain non-probability samples. Moreover, certain non-probabilitysampling frameworks may vary in terms of their coverage of a population of interest as thesample size increases. In such instances, additional sample may contribute to a reduction incoverage error.

Sample sizes of approximately 1,000 respondents have long been the convention forprobability samples used to make statistical inferences about a population of interest. Thisreflects a tacit consensus among most public opinion researchers that a peg of three per centis a reasonable margin of error. But calculating a margin of error as the inverse of the squareroot of the sample size requires an assumption that the sample is randomly selected. Sincenon-probability samples do not meet this criterion, calculating a margin of error is, at best,more complex. And yet, in comparative analyses between probability and non-probabilitysamples, there are few if any allowances for said complexity. One such allowance involvesrecognizing that the order in which non-random sample accrues is less uniform than thatof random selection. By extension, a somewhat arbitrary cap of the first n observationscollected inhibits certain non-probability sampling frameworks from amassing the requisitesample size to effectively control for obvious and non-obvious selection effects.

The upshot here is that the epistemic properties of probability samples should not beunreflexively applied to their non-probability counterparts. Baker et al. (2013, p. 99-100)observe that:

34

Page 48: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Unlike probability sampling, there is no single framework that adequately encom-passes all of non-probability sampling. . . non-probability sampling is a collectionof methods rather than a single method, and it is difficult if not impossible toascribe properties that apply to all non-probability sampling methodologies.

Accordingly, comparative analyses of probability and non-probability samples requireattention to the variant properties between them.

At most, for certain forms of non-probability sample, statistical power may be a functionof sample size, which need not be constrained according to existing conventions for probabilitysample. But at the very least, further introspection is warranted in terms of whether thecurrent framework for comparing probability and non-probability samples is appropriate. Itis possible that the different properties of probability and non-probability samples warrantdifferent theoretical and methodological considerations in their individual and comparativeevaluations. Treating the two as equivalent, either in terms of the statistical techniquesapplied in weighting the sample or the sample size and composition required in order toclaim external validity, overlooks their fundamental differences.

To that end, the following analysis relaxes the constraint of sample size equivalencebetween probability and non-probability samples. It speculates that, as the size of thenon-probability sample increases, so too does the external validity of the estimates derivedfrom the modelled data. It then compares those estimates with the results of a robustprobability sample.

3.3 Data

In demonstrating the capacity of certain non-probability samples to facilitate reliableinferences about a population of interest, the case of the 2015 Canadian federal election isinstructive. Two of the most formidable samples of their kind collected during and in relationto that particular election—the Canadian Election Study (CES) and Vote Compass—permitus to substantively interrogate several of the theoretical claims posited herein regardingcomparisons between probability and non-probability samples.

The external validity of these two samples is evaluated by comparing the accuracy ofeach sample in terms of forecasting the outcome of the 2015 Canadian federal election.This is an admittedly imperfect test. Neither instrument was designed to act as a poll orspecifically to forecast electoral outcomes. These data are nevertheless worthwhile sites ofinquiry when it comes to comparing probability and non-probability samples for a number ofreasons. First, each is arguably the highest calibre sample of its kind generated during theelection campaign—certainly in terms of sample size, but also in terms of the quality of thesampling framework. This allows us to compare exemplar probability and non-probability

35

Page 49: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

samples. Second, they each offer a high degree of transparency into the composition ofthe sample and the mechanics of the sampling framework. Notwithstanding their samplingmode, commercial polls are inconsistent at best in terms of making available raw sample,rarely if ever report weights or response rates, and have been subject to criticisms of herding(Sturgis et al., 2016; Whiteley, 2016). The absence of said information obscures detailsabout the sample that contribute to its evaluation and the verification of reported results.Third, both samples—the CES via a rolling cross-section design and Vote Compass byvirtue of continuous operation throughout the course of the 2015 Canadian federal electioncampaign—are not single point-in-time samples. As such, they are uniquely capable ofmonitoring campaign effects (Blais et al., 2000; Blais and Boyer, 1996; Johnston et al., 1996;Johnston, 1992; Johnston and Brady, 2002).

3.3.1 Canadian Election Study

A robust probability sample, the CES has functioned as Canada’s response to the AmericanNational Election Studies (ANES) since 1965. Its inaugural principal investigators includedJohn Meisel, Philip Converse, Maurice Pinard, Peter Regenstreif, and Mildred Schwartz(Kanji, Bilodeau and Scotto, 2012). As of 1997, the CES has also received support fromElections Canada, Canada’s federal election commission.

The 2015 CES includes three modes of sample collection, including a rolling cross-sectionalsample selected using a modified RDD procedure (the Campaign-Period Survey or CPS); areturn-to-sample phone-based re-interview with respondents to the CPS after the election(the Post-Election Survey or PES); and a paper-based survey that respondents to the PEScould opt to take part in (the Mail-Back Survey or MBS) (Fournier et al., 2015; Northrup,2016). The 2015 CES also included an online component, which involved recruiting panelrespondents from a sample provider and directing them to an online questionnaire, so as toadhere to the principles of probability sampling.1 The field dates, sample sizes, and responserates for each sampling mode are reported in Table 3.1 below.

Table 3.1: 2015 CES Sample Attributes

Mode Field Dates N Response rateCampaign-Period Survey (CPS) 2015-09-08 to 2015-10-18 4,202 37%Post-Election Survey (PES) 2015-10-20 to 2015-12-23 2,988 71%Mail-Back Survey (MBS) N/A 1,289 61%Online N/A 7,412 N/A

Source: Northrup (2016), Author’s calculations.

1See https://ces-eec.arts.ubc.ca/english-section/surveys/ for details.

36

Page 50: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

The CES makes available for analysis an archetypal probability sample—one with anuncommonly large sample size and methodological transparency. Its use of multiple modesalso allows comparison of live interviewer and online probability samples.

3.3.2 Vote Compass

A unique source of non-probability sample, Vote Compass is an online, survey-based instru-ment that purports to estimate a user’s alignment with the candidates running in a givenelection campaign.2 Users respond to a questionnaire relating to their political views andare then presented with a series of visualizations that represent the distance between theuser and each candidate. Vote Compass falls within a class of online instruments commonlyreferred to as Voting Advice Applications (VAAs) (Alvarez et al., 2014; Fossen and Anderson,2014), although its stewards—which include the author of this paper—argue for its exclusionfrom this definition based on a particular set of attributes (van der Linden and Dufresne,2017; Dufresne and van der Linden, 2015).

The Vote Compass questionnaire is election-specific and concentrated on the particularissues that delineate between the positions of the candidates for office in a given campaign.However, it consistently includes survey items that capture a range of sociodemographicattributes, which serve as weighting variables, as well as vote intention.

Vote Compass is typically operated in partnership with major news media organizationsduring election campaigns. Said media partners promote the initiative across their networks,which draws considerable audience. Vote Compass itself also contains numerous built-infeatures that encourage sharing across social media platforms, thus furthering its reach.

Though Vote Compass sample is self-selected by virtue of its distribution model (i.e.open online access), the size, attributes, and composition of the sample set it apart in avariety of ways from most of its non-probability counterparts. In many jurisdictions, thepublic opinion datasets collected by Vote Compass during election campaigns are severalorders of magnitude larger than any other such sample on record. Respondents provide a richbattery of sociodemographic identifiers and have unique incentives to do so honestly—theresult they receive depends on their responses.

Moreover, respondents ostensibly use Vote Compass because they are seeking an accuratereflection of how their views situate them in a given political landscape. In order to achievethis end, they must provide accurate representations of themselves. Vote Compass solicits asubstantial amount of sociodemographic and behavioural information, including numerousvariables that correspond with those included in population-level datasets such as thenational census and General Social Survey. These allow for a rigorous weighting schema tobe developed and applied to the Vote Compass sample. Given the sample size, weights can

2See http://www.votecompass.com for details.

37

Page 51: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

be developed not only for marginal distributions but, in my cases, at the level of interactionof two or more socio-demographic variables. Thus, instead of separately weighting themarginal distribution of gender and age in the population of interest, Vote Compass datacan be weighted by cross sections of age and gender.

Finally, the composition of the Vote Compass sample may uniquely control for selectioneffects not readily addressed by demographic weights. Given the context in which VoteCompass operates, it would be reasonable to assume that participation skews towardspolitically interested individuals. Indeed, in an analysis of Vote Compass data, Johnston(2017) finds that early respondents demonstrate particularly high levels of political interest.By the end of the campaign, however, average political interest among respondents declinesby as much as 20 percent. Johnston (2017, p. 103) argues that motivational dynamics are atplay:

By implication, early participants—less shy about their choice or more likely tohave made one—are more interested in politics. They have less need of the tool’simmediate benefit, the ‘compass’ itself. Late participants, evidently, are morelikely to be making up an informational deficit. What they do not yet have, ordo not want to reveal, is a party preference.

That the composition of the sample changes over the course of the campaign may resultin the reduction of coverage bias over time and with the accrual of additional sample. Indeed,as Johnston (2017, p. 99) notes:

In any case, the correspondence of the VC to published commercial polls isstriking. The aggregate of self-recruited VC participants brought themselves toroughly the same place as survey respondents were brought by the aggregate ofcommercial polls. Even the dynamics are similar, with certain telling exceptions.

Taking together the size, attributes and composition of the Vote Compass sample, furtheranalysis is warranted as to how said sample compares, in terms of its external validity, withprobability samples such as the CES.

3.4 Method

The result of the 2015 Canadian federal election serves as a benchmark of public opinion ata particular point in time. The degree to which the CES and Vote Compass can accuratelyestimate the distribution of opinion at said point in time is an indicator of their capacity torender representative inferences about a given population of interest—in this case, Canadianeligible voters.

38

Page 52: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Table 3.2: Vote intention variable design

CES Vote Compass

Vote intentionWhich party do youthink you will votefor?

If the Canadianfederal election wereto take place today,which party would youvote for?

Vote leaning (if vote inten-tion is don’t know / unde-cided)

Is there a party youare leaning towards?

Which party are youleaning toward?

Although, for reasons previously articulated, using forecasts as tests of external validityof the CES and Vote Compass samples are but one of many possible modes of interrogation,it is nevertheless a worthwhile test to use actual outcomes as a benchmark rather than relyon comparisons with previous studies. Tests that compare sample estimates with previoussample estimates always run the risk of reproducing a systemic bias inherent to a givensampling framework or class of frameworks.

The conventional evaluation of the accuracy of commercial polls during an electioncampaign involves measuring the difference between the actual and predicted vote share foreach poll. This is usually done by taking the mean absolute error (MAE) of the differencebetween predicted vote share and observed vote share, which is given as:

MAE =1n

n∑j=1|yj − yj |

Where n is the number of candidates whose vote share is forecasted, y is the actualobservation and y is the predicted value.

This is a rather crude test, however, particularly given that the richness of both the CESand Vote Compass permit a more thorough evaluation of their accuracy and, by extension,their external validity. To that end, a more substantive framework for analysis follows.

3.4.1 Vote share projections

Given that the sample size of most commercial polls does not permit prediction at the levelof individual electoral districts—referred to as ‘ridings’ in the Canadian context—the mostcommon forecasting practice is to report overall vote share.

Vote share estimates are derived from both the CES and Vote Compass using the statedvote intention of the respondent and, where available, the vote leaning variable. Voteintention is captured using the survey items detailed in Table 3.2 below.

39

Page 53: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

The construction of the vote intention variables is sufficiently similar between the CESand Vote Compass to compare results.

To evaluate the forecasting accuracy of the vote share projections from each sample, theroot-mean-square error (RMSE) is used, which is the square root of the average of squareddifferences between prediction and actual observation, given as:

RMSE =

√√√√ 1n

n∑j=1

(yj − yj)2

Where n is the number of candidates whose vote share is forecasted, y is the actualobservation and y is the predicted value.

RMSE is arguably a more robust measure of forecast accuracy than MAE because theerrors are squared before they are averaged, and thus increases with the variance of thefrequency distribution of error magnitudes. Unlike MAE, RMSE effectively gives greaterweight to larger errors.

Given the sample size of most commercial polls, the ability to accurately forecast electoraloutcomes in terms of vote share is constrained to national and regional geographies, beyondwhich the sample becomes unreasonably distorted.

3.4.2 District-level projections

Due to sample size constraints, results at the electoral district level are typically comprisedof modelled estimates derived from vote share projections, rather than outright district-levelprojections. However, both the online wave of the CES and Vote Compass associatedrespondents with their electoral districts.3 This permits a more fine-grained analysis offorecast accuracy and arguably a more robust indicator of the representativeness of thesample. In the case of the 2015 Canadian federal election, it effectively means predictingelectoral outcomes in 338 district-level races as opposed to one federal-level race.

Canada’s parliamentary system is rooted in the Westminster tradition, wherein Par-liament consists of the Crown and an upper and lower legislative Chamber. Membersof the lower Chamber, or House of Commons, are individually elected to represent singleelectoral districts. As of the 2015 federal election, there were 338 electoral districts. Electionsare determined using a single-member constituency, first-past-the-post or simple-pluralityelectoral system, wherein the candidate receiving the most votes in a district is elected torepresent the constituents of that district. Thus, the determinant of electoral victory in thecontext of Canadian federal elections is not the share of the popular vote received by anygiven party, but rather how many seats that party’s candidates are elected to represent.

3The CES does not make electoral district identifiers available for its phone survey respondents.

40

Page 54: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

To that end, the accuracy rate used to evaluate the CES and Vote Compass in terms ofdistrict-level projections is given as:

AccuracyRate=CorrectPredictions

TotalSeats

3.4.3 Raw versus weighted

The size of both samples permits a robust weighting schema to ostensibly control for selectioneffects in the sample.

In the case of the CES, the weighting variable provided is the one that is used forpurposes of analysis. Said weighting variable combines household size, province, gender andage in a single weight (Northrup, 2016).

Vote Compass data are weighted using raking. The data are weighted by electoral district,gender, age, language, religion, religiosity, union membership and voting history.

Both raw and weighted values are reported for CES and Vote Compass data.

3.5 Findings

The following analysis compares the raw and weighted projections of the CES and VoteCompass to the actual outcome of the 2015 Canadian federal election. The summary statisticsof the phone and web modes of the CES, as well as Vote Compass, are reported in Table 3.3below. Riding-level identifiers were not made available in the phone-based CES and thusriding-level statistics cannot be reported.

Table 3.3: Sample Summary Statistics

Source Mode N Avg. by district Min. by districtCES Phone 4,202 N/A N/ACES Online 7,412 19 1Vote Compass Online 871,823 2,579 298

3.5.1 Vote Share

Tables 3.4 and 3.5 compare the predicted vote share estimated used each of the samplesunder analysis with the actual vote share distribution in the 2015 Canadian federal election.The MAE for each of the three samples under analysis are reported in Table 3.4 below.

Of note, the weights on the phone-based CES actually appear to increase the MAE(albeit marginally). Moreover, while the raw Vote Compass data has the highest MAE, when

41

Page 55: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Table 3.4: Vote Share MAE

Mode MAE (Raw) MAE (Weighted)CES (Phone) 0.03 0.04CES (Online) 0.06 0.05Vote Compass 0.08 0.04

weighted it has an equivalent MAE to the phone-based CES and a lower MAE than theonline CES.

The RMSE by party is calculated using the vote intention measure in each of the samples.Both raw and weighted RMSE are specified.

Table 3.5: Vote Share RMSE

Mode RMSE (Raw) RMSE (Weighted)CES (Phone) 0.04 0.04CES (Online) 0.07 0.06Vote Compass 0.10 0.05

In terms of overall RMSE, weighted vote intention outperforms its unweighted coun-terparts. The phone-based CES outperforms the online CES and Vote Compass. It isnoteworthy, however, that weights do not affect the difference in the RMSE for the phone-based CES, and only marginally affect the weights for the online CES. As with MAE, there isa much greater difference between the raw and weighted versions of RMSE for Vote Compass.This suggests a larger selection bias in the raw Vote Compass sample than in either of theCES probability samples. However, once weights are applied, the RMSE for Vote Compassis actually lower than that of the online CES—and only 0.01% higher than the phone-basedCES—which indicates that the Vote Compass weights are relatively effective in controllingfor the sample bias.

3.5.2 District-level projections

Although the phone-based version of the CES does not contain district-level indicators, suchinformation is available in the online version of the CES. As online respondents to the CESare selected via an RDD phone-based recruitment process, it serves as a probability samplefor purposes of comparison with Vote Compass. It stands to reason that, as a probabilitysample, the selection of respondents to the CES should be randomly distributed acrosselectoral districts. However, it should be taken into that, although random, the distribution

42

Page 56: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

will not be uniform across all 338 districts because the population of Canadian ridings variessubstantially.4

The RMSE reported in Table 3.6 below takes the average of the RMSE for vote shareacross all 338 electoral districts contested during the 2015 Canadian federal election.

Table 3.6: Average district-level RMSE

Mode RMSE (Raw) RMSE (Weighted)CES (Online) 0.121 0.121Vote Compass 0.091 0.089

The district-level RMSEs are notably higher than those for overall vote share, both forthe CES and Vote Compass. Unlike with overall vote share, even the raw Vote Compassestimates have less error than the weighted CES. The raw and weighted CES estimatesare consistent at the electoral district level, which again suggests that the CES weightshave little effect. Granted, the weights included in the CES are calibrated for the marginaldistributions in the overall population, but if that was the source of error in the estimateswe would expect to see more variation between the raw and weighted RMSE for the CES inTable 3.6.

Of course, the sample sizes are substantially smaller at the district level (see Table 3.3),which is likely to account for much of the increased error in both samples. This may alsoexplain why the performance of both the raw and weighted Vote Compass projections exceedthose of the CES—the Vote Compass sample is several orders of magnitude larger than theCES, both overall but also certainly when broken down by electoral district.

The average district-level RMSE for the Vote Compass sample, as compared to the CES,suggests that it is more representative of the population at fine-grained geographies, despitebeing a non-probability sample.

The external validity of Vote Compass data at the district level can be further evaluatedusing the accuracy rate between the CES online survey and Vote Compass, the results ofwhich are reported in Table 3.7.

The weighted vote intentions derived from Vote Compass produce the highest accuracyrate, correctly identifying 71.3 percent of the outcomes across Canada’s 338 electoral districts.Weights continue to have little effect on the CES estimates. Both the raw and weighted VoteCompass data yield a more accurate forecast than either the raw or weighted CES data.

4According to census estimates, the average number of residents per electoral district is 99,034. Theelectoral district with the smallest population is Labrador with 26,728 residents. The district with the largestpopulation is Brantford—Brant with 132,443 residents.

43

Page 57: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Table 3.7: Riding projection accuracy rate

Mode No. of districts correctlypredicted (Raw)

No. of districts correctlypredicted (Weighted)

CES (Online) 0.574 (194/338) 0.571 (193/338)Vote Compass 0.618 (209/338) 0.713 (241/338)

3.6 Discussion

This paper has offered theoretical justification and empirical validation in favour of theargument that certain non-probability samples can produce reliable and valid inferencesabout a population of interest. This is demonstrated by comparing the CES and VoteCompass in terms of their respective ability to accurately forecast the outcome of the 2015Canadian federal election.

By most measures, Vote Compass is able to either match or exceed the accuracy of theCES, particularly so at more fine-grained geographies such as electoral districts. One mustappreciate that Vote Compass has an enormous advantage over the CES in terms of itssample size, but this is precisely the point. In a more controlled comparative analysis ofprobability and non-probability samples, Vote Compass data would likely be less performantrelative to the CES. The comparative framework for such an analysis would require similarsample sizes and like weights. But such parameters impose a false equivalence betweenprobability and non-probability samples and limit the potential of unconventional sources ofpublic opinion data. Absent such constraints, non-probability samples such as Vote Compassdemonstrate external validity on par with probability samples such as the CES.

If Vote Compass were, for example, to restrict its sample size or weighting schema tomatch that of the CES, its results would likely be less accurate as the selection effects in thesample would be substantially greater. The raw estimates from Vote Compass indicate thatthe unweighted data is subject to a more substantial selection bias than the CES, whichone would expect given the sampling framework. But with the full sample and a robustweighting schema tailored to that sample, the Vote Compass data has consistently producedestimates of electoral outcomes that have been as or more accurate than those of the CES.

Of course, the findings presented herein require further replication to confirm therobustness of the theoretical claims attached to them. But the intended contribution is notsimply to argue for the external validity of certain non-probability samples; it is rather tomake the case for a reconsideration of the framework that is used to evaluate such samples.

Not all non-probability samples are equal. They have myriad different properties and thuscannot be expected to behave like a probability sample. In fact, the absence of a universal

44

Page 58: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

framework for non-probability sampling means that we cannot expect all non-probabilitysamples to be alike.

The variability of non-probability sample inhibits us from extrapolating the case of VoteCompass to non-probability samples more broadly. Not all non-probability samples havesufficient breadth and depth to make representative inference possible. Moreover, sincemargins of error cannot be readily calculated for non-probability samples in the same waythey can be for probability sample, there is no readily available, reliable measure to indicatethe capacity of a given non-probability sample for representative inference.

In the absence of such a measure, the size, attributes, and composition of a non-probabilitysample must be carefully evaluated before endeavouring to derive generalizable inferences.A useful heuristic is to estimate known values within the population of interest but whichare not present in the sample and then observing whether said values can be accuratelypredicted. In the case of Vote Compass, predicting the distribution of vote share in the2015 Canadian federal election serves this end, but other possibilities may include predictingcertain census values.

Given the lack of standardized tests to establish the reliability and validity of certainnon-probability samples over others, considerations should be applied with great care. Whilenon-probability samples can provide opportunities to learn more about a population orsubpopulation of interest, not all non-probability sample can reliably produce these sorts ofinferences. Even other VAAs do not necessarily capture the size, attributes and compositionof sample to make representative inferences about a given population. Care must be takennot to falsely ascribe the properties of certain non-probability samples to others.

At the same time as technological advancements continue to make probability samplingmore difficult to achieve in practice, they also make available new opportunities to constructnon-probability samples that are capable of producing representative inferences. In order toleverage this opportunity, a new framework is needed for the evaluation of non-probabilitysample as a means to establish the veracity and the robustness of its estimates.

45

Page 59: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Chapter 4

Ideological Scaling of Social MediaUsers: a dynamic lexiconapproach∗

4.1 Introduction

Theories of political disagreement often envisage ideology as a single, latent dimensionthat structures politics on a continuum between left and right (Downs, 1957; Hare et al.,2015). A primary source in the measurement of ideology has been textual data, with a focuspredominantly on official texts such as party manifestos, policy platforms, and proposedlegislation (Gabel and Huber, 2000; Budge et al., 2001; Laver, Benoit and Garry, 2003;Klingemann et al., 2006; Slapin and Proksch, 2008; Lauderdale and Herzog, 2016). Whilethese texts are well-suited for analysis given that they conform to certain norms pertainingto style, quality, and content, they tend to be top-down edicts that reflect the ideologicalunderpinnings of the most dominant political actors within a party. They offer little insightinto the range and distribution of ideology among the general population.

By comparison with official party texts, social media data, such as Tweets, tend tobe less focused, much shorter, and more informal. Although their structure makes socialmedia data less appealing than official party documents for purposes of textual analysis,their abundance and breadth—both in terms of population and subject matter—arguablypermits a more substantive rendering of the ideological landscape as well as a better senseas to how ordinary citizens are situated therein. To date, attempts at applying ideological

∗Published as Temporão, Mickael, Corentin Vande Kerckhove, Clifton van der Linden, Yannick Dufresne,and Julien M. Hendrickx. “Ideological Scaling of Social Media Users: A Dynamic Lexicon Approach.” PoliticalAnalysis 27, no. 4 (2018): 457-473. doi:10.1017/pan.2018.30.

46

Page 60: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

text-scaling methods to social media data have been limited to users who can be readilyassociated with a substantial corpus exogenous to a given social media platform (Quinnet al., 2010; Grimmer, 2010). This has, in most cases, constrained the analysis to politicalparties, election candidates, and elected officials whose ideological positioning is clearlyarticulated in official texts.

In this article we develop a method that allows us to estimate the individual-levelideological attributes of both political elites and ordinary citizens using the textual contentthey generate on social media platforms. We first parse the lexicon of political elites inorder to create dynamic dictionaries using the Wordfish algorithm (Slapin and Proksch,2008). These dictionaries are used to create scales that reflect the ideological dimensionsthat structure political discourse. We then estimate the position of individual social mediausers on said scales by using the previously-computed dynamic dictionaries to analyze thetextual content a user generates.

Ideological estimates are validated using survey data sourced from an uncommonly largeand richly detailed sample of ideologically profiled Twitter users gathered using a VotingAdvice Application (VAA) called Vote Compass.1 The application is run during electioncampaigns and surveys users about their views on a range of policy issues. It generates a richideological and socio-demographic profile of its users and, in certain instances, captures theirsocial media identifiers. We use the Vote Compass survey data to estimate a given socialmedia user’s ideological position on a multidimensional scale and compare this result withthe ideological positions derived for that same user using unsupervised analysis of the contentthey generate on social media. The high correlation between the two measures suggestsconvergent validity. As an additional validation step but also to illustrate the usefulness ofour approach, we attempt to predict out-of-sample, individual-level voting intentions fromsocial media data and contrast the estimates produced with those of models based on surveydata from Vote Compass. Interestingly, the predictive power of the ideological estimates canoutperform those of surveys when combining textual data with rich network information(Barberá, 2015). We thus believe that data generated by social media can be considered insome respects to be even richer than that collected via conventional surveys.

This method represents a unique approach to the measurement of ideology in that itextends the utility of textual analysis to the domain of social media. Current methodsfor ideological inference from social media data have relied primarily on network analysis(Barberá, 2015). We examine the content that social media users generate rather than theconnections between them. In order to do so, we develop a technique to identify and extracta political lexicon from the broader social media discourse. The conventional application oftextual analysis to the estimation of ideology considers the entire corpus. This is relatively

1See www.votecompass.com for details.

47

Page 61: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

unproblematic when the scaling texts that are concentrated on the factors of interest, as isthe case with party manifestos or policy platforms (Grimmer and Stewart, 2013). However,social media discourse is much denser and more manifold in terms of subject matter. Usingdynamic dictionary files trained on the lexicon of political elites, we parse social mediacontent in such a manner as to distinguish text which is predictive of ideology from thebroader discourse.

Our contribution to the measurement of ideology is threefold. First, our model isendogenous—we do not rely on a corpus external to a given social media platform in order toinfer the ideological position of platform users. As a result, we are able to extend ideologicalestimation beyond political elites to ordinary citizens, who are normally not associated withthe corpora necessary to apply conventional ideological text-scaling methods. Second, ourmethod is eminently scalable. Although we use Twitter data for the case studies presentedin this article, the dynamic lexicon approach to ideological inference is platform agnostic—itcould be readily applied to most other text-based social media platforms. Moreover, as wedemonstrate, its performance is consistent in multiple languages. Third, a dynamic lexiconapproach extends the purchase of ideological scaling of political actors in party systemsoutside the United States, where methods such as DW-NOMINATE (Poole and Rosenthal,1985, 2001) are often limited in terms of their ability to differentiate the ideological varianceamong legislators due to the effect of party discipline on voting records. We demonstratethe convergent validity of our approach across multiple different political contexts—bothnational and subnational—including Canada, New Zealand, and Quebec.

The potential applications for this method in electoral behavior studies and publicopinion research are vast. Perhaps most notably, it offers a cost effective alternative to panelstudies as a means of estimating the ideological position of both a given social media userand the ideological distribution within a population of interest.

4.2 Deriving ideological scales from social media text

Existing text-based ideological scaling methods require large corpora of formal and politically-oriented texts in order to scale a valid ideological dimension. These methods either follow asupervised approach, such as the Wordscore method (Laver, Benoit and Garry, 2003), oran unsupervised approach, such as the Wordfish method (Slapin and Proksch, 2008). Theformer approach uses the guidance of experts to choose and position of reference texts thatdefine the ideological space in order to position actors in that space (Lowe, 2008). The latterapproach estimates the quantities of interest, without requiring any human judgment, byrelying on the assumption of ideological dominance in the texts under scrutiny (Monroeand Maeda, 2004). Such constraints limit the texts that can be scaled to those that are

48

Page 62: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

specifically designed to articulate ideological positions, such as political manifestos andpolicy platforms. This typically narrows the focus of any analysis to the political elites whoproduce these types of texts.

The user-generated text on social media platforms, by contrast, is characterized by itsbrevity, informality, and broad range of subject matter. These dynamics tend to be a functionof platform design and user culture, as even political elites adapt their communication styleto the parameters of a particular social media platform. Indeed, politicians who are ableto demonstrate a fluency in the lexicon of social media are often rewarded with out-sizedrecognition for their efforts. For example, Hillary Clinton’s most liked and re-Tweeted Tweetof the 2016 US Presidential election campaign referenced the popular phrase "Delete youraccount" in response to a Tweet from Donald Trump.2 Despite notable outliers, politicalelites generally exhibit common patterns in terms of how they use social media. They tendto be most active during election campaigns (Larsson, 2016) and generally use social mediafor purposes of broadcasting rather than dialogue (Grant, Moon and Busby Grant, 2010;Small, 2010). In this sense, they are typically using social media as an extension of theircampaign messaging (Straus et al., 2013; Vergeer, 2015; Evans, Cordova and Sipole, 2014).

As such, the discourse of political elites—even on social media—is distinct and thusdistinguishable from that of the general public. If we assume that social media contentgenerated by political elites has a more ideological focus relative to the general user base ofa given platform, it may be leveraged as a means to detect ideological signal in the noise ofsocial media chatter. Attempts to measure ideology based on the overall content generatedon social media platforms have generally produced estimates that are highly biased bycontent unrelated to ideology (Grimmer and Stewart, 2013; Lauderdale and Herzog, 2016).However, preliminary evidence suggests that political elites, as a subset of social media users,may be more reliably scaled based on the content they generate (see Ceron, 2017).

We use the lexicon of political elites to dynamically parse social media content so as toextract recurrent, comparable, and reliable ideological positions for a broader populationof social media users—both political elites and ordinary citizens. This dynamic lexiconapproach proceeds from the assumption that, on average, political elites tend to discussa narrower range of subjects than do ordinary citizens (at least in public), and that thesesubjects tend to be more politically oriented. The texts generated by political elites on socialmedia—as a subset of the general population—can in this sense define ideological space ashas been demonstrated with earlier text-scaling approaches (Monroe, Colaresi and Quinn,2008; Grimmer and Stewart, 2013; Ceron, 2017).

2Clinton, Hillary. (@HillaryClinton) 9 June 2016, 2:27 PM EST. Tweet: https://twitter.com/HillaryClinton/status/740973710593654784

49

Page 63: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

In order to apply the dynamic approach, the texts generated by political elites on theirsocial media profiles are first scraped in order to construct a lexicon of political terms thatstructure the dimensions of ideological discourse. We restrict the collection of social mediacontent to defined time periods, specifically election campaigns. This is due in part to evidencethat political elites are most active on social media during election campaigns (Barberá, 2015;Larsson, 2016), but also because election campaigns force ideological differences into sharperfocus. To maintain a standardized operationalization of ‘political elites’ for comparativepurposes, we limit the selection of social media users to politicians, which we define asincumbents or challengers who are candidates for election in a given campaign. To controlfor the effect of word use variation introduced by different languages, only texts writtenin the dominant language of the election discourse are included. In order to ensure thatthis method is scalable across contexts and not constrained by the subjective or arbitraryjudgment of individual researchers, we rely on an unsupervised approach—–specifically, theWordfish algorithm—to parse the given texts and identify terms that are indicative of certainideological positions. This method varies from the more established network approach,whereby ideological estimates are derived based on network connections such as Twitterfollowers (Barberá, 2015). However, it will arguably by design incorporate a network effectdynamic as sharing on social media platforms is more likely to occur between users whoshare a connection. The practice of sharing on social media platforms thus amplifies theoccurrence of certain terms in the dynamic lexicon approach. Finally, the lexicon extractedfrom the discourse of politicians is scaled and then compared with the texts generated byother social media users so as to estimate their respective ideological position.

4.2.1 Calibrating a dynamic lexicon of ideology

Ideology is generally conceived of in multi-dimensional terms, as represented by a d-dimensional vector in Rd. Extracting these dimensions is referred as ideological scaling.Intuitively, two members who share similar political views should be positioned nearby oneanother in ideological space, which is generally defined in terms of Euclidean distance.

Text-scaling approaches assume that the frequency of expression of specific terms madeby a given actor can be used to estimate said actor’s position in an ideological space. InitiallyMonroe and Maeda (2004) proposed a scaling model based on item response theory that waslater extended by (Slapin and Proksch, 2008) to measure political positions from writtendocuments. Their model is premised on the assumption that the use of words can be modeledby a Poisson process. The process first builds a matrix Z (also called a term-documentmatrix or TDM) where each element zjw corresponds to the number of times the word w ispresent in party j manifesto. Then, the expected value of zjw is expressed as a function ofthe ideologies:

50

Page 64: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Pr(zjw = k) = f(k;λjw) (4.1)

with λjw = exp (sj + pw + θw xj)

where f(k;λjw) represents the Poisson probability mass function. The model introducestwo factors pw and sj to deal with the more frequent words and more active politicalcandidates. The θw factor takes into account the impact of words in the author’s positionand xj is the estimate of the author’s position. Estimators of the unknown parameters arederived using a maximum likelihood estimation. An iterative approach (using expectation-maximization) is implemented in the Wordfish algorithm (Slapin and Proksch, 2008).

This dynamic lexicon creation process assumes that the textual content is politicallydominant. This is often the case for political candidates’ tweets, especially during an electioncampaign. We first record all the sequences of N adjacent words, N -gram, in the politicalcandidates’ tweets, for particular values of N = 1, 2 or 3. We take into account co-occurrenceswith other words to more readily distinguish between semantic groupings (Brown et al.,1992; Jurafsky and Martin, 2009) and to better interpret sarcasm (Lukin and Walker, 2013).Stop words are discarded and stemming is performed to improve the extraction accuracy.We handle non-political terms by introducing a threshold parameter β. We discard N -gramsthat are not shared by at least β percent of the set of political candidates. We then build aterm-document matrix (Zelit) by matching the N -grams’ occurrences to the correspondingpoliticians. Finally, we use the Poisson model described in equation (4.1) to generate thedynamic lexicon of political terms, specifically we estimate θelitw and pelitw .

4.2.2 Scaling ideological positions for social media users

Text-scaling methods fail to retrieve a valid ideological dimension when the subject matter ofthe texts in question is broader than politics (Grimmer and Stewart, 2013). It is reasonableto assume that politicians’ user-generated content will be concentrated around politicaltopics and themes. This assumption does not hold, however, for the average social mediauser. Ergo, an analysis of the content generated by average users—the political centralityof which is variable—does not readily lend itself to the identification of valid ideologicalpositions.

In order to detect the ideological signal in the textual data of average social media users,a dynamic lexicon approach analyzes user-generated content through the lens of the contentgenerated by politicians on the same social media platform. This approach is inspired bythe concept of transfer learning, that is, estimating certain parameters using one datasetand then using those parameter values to makes inferences on another dataset (Do and

51

Page 65: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Ng, 2005). When generating ideological estimates for politicians, the Poisson model (4.1)associates a weight estimator (θelit) and a popularity estimator (pelit) to each term presentin the derived dynamic lexicon. One can then estimate a given user’s ideology by takinginto account the precomputed values (θelit,pelit) in the Poisson model fit to the matrix Zelit

when handling citizens’ social media content. This entails an adapted term-document matrix(TDM) Zcit ∈Rn×q. The TDM is built by matching the q terms existing in the lexicon ofpolitical terms to a given social media user. The matrix entries ziw count the number ofinstances in which each term (identified by w= 1...q) appears in the aggregated content ofthe user. The estimates are obtained by solving for the convex optimization problem (4.2):

The log likelihood function of a Poisson distribution is given by

l(Λ) =n∑i=1

q∑w=1

(ziw ln(λiw)− λiw − ln(ziw!)

)

where Λ represents a matrix, here the matrix of event rates λiw = exp(si+ pelitw + xiθelitw ).

We optimize the log likelihood over the variables si and xi conditional on θelitw and pelitw . Thatis, the parameters θelit and pelit are now precomputed in the political candidates’ dynamiclexicon estimation process. This leads to the maximization problem

maximizesi,xi

n∑i=1

q∑w=1

(ziw (si + xi θ

elitw ) + ziwp

elitw − exp(pelitw ) exp(si + xi θ

elitw )− ln(ziw!)

)

where both terms ziwpelitw and ln(ziw!) do not depend on the instances xi and si. Sup-pressing these terms does not alter the optimal solution and leads to the convex optimizationproblem (4.2).

maximizesi,xi

n∑i=1

q∑w=1

(ziw (si + xi θ

elitw )− αw exp(si + xi θ

elitw )

)(4.2)

where xi denotes the ideology of citizen i and si depicts their publication activity.The constant αw = exp(pelitw ) takes into account the N -gram popularity vectors estimatedfrom political candidates’ lexicon. The optimization problem is derived by maximizing thelikelihood function of Equation 4.2 over two set of variables (xi and si) instead of the fourinitial vectors.

52

Page 66: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

4.3 Data

To demonstrate the dynamic lexicon approach, we apply the method to the Twitter platform.3

We note, however, that the approach is designed to be applied across text-based social mediaplatforms.

In order to test the dynamic lexicon approach we rely on two sources of data. The first,of course, is Twitter data itself. The second is from a VAA called Vote Compass. We usedata from three recent elections as case studies: the 2014 New Zealand general election, the2014 Quebec provincial election, and the 2015 Canadian federal election (see Section 4.4.1for details on case study selection).

As language explains most of the variation in the terms used on social media, we restrictthe analysis to English-speaking users only for New Zealand 2014 and Canada 2015, and toFrench-speaking users for Quebec 2014.

4.3.1 Vote Compass

In order to validate the ideological estimates derived from the dynamic lexicon approach, werely on the survey data generated by a large-scale VAA with active instances in each of theelection campaigns we draw on as case studies in this article.

Vote Compass is an online application that surveys users’ views on a variety of publicpolicy issues germane to a given election campaign, and then offers users an estimation oftheir position in the political landscape and, by extension, their alignment with each of thepolitical parties contesting said campaign. The application is wildly popular, drawing millionsof users worldwide, and is generally run in partnership with major media organizations inthe jurisdiction where an election is being held.

In addition to attitudes to policy issues, Vote Compass also collects sociodemographicattributes and ideological self-placement on a scale of 0 to 10, representing left to right.Users also have the option to self-select into a research panel which associates their respectiveTwitter accounts with their responses to the survey. A total of 62,430 Twitter users wererecruited through Vote Compass (n= 11,344 for Quebec 2014, n= 8452 for New Zealand2014, and n= 42,634 for Canada 2015).

4.3.2 Twitter

The user-generated content during the campaign period for accounts associated with verifiablecandidates for election is collected using the Twitter REST API. These data form the corpusof terms that serve as the lexicon that constitute the ideological discourse of a given campaign.

3Replication materials are available online on the Harvard Dataverse Temporão et al. (2018), at http://dx.doi.org/10.7910/DVN/0ZCBTB

53

Page 67: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

A content filter with threshold (β) excludes candidates whose tweets do not include at least5% of these political terms by word count. A second network filter eliminates candidateswho have fewer than 25 followers who we can validate externally using Vote Compass data(see Section 4.3.1). A third filter excludes candidates from political parties that have fewerthan three politicians in the sample once the first two filters have been applied. Excludingcandidates from such minor parties from the analysis prevents unreliable conclusions resultingfrom small sample sizes for classification.

Across the three election campaigns served as case studies for our analysis, we identify297 candidates with public Twitter accounts in the 2014 Quebec election, 131 candidates inthe 2014 New Zealand general election, and 759 candidates in the 2015 Canadian federalelection. The number of active candidates in our sample once filtering has been applied is asfollows: m= 106 for Quebec 2014, m= 56 for New Zealand 2014 and m= 120 for Canada2015. Citizens consist of potentially eligible voters that are active on Twitter during theelectoral campaigns under scrutiny. As with candidates, we only consider citizens with atleast 25 political bigrams in the dynamic lexicon within each context, and who follow aminimum of 3 active candidates. The citizens’ sample sizes after applying these filteringconditions are n= 796 (Quebec 2014), n= 123 (New Zealand 2014) and n= 1717 (Canada2015).

4.4 Results And Validation

When relying on unsupervised ideological scaling methods, validation is essential to ensurethat the estimates correspond to ideological positions. As any single test is generally notsufficient to establish the validity of a measurement, we take several different approaches tovalidity testing.

We begin with a focus on the face validity (Dinas and Gemenis, 2010) of the estimatesproduced by the dynamic lexicon approach, drawing on an overview of the ideologicallandscapes in each of our three case studies. Our objective is to develop an intuitiverendering of the ideological positioning of the major parties in each case, so that we havea frame of reference by which to compare the estimates generated by the dynamic lexiconapproach. We then test the convergent validity of the calibration process for the dynamiclexicon of political terms. Here we focus on the subset of political candidates, whose user-generated content defines the dynamic lexicon, by comparing the positions derived from thisapproach to an ideological scaling approach based on network information (Barberá, 2015;Bond and Messing, 2015). Finally, we extend this approach to individual citizens in oursample and compare their estimated positions to a reference position derived from surveydata collected from Vote Compass.

54

Page 68: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

4.4.1 Mapping ideological landscapes

The three case studies selected to demonstrate the dynamic lexicon approach are the 2015Canadian federal election, 2014 New Zealand general election, and the 2014 Quebec provincialelection. The intuitive ideological placement of the parties contesting these elections offersthe opportunity to compare the estimates produced by the dynamic lexicon approach forboth an external validity test and robustness check of the method.

Canada is a federal state, and elections take place in the context of a first-past-the-postelectoral system (Kam, 2009; Godbout and Høyland, 2011). Four federal parties contest thenationwide vote, with the sovereigntist Bloc Quebecois running candidates in the province ofQuebec. The New Democratic party (NDP) and the Green Party of Canada (GPC) typicallyvie for the position of most left-leaning in federal politics, depending on the policy context,with the Liberal Party of Canada (LPC) adopting a generally center-left position and theConservative Party of Canada (CPC) situated on the right of the ideological spectrum. Onlythe Liberal and Conservative parties have ever formed government, with the Conservativeshaving won, at minimum, a plurality of seats in the Canadian Parliament from 2006 to 2015,when the Liberals swept the federal election and formed a majority government.

New Zealand is a unitary state, and its elections have been run under a mixed-memberproportional framework since 1996. More than a dozen parties contest general elections,seven of which had at least one seat in Parliament following the 2014 general election. Ofthese, the filtering process (see Section 4.3.2) results in the inclusion of candidates from fourparties. Of these the Green Party of Aotearoa New Zealand (GRN) is generally consideredthe most left-leaning. While the Mana Party has adopted certain radical positions, itscoalition with the Internet Party during the 2014 campaign was perceived to have a slightlymoderating effect on its ideological positioning, placing Internet Mana (MNA) slightly tothe right of the Greens. The New Zealand Labour Party (LAB) is generally considered acenter-left party and the New Zealand National Party (NP) on the center-right.

The case of the 2014 Quebec provincial election offers two unique tests of the dynamiclexicon approach. First, it extends the approach to the subnational level by way of itsapplication in the context of a provincial election campaign. Second, it tests the scalabilityof an unsupervised model to contexts where English is not the dominant language. Quebechas a majority francophone population and the sovereigntist sentiment in the province givesrise to a unique ideological landscape within Canada, wherein nationalism and identityconstitute an independent dimension orthogonal to the more socio-economic left-rightspectrum. Toward the nationalist end of this spectrum, three parties advocate for Quebecindependence: Option Nationale (ON), the Parti Québécois (PQ), and Québec Solidaire(QS). The least nationalist party and that which is most opposed to Quebec independenceis the Quebec Liberal Party (QLP), the party that won office in 2014, putting an end to a

55

Page 69: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

short-lived PQ minority government. The Coalition Avenir Québec (CAQ) positions itselfin-between these two poles by advocating more autonomy for the French-speaking provincewhile remaining a member of the Canadian federation. But the socio-economic left-rightideological dimension also structures Quebec party competition, following roughly the sameorder as the identity dimension with most nationalist parties leaning generally more onthe left side of the spectrum relative to federalist parties. Notable exceptions include, forexample, attitudes toward religious accommodation, where the nationalist PQ takes moreright-leaning positions than other parties.

4.4.2 Validating ideological estimates for election candidates

In order to assert that the dynamic lexicon approach properly classifies social media users’ideologies, we must first convincingly position election candidates within the ideologicallandscape. As candidates’ lexicon is used to generate the ideological scales upon whichother social media users are positioned, the ideological estimates of candidates serve as thereference point for those of all other users.

To validate the ideological estimates ascribed to candidates by the dynamic lexiconapproach, we adapt the roll-call scaling method—an established standard in the estimationof elected officials’ ideological positions (Poole and Rosenthal, 2000; Clinton, Jackman andRivers, 2004; Carroll et al., 2009). In doing so, however, we face two immediate constraints.Roll-call data are only available for elected officials, not political challengers. While this isless problematic in a two-party system given that both parties are likely to have substantialrepresentation in government, it becomes more problematic in a multi-party, Westminstersystem (Spirling and McLean, 2007; Grimmer and Stewart, 2013). Not only does the presenceof smaller parties increase the error associated with ideological estimates, but a politicalculture that emphasizes party discipline makes it different to distinguish party ideologyfrom candidate ideology. While it is reasonable to assume that most candidates’ ideologiesshould be generally more aligned with their own party than with any other party, somedegree of variation can be reasonably expected but would be difficult to detect using aDW-NOMINATE approach.

To address this limitation, one can estimate ideological positions based on networkinformation, such as the existing connections between citizens and political candidates onsocial media (Barberá, 2015; Bond and Messing, 2015; Rivero, 2015).

The scaling process relies on the assumption that these virtual connections are homophilicin nature, that is that citizens tend to follow political candidates whose ideologies lie closeto their own. We refer to this approach hereafter as the network scaling approach (Barberá,2015). We validate the results of the dynamic lexicon approach by comparing how it positionselection candidates vis-à-vis the estimates produced by network scaling.

56

Page 70: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

The results demonstrate that, compared to the ideological estimates generated by anetwork scaling approach, the dynamic lexicon approach provides highly convergent results(see Figure 4.1). It is apparent that the point estimates showing each political candidates’positions measured using two fundamentally different scaling methods are strongly correlated.The correlations are particularly strong in the cases of the 2015 Canadian and the 2014New Zealand elections. The linear relationship between the estimates indicates that theyseem to capture a similar ideological dimension, which implicitly confirms the validity ofboth methods. Further examination suggests that these methods can identify clusters ofcandidates belonging the same political party, even though within-party correlations are weak.This supports the premise that candidates are generally more aligned in ideological termswith candidates from their own party than with those from other parties. The positioningof candidates from each party accords with conventional wisdom as to how said parties aregenerally situated with their respective political landscapes. It also aligns with estimatesderived by aggregating public articulations of public policy positions. This suggests thatthe dynamic lexicon approach is able to scale ideological positions at the individual levelfor political candidates and, by extension, render a valid ideological landscape from socialmedia data.

Figure 4.1: Comparison of estimated positions for the reference method (network scalingapproach) and the dynamic lexicon approach

ρ = − 0.92−2

−1

0

1

2

−2 −1 0 1 2

Reference Ideology

Dyn

amic

Lex

icon

App

roac

h

CPC GPC LPC NDP

Canada 2015

ρ = − 0.93−2

−1

0

1

2

−2 −1 0 1 2

Reference Ideology

Dyn

amic

Lex

icon

App

roac

h

GRN LAB MNA NP

New Zealand 2014

ρ = − 0.67−2

−1

0

1

2

−2 −1 0 1 2

Reference Ideology

Dyn

amic

Lex

icon

App

roac

h

CAQ ON PQ QLP QS

Quebec 2014

The x-axis shows the standardized position of political candidates on the unidimensional latent space derivedfrom network data (Barberá, 2015). The y-axis shows the standardized position of political candidates onthe unidimensional latent space derived from the dynamic lexicon approach. Pearson correlations (ρ) are allstatistically significant (p-value < 0.05).

Figure 4.1 compares the estimates generated by the dynamic lexicon approach withthose of the network scaling approach. In the context of the 2015 Canadian election, theclustering of candidates generated by both methods depicts a similar ideological landscape.

57

Page 71: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

The candidates cluster by party association and the party positioning exhibits face validitygiven the Canadian context.

The 2014 New Zealand context represented in Figure 4.1 shows once again that both thenetwork scaling and dynamic lexicon approaches produce a consistent ideological landscape.The positioning of the party candidates also demonstrates face validity.

We find a slightly divergent result in the 2014 Quebec context. Unlike in Canada or NewZealand, the dynamic lexicon approach produces a different ideological landscape than thenetwork scaling approach. For instance, the network scaling estimates, when taken alone,cannot differentiate the QLP from the CAQ. Although, upon examination of the dynamiclexicon approach estimates, we are able to clearly separate the CAQ from the QLP; althoughit then becomes difficult to distinguish the CAQ from ON. These results might suggestthat these two methods, when taken together, can complement each other when capturingthe ideological landscape. That said, the two methods identify relative clusters that areconsistent with the parties’ positions as one would intuitively expect. The results mayalso indicate that more than one dimension is be needed to capture nuances in particularideological landscapes.

4.4.3 Validating ideological estimates for social media users

Having demonstrated that the dynamic lexicon approach can derive valid ideological di-mensions from the content that political elites generate on social media, and that it canposition those political elites in ideological space in ways that correlated highly with moreestablished methods, we now examine its ability to classify the individual ideologies of thebroader population of social media users.

To do so, we compare the dynamic lexicon approach to a basic Wordfish approach, inwhich we ignore the precomputed values and instead simply apply the algorithm. Twodifferent baseline strategies are proposed, depending on whether or not we consider the useof a lexicon of political terms to create the term-document matrix for citizens. In the firststrategy (Wordfish-all), users’ term-document matrix is generated keeping all the N -grams(i.e., following the same building process as for elites’ term-document matrix Zelit). Thesecond strategy (Wordfish-political) considers the adapted term-document matrix Zcit builtfrom the dynamic lexicon of political terms. The reference ideological position for eachusers is derived from the survey data collected by Vote Compass.4 We use an expectation-maximization algorithm to solve the maximum likelihood problem that generates theseestimates (Barber, 2012). We consider a social media extraction to be effective if two socialmedia users with similar ideologies according to the Vote Compass survey data are alsosituated in close ideological proximity to one another using the dynamic lexicon approach.

4See Section Data for details.

58

Page 72: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

The quality of estimates is then defined as the Pearson correlation coefficient (ρ) betweenthe vector of textual estimates and the vector of reference ideologies.

Table 4.1 demonstrates the performance of the dynamic lexicon approach compared withthe two baseline strategies. It clearly outperforms both baselines, especially for Quebec 2014,however, we observe weak to moderate correlations for the this approach, which suggest thatthe dimension extracted by this method differs for the most part from that of the referenceideology. When we look solely at the politically interested citizens5 individuals in our sample,the correlations remain similar even though one would expect the correlations to be higherfor these more politically interested individuals as they are likely to demonstrate greaterideological consistency (Converse, 1964). This can be partly explained by the selection biasinherent to the user base of VAAs, wherein the average user is more politically interestedvis-à-vis the overall population. We furthermore tested the network scaling approach thatwas used to validate candidate positions (Barberá, 2015) on the broader population of socialmedia users. The results of the network scaling approach are strongly correlated with thereference ideology (ρ 0.6). This result suggests that, by looking at one’s connections onsocial media (in this case, Twitter followers), we are able to derive meaningful informationabout one’s ideological position.

Table 4.1: Assessment of the dynamic lexicon approach for citizens

Results are expressed for a bigram dictionary. The values indicate the Pearson correlations (p-value < 0.05)between the ideologies extracted from Twitter data (text and network) and the reference ideologies basedon policy positions. The values in parenthesis indicate the Pearson correlations for the subset of politicallyinterested citizens.

Method 2015 Canada 2014 New Zealand 2014 Quebec

Dynamic lexicon 0.43 (0.44) 0.40 (0.38) 0.21 (0.22)- Wordfish-all 0.29 (0.27) 0.34 (0.29) 0.09 (0.06)- Wordfish-political 0.38 (0.39) 0.39 (0.42) 0.09 (0.11)

Network 0.63 (0.64) 0.76 (0.77) 0.75 (0.75)

As the information captured by these methods seems to differ, we consider combiningnetwork and textual ideologies into a single estimate as a means of enhancing the measurement.A classical method of merging estimators is to consider the set of estimators xλ generated

5The subset of users reflected in parenthesis is based on filtering respondents that answered "veryinterested" to a political interest question asked in the Vote Compass survey. The question is as following:"Generally speaking, how interested are you in politics?" Users are offered the choice of one of the followingfour options: 1) not interested at all; 2) not very interested; 3) somewhat interested; 4) very interested.

59

Page 73: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

by convex combinations (Struppeck, 2014). This family of ideological estimators is describedby:

xλ = λ ∗ xnet + (1− λ) ∗ xtxt , λ∈ [0,1] (4.3)

.We compare the efficiency of combining ideologies by analyzing the correlation of the

new estimates xλ with the reference ideologies according for multiple values of λ (Figure 4.2).The optimal quality for Canada 2015, New Zealand 2014 and Quebec 2014 is respectivelyreached for λ= 0.86, λ= 0.87 and λ= 1. None of these combinations leads to enhancementsof the network performance by more than 1%. This suggests that combining estimates doesnot lead to a significant improvement.

Figure 4.2: Assessing linear combinations of textual and network ideologies

Figure 1: Citizens - Assessing linear combinations of textual and network ideologies. Thex-axis displays the parameter l (??). The textual ideology corresponds to the case l = 0. Thenetwork ideology corresponds to the case l = 1. The y-axis displays Pearson correlations (p-value< 0.05) between the linear combination and the reference ideologies. The dashed line depicts thecorrelation for l = 1.

(a) Canada 2015 (b) New Zealand 2014 (c) Quebec 2014

1

The x-axis displays the parameter λ (4.3). The textual ideology corresponds to the case λ = 0. The networkideology corresponds to the case λ = 1. The y-axis displays Pearson correlations (p-value 0.05) between thelinear combination and the reference ideologies. The dashed line depicts the correlation for λ = 1.

The moderate correlations observed in the dynamic lexicon approach could be attributableto a variety of factors. Ordinary citizens may exhibit less ideological consistency than dopolitical elites, which could partly explain the lower correlations between the estimates (Con-verse, 1964). However, the results emphasized in Table 4.1 are not sufficient to independentlyvalidate said hypothesis. Another explanation could be that the periods of time studied aretoo short to capture a concept as substantive as ideology. Increasing the time frame couldplausibly result in improved estimates. The presence of non-political bigrams generatesnoisy estimates that explain the lower correlations observed for the textual estimates. Thederived positions could also contain some other relevant information about individuals, butdo not capture the same ideological dimension that captured by the reference ideology.

60

Page 74: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Nevertheless, these two types of information could be complementary and could result inimproved performance when trying to predict phenomena related to political ideologies, suchas voting behavior.

4.5 Validating ideological estimates using voting intention

The notion that social media data may contain other useful information about its usersprovides an additional opportunity for validation. Scholarly interest in ideological positioningis in part motivated by the possibility to predict voting behavior (Downs, 1957; Jessee, 2009).Yet, predicting voting behavior remains a challenging empirical task. Contrary to convergentvalidity, which explores the correlations between multiple indicators of the same concept,predictive validity compares different phenomena that are linked together by an explanatoryrelation (Adcock, 2001). The predictive validity of the ideological positions derived from thedynamic lexicon approach, that is, the ability to predict individual-level voting intentionsfrom the ideological estimates the method produces, can serve as an additional means ofvalidation of the approach.

To classify citizens by their voting intentions, we require two supplementary filteringcriteria. The analysis is restricted to members who reported one of the parties that passthe filtering criteria in their voting intentions. Also, training sets with highly unbalancedclass sizes generates important biases (Huang and Du, 2005). The classification analysis istherefore performed on parties with at least 20 voters. This leads us to a total of 8 majorparties: Canada’s Conservative (CPC), Liberal (LPC), and New Democratic (NDP) parties,New Zealand’s National (NP) and Labour (LAB) parties, as well as Quebec’s Parti Quebecois(PQ), Liberal Party (QLP) and Coalition Avenir Quebec (CAQ). The resulting sample sizesare n= 796 for Quebec 2014, n= 123 for New Zealand 2014and n= 1717 for Canada 2015.We perform a machine learning classification task to investigate the ability of the ideologicalestimates to correctly classify party affiliation (for elites) and predict vote intentions (forcitizens). The machine learning approach is preferred to a descriptive statistical study sincewe focus on predicting an outcome rather than providing an explanatory model.

New data sources combined with novel scaling methods can allow the estimation ofideological positions that can predict votes without the cost of designing and administratingcomplex and costly surveys. Furthermore, each different source of information availablehas the potential to complement others and improve the accuracy of a classification task(Jurafsky and Martin, 2009). In order to investigate the complementarity of these methods,we evaluate the power of each of the two methods taken individually, but also in combination,to predict individual level voting intentions.

61

Page 75: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Figure 4.3: Venn diagram illustrating the complementarity between the ideology estimatesto predicting voting intentions of citizens

The metric values inside each set represent the prediction’s efficiency measured by the area under curve(AUC).

The Venn diagrams (Figure 4.3) illustrate the average prediction efficiencies based on thedynamic lexicon approach (text), network scaling approach (network), and Vote Compasssurvey-based ideological positions (survey). This allows us to illustrate the quality of thepredictions for each approach taken individually and for any combination between these.The metrics displayed are an evaluation of the quality of the predictions based on the AUC ofprecision and recall curves (PR) for each party within each context. These metrics evaluatethe proportion of time the trained algorithm guesses the voting intentions of an individualcorrectly. The advantage of this metric is that it is less affected by sample balance thansimply using the prediction accuracy. A cross-validation process handles over-fitting so thatthe results of the combinations of these estimates are non-trivial and actually result fromnew information captured.

When we examine the AUC for each source on Figure 4.3, we can see that the predictionsbased on ideological estimates derived from a network scaling approach can outperform thepredictions based on ideological estimates from Vote Compass survey data. This is the casefor Canada and for New Zealand. In the case of Quebec, the ideological estimates fromthe Vote Compass survey data outperform the alternative approaches. Any combinationof these estimates improves the the quality of the prediction, with the exception of NewZealand, where it is clear that the estimates from the dynamic lexicon approach do not addany information to network scaling or survey-based estimates. This result is consistent towhat is shown for elites on Figure 4.1. These findings indicate a cumulative improvementin the ability to predict voter intentions by combining different types of social media data,

62

Page 76: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Figure 4.4: Comparison at the party level of citizens’ Twitter and Survey predictionefficiencies

Ave

rage

Predic

tion E

ffic

iency

88.6 C-NDP

71.368.7 C-CPC

95.4

59.6 C-LPC

51.0

97.8 N-NP96.3

87.2 N-LAB87.2

50.7 Q-CAQ

27.4

93.6 Q-PQ

98.7

Survey Twitter Parties

69.1 Q-QLP

57.1

The values displayed correspond to the area under curve of the precision and recall curves related to eachparty.

specifically ideological estimates derived from a combination of network and textual data.These combinations can outperform, or at least approach, established survey-based estimateswithout the burden of having to design or administer a survey.

To further examine the effect of combining different types of social media data, weillustrate the performance of our classifier when it combines a network scaling approachwith the dynamic lexicon approach to predict individual voting intentions. We compare thiscombination to a classification based solely on survey data. Figure 4.4 shows the comparisonof the two models by illustrating their performance at the party-level. The values displayedcorrespond to the AUC of PR curves related to each party. The solid lines highlight theparties where Twitter estimates perform better than the conventional survey estimates. Thedashed lines indicate when Twitter information leads to higher prediction errors. We cansee on Figure 4.4 that Twitter-based predictions of voting intentions generally outperformsurvey predictions. Even though the evidence does not allow us to generalize this pattern,it is worth noting that the dashed lines are related to the Parti Quebecois (PQ) and theConservative Party of Canada (CPC), two incumbent parties that failed to be re-elected inthe cases under examination. Besides having that in common, these two parties also werethe least active on social media relative to other parties. That is, we have much less data forthose parties where the predictions are less accurate than those derived from survey data.

63

Page 77: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

There is also a noticeably higher efficiency rate for the social media model in the cases of forsmaller parties such as the CAQ and NDP. This could be explained in part by the fact thatsaid smaller parties have a more pervasive online presence in terms of quantity of Tweetspublished by their respective political candidates. Parties with smaller campaign budgetstend to rely more extensively on social media outreach than do parties with more substantialwar chests (Haynes and Pitts, 2009). They tend to publish larger quantities of informationon social media in order to try to reach a broader audience at a lower cost. The additionaldata available for smaller parties can increase the efficiency of the classifiers in terms ofidentifying individuals intending to vote for said parties. Indeed, the more information wehave to train the classifiers for these parties, the fewer classification errors we should observe.This is not the case with the Vote Compass survey data as every user answers questionsrelated to specific policy issues relevant to the political campaign. On social media, however,the unsupervised approaches have to identify signals among large quantities of relevant andirrelevant information relative to the specific parties which require larger amounts of data inorder to develop efficient estimates.

4.6 Conclusion and Discussion

In this article we introduce a dynamic lexicon approach to ideological scaling of social mediausers based on the textual content they generate on a given platform. This approach extendsthe capacity for ideological estimation from political elites to ordinary citizens by foregoingthe requirement of a verbose and formal corpus of ideological texts exogenous to the platform.Instead, a given social media user’s ideology can be estimated within an given ideologicaldiscourse, as defined by the lexicon of political elites and using only data endogenous tothe platform. The findings from a series of validation tests indicate that a dynamic lexiconapproach can extract ideological dimensions which demonstrate convergent validity in thatthey are correlated with other measures of ideology such as network scaling and survey-basedapproaches. It also exhibits predictive validity in terms of predicting individual-level votingintentions. Although we find that network scaling performs better than the dynamic lexiconapproach in terms of predicting individuals’ voting intentions, the combination of the twomethods into a single model generally outperforms predictions of individual-level votingintentions extrapolated from survey-based measures of ideological self-placement.

The implications of these findings, should they withstand further empirical scrutiny, aresignificant for researchers in the fields of electoral behavior and public opinion research,who could effectively measure ideological dynamics with less data, lower cost, but greateraccuracy than conventional survey-based methods. Moreover, in the context of its utility forpredictive purposes, the dynamic lexicon approach not only serves to boost the accuracy

64

Page 78: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

of network scaling. Network estimates may demonstrate more predictive power but theyare mostly static in the short term, computationally intensive to derive, and therefore notscalable to large pools of social media users. By contrast, text-based estimates have thepotential to allow for real-time analysis of ideological volatility.

In interrogating the robustness of the dynamic lexicon approach, certain limitations of themethod were surfaced. Though successful at scaling individual-level ideologies and predictingvote intentions, the approach comes at the cost of high filtering criteria. These criteriadirectly affect the number of users for whom an ideological estimation can be modeled. Forexample, the constraints in terms of content generation and time frames limited the analysisundertaken in this article to Twitter users who posted messages during elections campaignsincluded as case studies, i.e. the 2015 Canadian federal election, the 2014 New Zealandgeneral election, and the 2014 Quebec provincial election.

These limitations are far from insurmountable and they open up opportunities for futureresearch. A longer-term collection of Twitter data would provide an indication as to whetherelections are particularly effective moments for detecting signal within the elite lexiconused to define the ideological landscape. Moreover, the platform scalability of the dynamiclexicon approach, while theoretically plausible, requires empirical testing. Does the approachproduce comparable ideological estimates when applied, for example, to user-generatedcontent derived from Facebook? Further testing of the linguistic scalability of the dynamiclexicon approach is also necessary, particularly given the variance in the convergent validityof the Quebec case—the only non-English case—vis-à-vis Canada and New Zealand. Themore likely explanation of this variance, however, is the structure of the Quebec ideologicallandscape, which begs the question of whether multidimensional scaling of ideology wouldproduce more accurate estimates in contexts such as Quebec. Finally, future research toexpand classifiers—previously trained on previous, known contexts—may provide additionalvalidation as to the capacity of this method to predict individual voting intentions for specificelections.

Avenues for future research notwithstanding, this research stands on its own as a novelcontribution to experimental methods for deriving valid inferences from social media data.Political scientists often require recurrent, comparable, and valid estimates of ideologicalpositions of political actors in order to develop and test theories. Acquiring such estimatesthrough surveys is often prohibitive in terms of cost (Slapin and Proksch, 2008; Bond andMessing, 2015). A dynamic lexicon approach permits the study of a wide range of actors,including some for which reliable measures of ideology may not be otherwise available. Itsscalability and multilingual capacity also suggest that the dynamic lexicon approach couldbe used to study political contexts where it is impractical to conduct conventional surveys,but where there is ample uptake of social media platforms. This approach could be used to

65

Page 79: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

offer a more refined analysis across a broad array of salient topics in the study of politics,such as representation, polarization, and populism. Moreover, as our estimates of voteintention demonstrate, the signal contained within social media data enables inferencesbeyond a particular variable of interest. We demonstrate that ideological estimates generatedfrom social media data can, when properly modeled, better predict individual-level votingintentions than ideological estimates derived from discrete scales. This suggests that therich information contained with social media data may provide additive signal.

The results from this initial interrogation of the dynamic lexicon approach are sufficientlypromising to warrant further investigation into the promise and potential of this method.

66

Page 80: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Chapter 5

Conclusion

5.1 Introduction

By converting information into binary code, digitization has made possible the preservation,replication, and transmission of unprecedented amounts of data. The phenomenon of digiti-zation has revolutionized the information and communications technology (ICT) landscape.The result has been a dramatic reshaping of how societies operate and, consequently, howdemocracy works.

Voting Advice Applications (VAAs) make for an interesting case study of a specificintervention of digital technology into the practice of politics and its influence in particularon the practice of democracy. The foregoing articles, which collectively comprise thisdissertation, make methodological and substantive contributions to the literature on VAAsby analyzing their technical and social functions. It furthers scholarly inquiry both into thealgorithmic underpinnings of VAAs, but also the potential of these instruments to advancethe theory and method of public opinion research.

As each article in this dissertation serves an independent contribution to the literaturethat it addresses, this concluding chapter offers some thoughts as to how they collectivelycontribute to the study of VAAs. It then goes on to map possibilities for future researchthat builds from these findings.

5.2 Contributions

The motivation for this dissertation has been to explore the implications of VAAs for electoralpolitics as case study in the broader transformation of democratic practices in response tothe changing ICT landscape. Though the articles that ultimately comprise this dissertationdiffer substantively in how they engage with the topic of VAAs, they each contribute to an

67

Page 81: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

understanding of the implications of digital technology for both measurement validity anddemocratic governance.

5.2.1 Valid measures

A central theme shared by each of the articles presented in this dissertation is that ofmeasurement validity.

Digital technology is distinct from technological innovations such as the printing press, thetelegraph or radio in terms of its facility for massive information retention, allowing for thecollection, monitoring, and analysis of unprecedented amounts of data. The computationalresources introduced by advances in digital technology have also introduced machine learningtechniques into the methodological repertoire of political scientists, making possible theexploration of research questions that cannot be readily addressed using conventionalmethods.

Emergent methodologies that have been fostered by digital technologies are premised onassumptions that warrant critical reflection. So too, however, must conventional methods bereconsidered given the changing technological landscape. The articles in this dissertationeach contribute to this discussion in various ways.

The first article, The curse of dimensionality in VAAs: reliability and validity in algorithmdesign, examines the operationalization of proximity between parties and candidates in VAAs.Although the measurement of relative ideological positions using spatial modelling is alongstanding technique in political science, the design of VAAs introduces certain constraintswhich are application-specific. The article questions both the reliability and validity ofproximal measures in most VAAs, highlighting the lack of a general theory of VAAs and theabsence of a validation mechanism for the results that they produce. It proposes a basicframework for assessing the reliability and validity of the outputs that VAAs generate forusers. Two measures are offered as validity indicators – candidate-to-partisan proximity andcandidate-to-candidate proximity. The article then uses respondent data from Vote Compassas well as a series of simulations to illustrate how these measures provide an indication ofthe validity of a VAA’s outputs.

The second article, On the external validity of non-probability samples: the case ofVote Compass, explores the implications of the diversification of the ICT landscape forthe measurement of public opinion – in particular the mounting challenges to practiceof probability sampling such as coverage error and non-response bias. It challenges thedoctrinal assumption among public opinion researchers that only probability samples canreliably produce measures of public opinion which are externally valid. It takes as a casestudy the sample from the 2015 federal election edition of Vote Compass – a uniquely largeand diverse non-probability sample – and compares it with the 2015 Canadian Election

68

Page 82: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Survey (CES) sample – a classic and robust probability sample. It then uses each sampleto generate a prediction of the outcome of 2015 Canadian federal election, applying post-stratification weights to each. Across various performance measures, the weighted VoteCompass sample consistently outperforms that of the CES. These findings lend credence tostudies such as Wang et al. (2015), which are able to extrapolate reliable election forecastsfrom unconventional sources of non-representative data.

The third article, Ideological scaling of social media users: a dynamic lexicon approach,extends the search for valid, if unconventional, measures of public attitudes to larger andmore complex non-probability samples such as those captured by social media platforms suchas Twitter. The article uses Vote Compass data to train a model that parses the lexicon ofsocial media users in order to predict their ideology and vote intention. The Vote Compassdata provides a unique source of both training and validation sets for the model, and themodel itself makes the case for the robustness of the out-of-sample inferences derived fromVote Compass.

Each of these articles in their own way leverages the Vote Compass instrument toexperiment with new and unorthodox approaches to measurement – with each offeringempirical tests as to the validity of these proposed approaches.

5.2.2 Digital democracy

A second theme common to each of the articles presented in this dissertation is around theimplications of emergent digital technologies for how democracies operate.

From referenda to revolutions, digital technology has had a substantive role in reshapingthe practice of democracy in the 21st century. It has been credited with facilitating theorganization of subversion and resistance movements in authoritarian regimes, exposinggovernment corruption and deceit, and providing powerful new platforms for public expressionand democratic participation. But it has also been charged with undermining electoralintegrity, compromising personal privacy, increasing surveillance, and facilitating extremism.Both the promise and the peril of digital technology for liberal democratic orders are thesubject of much scholarly interest and attention (Der Derian, 2003; Feenberg, 1999; Lessig,1999; Deibert et al., 2010; Owen, 2015).

As a case study, VAAs offer perspective into some of the ways in which digital technologyis shaping the practices of democratic engagement. Each of the articles in this dissertationexamine different ways in which VAAs are reconstituting the dynamics of democracy in theDigital Age.

While the first article, The curse of dimensionality in VAAs: reliability and validity inalgorithm design, is primarily occupied with questions of the measurement of ideologicalagreement used by VAAs, it also engages more broadly with their role and function in

69

Page 83: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

a democratic society. In particular, it endeavours to posit a theoretical framework as tothe utility of VAAs for democratic participation. It identifies both the potential and theconstraints of VAAs as sources of civic engagement and political knowledge, and drawsdirectly from data generated by the Vote Compass VAA to examine how citizens use theapplication and the utility they derive from it.

The second article, On the external validity of non-probability samples: the case of VoteCompass, engages with the idea of representation in a democratic society and problema-tizes the dominant mode of public opinion measurement given the implications of ICTdiversification for sampling. It argues that digital technologies have disrupted the practiceof public opinion research and that, without credible measures of public opinion, electedrepresentatives – whether as delegates or trustees – are ostensibly less effective in terms ofaccurately representing the collective interests of their constituents.

The main thrust of the article, in terms of its contribution to the discourse on digitaldemocracy, is to make the case that certain unconventional datasets generated by emergentdigital technologies – such as VAAs – may represent new, albeit unorthodox, sites ofpublic opinion research, which could inform elected representatives and engender furtheraccountability and responsiveness in government.

The third article, Ideological scaling of social media users: a dynamic lexicon approach,is preoccupied with the implications of digital technology for political representation in ademocratic society. It extends the thesis developed in the second article, i.e. that certainnon-probability samples can be modelled in such a way as to generate reliable and validinferences about a population of interest, from the case of Vote Compass to that of socialmedia platforms such as Twitter. Vote Compass is a non-continuous source of public opinion– the application only runs during election campaigns and is otherwise dormant – and thesurvey data it collects are somewhat constrained to issues germane to the election campaignin which it is running. Twitter, on the other hand, represents a continuous, real-time flowof public opinion data which, given appropriate statistical controls and training data fromsources like Vote Compass, may yield reliable inferences about opinions within a populationof interest. Although not explicitly stated, this article lays for the foundation for passivepolling, wherein public opinion could be reliably inferred without having to solicit the activeparticipation of a sample of a population of interest. The applications of this techniqueextend beyond opinion polling and may contribute to the identification of disinformationand extremism on social media platforms.

70

Page 84: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

5.3 Discussion

This dissertation has endeavoured to build on the VAA scholarship to date and also advancenew research frontiers. It has, in the process, surfaced critical discussions about the meaningthat citizens derive from VAAs. Moreover, in positing the potential applications of theparticular kinds of data that VAAs can generate, it prompts critical reflection around theethical use of such data.

5.3.1 From advice to engagement

A path dependence within the literature on the subject has resulted in the prominence ofthe term “Voting Advice Application” as the presumptive label for applications such asVote Compass. As (Fossen and Anderson, 2014, p. 245) observe, this nomenclature reflectsthe assumption that “strengthening democracy is a matter of ensuring that the supportfor parties (expressed in votes) more accurately reflects the existing preferences of voters”,which they posit “fits well with the normative conception of democracy expounded by socialchoice theorists, but that view of democracy is contested.” Even within the context of socialchoice theory, however, the function of VAAs is subject to critique.

The very nomenclature of this class of applications, which purport to provide “votingadvice” suggests that vote choice can be optimised exclusively on the basis of the policy posi-tions associated with election platforms. This view is reductive and precludes considerationsabout party size and leadership, previous government experience, possible configurationsof governing coalitions, and other factors that have significant bearing on the likelihoodof a party to be able to make good on its campaign promises. In the absence of suchconsiderations, the “advice” provided by many VAAs is, at best, incomplete and, at worst,incorrect.

Evidence presented in the first article suggests that VAA users are not engaging withthe application explicitly to inform their vote choice, nor should the objective of a VAA beto provide “voting advice”, but rather to increase political knowledge within the generalpopulation regarding the public policy platforms of election candidates, all with a viewto promoting a more robust participatory democracy of the ‘social choice’ variety that(Fossen and Anderson, 2014) identify in their taxonomy. VAAs accomplish this end byorganizing and presenting information in a manner that makes it engaging to specialized andunspecialized audiences alike. It is this focus on engagement that distinguishes VAAs fromother online electoral information initiatives. Engagement is often the broad, if implicit,measure of the success of VAAs.

The commitment to engagement informs and simultaneously constrains most of thedesign parameters of a VAA. Substantial distillation of the nuances of public policy and

71

Page 85: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

ideology is required in order to ensure that the application appeals to a general audience,but this must undertaken with great care so as not to compromise the educational valueon offer. The tension between the dual – albeit not mutually exclusive – commitments toengagement and education have drawn criticism in which VAAs are characterized as “toys”(Ladner, Felder and Fivaz, 2010). In the absence of a robust algorithm, the educationalvalue of VAAs is indeed questionable. However, if properly calibrated, VAAs can arguablyenhance users’ knowledge and prompt critical reflection, thus contributing in a meaningfulway to civic engagement.

Both in public and academic discourses, a reorientation of the framing of this classof applications is called for in order to make more explicit their function, purpose andlimitations. Such a move would almost certainly have to start with a break from thenow well-entrenched path dependency relating to the nomenclature associated with theseapplications, as the term “Voting Advice Application” misrepresents the both the empiricaland prescriptive function of VAAs and can affect what users take away from their experiencewith these applications. Semantics matter and framing outputs as “advice” fosters theperception of that these instruments are instructive as to how users should vote which, atleast as VAAs are currently engineered, is neither the normative intention nor the manifestoutcome of VAA usage.

5.3.2 On the potential uses and abuses of VAA data

One of the most substantial contributions that this dissertation makes to the literature onVAAs is in relation to the applications of VAA data. To date, the data generated by VAAshas largely been used as diagnostics of the instruments themselves (Wheatley, 2012; Mendezand Wheatley, 2014), much in the way that the first article in this dissertation makes use ofVote Compass data. But the subsequent two articles work in tandem to extend the utility ofVAA data to broader research domains. Specifically, the second article makes the case forVAAs as valid polling instruments, and the third article demonstrates how these data canpermit the out-of-sample extrapolation of attributes such as ideology.

The theories and methods developed in this dissertation are intended to showcase thepotential of Big Data to contribute to the strengthening of core democratic tenets. However,it would be naïve to overlook the potential for exploitation of these methods for anti-democratic ends, particularly in the current climate where data and statistical techniquesnot entirely dissimilar from those described and developed in this dissertation have beenallegedly employed to undermine democratic institutions. Microtargeting, wedge politics,and other such modes of voter manipulation are by no means biproducts of the Digital Age– they certainly predate the dawn of the Internet, social media, and Artificial Intelligence.

72

Page 86: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

They are, however, dramatically more scalable as a result of these digital technologies, andostensibly more effective.

The motivation for this dissertation has been to explore the ways in which digitaltechnologies and Big Data might contribute to more robust democracies, and to that end itsfocus has been on how to model VAA and social media data in such a way as to give voiceto citizens. Doing so credibly and reliably involves collecting personal attributes about VAAusers; it can also involve using that data to estimate similar attributes for out-of-samplesocial media platform users.

These techniques are, of course, reminiscent of those that Cambridge Analytica wasallegedly using to tailor frames with a view to manipulating vote intentions during therecent U.S Presidential election and Brexit referendum, among others. There is, however, akey difference that warrants acknowledgement. For Cambridge Analytica, estimating theattributes of social media users was intended to refine microtargeting efforts that soughtto shift electoral outcomes toward a given outcome. The work advanced herein insteadextrapolates user attributes for use as weights to control for selection bias in non-probabilitysamples and thus produce externally valid estimates of public opinion. In this way itsmotivations are antithetical to those of Cambridge Analytica in that it seeks to reflectaggregate public opinion agnostic of the opinion that is ultimately expressed. Rather thantrying to influence citizens to reflect politicians’ views, it seeks to influence politicians toreflect citizens’ views.

Regardless of the normative intentions that motivate this dissertation, the potential forexploitation of the methods developed must be addressed. Invariably the question arisesas to whether this type of research ought to be curtailed so as to prevent any furtherrefinement of techniques already implicated in antidemocratic projects. But data miningis not new nor is Big Data is necessarily ethically suspect. There are ways of guardingagainst reasonable ethical concerns that researchers do and should hold when it comes tosuch matters. Moreover, transparency with respect to how these methods operate is criticalin identifying and addressing instances of exploitation. The techniques developed withinthis dissertation, for example, lay the foundation for advances in areas such as electoralfraud detection (Norris, 2013) and false news identification (Vosoughi, Roy and Aral, 2018).By exposing the ways in which digital technology can be used to undermine democraticpractices, research can also contribute to the development of powerful countermeasures thatprotect and preserve democratic ideals in the digital age.

73

Page 87: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Bibliography

Adcock, Robert. 2001. “Measurement validity: A shared standard for qualitative andquantitative research.” American Political Science Association 95(03):529–546.

Agathokleous, Marilena, Nicolas Tsapatsoulis and Ioannis Katakis. 2013. On the quantifica-tion of missing value impact on Voting Advice Applications. In International Conferenceon Engineering Applications of Neural Networks. Springer pp. 496–505.

Alvarez, R Michael, Ines Levin, Peter Mair and Alexander Trechsel. 2014. “Party preferencesin the digital age: The impact of voting advice applications.” Party Politics 20(2):227–236.

Anderson, Joel and Thomas Fossen. 2014. Voting Advice Applications and Political Theory:Citizenship, Participation and Representation. In Matching Voters with Parties andCandidates: Voting Advice Applications in a Comparative Perspective, ed. Diego Garziaand Stefan Marschall. Colchester: ECPR Press pp. 217–226.

Andreadis, Ioannis. 2012. To clean or not to clean? Improving the quality of VAA data. InIPSA World Congress. pp. 8–12.

Baker, Reg, J Michael Brick, Nancy A Bates, Mike Battaglia, Mick P Couper, Jill A Dever,Krista J Gile and Roger Tourangeau. 2013. “Summary report of the AAPOR task forceon non-probability sampling.” Journal of Survey Statistics and Methodology 1(2):90–143.

Barber, David. 2012. Factor Analysis. In Bayesian reasoning and machine learning, ed.David Barber. Cambridge University Press pp. 462–478.

Barberá, Pablo. 2015. “Birds of the same feather tweet together: Bayesian ideal pointestimation using Twitter data.” Political Analysis 23(1):76–91.

Battaglia, Michael P, David Izrael, David C Hoaglin and Martin R Frankel. 2009. “Practicalconsiderations in raking survey data.” Survey Practice 2(5):1–10.

Bélanger, Éric and Bonnie M Meguid. 2008. “Issue salience, issue ownership, and issue-basedvote choice.” Electoral Studies 27(3):477–491.

74

Page 88: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Bellman, Richard Ernest. 1957. Dynamic Programming. Princeton University Press.

Berrens, Robert P, Alok K Bohara, Hank Jenkins-Smith, Carol Silva and David L Weimer.2003. “The advent of Internet surveys for political research: A comparison of telephoneand Internet samples.” Political analysis 11(1):1–22.

Blais, André and M Martin Boyer. 1996. “Assessing the impact of televised debates: Thecase of the 1988 Canadian election.” British Journal of Political Science 26(2):143–164.

Blais, André, Neil Nevitte, Elisabeth Gidengil and Richard Nadeau. 2000. “Do people havefeelings toward leaders about whom they say they know nothing?” The Public OpinionQuarterly 64(4):452–463.

Bond, Robert and Solomon Messing. 2015. “Quantifying Social Media’s Political Space:Estimating Ideology from Publicly Revealed Preferences on Facebook.” American PoliticalScience Review 109(01):62–78.

Braunsberger, Karin, Hans Wybenga and Roger Gates. 2007. “A comparison of reliabilitybetween telephone and web-based surveys.” Journal of Business Research 60(7):758–764.

Brick, J Michael. 2011. “The future of survey sampling.” Public Opinion Quarterly 75(5):872–888.

Brown, Peter F, Peter V Desouza, Robert L Mercer, Vincent J Della Pietra and Jenifer CLai. 1992. “Class-based n-gram models of natural language.” Computational linguistics18(4):467–479.

Budge, Ian, Hans-Dieter Klingemann, Andrea Volkens, Judith L. Bara and Eric Tanenbaum.2001. Mapping policy preferences: estimates for parties, electors, and governments,1945-1998. Vol. 1 Oxford University Press on Demand.

Campbell, Angus, Philip E. Converse, Waren Miller and Donald Stokes. 1960. The AmericanVoter. New-York: Wiley.

Caplan, Bryan. 2006. The Myth of the Rational Voter: Why Democracies Choose BadPolicies. Princeton: Princeton University Press.

Carroll, Royce, Jeffrey B Lewis, James Lo, Keith T Poole and Howard Rosenthal. 2009. “Mea-suring bias and uncertainty in DW-NOMINATE ideal point estimates via the parametricbootstrap.” Political Analysis 17(3):261–275.

Ceron, Andrea. 2017. “Intra-party politics in 140 characters.” Party Politics 23(1):7–17.

75

Page 89: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Chang, Linchiat and Jon A Krosnick. 2009. “National surveys via RDD telephone interviewingversus the Internet: Comparing sample representativeness and response quality.” PublicOpinion Quarterly 73(4):641–678.

Clinton, Joshua, Simon Jackman and Douglas Rivers. 2004. “The statistical analysis of rollcall data.” American Political Science Review 98(02):355–370.

Cochrane, Christopher. 2011. “The Origins and Direction of Measurement Error in the‘Vote Compass’ on the CBC.” https://www.researchgate.net/publication/228510827_The_Origins_and_Direction_of_Measurement_Error_in_the_Vote_Compass_on_the_CBC.

Converse, Philip E. 1964. The nature of belief systems in mass publics. In Ideology andDiscontent, ed. David Apter. New York: The Free Press of Glencoe.

Council, National Research et al. 2013. Nonresponse in social science surveys: A researchagenda. National Academies Press.

De Graaf, Jochum. 2010. “The irresistible rise of Stemwijzer.” Voting Advice Applications inEurope: The state of the art pp. 35–46.

De Heer, W and E De Leeuw. 2002. “Trends in household survey nonresponse: A longitudinaland international comparison.” Survey nonresponse p. 41.

Deibert, Ronald, John Palfrey, Rafal Rohozinski and Jonathan Zittrain. 2010. AccessControlled: Policies and Practices of Internet Filtering and Surveillance. Cambridge: MITPress.

Delli Carpini, Michael X. and Scott Keeter. 1996. What Americans Know About Politicsand Why it Matters. New Haven: Yale University Press.

Der Derian, James. 2003. “The question of information technology in international relations.”Millennium 32(3):441–456.

Dever, Jill A, Ann Rafferty and Richard Valliant. 2008. Internet surveys: Can statisticaladjustments eliminate coverage bias? In Survey Research Methods. Vol. 2 pp. 47–60.

Dinas, Elias and Kostas Gemenis. 2010. “Measuring parties’ ideological positions withmanifesto data a critical evaluation of the competing methods.” Party politics 16(4):427–450.

Do, Chuong and Andrew Y Ng. 2005. Transfer learning for text classification. In NeuralInformation Processing Systems (NIPS). pp. 299–306.

76

Page 90: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Downs, Anthony. 1957. An Economic Theory of Democracy. New-York: Harper.

Dufresne, Yannick and Clifton van der Linden. 2015. Digital technology and civic engagement:the case of Vote Compass. In Canadian Election Analysis, 2015: Communication, Strategy,and Democracy, ed. Thierry Giasson and Alexander J Marland. Vancouver: UBC Presspp. 114–116.

Edgington, Eugene S. 1966. “Statistical inference and nonrandom samples.” PsychologicalBulletin 66(6):485.

Enyedi, Zsolt. 2016. “The influence of voting advice applications on preferences, loyaltiesand turnout: An experimental study.” Political Studies 64(4):1000–1015.

EU Profiler. 2009. “General Description and Method Explanation.”.URL: http://www.euprofiler.eu/help

Evans, Heather K, Victoria Cordova and Savannah Sipole. 2014. “Twitter style: An analysisof how house candidates used Twitter in their 2012 campaigns.” PS: Political Science &Politics 47(2):454–462.

Feenberg, Andrew. 1999. Technology, Philosophy, Politics. In Questioning technology, ed.Andrew Feenberg. New York City, NY: Routledge pp. 1–17.

Few, Stephen. 2005. “Keep Radar Graphs Below the Radar - Far Below.” InformationManagement Magazine .URL: http://www.information-management.com/issues/20050501/1026069-1.html

Fivaz, Jan and Giorgio Nadig. 2010. “Impact of voting advice applications (VAAs) on voterturnout and their potential use for civic education.” Policy & Internet 2(4):167–200.

Fossen, Thomas and Joel Anderson. 2014. “What’s the point of voting advice applications?Competing perspectives on democracy and citizenship.” Electoral Studies 36:244–251.

Fournier, Patrick, Fred Cutler, Stuart Soroka and Dietlind Stolle. 2015. “The 2015 CanadianElection Study.” [dataset].

Fournier, Patrick, Fred Cutler, Stuart Soroka, Dietlind Stolle and Éric Bélanger. 2013.“Riding the Orange Wave: Leadership, Values, Issues, and the 2011 Canadian Election.”Canadian Journal of Political Science/Revue canadienne de science politique 46:863–897.

Gabel, Matthew J and John D Huber. 2000. “Putting parties in their place: Inferring partyleft-right ideological positions from party manifestos data.” American Journal of PoliticalScience 44(1):94–103.

77

Page 91: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Garzia, Diego and Stefan Marschall. 2012. “Voting Advice Applications under review: thestate of research.” International Journal of Electronic Governance 5(3/4):203–222.

Garzia, Diego and Stefan Marschall, eds. 2014. Matching Voters with Parties and Candidates:Voting Advice Applications in a Comparative Perspective. Colchester: ECPR Press.

Gelman, Andrew et al. 2007. “Struggles with survey weighting and regression modeling.”Statistical Science 22(2):153–164.

Gemenis, Kostas. 2013. “Estimating parties’ policy positions through voting advice applica-tions: Some methodological considerations.” Acta politica 48(3):268–295.

Gemenis, Kostas and Martin Rosema. 2014. “Voting advice applications and electoralturnout.” Electoral studies 36:281–289.

Germann, Micha, Fernando Mendez, Jonathan Wheatley and Uwe Serdült. 2015. “Spatialmaps in voting advice applications: The case for dynamic scale validation.” Acta Politica50(2):214–238.

Godbout, Jean-François and Bjørn Høyland. 2011. “Legislative voting in the CanadianParliament.” Canadian Journal of Political Science 44(02):367–388.

Goot, Murray and Tim Sowerbutts. 2004. Dog whistles and death penalties: the ideologicalstructuring of Australian attitudes to asylum seekers. In Australasian Political StudiesAssociation Conference. Adelaide: University of Adelaide.

Grant, Will J, Brenda Moon and Janie Busby Grant. 2010. “Digital dialogue? Australianpoliticians’ use of the social network tool Twitter.” Australian Journal of Political Science45(4):579–604.

Grimmer, Justin. 2010. “A Bayesian hierarchical topic model for political texts: Measuringexpressed agendas in Senate press releases.” Political Analysis pp. 1–35.

Grimmer, Justin and Brandon M Stewart. 2013. “Text as data: The promise and pitfalls ofautomatic content analysis methods for political texts.” Political Analysis .

Hair, Joseph F., William C. Black, Barry J. Babin and Rolph E. Anderson. 1998. MultivariateData Analysis, Fifth Edition. Upper Saddle River: Prentice-Hall.

Hare, Christopher, David A Armstrong, Ryan Bakker, Royce Carroll and Keith T Poole.2015. “Using Bayesian Aldrich-McKelvey Scaling to Study Citizens’ Ideological Preferencesand Perceptions.” American Journal of Political Science 59(3):759–774.

78

Page 92: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Haynes, Audrey A and Brian Pitts. 2009. “Making an impression: New media in the 2008presidential nomination campaigns.” PS: Political Science & Politics 42(01):53–58.

Hillygus, D. Sunshine and Todd G. Shields. 2008. The Persuadable Voter: Wedge Issues inPolitical Campaigns. Princeton: Princeton University Press.

Himmelweit, Hilde T, Patrick Humphreys and Marianne Jaeger. 1993. How voters decide: amodel of vote chice based on special longitudinal study extending over fifteen years and theBritish election surveys of 1970-1983. Open University Press,.

Hirzalla, Fadi, Liesbet Van Zoonen and Jan de Ridder. 2010. “Internet use and politicalparticipation: Reflections on the mobilization/normalization controversy.” The InformationSociety 27(1):1–15.

Holbrook, Allyson, Jon A Krosnick, Alison Pfent et al. 2007. “The causes and consequencesof response rates in surveys by the news media and government contractor survey researchfirms.” Advances in telephone survey methodology pp. 499–528.

Huang, Yi-Min and Shu-Xin Du. 2005. Weighted support vector machine for classificationwith uneven training class sizes. In 2005 International Conference on Machine Learningand Cybernetics. Vol. 7 IEEE pp. 4365–4369.

Jessee, Stephen A. 2009. “Spatial voting in the 2004 presidential election.” American PoliticalScience Review 103(01):59–81.

Johnston, Richard. 1992. Letting the people decide: Dynamics of a Canadian election.Stanford University Press.

Johnston, Richard. 2017. “Vote Compass in British Columbia: insights from and aboutpublished polls.” Journal of Elections, Public Opinion and Parties 27(1):97–109.

Johnston, Richard, André Blais, Elisabeth Gidengil and Neil Nevitte. 1996. Challenge ofDirect Democracy: The 1992 Canadian Referendum. McGill-Queen’s Press-MQUP.

Johnston, Richard and Henry E Brady. 2002. “The rolling cross-section design.” ElectoralStudies 21(2):283–295.

Jurafsky, Daniel and James H Martin. 2009. Speech and Language Processing: An Introductionto Natural Language Processing, Computational Linguistics, and Speech Recognition. UpperSaddle River, NJ: Prentice Hall.

Kam, Christopher J. 2009. Party discipline and parliamentary politics. Cambridge UniversityPress.

79

Page 93: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Kanji, Mebs, Antoine Bilodeau and Thomas J Scotto. 2012. The Canadian Election Studies:Assessing Four Decades of Influence. UBC Press.

Katakis, Ioannis, Nicolas Tsapatsoulis, Fernando Mendez, Vasiliki Triga and ConstantinosDjouvas. 2014. “Social Voting Advice Applications: Definitions, Challenges, Datasets andEvaluation.” IEEE Transactions on Cybernetics 44:1039–52.

Keeter, Scott, Courtney Kennedy, Michael Dimock, Jonathan Best and Peyton Craighill.2006. “Gauging the impact of growing nonresponse on estimates from a national RDDtelephone survey.” International Journal of Public Opinion Quarterly 70(5):759–779.

Kleinnijenhuis, Jan, Jasper van de Pol, Anita MJ van Hoof and André PM Krouwel. 2017.“Genuine effects of vote advice applications on party choice: Filtering out factors thataffect both the advice obtained and the vote.” Party Politics p. 1354068817713121.

Klingemann, Hans-Dieter, Andrea Volkens, Judith L. Bara, Ian Budge and Michael D.McDonald. 2006. Mapping policy preferences II: estimates for parties, electors, andgovernments in Eastern Europe, European Union, and OECD 1990-2003. Vol. 2 OxfordUniversity Press on Demand.

Kohut, Andrew, Scott Keeter, Carroll Doherty, Michael Dimock and Leah Christian. 2012.“Assessing the representativeness of public opinion surveys.”Washington, DC: Pew ResearchCenter .

Krouwel, André, Thomas Vitiello and Matthew Wall. 2012. “The practicalities of issuingvote advice: a new methodology for profiling and matching.” International Journal ofElectronic Governance 5(3-4):223–243.

Ladner, Andreas, Gabriela Felder and Jan Fivaz. 2010. “More than toys? A first assessmentof voting advice applications in Switzerland.” Voting advice applications in Europe. Thestate of the art pp. 91–123.

Ladner, Andreas and Joëlle Pianzola. 2010. Do voting advice applications have an effect onelectoral participation and voter turnout? Evidence from the 2007 Swiss Federal Elections.In International Conference on Electronic Participation. Springer pp. 211–224.

Larsson, Anders Olof. 2016. “Online, all the time? A quantitative assessment of thepermanent campaign on Facebook.” New Media & Society 18(2):274–292.

Lauderdale, Benjamin E and Alexander Herzog. 2016. “Measuring political positions fromlegislative speech.” Political Analysis 24(3):374–394.

80

Page 94: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Laver, Michael, Kenneth Benoit and John Garry. 2003. “Extracting policy positions frompolitical texts using words as data.” American Political Science Review 97(02):311–331.

Lazarsfeld, Paul N., Bernard R. Berelson and Helen Gaudet. 1948. The People’s Choice:How the Voter Makes Up Their Mind in a Presidential Campaign. New-York: ColumbiaUniversity Press.

Lee, Sunghee. 2006. “Propensity score adjustment as a weighting scheme for volunteer panelweb surveys.” Journal of official statistics 22(2):329.

Lee, Sunghee and Richard Valliant. 2009. “Estimation for volunteer panel web surveys usingpropensity score adjustment and calibration adjustment.” Sociological Methods & Research37(3):319–343.

Lefevere, Jonas and Stefaan Walgrave. 2014. “A perfect match? The impact of statementselection on Voting Advice Applications’ ability to match voters and parties.” ElectoralStudies 36:252–262.

Lessig, Lawrence. 1999. Code and Other Laws of Cyberspace. New York City, NY: PegasusBooks.

Lijphart, Arend. 1997. “Unequal participation: Democracy’s unresolved dilemma presidentialaddress, American Political Science Association, 1996.” American political science review91(1):1–14.

Louwerse, Tom and Martin Rosema. 2014. “The design effects of voting advice applications:Comparing methods of calculating matches.” Acta politica 49(3):286–312.

Lowe, Will. 2008. “Understanding wordscores.” Political Analysis 16(4):356–371.

Lukin, Stephanie and Marilyn Walker. 2013. Really? well. apparently bootstrapping improvesthe performance of sarcasm and nastiness classifiers for online dialogue. In Proceedings ofthe Workshop on Language Analysis in Social Media. Citeseer pp. 30–40.

Mahéo, Valérie-Anne. 2016. “The Impact of Voting Advice Applications on ElectoralPreferences: A Field Experiment in the 2014 Quebec Election.” Policy & Internet 8(4):391–411.

Malhotra, Neil and Jon A Krosnick. 2007. “The effect of survey mode and sampling oninferences about political attitudes and behavior: Comparing the 2000 and 2004 ANES toInternet surveys with nonprobability samples.” Political Analysis 15(3):286–323.

81

Page 95: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Marschall, Stefan and Martin Schultze. 2015. “German E-Campaigning and the Emergenceof a ‘Digital Voter’? An Analysis of the Users of the Wahl-O-Mat.” German Politics24(4):525–541.

Medeiros, Mike, Jean-Philippe Gauvin and Chris Chhim. 2015. “Refining vote choice in anethno-regionalist context: Three-dimensional ideological voting in Catalonia and Quebec.”Electoral Studies 40:14–22.

Mendez, Fernando. 2012. “Matching voters with political parties and candidates: Anempirical test of four algorithms.” International Journal of Electronic Governance 5(3-4):264–278.

Mendez, Fernando and Jonathan Wheatley. 2014. “Using VAA-generated data for mappingpartisan supporters in the ideological space.” Matching Voters with Parties and Candidates:Voting Advice Applications in a Comparative Perspective pp. 161–73.

Mendez, Fernando, Kostas Gemenis and Constantinos Djouvas. 2014. Methodologicalchallenges in the analysis of voting advice application generated data. In Semantic andSocial Media Adaptation and Personalization (SMAP), 2014 9th International Workshopon. IEEE pp. 142–148.

Merrill, Samuel and Bernard Grofman. 1999. A Unified Theory of Voting: Directional andProximity Spatial Models. Cambridge: Cambridge University Press.

Monroe, Burt L and Ko Maeda. 2004. Talk’s cheap: Text-based estimation of rhetoricalideal-points. In Society for Political Methodology. pp. 29–31.

Monroe, Burt L, Michael P Colaresi and Kevin M Quinn. 2008. “Fightin’words: Lexicalfeature selection and evaluation for identifying the content of political conflict.” PoliticalAnalysis 16(4):372–403.

Montigny, Éric, Gélineau, François and François Pétry. 2013. La Boussole électorale québé-coise. In Les Québécois aux urnes: les partis, les médias et les citoyens en campagne,ed. Éric Bélanger, Frédérick Bastien and François Gélineau. Montreal: Les Presses del’Université de Montréal pp. 285–297.

Norris, Pippa. 2013. “The new research agenda studying electoral integrity.” Electoral Studies32(4):563–575.

Northrup, David. 2016. The 2015 Canadian Election Study Technical Documentation.Technical report Institute for Social Research, York University.

82

Page 96: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Otjes, Simon and Tom Louwerse. 2014. “Spatial models in voting advice applications.”Electoral Studies 36:263–271.

Owen, Taylor. 2015. Disruptive power: The crisis of the state in the digital age. OxfordStudies in Digital Politics.

Poole, Keith T and Howard Rosenthal. 1985. “A spatial model for legislative roll callanalysis.” American Journal of Political Science pp. 357–384.

Poole, Keith T and Howard Rosenthal. 2000. Congress: A political-economic history of rollcall voting. Oxford: Oxford University Press.

Poole, Keith T and Howard Rosenthal. 2001. “D-nominate after 10 years: A comparativeupdate to congress: A political-economic history of roll-call voting.” Legislative StudiesQuarterly pp. 5–29.

Quinn, Kevin M, Burt L Monroe, Michael Colaresi, Michael H Crespin and Dragomir RRadev. 2010. “How to analyze political attention with minimal assumptions and costs.”American Journal of Political Science 54(1):209–228.

Rivero, Gonzalo. 2015. “Preaching to the Choir. The Offline Determinants of FollowingMembers of the US Congress on Twitter.”.

Rosema, Martin, Joel Anderson and Stefaan Walgrave. 2014. “The design, purpose, andeffects of voting advice applications.” Electoral Studies 36:240–243.

Rosema, Martin and Tom Louwerse. 2015. Answer Scales in Voting Advice Applications.European Consortium for Political Research General Conference Montréal: .

Schneider, Philipp. 2017. “Ein Drittel der Deutschen plant den Wahl-O-Mat zu nutzen.” https://yougov.de/news/2017/08/30/ein-drittel-der-deutschen-plant-den-wahl-o-mat-zu-/.

Schonlau, Matthias, Arthur Van Soest, Arie Kapteyn and Mick Couper. 2009. “Selectionbias in web surveys and the use of propensity scores.” Sociological Methods & Research37(3):291–318.

Schultze, Martin. 2014. “Effects of voting advice applications (vaas) on political knowledgeabout party positions.” Policy & Internet 6(1):46–68.

Slapin, Jonathan B and Sven-Oliver Proksch. 2008. “A scaling model for estimating time-series party positions from texts.” American Journal of Political Science 52(3):705–722.

Small, Tamara. 2010. “Canadian politics in 140 characters: Party politics in the Twitterverse.”Canadian parliamentary review 33(3):39–45.

83

Page 97: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

Smith, TMF. 1983. “On the validity of inferences from non-random sample.” Journal of theRoyal Statistical Society. Series A (General) pp. 394–403.

Sniderman, Paul M. and Benjamin Highton. 2011. Facing the Challenge of Democracy:Explorations in the Analysis of Public Opinion and Political Participation. Princeton:Princeton University Press.

Spirling, Arthur and Iain McLean. 2007. “UK OC OK? Interpreting optimal classificationscores for the UK House of Commons.” Political Analysis pp. 85–96.

Steeh, Charlotte, Nicole Kirgis, Brian Cannon and Jeff DeWitt. 2001. “Are they really asbad as they seem? nonresponse rates’ at the end of the twentieth century.” Journal ofOfficial Statistics 17(2):227.

Straus, Jacob R, Matthew Eric Glassman, Colleen J Shogan and Susan Navarro Smelcer.2013. “Communicating in 140 characters or less: Congressional adoption of Twitter in the111th Congress.” PS: Political Science & Politics 46(1):60–66.

Struppeck, Thomas. 2014. “Combining Estimates.” Casualty Actuarial Society E-Forum2:1–14.

Sturgis, Patrick, Baker Nick, Callegaro Mario, Fisher Stephen, Green Jane, Will Jennings,Kuha Jouni, Lauderdale Ben and Smith Patten. 2016. “Report of the Inquiry into the2015 British general election opinion polls.”.

Temporão, Mickael, Corentin Vande Kerckhove, Clifton van der Linden, Yannick Dufresneand Julien M. Hendrickx. 2018. “Replication Data for: Ideological Scaling of Social MediaUsers. A Dynamic Lexicon Approach.”.URL: https://doi.org/10.7910/DVN/0ZCBTB

Toker, Dereck, Cristina Conati, Giuseppe Carenini and Mona Haraty. 2012. User Mod-eling, Adaptation, and Personalization: 20th International Conference, UMAP 2012,Montreal, Canada, July 16-20, 2012. Proceedings. Berlin, Heidelberg: Springer BerlinHeidelberg chapter Towards Adaptive Information Visualization: On the Influence of UserCharacteristics, pp. 274–285.

Trechsel, Alexander H and Peter Mair. 2011. “When parties (also) position themselves: Anintroduction to the EU Profiler.” Journal of Information Technology & Politics 8(1):1–20.

Van de Kerckhove, Wendy, Jill M Montaquila, Priscilla R Carver and J Michael Brick.2009. “An Evaluation of Bias in the 2007 National Household Education Surveys Program:Results from a Special Data Collection Effort. NCES 2009-029.” National Center forEducation Statistics .

84

Page 98: Disrupting Democracy? · 2019. 7. 24. · process. Healsotaughtmeagreatdealaboutpoliticalbehaviour(boththeoryandpractice), resilienceinthefaceofadversity,andexperimentation(including,oftentimes,experimental

van der Linden, Clifton and Yannick Dufresne. 2017. “The curse of dimensionality in VotingAdvice Applications: reliability and validity in algorithm design.” Journal of Elections,Public Opinion and Parties 27(1):9–30.

van der Meer, Tom, Henk van der Kolk and Roderik Rekker. 2018. Aanhoudend wisselvallig:Nationaal Kiezersonderzoek 2017. Technical report Stichting KiezersOnderzoek Nederland(SKON).

Vavreck, Lynn and Douglas Rivers. 2008. “The 2006 cooperative congressional electionstudy.” Journal of Elections, Public Opinion and Parties 18(4):355–366.

Vergeer, Maurice. 2015. “Twitter and political campaigning.” Sociology Compass 9(9):745–760.

Vosoughi, Soroush, Deb Roy and Sinan Aral. 2018. “The spread of true and false newsonline.” Science 359(6380):1146–1151.

Walgrave, Stefaan, Michiel Nuytemans and Koen Pepermans. 2009. “Voting aid applicationsand the effect of statement selection.” West European Politics 32(6):1161–1180.

Wall, Matthew, Maria Laura Sudulich, Rory Costello and Enrique Leon. 2009. “Pickingyour party online–An investigation of Ireland’s first online voting advice application.”Information Polity 14(3):203–218.

Wang, Wei, David Rothschild, Sharad Goel and Andrew Gelman. 2015. “Forecasting electionswith non-representative polls.” International Journal of Forecasting 31(3):980–991.

Wheatley, Jonathan. 2012. “Using VAAs to explore the dimensionality of the policy space:experiments from Brazil, Peru, Scotland and Cyprus.” International Journal of ElectronicGovernance 5(3-4):318–348.

Whiteley, Paul. 2016. “Why Did the Polls Get It Wrong in the 2015 General Election?Evaluating the Inquiry into Pre-Election Polls.” The Political Quarterly 87(3):437–442.

Yeager, David S, Jon A Krosnick, LinChiat Chang, Harold S Javitz, Matthew S Levendusky,Alberto Simpser and Rui Wang. 2011. “Comparing the accuracy of RDD telephone surveysand internet surveys conducted with probability and non-probability samples.” Publicopinion quarterly 75(4):709–747.

Zuckerman, Alan S, Nicholas A Valentino and Ezra W Zuckerman. 1994. “A structuraltheory of vote choice: Social and political networks and electoral flows in Britain and theUnited States.” The Journal of Politics 56(4):1008–1033.

85


Recommended