SKEWING THE DATA - Amsterdam Data Science€¦ · SKEWING THE DATA: POLITICAL POLLING AND...

Post on 11-Jun-2020

4 views 0 download

transcript

1

SKEWING THE DATA:POLITICAL POLLING AND CONTROVERSY

Andrew S. Tanenbaum

The Votemaster at (www.electoral-vote.com)

THE ELECTORAL COLLEGE

•  The president is not elected by popular vote•  The president is elected by the electoral college•  Each state has a certain number of electors•  California has 55, Wyoming has 3•  In total there are 538 electoral votes•  In 2004, I started a site that kept track of the state polls

2

SITE ON 6 NOV. 2012

3

HOW DO POLLSTERS DECIDE WHO TO CALL?

•  Random digit dialing–  Phone numbers are of the format (212) 695-xxxx–  Computer picks random xxxx–  Problems: you call businesses, modems, fax machines, …

•  Pollster buys lists of voters (e.g., registered voters)

4

WHAT DO THEY ASK?

•  Candidate favorability•  Horse race questions•  Demographic questions

–  Age–  Party identification–  Gender–  Race–  Education–  Income

5

BACKGROUND ON POLLING

•  A poll using live interviewers costs about $15,000•  Strong bias on who will talk to interviewer•  Response rates are below 10%•  So what do polling firms do?•  Make small samples and do statistical correction

6

THE POLLS ARE IN. NOW WHAT?

7

Feb Mar Apr May Jun Jul Aug Sep OctJan

60%

55%

50%

45%

40%

Poll

THE POLLS ARE IN. NOW WHAT?

8

If candidate is above 50%, he wins

If candidate is below 50%, he loses

Feb Mar Apr May Jun Jul Aug Sep OctJan

60%

55%

50%

45%

40%

THE POLLS ARE IN. NOW WHAT?

9

Feb Mar Apr May Jun Jul Aug Sep OctJan

60%

55%

50%

45%

40%

If we count only the mostrecent poll, the candidate loses

THE POLLS ARE IN. NOW WHAT?

10

Feb Mar Apr May Jun Jul Aug Sep OctJan

60%

55%

50%

45%

40%

If we average the last 2,The candidate wins

* Average

THE POLLS ARE IN. NOW WHAT?

11

Feb Mar Apr May Jun Jul Aug Sep OctJan

60%

55%

50%

45%

40%

Should this poll count?It was in June

*Average

HOW TO AVERAGE POLLS

•  How you handle this is controversial•  If the window is too long, you miss rapid changes•  If the window is too short, not so accurate•  Should partisan pollsters be included?•  Should pollsters be weighted by previous accuracy?

12

POLLING CONTROVERSIES

•  Is the sample really random?–  RDD: Homes with 0 landlines? 3 landlines?–  Lists: Which ones? Are they current?–  Who to poll? All adults? Registered voters? Likely voters?–  Nonresponse bias (e.g. young mothers vs. old widows)–  It is illegal for a computer to call a cell phone

•  Methodology–  Live interviewer vs. robo vs. Internet polls–  Question order and wording

13

POLLING CONTROVERSIES

•  Polling Non-English-speaking voters–  Most firms just ignore them

•  Dealing with lying respondents–  Bernie-or-bust voters–  Bradley effect–  Trump does better in robo polls than live interviewer polls

•  Some pollsters just made up the data! No real calls

14

DATA CORRECTION

•  Pollsters know the demographics of the population•  If the sample demographics are different, they correct•  Suppose it is known that 52% of voters are women•  But in some poll, only 47% were women•  Then they weight each woman in the sample as 52/47•  They also correct for political party identification•  Suppose they know 38% of voters are Democrats•  But in this sample only 32% were•  Each Democrat then is weighted as 38/32

15

UNSKEWED POLLS

•  In 2012, a man named Dean Chambers thought Mitt Romney would beat Barack Obama

•  The polls showed otherwise•  He assumed the polls were wrong•  But how?•  He blamed it on the data correction•  He re-corrected the raw data based on a much

more Republican electorate•  His Website predicted a Romney win

16

UNSKEWEDPOLLS.COM

17

Romney got 206 electoral votes and lost

RIGHT-WING MEDIA BELIEVED IT TOTALLY

•  Right-wing media & Websites took this as the truth•  Millions of people believed Chambers•  Fox News saw him as a rising star•  Mitt Romney expected to win based on this•  On election night, Karl Rove refused to accept results

18

WERE THE 2016 POLLS “SKEWED”?

•  RealClear Politics had Clinton +3.3% ± 2%•  Final result nationally was Clinton +2.1%•  The national polls were spot on•  The state polls showed it too close to call in 12 states

19

CONCLUSION

•  Somebody can take the data and modify it•  People & media might prefer the “modified” data•  Millions of people believed the “modified” data

20

21

THANK YOU !

ELECTORAL-VOTE.COM

22