+ All Categories
Home > Documents > Thoughts on engaging critically with data Bettina Berendt Department of Computer Science KU Leuven,...

Thoughts on engaging critically with data Bettina Berendt Department of Computer Science KU Leuven,...

Date post: 28-Dec-2015
Category:
Upload: rosamund-snow
View: 214 times
Download: 0 times
Share this document with a friend
Popular Tags:
46
Thoughts on engaging critically with data Bettina Berendt Department of Computer Science KU Leuven, Belgium http://people.cs.kuleuven.be/~bettina.berendt/ Vienna Summer School on Digital Humanities July 7 th , 2015, Vienna, Austria ‹#›
Transcript

Thoughts onengaging critically with data

Bettina Berendt

Department of Computer ScienceKU Leuven, Belgiumhttp://people.cs.kuleuven.be/~bettina.berendt/

Vienna Summer School on Digital HumanitiesJuly 7th, 2015, Vienna, Austria

‹#›

2

2

Goals and non-goals

•Goals▫Discuss some relevant issues with (esp. Big)

data, using text data as illustration wherever possible described in the literature and/or Identified in our research

•Non-goals (selection)▫Discuss all of the issues▫Solve the problems▫Go into the depth these topics deserve▫(I hope to have made you want to carry on!)▫Offer binding legal advice

3

3

What can (text) data tell us? And what not?

What are we allowed to do with data?

Who is doing research, and what else do they do?

Thoughts about research paradigms

‹#›

What can (text) data tell us? And what not?

What are we allowed to do with data?

Who is doing research, and what else do they do?

Thoughts about research paradigms

‹#›

6

6

“Data speak for themselves.“

•“With enough data, the numbers speak for themselves.”

Anderson, C. (2008). •“Quantitative data [...] are independent of

interpretation; [...] they often demand an interpretation that transcends the quantitative realm.“

Moretti, F. (2007), p.30

7

7

Data?

•datum = given

•“data refer to those elements that are taken [abstracted from phenomena]: extracted through observations, computations, experiments, and record keeping”, “selected from nature by the scientist in accordance with his [sic] purpose” (Kitchin, 2014)

Capta!

8

8

Impact of measure-ment methods

9

9

Name ≥2 aspects of the data we should question!

Moretti, F. (2007)

10

10

Name ≥2 aspects of the data we should question!

Moretti, F. (2007)

11

11

Name ≥3 aspects of the data and interpretation that we should question!

Moretti, F. (2007)

12

12

Who or what “speaks“? Who or what “decides“?

13

13

CRISP-DM revisited: What‘s misleading?

14

14

Summary: Can data speak for themselves?

•„all researchers are interpreters of data“ (boyd & Crawford, 2012)

•This starts with the design decisions that determine what will be measured

• ... Goes on with decisions of what is cleaned (what is a stopword? What is an outlier? ...)

• ... Continues with the decision of models (e.g. inductive bias of data-mining methods)

• ... And of course carries over into the interpretation of the results

15

15

Empiricism and apophenia

16

16

Empiricism and apophenia: correlation, causation, and instrumentality

17

17

“Correlation replaces causation“: Business logic and prediction vs. explanation ...In addition, this slide IS a book recommendation! Read Dave Eggers, The Circle

18

18

... but can we explain German history like this?(Thanks to Christiane Fellbaum for leading me up to this example)

19

19

Business logic and prediction vs. “Law is a framework aimed to regulate life in society”

• After 9/11, data companies began expanding into the counterterrorism business.

• They lobbied to convince politicians of data mining‘s potential value for counterterrorism purposes.▫ E.g., Robert O’Harrow, Jr. (2005). No Place to Hide, pp. 56–63, cited

in Solove (2010). Nothing to Hide.

• This has given rise to texts such as (from Mayer-Schönberg & Cukier (2013). Big Data): “The promise of big data is that we do what we’ve been doing all along – profiling – but make it better, less discriminatory, and more individualized. That sounds acceptable if the aim is simply to prevent unwanted actions. But it becomes very dangerous if we use big-data predictions to decide whether somebody is culpable and ought to be punished for behaviour that has not yet happened.”

• What‘s wrong with this passage? Pretty much everything, I think. See (Berendt, 2015)

20

20

Bigger data are not always better data•All data (big or not) are representations and

samples▫E.g. „Twitter users“ =/= (all) people

•Social scientists know this and have methods to correct for it (or: to assess the impact)

•The Big Data assumption that “more data compensates for these factors“ is wrong!

•Commercial interests behind data provision, privacy concerns, ... we don‘t even know what the sampling method and the biases are!

•Combining datasets makes the problem worse

21

21

Statistics doesn‘t change just because of computers (or big datasets)

•Let‘s say you predict “politician X will win“ because you found, with sentiment analysis, that 60% rate her positively and 40% negatively.

•What if the sentiment-analysis method has ~70% accuracy?

•What if it builds on a prior ▫language detection▫named-entity recognition▫...with x<100% accuracy?

22

22

Parking lot science

23

23

Parking lot science: examples?!

•Restrictions on search in Twitter, ... Research focus on current and recent events?!

•“Trending topics“ algorithm in Twitter based on burstiness

Suppression of persistent topics?!

•What about Facebook‘s edgerank?▫ e.g. Zeynep Tufekci’s observations after the

Ferguson shooting

What can (text) data tell us? And what not?

What are we allowed to do with data?

Who is doing research, and what else do they do?

Thoughts about research paradigms

‹#›

25

25

Is it legal to use social-media data?•No simple answer, different opinions•Main issues: copyright, privacy•Privacy

▫Purpose limitation as a central principle▫Reasonable expectations of privacy

•But: Terms and conditions / consent + contract▫E.g. Twitter: “a Tweet […] is a message of 140

characters or less that is public by default”•Conflict between fundamental rights

26

26

Is it ethical to use [specific] data?

• Just because it‘s accessible – doesn‘t mean it‘s ethical.

•Shifting ethics?• (1) without consent (from an intercepted

telephone conversation between two US diplomats WHAT caused the public rage? (Pauen & Welzer, 2015)

“Fuck the EU“

“We are facing the threat of totalitarianism without uniform.“ (Pauen & Welzer, 2015)

27

27

New forms of data procurement with explicit consent•Citizen science•Crowdsourcing (e.g. Mechanical Turk)•“Donated data“ (quantified self,

tachographs ...)

•Democratic empowerment?•Self-surveillance?•Exploitation and neo-colonialism?

28

28

Charting a city‘s language – a re-purposing without user consenthttp://www.theguardian.com/news/datablog/interactive/2013/feb/21/twitter-languages-new-york-mapped

29

29

Let‘s take a vote

Assuming the position of ...

... do you think that personal rights may be violated by the re-purposing?

a modern person (e.g., the author)

30

30

What about those who can‘t give consent any more? The case of dead people (a key area of research for DH!)

• Warning: I am not a historian ;-)• Today‘s view:

▫Dead people are, primarily, dead.▫Limited scope and temporal decay of postmortal

personality rights (“the need for protection disappears in line with memory of the deceased increasingly fading away“, Bundesgerichtshof 1989)

• Medieval view:▫Dead people are, primarily, people.▫Memoria as a key social practice.▫Obligation of the clergy: pray for others.

31

31

Charting Europe – partially based on “tweets and retweets, 8th Century AD+“

32

32

Tweets and retweets, 8th Century AD+

“a rich source for prosopography and linguistic history of the early middle ages“

then now

Prayer brotherhoods (societas fraternitatis)

social network between monasteries

Brotherhood books (libri confraternitatum)

articulated social network

Death roll (rotulae mortuorum) thread

Circulated through Europe for years, reaching a length of up to 30 meters, consisting of

Vita of the deceased monk (etc.) (encyclica)

first tweet

Additions (tituli) commenting retweets

33

33

Let‘s take a vote (contd.)

Assuming the position of ...

... do you think that personal rights may be violated by the re-purposing?

a modern person

34

34

Let‘s take a vote (contd.)

Assuming the position of ...

... do you think that personal rights may be violated by the re-purposing?

a modern person

a medieval person

35

35

Let‘s take a vote (contd.)

Assuming the position of ...

... do you think that personal rights may be violated by the re-purposing?

... do you think that God‘s will may be violated by the re-purposing?

a modern person

a medieval person

What can (text) data tell us? And what not?

What are we allowed to do with data?

Who is doing research, and what else do they do?

Thoughts about research paradigms

‹#›

37

37

Access and new digital divides?

•Who gets to do research?▫Social-media companies?▫Rich top-tier universities?▫Computer scientists?▫... What about further demographics?

•“Digital Humanities work is only getting funded these days if it involves big infrastructure projects.“ ▫(from a conversation with a critical data

scientist who has big infrastructure projects)

38

38

What if the researcher is also the service provider?•“The Facebook experiment” (Kramer et al.,

2014) manipulated the contents of nearly 700,000 users’ News Feeds to induce changes in their emotions.

•Basic question: If you hear happy stories from your friends, does this▫make you happy? (“emotional contagion“)▫make you miserable (“social comparison“)

•This experiment was widely criticized on ethical grounds regarding informed consent.

•But it also has severe methodological flaws.

39

39

What if the service provider is also the news medium?•Recall Twitter‘s Trending topics, Facebook‘s

edgerank

“But keep in mind, Ferguson is also a net neutrality issue. It’s also an algorithmic filtering issue. How the internet is run, governed and filtered is a human rights issue.” (Zeynep Tufekci, 2014)

What can (text) data tell us? And what not?

What are we allowed to do with data?

Who is doing research, and what else do they do?

Thoughts about research paradigms

‹#›

41

41

Alternative visions• data-driven science (Kitchin, esp. 2014a)

▫ combines induction, deduction and abduction▫ values theory & is aware of it

• ... and what about the digital humanities and computational social sciences? (Kitchin, esp. 2014b)▫ Post-positivism in the humanities and the social sciences▫ Critical theory:

Research is not neutral Positionality of the researcher Reflexivity of the process

• An STS approach: ▫ society / “the human“ technology ▫ (De Wolf, Vanderhoven, Berendt, Pierson, & Schellens,

submitted)

42

42

What are our responsibilities? What can we do?A personal view on this:

Berendt, Büchler, & Rockwell (2015). Is it research or is it spying? Thinking-through ethics in Big Data AI and other knowledge sciences.

43

Thank you!

I‘ll be more than happy to hear your

s?

44

44

General discussion (just my non-edited notes)•Annotation tools•Where to save the research data•Can we trust the tools? Who gets access?•Customized solutions talk to a CSer•Positionality is there•What can be automated

45

45

References• Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete.

Wired 16.07. Available at http://edge.org/3rd_culture/anderson08/anderson08_index.html • Berendt, B. (2015). Big Capta, Bad Science? On two recent books on “Big Data” and its

revolutionary potential. http://people.cs.kuleuven.be/~bettina.berendt/Reviews/BigData.pdf• Berendt, B., Büchler, M., & Rockwell, G. (2015). Is it research or is it spying? Thinking-through

ethics in Big Data AI and other knowledge sciences. Künstliche Intelligenz, 29(2), 223-232.• boyd, d. & Crawford, K. (2012). Critical questions for Big Data. Information, Communication &

Society, 15:5, 662-679, DOI: 10.1080/1369118X.2012.678878.• De Wolf, R., Vanderhoven, E., Berendt, B., Pierson, J. & Schellens, T. (submitted). Self-reflection in

privacy research on social network sites.• Kitchin, R. (2014a). The Data Revolution. Big Data, Open Data, Data Infrastructures & Their

Consequences. London: Sage.• Kitchin, R. (2014b). Big Data, new epistemologies and paradigm shifts. Big Data & Society, April-

June 2014,1-12. • Kramer, A., Guillory, J., & Hancock, J. (2014). Experimental evidence of massive-scale emotional

contagion through social networks. Proceedings of the National Academy of Sciences 111, 8788-8790. http://www.pnas.org/content/111/24/8788.full.pdf+html

• Moretti, F. (2005). Graphs, Maps, Trees. Abstract Models for Literary History. p.30 London: Verso (cited from the paperback published in 2007)

• Pauen, M. & Welzer, H. (2015). Autonomie: Eine Verteidigung [Autonomy: A Defence], Frankfurt am Main: S. Fischer Verlag

• Tufekci, Z. (2014). What Happens to #Ferguson Affects Ferguson: Net Neutrality, Algorithmic Filtering and Ferguson. https://medium.com/message/ferguson-is-also-a-net-neutrality-issue-6d2f3db51eb0

46

46

More sources

•Please find the URLs of pictures and screenshots in the Powerpoint “comment“ box

•Thanks to the Internet for them!


Recommended