+ All Categories
Home > Documents > Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja...

Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja...

Date post: 29-Dec-2015
Category:
Upload: adrian-perkins
View: 225 times
Download: 0 times
Share this document with a friend
14
Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona Statache, Claire O’Malley, Tom Rodden, and Derek McAuley HORIZON Digital Economy Research, University of Nottingham
Transcript
Page 1: Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona.

Ethics considerations for Corpus Linguistics studies using internet

resources

Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona Statache, Claire O’Malley, Tom Rodden,

and Derek McAuley

HORIZON Digital Economy Research, University of Nottingham

Page 2: Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona.

Rising popularity of public and semi-public online communication channels:

Written form: Blogs (late 1990s); Wikipedia (2001); Facebook (2004); Reddit (2005); Twitter (2006)

Spoken form: Podcasts (2004); YouTube (2005)

Low effort and cost of data collectionUnobtrusive ‘behind the scenes’ data collection using

application programme interfaces (APIs) or web scraping techniques (e.g. Twitter; Blogs)

Growing appeal of the Web as source of corpus data

Page 3: Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona.

Section 2.9 ‘Internet research’:

In the case of an open-access site, where contributions are publicly archived, and informants might reasonably be expected to regard their contributions as public, individual consent may not be required. In other cases it normally would be required.

OK easy, unless a site blocks access (e.g. password required), no consent is needed from observed public. … Or is it?

BAAL Recommendations on Good Practice in Applied Linguistics (2006)

Page 4: Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona.

Online data collection is often undetectable to the public (i.e. covert) unless they are explicitly informed about it.

Section 2.5 ‘covert research’:Observation in public places is a particularly problematic issue.

If observations or recordings are made of the public at large, it is not possible to gain informed consent from everyone. However, post-hoc consent should be negotiated if the researcher is challenged by a member of the public.

Unless explicitly informed about the data collection, the public has no chance to challenge and demand post-hoc consent.

BAAL Recommendations on Good Practice in Applied Linguistics (2006)

Page 5: Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona.

Section 2.5 ‘covert research’ (concluding part):

A useful criterion by which to judge the acceptability of research is to anticipate or elicit, post hoc, the reaction of informants when they are told about the precise objectives of the study. If anger or other strong reactions are likely or expressed, then such data collection is inappropriate.

Researchers should, at the end of the data collection period, post a message about the research, offering some form of ‘opt-out’ procedure for any participant who wishes to do so.

BAAL Recommendations on Good Practice in Applied Linguistics (2006)

Page 6: Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona.

When is online information ‘public’ and can be “freely quoted and analyzed […] without consent”? [Bruckman, 2002]

• It is officially, publicly archived

• No password is required for access

• No site policy prohibits it

• The topic is not highly sensitive.

With Google-caching, retweeting, ‘Like’ buttons etc. what is the true meaning of “officially, publicly archived”?

Software defaults settings produce publicly accessible archives without users formulating a conscious decision (e.g. Blogs)

Public – Private distinction

Page 7: Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona.

“… where contributions are publicly archived, and informants might reasonably be expected to regard their contributions as public, individual consent may not be required.” [section 2.9, BAAL Recommendations on Good Practice in Applied Linguistics]

• It is not always easy to determine which online spaces people perceive as 'private' or 'public‘.

• Participants may consider their publicly accessible internet activity to be private despite agreeing to the site User License Agreements.

• Communication may have been private when it was first conducted, even if it is now publicly available.

Public – Private expectation

Page 8: Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona.

Public/Private: People may operate in public spaces but maintain strong

perceptions or expectations of privacy. The substance of their communication may be public, but

the context in which it appears implies restrictions on how that information is -- or ought to be -- used.

Social, academic, or regulatory delineations of public and private as a clearly recognizable binary no longer holds in everyday practice.

AoIR Ethics Working Committee (version 2.0)

Page 9: Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona.

Communication on the Internet has characteristics that are different from communication in other channels (Boyd, 2008):

Persistence: postings on the Internet are automatically registered and stored;

Replicability: content in digital form can be duplicated without cost;

Invisible audiences: we do not know who sees our postings. Searchability: content in the networked public sphere is very

easily accessible by conducting a search.

People therefore do not have an intuitive sense about the level of privacy that they should expect from internet communication

Factors affecting user behaviour and expectations of privacy

Page 10: Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona.

Consent was given when the “Terms and Conditions” of the site were click-signed.

T&Cs are rarely read T&Cs are too vague and incomprehensible to gain true

informed consent (Luger, 2013)

Having other people read your conversation is different from having your past conversations made into a corpus for analysis.

In a climate where ethically questionable social media anlytics for commercial and security gain are increasing, academia has a responsibility to enter into the discussion of what constitutes good, ethical conduct.

Responsibility when dealing with online communication

Page 11: Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona.

• Purpose: obtain first-hand data concerning conditions under which participants would be willing to consent to having their data used for research purposes

• Targeted at a wide cross-section of the population

• How do conditions for consent change as function of:• Participant demographics• The type of social network platform• The type of organization doing the study• The type of question being studied

http://casma.wp.horizon.ac.uk/casma-projects/ccasmd

Questionnaire study regarding conditions for consent

Page 12: Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona.

‘Respect for the autonomy and dignity of persons’, e.g. privacy, is not the only factor determining ethics of data collection or analysis.

• Scientific value• Social responsibility• Maximizing benefits and minimizing harm

Research conducted to prevent socially unacceptable behaviour, e.g. bullying, may require using data from perpetrators without their consent.

The greater good

Page 13: Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona.

• There is no binary divide between private and public. • Expectations of privacy differ from official site policy.• The ‘public’ nature of a platform does not provide a

carte-blanch for accessing the data hosted on it.• Maximize transparency of research as much as

possible.• If opt-in is not possible, opt-out should be offered.• If contacting all subjects is not possible, at least

contact some to get a sense of the subjective response to the study and methods.

Conclusions

Page 14: Ethics considerations for Corpus Linguistics studies using internet resources Ansgar Koene, Svenja Adolphs, Elvira Perez, Christopher J. Carter, Ramona.

Thank you for your attention

[email protected]

Project blog: http://casma.wp.horizon.ac.ukConsent survey: http://casma.wp.horizon.ac.uk/casma-projects/ccasmd

Twitter: @CaSMaResearch


Recommended