+ All Categories
Home > Documents > Lecture 2 : Introduction to Data Ethics -...

Lecture 2 : Introduction to Data Ethics -...

Date post: 25-May-2018
Category:
Upload: hadieu
View: 213 times
Download: 0 times
Share this document with a friend
31
CDS 151 — Spring 2013 — Data Ethics in an Information Society Lecture 2 : Introduction to Data Ethics
Transcript

CDS 151 — Spring 2013 — Data Ethics in an Information Society

Lecture 2 :

Introduction to

Data Ethics

2

Outline

Reading Assignment & Class Assignments

Introduction to Data & Data Ethics

Statistics: Use, Abuse, and Misuse

3

Reading Assignment

Weekly reading assignments are posted online at http://mymason.gmu.edu/

How to Lie with Statistics (D. Huff) Last week’s assignment: Introduction and Chapter 1

This week: Chapters 2 and 3

Visual & Statistical Thinking: Displays of Evidence

for Decision Making (E. R. Tufte) [ no assignment this week]

On Being a Scientist: Responsible Conduct in

Research (National Academy of Sciences) [free] [ no assignment this week]

4

Class Assignments

1. Assignments in Blackboard – don’t forget!

2. On-going assignment (due date April 17) – submit a

copy of your Training Completion Report. Choose

one of these to complete : a) Complete the RCR training (after passing the exams, submit copy of

Completion Report for up to 25% of course grade): http://oria.gmu.edu/ethical-conduct-of-research/responsible-conduct-of-research-education/responsible-conduct-of-research-training-plan/

or b) Complete the HSR training (after passing the exams, submit copy of

Completion Report for up to 25% of course grade): http://oria.gmu.edu/research-with-humans-or-animals/institutional-review-board/human-subjects-training/

5

Outline

Reading Assignment & Class Assignments

Introduction to Data & Data Ethics

Statistics: Use, Abuse, and Misuse

We are now facing a huge problem !

The Tsunami

The

Data

Tsunami

We are now facing a huge problem !

8

The Data Flood is Everywhere!

Huge quantities of data are

being generated in all

business, government, and

research domains:

Banking, retail, marketing,

telecommunications, health,

homeland security, computer

networks, social networks,

business transactions,

scientific data (genomics,

astronomy, physics, etc.),

Web, text, and e-commerce

9

How much data are there?

Data volume doubles every year !

There are a lot !

So, how do we measure it ?

Note: “Data” are plural (many), and datum is singular (one item)

10

Measuring Data Quantities

Byte 8 bits 1 one byte = one character (A,B,C...)

one bit = 0/1 or Y/N or T/F

Kilobyte 1000 bytes 210 half a page of text

Megabyte 106 bytes 220 small digital photo, or small book, or 3.5-inch diskette

Gigabyte 109 bytes 230 DVD with broadcast quality movie, or 2 CDs

Terabyte 1012 bytes 240 50,000 trees made into paper and printed into text

Petabyte 1015 bytes 250 all U.S. academic research libraries

Exabyte 1018 bytes 260 all words ever spoken by human beings throughout all of history

… followed by Zettabytes, Yottabytes, Brontobytes … http://www.whatsabyte.com/

11

Measuring Data Quantities

Byte 8 bits 1 one byte = one character (A,B,C...)

one bit = 0/1 or Y/N or T/F

Kilobyte 1000 bytes 210 half a page of text

Megabyte 106 bytes 220 small digital photo, or small book, or 3.5-inch diskette

Gigabyte 109 bytes 230 DVD with broadcast quality movie

Terabyte 1012 bytes 240 50,000 trees made into paper and printed into text

Petabyte 1015 bytes 250 all U.S. academic research libraries

Exabyte 1018 bytes 260 all words ever spoken by human beings throughout all of history

12

UC Berkeley 2003 estimate:

http://www.sims.berkeley.edu/research/projects/how-much-info-2003/

Updated … 2008 estimate by IDC.com:

http://www.emc.com/collateral/analyst-reports/diverse-exploding-digital-universe.pdf

5 exabytes* created in 2002

1800 exabytes* (1.8 zettabytes)

estimated for 2011

* 1 exabyte = 1000 petabytes = 1 million terabytes = 1 billion gigabytes !!

So … how much data are there??

How much is that?

2 zettabytes = about 4 trillion CDs of data

4 trillion CDs are hard to imagine …

So, try to visualize 1/7,000,000th of that amount …

14

15 15

Here is a sea of CDs …

The CD Sea in Kilmington, England

(600,000 CDs)

Data Ethics in an Information Society

With so much data and information out there, it is imperative for each one of us to …

Protect the rights of the owners of the data and information (infodata)

Protect your infodata from thieves

Protect the integrity of your infodata from corruption (intentional or accidental)

Deter criminals who would steal your infodata

Use infodata correctly (do not abuse or misuse your infodata)

… act in an ethical manner at all times …

16

Data Ethics in upcoming lectures

We will define Ethics, and how it appears in human society:

Principles, Policies, Regulations, and Laws

The Belmont Report … FERPA, HIPPA, ….

We will investigate some simple life examples where things can go wrong with data:

Data privacy – who owns my data anyway?

Information security – protecting your data from computer criminals (and others)

Misunderstanding statistics

Telling lies with statistics “There are 3 types of lies – lies, damned lies, and statistics!”

17

18

Outline

Reading Assignment & Class Assignments

Introduction to Data & Data Ethics

Statistics: Use, Abuse, and Misuse

Quote from H.G. Wells (1903) …

“Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.”

Well, that day is here now!

19

Famous & Infamous Quotes

“There are three kinds of lies: lies, damned lies, and statistics.” (Benjamin Disraeli)

“It is now beyond any doubt that cigarettes are the biggest cause of statistics.”

“If your experiment needs statistics, you ought to have done a better experiment.” (Bertrand Russell)

“The Lottery is a tax on people who are bad at math.”

“Statistics in the hands of an engineer are like a lamppost to a drunk – they're used more for support than for illumination.”

20

Other Quotes – Abusing Statistics

“42.7% of all statistics are made up on the spot.” (Steven Wright, comedian)

“Say you were standing with one foot in the oven and one foot in an ice bucket. Then according to the percentage people, you should be perfectly comfortable.”

“Then there is the man who drowned crossing a stream that had an average depth of six inches.”

“A man may have 21 meals on Sunday and no meals for the rest of the week, making a perfect average of three meals per day, but that is not a good way to live.”

21

“Say what?...”

“Global warming, earthquakes, hurricanes, and other natural disasters are a direct effect of the shrinking numbers of pirates since the 1800s.”

Correlation does not imply causation!

“The leading cause of divorce must be marriage, since we find that 100% of divorced couples were married first.”

“Our education system must be really bad since half of the students in this country scored below average on their SAT tests.”

22

Small Group Exercise: Answer this…

Suppose that a survey finds that 10% of people believe that product X is bad for you.

After a national advertising campaign to inform society of the dangers of product X, another survey is taken.

The national media report the survey result:

Following the national advertising campaign, the number

of people who now believe that product X is bad for you has

increased by 90%.

Your question: What percentage of people now believe that product X is bad for you?

23

What answer did you get ?

24

Statistical Concepts – Ethical Concerns

1. Biased sample

2. Insufficient sample

3. Correlation does not imply causation

4. Confounding factors (or Lurking Variables)

5. Subjective inference vs. Objective inference from data

Reference: http://www.lsat-center.com/lsatc4s3b.htm

25

We will briefly examine these concerns for now, but we will give detailed

examples in a future lecture.

Statistical Concepts – Ethical Concerns

1. Biased sample – was the sample chosen fairly, so that all possible outcomes are really possible?

2. Insufficient sample

3. Correlation does not imply causation

4. Confounding factors (or Lurking Variables)

5. Subjective inference vs. Objective inference from data

26

Statistical Concepts – Ethical Concerns

1. Biased sample

2. Insufficient sample – was the sample large enough to justify a statistically significant conclusion?

3. Correlation does not imply causation

4. Confounding factors (or Lurking Variables)

5. Subjective inference vs. Objective inference from data

27

Statistical Concepts – Ethical Concerns

1. Biased sample

2. Insufficient sample

3. Correlation does not imply causation – implying that one thing caused the other can be very misleading.

4. Confounding factors (or Lurking Variables)

5. Subjective inference vs. Objective inference from data

28

Statistical Concepts – Ethical Concerns

1. Biased sample

2. Insufficient sample

3. Correlation does not imply causation

4. Confounding factors (or Lurking Variables) – These are extraneous (usually unknown, ignored, or invisible) factors that affect the outcome of a survey.

5. Subjective inference vs. Objective inference from data

29

Statistical Concepts – Ethical Concerns

1. Biased sample

2. Insufficient sample

3. Correlation does not imply causation – implying that one thing caused the other can be very misleading.

4. Confounding factors (or Lurking Variables)

5. Subjective inference vs. Objective inference from data – is the conclusion presented by the authors biased or else is the conclusion clearly supported by the data?

30

Outline

Reading Assignment & Class Assignments

Introduction to Data & Data Ethics

Statistics: Use, Abuse, and Misuse

Final Comments:

Complete your assignments on MyMason.gmu.edu

Complete your Reading Assignment

31


Recommended