+ All Categories
Home > Documents > Lecture 02: Information

Lecture 02: Information

Date post: 02-Jan-2016
Category:
Upload: stephanie-rokos
View: 19 times
Download: 0 times
Share this document with a friend
Description:
Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2003 http://www.sims.berkeley.edu/academics/courses/is202/f03/. Lecture 02: Information. IS 202: Information Organization and Retrieval. Lecture Outline. What Is Information? - PowerPoint PPT Presentation
45
2003.08.28 - SLIDE 1 IS 202 - FALL 2003 Lecture 02: Information Prof. Ray Larson & Prof. Marc Davis UC Berkeley SIMS Tuesday and Thursday 10:30 am - 12:00 am Fall 2003 http://www.sims.berkeley.edu/academics/courses/ is202/f03/ IS 202: Information Organization and Retrieval
Transcript
Page 1: Lecture 02: Information

2003.08.28 - SLIDE 1IS 202 - FALL 2003

Lecture 02: Information

Prof. Ray Larson & Prof. Marc Davis

UC Berkeley SIMS

Tuesday and Thursday 10:30 am - 12:00 am

Fall 2003http://www.sims.berkeley.edu/academics/courses/is202/f03/

IS 202:

Information Organization

and Retrieval

Page 2: Lecture 02: Information

2003.08.28 - SLIDE 2IS 202 - FALL 2003

Lecture Outline

• What Is Information?

• History of Information Search and Organization

• Discussion Questions

• Action Items for Next Time

Page 3: Lecture 02: Information

2003.08.28 - SLIDE 3IS 202 - FALL 2003

Lecture Outline

• What Is Information?

• History of Information Search and Organization

• Discussion Questions

• Action Items for Next Time

Page 4: Lecture 02: Information

2003.08.28 - SLIDE 4IS 202 - FALL 2003

What is Information?

• There is no “correct” definition

• Can involve philosophy, psychology, signal processing, physics

• Cookie Monster’s definition:– “news or facts about something”

Page 5: Lecture 02: Information

2003.08.28 - SLIDE 5IS 202 - FALL 2003

What is Information?

• Oxford English Dictionary– Information

• Informing, telling; thing told, knowledge, items of knowledge, news

– Knowledge• Knowing familiarity gained by experience;

person’s range of information; a theoretical or practical understanding of; the sum of what is known

Page 6: Lecture 02: Information

2003.08.28 - SLIDE 6IS 202 - FALL 2003

Assignment 1 - Discussion

• What is information, according to your background or area of expertise?

Page 7: Lecture 02: Information

2003.08.28 - SLIDE 7IS 202 - FALL 2003

What Is Information?

• Relating data to a context (“situational interpretation”)

• Anything that is important to anyone (“significance”)

• World data information knowledge

• Requires community of interpretation

• All information is dependent on context

• Capable of being recorded and stored and transmitted (also in physical form – e.g., fossils)

• Information must be recorded

• Information is a record of something that can be reused

• Information is a commodity

Page 8: Lecture 02: Information

2003.08.28 - SLIDE 8IS 202 - FALL 2003

What Is Information?

• Negentropy

• Potential energy to become knowledge

• Potential for it to be built upon

• Does information have to be related to “true” data?

• Can information be downgraded to data if it is forgotten?

Page 9: Lecture 02: Information

2003.08.28 - SLIDE 9IS 202 - FALL 2003

Types of Information

• Differentiation by form

• Differentiation by content

• Differentiation by quality

• Differentiation by associated information

Page 10: Lecture 02: Information

2003.08.28 - SLIDE 10IS 202 - FALL 2003

Information Properties

• Information can be communicated electronically– Broadcasting– Networking

• Information can be easily duplicated and shared– Problems of ownership– Problems of control

Adapted from ‘Silicon Dreams’ by Robert W. Lucky

Page 11: Lecture 02: Information

2003.08.28 - SLIDE 11IS 202 - FALL 2003

Intuitive Notion (Losee 97)

• Information must– Be something, although the exact nature

(substance, energy, or abstract concept) is not clear

– Be “new”: repetition of previously received messages is not informative

– Be “true”: false or counterfactual information is “mis-information”

– Be “about” something

• This human-centered approach emphasizes meaning and use of message

Page 12: Lecture 02: Information

2003.08.28 - SLIDE 12IS 202 - FALL 2003

Information from the Human Perspective

• Levels in cognitive processing– Perception– Observation/attention– Reasoning, assimilating, forming inferences

• Knowledge– “Justified true belief”

• Belief– An idea held based on some support; an internally

accepted statement, result of inductive processes combining observed facts with a reasoning process

Page 13: Lecture 02: Information

2003.08.28 - SLIDE 13IS 202 - FALL 2003

Information from the Human Perspective

• Does information require a human mind?– Communication and information transfer

among ants– A tree falls in the forest … is there information

there?– Existence of quarks

Page 14: Lecture 02: Information

2003.08.28 - SLIDE 14IS 202 - FALL 2003

Meaning vs. Form

• Form of information as the information itself• Meaning of a signal vs. the signal itself

– What aspects of a document are information?

• Representation (Norman 93)– Why do we write things down?

• Socrates thought writing would obliterate serious thought• Sounds and gestures fade away

– Artifacts help us to reason– Anything not present in the representation can be

ignored– Things left out of the representation are often what we

don’t know how to represent

Page 15: Lecture 02: Information

2003.08.28 - SLIDE 15IS 202 - FALL 2003

Information

• Consider Borges’ infinite Library of Babel…– It has all possible data combinations of letters– Does it therefore contain all possible

information?– What about all possible knowledge?– What about wisdom?

• Is the Internet a prototype Library of Babel?

Page 16: Lecture 02: Information

2003.08.28 - SLIDE 16IS 202 - FALL 2003

Information Theory

• Claude Shannon, 1940’s, studying communication • Ways to measure information

– Communication: producing the same message at its destination as that seen at its source

– Problem: a “noisy channel” can distort the message

• Between transmitter and receiver, the message must be encoded

• Semantic aspects are irrelevant

Message Source

Desti-nation

ReceiverTrans-mitter

Noise

Channel

Page 17: Lecture 02: Information

2003.08.28 - SLIDE 17IS 202 - FALL 2003

Information Theory

• Better called “Technical Communication Theory”

• Communication may be over time and space

Destination

Noise

Source DecodingEncoding

Message Message

Channel

StorageSourceDecoding(Retrieval/Reading)

Encoding(Writing/Indexing)

Destination

Message Message

Page 18: Lecture 02: Information

2003.08.28 - SLIDE 18IS 202 - FALL 2003

Human Communication Theory?

Destination

Noise

Source DecodingEncoding

Message Message

Channel

Page 19: Lecture 02: Information

2003.08.28 - SLIDE 19IS 202 - FALL 2003

Communication Theory

• Encompasses a vast array of disciplines– Mass communications, literary and media

theory, rhetoric, sociology, psychology, linguistics, law, cognitive science, information science, engineering, etc.

• Questions– What and how we communicate– Why we communicate– What happens when communication “works”

and when it doesn’t– How to improve communication

Page 20: Lecture 02: Information

2003.08.28 - SLIDE 20IS 202 - FALL 2003

Why Study Communication Theory?

• Our understanding of what, how, and why we communicate informs our– Theory of information and practice of

information production– Analysis, design, and evaluation of

information systems and applications– How we work together in teams– How we read texts and talk with one another

in this course– Law and public policy

Page 21: Lecture 02: Information

2003.08.28 - SLIDE 21IS 202 - FALL 2003

Etymology of “Communication”

• Communication - c.1384, from O.Fr. communicacion, from L. communicationem (nom. communicatio), from communicare "to impart, share," lit. "to make common," from communis (see common).

• Common - 13c., from O.Fr. comun, from L. communis "shared by all or many," from L. com- "together" + munia "public duties," those related to munia "office." Alternate etymology is that Fr. got it from P.Gmc. *gamainiz (cf. O.E. gemæne), from PIE *kom-moini "shared by all," from base *moi-, *mei- "change, exchange."

• Remuneration - c.1400, from L. remunerationem, from remunerari "to reward," from re- "back" + munerari "to give," from munus (gen. muneris) "gift, office, duty." Remunerative is from 1677.

Page 22: Lecture 02: Information

2003.08.28 - SLIDE 22IS 202 - FALL 2003

What and How Do We Communicate?

• What “gifts” do we give each other?

• What do we do with these gifts?

• How does this gift exchange bring us together (or not)?

Page 23: Lecture 02: Information

2003.08.28 - SLIDE 23IS 202 - FALL 2003

The Conduit Metaphor

• Language functions like a conduit, transferring thoughts bodily from one person to another

• In writing and speaking, people insert their thoughts or feelings in the words

• Words accomplish the transfer by containing the thoughts or feelings and conveying them to others

• In listening or reading, people extract the thoughts and feelings once again from the words

Page 24: Lecture 02: Information

2003.08.28 - SLIDE 24IS 202 - FALL 2003

Conduit Metaphor: Minor Frameworks

• Thoughts and feelings are ejected by speaking or writing into an external “idea space”

• Thoughts and feelings are reified in this external space, so they exist independent of any need for living beings to think or feel them

• These reified thoughts and feelings may, or may not, find their way back into the heads of living humans

Page 25: Lecture 02: Information

2003.08.28 - SLIDE 25IS 202 - FALL 2003

Toolmakers’ Paradigm

Page 26: Lecture 02: Information

2003.08.28 - SLIDE 26IS 202 - FALL 2003

Semantic Pathology

• Semantic Pathology– “Whenever two or more incompatible senses

capable of figuring meaningfully in the same context develop around the same name”

• Example– “This text is confusing.”

• Text(1) = The layout/font of the text is confusing.• Text(2) = The argument of the text is confusing.• Question: Where is Text(2)?

Page 27: Lecture 02: Information

2003.08.28 - SLIDE 27IS 202 - FALL 2003

Lecture Outline

• What Is Information?

• History of Information Search and Organization

• Discussion Questions

• Action Items for Next Time

Page 28: Lecture 02: Information

2003.08.28 - SLIDE 28IS 202 - FALL 2003

Origins: Physical Representations

• Very early history of content representation– Sumerian tokens and “envelopes”– Alexandria - pinakes– Indices

Page 29: Lecture 02: Information

2003.08.28 - SLIDE 29IS 202 - FALL 2003

Origins: Mental Representations

• Rhetorical mnemonic theory and practice (“memoria”)

• Memory palaces– An organization and retrieval technology for

concepts that combines physical and virtual places (“loci”)

• Examples– Simonides of Ceos– Cicero’s “testes”

Page 30: Lecture 02: Information

2003.08.28 - SLIDE 30IS 202 - FALL 2003

Origins: Bibliographic Representations

• Biblical indexes and concordances– Hugo de St. Caro – 1247 A.D. : 500 monks – KWOC– Book indexes (Nuremburg Chronicle)

• Library catalogs• Journal indexes• “Information explosion” following WWII

– Bush and Memex– Cranfield studies of indexing languages and

information retrieval– Development of bibliographic databases

• Index Medicus – production and Medlars searching

Page 31: Lecture 02: Information

2003.08.28 - SLIDE 31IS 202 - FALL 2003

How Much Information Today?

• See report by Hal Varian and Peter Lyman http://www.sims.berkeley.edu/research/projects/how-much-info/

• Total annual information production including print, film, magnetic media, etc.– Upper Bound 2,120,539 Terabytes (1012 bytes)– Lower Bound 635,480 Terabytes– I.e., between 1 and 2 Exabytes per year (1018 bytes)

• How do we organize THIS?

Page 32: Lecture 02: Information

2003.08.28 - SLIDE 32IS 202 - FALL 2003

Lecture Outline

• What Is Information?

• History of Information Search and Organization

• Discussion Questions

• Action Items for Next Time

Page 33: Lecture 02: Information

2003.08.28 - SLIDE 33IS 202 - FALL 2003

Discussion Questions (Borges)

• Yuri Takhteyev on Borges– How does Borges' view of information compares to

Shannon's (information as reducing uncertainty)?– Why does Borges arrange the books randomly?

What difference would it make in the story? (This question is also raised by Dennett in the “Library of Mendel,” so we may want to leave it till that discussion)

– What leads the Librarians to postulate the existence of the Man of the Book? Does that logic make sense?

Page 34: Lecture 02: Information

2003.08.28 - SLIDE 34IS 202 - FALL 2003

Discussion Questions (Borges)

• Yuri Takhteyev on Borges– What is the significance of the sentence: “I

cannot combine some characters - htcmrlchtdj - which the divine library has not foreseen and which in one of its secret tongues do not contain a terrible meaning?”

– What is the significance of the Librarian's conclusion that the “Library is unlimited and cyclical?”

Page 35: Lecture 02: Information

2003.08.28 - SLIDE 35IS 202 - FALL 2003

Discussion Questions (Dennett)

• Joshua Solomin on Dennett– It is mentioned that books over 500 pages in length can be

represented in the Library by having them span multiple Library volumes; and that by doing this, some Library volumes will be reused. But Dennett (from Quine) reduces this case to the case where the entire Library can be represented by a 1 and a 0, simply reused in different combinations. I would argue that this reductive case is no longer useful, because you then have to store the formulae for reproducing each book from your 1 and 0, which would be just as bad as storing the volumes themselves. So, does this strategy of reducing the content of a volume and re-using volumes help with the volume of information at all? If so, at what point between the 500-page volume and the 1-character volume will the strategy break down? Or would it be argued that it doesn't break down, but rather the strategy is still useful when condensed to a 1 or 0?

Page 36: Lecture 02: Information

2003.08.28 - SLIDE 36IS 202 - FALL 2003

Discussion Questions (Dennett)

• Joshua Solomin on Dennett– Dennett mentions “even finding one readable volume

in this huge storehouse is unlikely in the extreme.” If no parse-able information can be gleaned from a given volume (or piece of data), is it still useful? Can it be said that some piece of data is absolutely useless, or is it more that we simply haven't yet developed an encoding system that corresponds to it (that would allow us to decode meaning from it)? Or perhaps some third option? What could be a possible strategy for declaring some volumes “useless,” in order to reduce the scope of the Library to something easier to deal with?

Page 37: Lecture 02: Information

2003.08.28 - SLIDE 37IS 202 - FALL 2003

Discussion Questions (Dennett)

• Joshua Solomin on Dennett– It is observed that while Borges did not order his Library,

attempting to do so would have its own problems associated with it. Dennett's solution is a kind of alphabetizing, organized in multiple dimensions. Is there some better way to perform this sorting? Assuming that we didn't want to have 1,000,000 dimensions to our file cabinet (the number of characters per volume), could we perform some kind of intelligent grouping of volumes? What kind of metadata could be developed from this sorted Library to facilitate searching -- e.g., a section devoted to books about whales, with subsections on books involving sea captains as well as books involving wooden boys who become human? Would this save us anything over Dennett's alphabetizing?

Page 38: Lecture 02: Information

2003.08.28 - SLIDE 38IS 202 - FALL 2003

Discussion Questions (Reddy)

• Katherine Ahern and Brooke Maury on Reddy– Is there any model of communication other

than the conduit metaphor and the toolmaker's paradigm? Do these two visions leave any aspects of communication out?

– If information is not actually stored in the 'signal', then is the only value in this transmitted matter how one interprets it?

Page 39: Lecture 02: Information

2003.08.28 - SLIDE 39IS 202 - FALL 2003

Discussion Questions (Reddy)

• Katherine Ahern and Brooke Maury on Reddy– What is the value of information (ideas, data, facts,

etc.) without someone to receive, decode and interpret that information?

– Reddy seems to put the responsibility on the user or consumer of information in terms of correct interpretation. However, are there tools that can be 'packaged' with the information, that can assist in this unpacking?

– How does one develop a common context from which we can establish the rules or semantics of information exchange?

Page 40: Lecture 02: Information

2003.08.28 - SLIDE 40IS 202 - FALL 2003

Discussion Questions (Reddy)

• Katherine Ahern and Brooke Maury on Reddy– Reddy suggests that the increase in signals (i.e.,

libraries, recordings, and mass communication) have resulted in less culture, because the skill of reconstructing or “extracting” ideas is neglected. What are the implications for information organization and retrieval? Is it our job to somehow facilitate this reconstruction? Does Reddy's analysis even allow the possibility of facilitating extraction of ideas? If so, how does one encode information in such a way as to minimize the confusion and lack of clarity around its meaning during transmission and upon reception?

Page 41: Lecture 02: Information

2003.08.28 - SLIDE 41IS 202 - FALL 2003

Discussion Questions (Reddy)

• Katherine Ahern and Brooke Maury on Reddy– Is Reddy's analogy of the evil magician

representing language appropriate? Are subscribers to the conduit metaphor doomed to think others hostile or insane? Perhaps the 'evil magician' is our own laziness or failure to do the work of communication.

Page 42: Lecture 02: Information

2003.08.28 - SLIDE 42IS 202 - FALL 2003

Lecture Outline

• What Is Information?

• History of Information Search and Organization

• Discussion Questions

• Action Items for Next Time

Page 43: Lecture 02: Information

2003.08.28 - SLIDE 43IS 202 - FALL 2003

Homework (!)

• Read Introduction and Chapters 1 – 2 of George Lakoff’s Women, Fire, and Dangerous Things

• Create your SIMS home page

Page 44: Lecture 02: Information

2003.08.28 - SLIDE 44IS 202 - FALL 2003

Next Time

• Human Categorization

Page 45: Lecture 02: Information

2003.08.28 - SLIDE 45IS 202 - FALL 2003

Sign Up for Office Hours

• Prof. Marc Davis– Thursdays 2:00 pm – 4:00 pm

– 314 South Hall


Recommended