+ All Categories
Home > Documents > Mdst3705 2013-02-12-finding-data

Mdst3705 2013-02-12-finding-data

Date post: 21-Nov-2014
Category:
Upload: rafael-alvarado
View: 280 times
Download: 0 times
Share this document with a friend
Description:
 
Popular Tags:
43
Finding and Making Data Prof. Alvarado MDST 3705 12 February 2013
Transcript
Page 1: Mdst3705 2013-02-12-finding-data

Finding and Making Data

Prof. AlvaradoMDST 3705

12 February 2013

Page 2: Mdst3705 2013-02-12-finding-data

Business• Quizzes by Friday• Safari Resources–When off grounds, use VPN or access

from the Library web page– It should allow you to log on to the

resource

Page 3: Mdst3705 2013-02-12-finding-data

Big Data• What is Big Data?– Data produced by governments,

corporations, scientific instruments, transactions …

– Captured by databases• Databases are at the foundation of

almost all digital products we use– Social Media, from Facebook to WordPress– Learning Management Systems (e.g. Collab)– Video Games and Simulations–Maps and Timelines

Page 4: Mdst3705 2013-02-12-finding-data

The Digital Humanities has entered the era of Big Data

Numerous collections of primary and secondary sources have been digitized

over the last two decades

To do scholarship, you need to both produce and consume data

Page 5: Mdst3705 2013-02-12-finding-data
Page 6: Mdst3705 2013-02-12-finding-data
Page 7: Mdst3705 2013-02-12-finding-data
Page 8: Mdst3705 2013-02-12-finding-data
Page 9: Mdst3705 2013-02-12-finding-data
Page 10: Mdst3705 2013-02-12-finding-data
Page 11: Mdst3705 2013-02-12-finding-data
Page 12: Mdst3705 2013-02-12-finding-data
Page 13: Mdst3705 2013-02-12-finding-data
Page 14: Mdst3705 2013-02-12-finding-data
Page 15: Mdst3705 2013-02-12-finding-data
Page 16: Mdst3705 2013-02-12-finding-data
Page 17: Mdst3705 2013-02-12-finding-data
Page 18: Mdst3705 2013-02-12-finding-data
Page 19: Mdst3705 2013-02-12-finding-data
Page 20: Mdst3705 2013-02-12-finding-data
Page 21: Mdst3705 2013-02-12-finding-data
Page 22: Mdst3705 2013-02-12-finding-data

Databases• We can also use relational databases

to ingest data sets from the wild• Once they are in the database, we

may modify them to conform to our own data model

• And we may combine them to produce new data

• The database becomes a recombinant space for creating data mash ups

Page 23: Mdst3705 2013-02-12-finding-data

The database is also a machine for making inferences …

Page 24: Mdst3705 2013-02-12-finding-data

This query is an example of how two tables can be "joined" into a third table. It also shows how you can manipulate the data on the fly to produce new results.

Page 25: Mdst3705 2013-02-12-finding-data

Quick Note• MySQL uses two kinds of quotes– Double and single to wrap strings– “Backticks” ( ` ) are used sometimes to

wrap table and field names– E.g. SELECT `Country` FROM

`country_debt`• Back ticks are used to allow spaces

in field and table names– But this is a bad practice; I do not

encourage spaces– Therefore backticks are optional

Page 26: Mdst3705 2013-02-12-finding-data

Just as we saw with Aristotle’s logic, relational databases allow us to develop ontologies from which we can draw inferences

Page 27: Mdst3705 2013-02-12-finding-data

We can see that each of table we imported actually stands for an assertion

(The conclusion in this case is simply a correlation)

Page 28: Mdst3705 2013-02-12-finding-data

I felt like the strategy for database design explained in the reading on SQL ran quite contrary to my understanding of the “hacker” mentality, and I think it speaks to the lack of flexibility in the SQL database system. . . . Database designers [are] encouraged to map everything out before even thinking about beginning construction on the actual database.This is true – the book does project a planning ethos at odds with the spirit of hacking and iterative building. This is as it should be – experienced programmers and database designers do value planning. But building databases can be organic and creative too, especially when we the domain being modeled is not well understood, which is often the case with the digital humanities.

Page 29: Mdst3705 2013-02-12-finding-data

Remember that in the digital humanities, we are reverse

engineering culture from media

Instead of planning a data model, we need to extract and evolve one

But we can use the tools of database design to help us

Page 30: Mdst3705 2013-02-12-finding-data

EXAMPLES OF DATABASES

Page 31: Mdst3705 2013-02-12-finding-data

Database Design, or Making Data

Page 32: Mdst3705 2013-02-12-finding-data

Making data is more than adding data to a database

You first have to create the database

All good databases are based on models, which we view as

knowledge representations

Page 33: Mdst3705 2013-02-12-finding-data

Learning MySQL• Provides the right level of

information– But follows traditional

planning model– Our approach is a bit different– Introduces useful vocabulary

• Key idea in Chapter 3 is use of Entity Relationship Diagrams– E-R diagrams– I use a simplified version

Page 34: Mdst3705 2013-02-12-finding-data

Database Design• Process 1 (Planned)– Gather requirements– Create an ER model – data model– Translate into tables – database schema

• Process 2 (Evolved)– Gather data – Find implicit relations– Create new tables– Create ER model– Translate into tables

Page 35: Mdst3705 2013-02-12-finding-data

The simplest case of two entities with a relationship. We don't specify the nature of the relationship at this point. For example, A might stand for PERSON and B might stand for BOOK, as in PERSON READS BOOK.

Page 36: Mdst3705 2013-02-12-finding-data

This includes the cardinality of the relationship. A relates to 1 or more (or 0 or more) of B. For example, PERSON READS MANY BOOKS.

Page 37: Mdst3705 2013-02-12-finding-data

This shows a Many-to-Many relationship (M:M, or M:N). MANY PERSONS READ MANY BOOKS. That is, a given PERSON may read more than one BOOK, and a given BOOK may be read by more than one PERSON.

Page 38: Mdst3705 2013-02-12-finding-data

This implies the creation of a third entity, C, to capture the BOOK / PERSON relationship. We can think of this as a kind of EVENT -- our database will capture all instances, say, of PEOPLE reading BOOKS.

Page 39: Mdst3705 2013-02-12-finding-data

Now, in the case of our two tables, we have the following implied model. (The single arrow heads imply a Subject/Object relation.)

Page 40: Mdst3705 2013-02-12-finding-data

After thinking about this model some, we can see that COUNTRY actually has a 1:M relationship to DEBT, since the latter varies by year. (We can imagine a DEBT table with an AMOUNT field and a YEAR field.) We also know that each SOCIALNETWORK can be related to more than one COUNTRY.

Page 41: Mdst3705 2013-02-12-finding-data

In the end, our model will look something like this. So we will need to create tables to match these entities, e.g. COUNTRY, DEBT_OF_COUNTRY, SOCIALNETWORK, SOCIALNETWORK_OF_COUNTRY

Page 42: Mdst3705 2013-02-12-finding-data

E-R Rules• Entities and Attributes– Entities are definitions of things that have some

“integrity”– Attributes are like properties of things– The difference can be logical or practical

• Relations and Cardinality– Relations exist between Entities– They are like assertions—PERSON read BOOK– Relations have “cardinality” which gives clues

about the data model• Uniqueness and keys– Entities are uniquely defined by certain attributes

Page 43: Mdst3705 2013-02-12-finding-data

Mapping ER Diagrams to TablesCardinality matters:1:1 Same table, with exceptions1:M Two tables, table A has keyM:1 Two tables, table B has foreign keyM:M Third table of foreign keys


Recommended