+ All Categories
Home > Documents > Introduction to Computing Using Python Data Storage and Processing How many of you have taken IT...

Introduction to Computing Using Python Data Storage and Processing How many of you have taken IT...

Date post: 04-Jan-2016
Category:
Upload: ezra-perkins
View: 223 times
Download: 2 times
Share this document with a friend
Popular Tags:
26
Introduction to Computing Using Py Data Storage and Processing How many of you have taken IT 240? Databases and Structured Query Language Python Database Programming
Transcript
Page 1: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Data Storage and Processing

How many of you have taken IT 240? Databases and Structured Query Language Python Database Programming

Page 2: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Data storage

Beijing × 3Paris × 5Chicago × 5

Chicago × 3Beijing × 6

Bogota × 3Beijing × 2Paris × 1

Chicago × 3Paris × 2Nairobi × 1

Nairobi × 7Bogota × 2

one.html four.html

two.html

three.html five.html

We wish to store data about Web pages in a way that Python programs can access the data conveniently

Page 3: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Data storage

Beijing × 3Paris × 5Chicago × 5

Chicago × 3Beijing × 6

Bogota × 3Beijing × 2Paris × 1

Chicago × 3Paris × 2Nairobi × 1

Nairobi × 7Bogota × 2

one.html four.html

two.html

three.html five.html

To do this, we will use a database

Page 4: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Databases

A database consists of one or more tables

Each table has a name and consists of rows (records) and columns (attributes) Each attribute has a name and contains data of a specific type Hyperlinks

Keywords

Url Link

one.html two.html

one.html three.html

two.html four.html

three.html four.html

four.html five.html

five.html one.html

five.html two.html

five.html four.html

Url Word Freq

one.html Beijing 3

one.html Paris 5

one.html Chicago 5

two.html Bogota 3

two.html Beijing 2

two.html Paris 1

three.html Chicago 3

three.html Beijing 6

four.html Chicago 3

four.html Paris 2

four.html Nairobi 5

five.html Nairobi 7

five.html Bogota 2

Page 5: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Database files

Database files are not text files – you can’t read from or write to them directly

Instead, communication is performed by commands written in a database language called Structured Query Language (SQL)

Page 6: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

SQL SELECT FROM statement

Link

two.html

three.html

four.html

four.html

five.html

one.html

two.html

four.html

Url Link

one.html two.html

one.html three.html

two.html four.html

three.html four.html

four.html five.html

five.html one.html

five.html two.html

five.html four.html

SELECT Link FROM HyperlinksHyperlinks

SQL statement SELECT is used make queries into a database. The result called a result table

result table

Page 7: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

SQL SELECT FROM statementSQL statement SELECT is used make queries into a database.

SELECT Url, Word FROM Keywords

KeywordsUrl Word Freq

one.html Beijing 3

one.html Paris 5

one.html Chicago 5

two.html Bogota 3

two.html Beijing 2

two.html Paris 1

three.html Chicago 3

three.html Beijing 6

four.html Chicago 3

four.html Paris 2

four.html Nairobi 5

five.html Nairobi 7

five.html Bogota 2

Url Word

one.html Beijing

one.html Paris

one.html Chicago

two.html Bogota

two.html Beijing

two.html Paris

three.html Chicago

three.html Beijing

four.html Chicago

four.html Paris

four.html Nairobi

five.html Nairobi

five.html Bogota

Result

Page 8: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

SQL SELECT FROM statement

Url Link

one.html two.html

one.html three.html

two.html four.html

three.html four.html

four.html five.html

five.html one.html

five.html two.html

five.html four.html

SELECT * FROM Hyperlinks

HyperlinksSELECT statements can use *, a wild card

Url Link

one.html two.html

one.html three.html

two.html four.html

three.html four.html

four.html five.html

five.html one.html

five.html two.html

five.html four.html

Page 9: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

SQL DISTINCT keyword

Link

two.html

three.html

four.html

five.html

one.html

Url Link

one.html two.html

one.html three.html

two.html four.html

three.html four.html

four.html five.html

five.html one.html

five.html two.html

five.html four.html

SELECT DISTINCT Link FROM Hyperlinks

HyperlinksSQL keyword DISTINCT removes duplicate records in the result table

Page 10: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

SQL WHERE clause

SQL clause WHERE is used to select only those records that satisfy a condition

SELECT Url FROM KeywordsWHERE Word = 'Paris'

KeywordsUrl Word Freq

one.html Beijing 3

one.html Paris 5

one.html Chicago 5

two.html Bogota 3

two.html Beijing 2

two.html Paris 1

three.html Chicago 3

three.html Beijing 6

four.html Chicago 3

four.html Paris 2

four.html Nairobi 5

five.html Nairobi 7

five.html Bogota 2

Url

one.html

two.html

four.html

“In which pages does word X appear in?”

Page 11: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Operator Explanation= Equal<> Not equal> Greater than< Less than>= Greater than or equal<= Less than or equalBETWEEN Within an inclusive range

Introduction to Computing Using Python

SQL WHERE clause

SQL clause WHERE is used to select only those records that satisfy a condition

SELECT Column(s) FROM TableWHERE Column operator value

SELECT Column(s) FROM TableWHERE Column BETWEEN value1 AND value2

Page 12: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Exercise

Hyperlinks

Keywords

Url Link

one.html two.html

one.html three.html

two.html four.html

three.html four.html

four.html five.html

five.html one.html

five.html two.html

five.html four.html

Url Word Freq

one.html Beijing 3

one.html Paris 5

one.html Chicago 5

two.html Bogota 3

two.html Beijing 2

two.html Paris 1

three.html Chicago 3

three.html Beijing 6

four.html Chicago 3

four.html Paris 2

four.html Nairobi 5

five.html Nairobi 7

five.html Bogota 2

Write an SQL query that returns:1. The URL of every page that has a link to web

page four.htmlSELECT DISTINCT Url FROM HyperlinksWHERE Link = 'four.html'

Page 13: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Exercise

Hyperlinks

Keywords

Url Link

one.html two.html

one.html three.html

two.html four.html

three.html four.html

four.html five.html

five.html one.html

five.html two.html

five.html four.html

Url Word Freq

one.html Beijing 3

one.html Paris 5

one.html Chicago 5

two.html Bogota 3

two.html Beijing 2

two.html Paris 1

three.html Chicago 3

three.html Beijing 6

four.html Chicago 3

four.html Paris 2

four.html Nairobi 5

five.html Nairobi 7

five.html Bogota 2

Write an SQL query that returns:2. The URL of every page that has an incoming link

from page four.html SELECT DISTINCT Link FROM Hyperlinks WHERE Url = 'four.html'

Page 14: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Exercise

Hyperlinks

Keywords

Url Link

one.html two.html

one.html three.html

two.html four.html

three.html four.html

four.html five.html

five.html one.html

five.html two.html

five.html four.html

Url Word Freq

one.html Beijing 3

one.html Paris 5

one.html Chicago 5

two.html Bogota 3

two.html Beijing 2

two.html Paris 1

three.html Chicago 3

three.html Beijing 6

four.html Chicago 3

four.html Paris 2

four.html Nairobi 5

five.html Nairobi 7

five.html Bogota 2

Write an SQL query that returns:3. The URL and word for every word that appears

exactly three times in the web page associated with the URL

SELECT Url, Word from KeywordsWHERE Freq = 3

Page 15: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Exercise

Hyperlinks

Keywords

Url Link

one.html two.html

one.html three.html

two.html four.html

three.html four.html

four.html five.html

five.html one.html

five.html two.html

five.html four.html

Url Word Freq

one.html Beijing 3

one.html Paris 5

one.html Chicago 5

two.html Bogota 3

two.html Beijing 2

two.html Paris 1

three.html Chicago 3

three.html Beijing 6

four.html Chicago 3

four.html Paris 2

four.html Nairobi 5

five.html Nairobi 7

five.html Bogota 2

Write an SQL query that returns:4. The URL, word, and frequency for every word

that appears between 3 and 5 times, inclusive, in the web page associated with the URL

SELECT * from Keywords WHERE Freq BETWEEN 3 AND 5

Page 16: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

SQL built-in functions

SQL includes built-in math functions such as COUNT() and SUM()

There are 3 web pages that mention Paris

KeywordsUrl Word Freq

one.html Beijing 3

one.html Paris 5

one.html Chicago 5

two.html Bogota 3

two.html Beijing 2

two.html Paris 1

three.html Chicago 3

three.html Beijing 6

four.html Chicago 3

four.html Paris 2

four.html Nairobi 5

five.html Nairobi 7

five.html Bogota 2

3

“How many pages contain the word Paris?”

SELECT COUNT(*) FROM Keywords WHERE Word = 'Paris'

Page 17: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

SQL built-in functions

SQL includes built-in math functions such as COUNT(), SUM() and AVG()

SELECT SUM(Freq) FROM Keywords WHERE Word = 'Paris'

KeywordsUrl Word Freq

one.html Beijing 3

one.html Paris 5

one.html Chicago 5

two.html Bogota 3

two.html Beijing 2

two.html Paris 1

three.html Chicago 3

three.html Beijing 6

four.html Chicago 3

four.html Paris 2

four.html Nairobi 5

five.html Nairobi 7

five.html Bogota 2

8

There are a total of 8 occurrances s of ‘Paris’ on these web pages

Page 18: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Another example database

seasons

weatherdata

name number

winter 1

spring 2

summer 3

fall 4

city Season temperature

Mumbai 1 24.8

Mumbai 2 28.4

Mumbai 3 27.9

Mumbai 4 27.6

London 1 4.2

London 2 8.3

London 3 15.7

London 4 10.4

Cairo 1 13.6

Cairo 2 20.7

Cairo 3 27.7

Cairo 4 22.2

weather.db contains two tables:

weatherdata (city text, country text, season int, temperature float)

seasons (attributes name text, number int)

Page 19: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

“What is the average summer temperature in Mumbai’?”

Introduction to Computing Using Python

SQL queries involving multiple tables

Assume we don’t know the number coding of seasons, then this question requires a lookup of both tables:

• Use seasons to find match to season name• Use weatherdata to find temperature

Page 20: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Standard Library module sqlite3The Python Standard Library includes module sqlite3 that allows Python programs to access databases

>>> import sqlite3>>> con = sqlite3.connect('web.db')

sqlite3 function connect() takes as input the name of a database and returns an object of type Connection, a type defined in module sqlite3

• The Connection object con is associated with database file web.db• If database file web.db does not exists in the current working directory,

a new database file web.db is created

Page 21: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Standard Library module sqlite3>>> import sqlite3>>> con = sqlite3.connect('web.db')>>> cur = con.cursor()

Connection method cursor() returns an object of type Cursor, another type defined in the module sqlite3

• Cursor objects are responsible for executing SQL statements

Page 22: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Standard Library module sqlite3

The Python Standard Library includes module sqlite3 provides an API for accessing database files

• It is an interface to a library of functions that accesses the database files directly

>>> import sqlite3>>> con = sqlite3.connect('web.db')>>> cur = con.cursor()>>> cur.execute("CREATE TABLE Keywords (Url text, Word text, Freq int)")<sqlite3.Cursor object at 0x100575730>

The Cursor class supports method execute() which takes an SQL statement as a string, and executes it

>>> import sqlite3>>> con = sqlite3.connect('web.db')>>> cur = con.cursor()>>> cur.execute("CREATE TABLE Keywords (Url text, Word text, Freq int)")<sqlite3.Cursor object at 0x100575730>>>> cur.execute("INSERT INTO Keywords VALUES ('one.html', 'Beijing', 3)")<sqlite3.Cursor object at 0x100575730>

Hardcoded values

Page 23: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Parameter substitution

In general, the values used in an SQL statement will not be hardcoded in the program but come from Python variables

>>> cur.execute("INSERT INTO Keywords VALUES ('one.html', 'Beijing', 3)")<sqlite3.Cursor object at 0x100575730>>>> url, word, freq = 'one.html', 'Paris', 5>>>

Page 24: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Querying a database

>>> import sqlite3>>> con = sqlite3.connect('links.db')>>> cur = con.cursor()>>> cur.execute('SELECT * FROM Keywords')<sqlite3.Cursor object at 0x102686960>>>> cur.fetchall()[('one.html', 'Beijing', 3), ('one.html', 'Paris', 5), ('one.html', 'Chicago', 5), ('two.html', 'Bogota', 5), ('two.html', 'Beijing', 2), ('two.html', 'Paris', 1), ('three.html', 'Chicago', 3), ('three.html', 'Beijing', 6), ('four.html', 'Chicago', 3), ('four.html', 'Paris', 2), ('four.html', 'Nairobi', 5), ('five.html', 'Nairobi', 7), ('five.html', 'Bogota', 2)]>>>

The result of a query is stored in the Cursor object

To obtain the result as a list of tuple objects, Cursor method fetchall() is used

Page 25: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Querying a database

>>> cur.execute('SELECT * FROM Keywords')<sqlite3.Cursor object at 0x102686960>>>> for record in cur:

print(record)

('one.html', 'Beijing', 3)('one.html', 'Paris', 5)('one.html', 'Chicago', 5)('two.html', 'Bogota', 5)('two.html', 'Beijing', 2)('two.html', 'Paris', 1)('three.html', 'Chicago', 3)('three.html', 'Beijing', 6)('four.html', 'Chicago', 3)('four.html', 'Paris', 2)('four.html', 'Nairobi', 5)('five.html', 'Nairobi', 7)('five.html', 'Bogota', 2)>>>

An alternative is to iterate over the Cursor object

Page 26: Introduction to Computing Using Python Data Storage and Processing  How many of you have taken IT 240?  Databases and Structured Query Language  Python.

Introduction to Computing Using Python

Exercises

In week10exercisesstart.py


Recommended