+ All Categories
Home > Documents > YouTunes: A Music Library Data Set · • Early data management systems represented data using...

YouTunes: A Music Library Data Set · • Early data management systems represented data using...

Date post: 24-Mar-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
35
YouTunes: A Music Library Data Set Online (read-only) spreadsheet
Transcript
Page 1: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

YouTunes: A Music Library Data Set

Online (read-only) spreadsheet

Page 2: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

The Worksheets

• Employees: One line per employee in your company • Songs: One line per song in your store • Customers: One line per customer using your service • Invoice: One line per invoice

Page 3: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

A Quick Introduction to Analysis with Spreadsheets

Page 4: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

What we’ll cover

• Getting started: viewing data • Naming cells • Writing formulae • Built-in functions • Getting help • Formatting your spreadsheet

Page 5: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

Getting Started: Viewing Data

• We’ll be using the same data here as you’ll be using in Assignment 1.

• The data is available here.

Page 6: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

Absolute and Relative Addresses

• If there are no $ signs, then addresses are relative and the cells referenced will change when you copy and paste a cell.

• The $ sign indicates that you want absolute addressing – that is retain the exact row and/or column when you cut and paste.

• This will become crucial when we start applying formulae to entire rows/columns.

Page 7: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

Walkthrough: Sample Spreadsheet Analyses

Page 8: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

YouTune Analysis

1. You’d like to get a better sense of your orders. What is your average invoice size?

2. You are considering creating album pricing. Doing this requires that you get a sense of how many tracks are on each album. On the album sheet, create a list of album titles with the number of tracks on each album.

3. It’s time to establish an employee recognition program. What is the name of the employee that has been with you the longest?

4. How long has the employee in #3 worked for you?

Page 9: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

From Spreadsheets to DatabasesAn Introduction to Relational Databases and SQL

Page 10: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

Relational Databases

• A Bit of History • The Relational Model • YouTunes Data is Relational Data

Page 11: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

• Where beginning equals 1960’s • Computers

• Centralized systems • Spiffy new data channels let CPU and IO overlap. • Persistent storage is on drums. • Buffering and interrupt handling done in the OS. • Making these systems fast is becoming a research focus.

• Data • What did data look like?

In the Beginning

Page 12: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

• Indexed sequential access method • Pioneered by IBM for its mainframes • Fixed length records • Each record lives at a specific location • Rapid record access even on a sequential medium (tape)

• All indexes are ‘secondary’ • Allow key lookup, but … • Do not correspond to physical organization • Key is to build small, efficient index structures

• Fundamental access method in COBOL

Organizing Data: ISAM

Page 13: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

• Early data management systems represented data using something called a network model.

• Data are represented by collections of records (today we would call those key/data pairs).

• Relationships among records are expressed via links between records (today we can think of those links as pointers).

• Applications interacted with data by navigating through it: • Find a record • Follow links • Find other records • Repeat

Organizing Data: The Network Model

Page 14: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

• Records composed of attributes. • Attributes are single-valued. • Links connect exactly two records. • Represent N-way relationships via link records

The Network Model: Inside Records

ID nameTrack

ID nameArtist

ID nameAlbum

Songs

Page 15: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

• The Network Model had some problems • Applications had to know the structure of the data • Changing the representation required a massive rewrite • Fundamentally: the physical arrangement was tightly coupled to the

application and the application logic. • 1968: Ted Codd proposes the relational model

• Decouple physical representation from logical representation • Store “records” as “tables” • Replace links with implicit joins among tables

• The big question: could it perform?

The Relational Model: The Competition

Page 16: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

The Relational Model

• Basic concepts: • A database consists of a collection of tables.

• Example:SKU Description Price

12345678 Perky-Pet Mason Jar Wild Bird Feeder $17.1590123456 No-No Greed Seed Ball Wild Bird Feeder $7.8078901234 Perky-Pet Squirrel-Be-Gone Wild Bird Feeder $19.9856789012 Wilderness Lantern Wild Bird Feeder $18.99

Column = field = attribute

columns have a type = domain

Row = tuple = record

Page 17: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

Spreadsheet Data is (mostly) Relational

Page 18: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

Relational Databases: Schemas and SQL

Page 19: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

Topics for Today

• Structured (English) Query Language (SQL) • What is SQL • Data Definition Language • Data Manipulation Language

• Learning objectives • Create and delete tables • Use SELECT to retrieve data

Page 20: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

SQL Commands• DDL: data definition language

• CREATE TABLE, DROP TABLE • CREATE INDEX, DROP INDEX • CREATE VIEW, DROP VIEW

• DML: data manipulation language • SELECT: retrieve tuples • UPDATE: modify tuples • DELETE: remove tuples • INSERT: add tuples

• Note: Note that relations were sets in relational algebra, thus relations have no duplicates SQL treats relations (tables) as bags (multisets), thereby allowing duplicates!

Page 21: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

Create Table• What it does: creates a relation with the specified schema • Basic syntax:

CREATE TABLE relation_name(attr_name1 attr_type1,… attr_nameN attr_typeN); • The system we are using (SQLITE) has a somewhat unusual type system. It has only five

types (shown in bold below). The other types are the more traditional SQL types, which all “work” in SQLITE, but all the entries in a line are treated the same. • INTEGER: INT, INTEGER, TINYINT, SMALLINT, MEDIUMINT, BIGINT, UNSIGNED BIG INT, INT2,

INT8 • TEXT: CHARACTER(20), VARCHAR(255), CARYING CHARACTER(255), NCHAR(55), NATIVE

CHARACTER(70), NVARCHAR(100), TEXT, CLOB • BLOB: BLOB (no datatype) • REAL: REAL, DOUBLE, DOUBLE PRECISION, FLOAT • NUMERIC: DECIMAL(10,5), BOOLEAN, DATE, DATETIME

Page 22: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

ExamplesCREATE TABLE songs (Name text, Composer text, Album text, Artist text);

CREATE TABLE employee(Manager text, LastName text, FirstName text, Title text, BirthDate DATETIME, HireDate DATETIME, Address text, City text, State text, Country text, PostalCode text, Phone text, Fax text, Email text);

Page 23: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

Select• What it does: performs a query (read-only)

• Implements most of the relational-algebra operations (and then some) • Basic syntax:

SELECT attributes FROM table [WHERE selection predicate];

• Extended syntax: SELECT a1, a2, … FROM R1, R2, … WHERE selection-predicate GROUP BY attribute(s) ORDER BY attribute(s) LIMIT ;

Page 24: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

Handy Options in Select• Use shorthands for tables

SELECT fields from Songs as S, Invoices as I … • Use built-in functions

SELECT MAX(InvoiceDate) from invoice; SELECT DateTime(‘now’); See documentation for a list

• Store results into tables CREATE TABLE new AS SELECT …

• Assign names to results • CREATE TABLE roster AS • SELECT FirstName || LastName as FullName from Employee;

Page 25: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

Schema Normalization and Joins

• Notice that we have a lot of redundancy in our data. • Examples:

• Employees: a Manager who manages a lot of people has his/her name replicated for each employee

• Songs: Each album name appears once per song on that album • Invoices: Every time a customer purchases a song, we include

its price. • So what?

Page 26: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

Why is Duplication Bad?

• Updating the data becomes difficult and/or expensive • If we discover a typo in an album name, we have to update

every song on that album. • What if our manager gets married? We’d have to update very

employee who worked for that manager. • Having to consistently update multiple data items is

typically a costly operation.

Page 27: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

What’s a Schema Designer to do?

• The relational model and its query language, SQL, are intended to be used in a manner that avoids this duplication.

• In general, whenever we might be inclined to duplicate data, we create a separate table and associate data between the two tables.

• SQL lets you combine data from multiple tables using something called a join.

Page 28: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

Joins

• A join lets you “connect” two tables based on their values. • Once you can do that, it’s relatively easy to get rid of some

of the duplication we have in our data. • Let’s tackle the big table first: invoices.

Page 29: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

More Normalization

• The price information is still duplicated in every invoice item in which it appears.

• The song title (which is kind of long in some cases) also appears multiple times.

• Couldn’t we place those in the song relation? • Yes! And you’ll do that in class.

Page 30: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

SQL: Modifying DataINSERT, DELETE, and UPDATE

Page 31: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

INSERT: Adding Tuples (Rows)

• What it does: Adds a row to a table • Basic syntax:

INSERT INTO relation VALUES (v1, v2, v3 …) INSERT INTO relation(a1, a2) VALUES (v1, v2)

• Examples: INSERT INTO Songs VALUES (“Gagnam Style”, “Psy and Yoo Gun-hyung”, “Psy 6 (Six Rules), Part 1”, “PSY”); INSERT INTO Songs (Name, Artist) VALUES (“Gangnam Style”, “PSY”);

• If an attribute’s value is not given, use the default value. • If there is no default, set the value to NULL.

Page 32: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

INSERT: Nested query version

• What it does: SELECTs data and inserts the result into a relation.

• Basic syntax: INSERT INTO relation SELECT …;

• Examples: INSERT INTO invoiceItem SELECT InvoiceId, Item1, Price1 FROM invoices;

• This is the query we used to create the initial invoiceItem table; for each of the remaining items, we did: INSERT INTO invoiceItem SELECT InvoiceId, Item2, Price2 FROM invoices where Item2 != “”;

Page 33: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

DELETE: Removing Tuples

• What it does: Removes rows from a table • Basic syntax:

DELETE FROM relation WHERE predicate; • Example

DELETE FROM invoices WHERE InvoiceId =2;

Page 34: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

UPDATE: Modifying Data

• What it does: Changes values in a table • Basic syntax:

UPDATE relation attr1=val1, attr2=val2, … WHERE predicate; • Examples:

UPDATE employees SET address = “2468 39th Avenue” WHERE FirstName = “Steve”;

UPDATE customer SET Company=“2U” WHERE CustomerId =“17”;

Page 35: YouTunes: A Music Library Data Set · • Early data management systems represented data using something called a network model. • Data are represented by collections of records

Recommended