Tutorial Day: Nov 11th, 9:00a - 12:30p
Dave StokesMySQL Community Manager
SQL For PHP ProgrammersSQL For PHP Programmers
2Safe Harbor StatementSafe Harbor Statement
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decision. The development, release, and timing of any features or functionality described for Oracles products remains at the sole discretion of Oracle.
3The Problem with PHP ProgrammersThe Problem with PHP Programmers
Your are up to date on the latest version of PHP
4The Problem with PHP ProgrammersThe Problem with PHP Programmers
5The Problem with PHP ProgrammersThe Problem with PHP Programmers
6The Problem with PHP ProgrammersThe Problem with PHP Programmers
But roughly 2-3% have had any training in Structured Query Language (SQL)
7So what is SQL?!?!??!?!??!??!
SQL (/s kju l/ or /sikwl/; Structured Query Language) is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS).Originally based upon relational algebra and tuple relational calculus, SQL consists of a data definition language and a data manipulation language. The scope of SQL includes data insert, query, update and delete, schema creation and modification, and data access control.
He Said 'relational algebra' and 'tuple relational calculus'!
Relational algebra is a family of algebra with a well-founded semantics used for modelling the data stored in relational databases, and defining queries on it.
To organize the data, first the redundant data and repeating groups of data are removed, which we call normalized. By doing this the data is organized or normalized into what is called first normal form (1NF). Typically a logical data model documents and standardizes the relationships between data entities (with its elements). A primary key uniquely identifies an instance of an entity, also known as a record.
Relation Algebra Continued
Once the data is normalized and in sets of data (entities and tables), the main operations of the relational algebra can be performed which are the set operations (such as union, intersection, and cartesian product), selection (keeping only some rows of a table) and the projection (keeping only some columns). Set operations are performed in the where statement in SQL, which is where one set of data is related to another set of data.
Database Normalization Forms
1nf No columns with repeated or similar data Each data item cannot be broken down further Each row is unique (has a primary key) Each filed has a unique name
2nf Move non-key attributes that only depend on part of the
key to a new table Ignore tables with simple keys or no no-key attributes
3nf Move any non-key attributes that are more dependent
on other non-key attributes than the table key to a new table.
Ignore tables with zero or only one non-key attribute
In more better English, por favor!
3NF means there are no transitive dependencies.
A transitive dependency is when two columnar relationships imply another relationship. For example, person -> phone# and phone# -> ringtone, so person -> ringtone
A B It is not the case that B A B C
And the rarely seen 4nf & 5nf
You can break the information down further but very rarely do you need to to 4nf or 5nf
So why do all this normalization?
Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table ) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored.
Name Gender Color ModelHeather F Blue MustangHeather F White ChallengerEli M Blue F-typeOscar M Blue 911Dave M Blue Mustang
There is redundant information across multiple rows but each row is unique
2nf split into tables
Name GenderHeather FEli MOscar MDave M
Color Model OwnerBlue Mustang HeatherWhite Challenger HeatherBlue F-type EliBlue 911 OscarBlue Mustang Dave
Split data into two tables one for owner data and one for car data
3nf split owner and car info into different tables
Car_ID Color Model Owner_ID
1 Blue Mustang 1
2 White Challenger 1
3 Blue F-type 2
4 Blue 911 3
5 Blue Mustang 4
The car info is separated from the car info. Note that the car table has a column for the owner's ID from the owner table.
Owner_ID Name Gender1 Heather F2 Eli M3 Oscar M4 Dave M
But what if White Mustang is shared or 4nf
Owner_ID Name Gender1 Heather F2 Eli M3 Oscar M4 Dave M
Car_id Model Color1 Mustang Blue2 Challenger White3 F-type Blue4 911 Blue
Tables for Owner, Car, & Ownership data
Now we have a flexible way to search data about owners, cars, and their relations.
So now what!!!
By normalizing to 3nf (or 4th), we are storing the data with no redundancies (or very, very few)
Now we need a way to define how the data is stored
And a way to manipulate it.
SQL is a declarative language made up of DDL Data Definition Language DML Data Manipulation Language
SQL was one of the first commercial languages for Edgar F. Codd's relational model, as described in his influential 1970 paper, "A Relational Model of Data for Large Shared Data Banks." --Wikipedia
Codd, Edgar F (June 1970). "A Relational Model of Data for Large Shared Data Banks". Communications of the ACM (Association for Computing Machinery) 13 (6): 37787. doi:10.1145/362384.362685. Retrieved 2007-06-09.
Cod versus Codd
SQL is declarative
Describe what you want, not how to processHard to look at a query to tell if it is efficient by just looksOptimizer picks GPS-like best route
Can pick wrong traffic, new construction, washed out roads, and road kill! Oh my!!
SQL is made up of two parts
Data Definition Language (DDL) For defining data structures
CREATE, DROP, ALTER, and RENAME
Data Manipulation Language Used to SELECT, INSERT, DELETE, and
Useful commands DESC[ribe] table SHOW CREATE TABLE table
The stuff in the parenthesis
CHAR(30) or VARCHAR(30) will hold strings up to 30 character long.
SQL MODE (more later) tells server to truncate or return error if value is longer that 30 characters
INT(5) tells the server to show five digits of data
DECIMAL(5,3) stores five digits with two decimals, i.e. -99.999 to 99.999
FLOAT(7,4) -999.9999 to 999.9999
Another look at DESC City
NULL No Value
Null is used to indicate a lack of value or no data Gender : Male, Female, NULL
Nulls are very messy in B-tree Indexing, try to avoidMath with NULLs is best avoided
DESC City in detail
Describe table tells us the names of the columns (Fields), the data type, if the column is NULLABLE, Keys, any default value, and Extras.
Varies with vendorUsually have types for text, integers, BLOBs, etc.Refer to manual
MySQL World Database
http://dev.mysql.com/doc/index-other.htmlUsed in MySQL documentation, books, on line tutorials, etc.Three tables
City Country Country Language
Join two tables
To get a query that provides the names of the City and the names of the countries, JOIN the two tables on a common data between the two columns (that are hopefully indexed!)
http://i.imgur.com/hhRDO4d.png Get a copy!!!
Both City and Country have columns thatcan be used for JOINs
What happened when you send a query
Server receives the queryThe user is authenticated for permissions
Database, table, and/or column levelSyntaxOptimizer
Statistics on data Cost model
Pick cheapest option (DISK I/O) Cardinality of indexes
Get the dataSorting/Grouping/etcData retu