Principles of Information Systems Session 03 Data Organization
Part 01 – Data Representa/on
Going Digital
• A computer is: – A CD player – Video player – Family photo album – And more…
• Digi;zing means conver;ng analog signals to 1s and 0s
Analog
Digital
3
Binary Digits: 1 and 0
• Binary digits represent leBers, numbers, colours, shapes, & more
• The on state is 1 • The off state is 0 • RAM
– Presence or absence of an electrical charge • DISK
– Magne;c arrangement
• CD – Permanent microscopic pits
• Fiber Op;c – Pulses of light
Only 2 states possible
On Off
4
Octal
• Octal or base-‐8 is used as a shorthand to store data in the older, 8-‐bit computers (e.g. vintage Atari video games)
• A single octal digit represents three binary digits
001 111 110 111 010
5
Hexadecimal
• Hexadecimal or base-‐16 is used as a shorthand to display binary contents of RAM or disk storage
• A single hexadecimal digit represents four binary digits
• Two hexadecimal digits can be used to represent an eight-‐bit byte
0011 1111 0111 1010
6
Decimal, Binary, Octal and Hexadecimal
Decimal Binary Octal Hex Decimal Binary Octal Hex
0 00000 0 0 9 01001 11 9
1 00001 1 1 10 01010 12 A
2 00010 2 2 11 01011 13 B
3 00011 3 3 12 01100 14 C
4 00100 4 4 13 01101 15 D
5 00101 5 5 14 00110 16 E
6 00110 6 6 15 01111 17 F
7 00111 7 7 16 10000 20 10
8 01000 10 8 17 10001 21 11
7
Conver/ng Decimal to Binary To convert decimal to binary is also very simple, you simply divide the decimal value by 2 and then write down the remainder, repeat this process un;l you cannot divide by 2 anymore, for example let's take the decimal value 157:
157 ÷ 2 = 78 with a remainder of 1 78 ÷ 2 = 39 with a remainder of 0 39 ÷ 2 = 19 with a remainder of 1 19 ÷ 2 = 9 with a remainder of 1 9 ÷ 2 = 4 with a remainder of 1 4 ÷ 2 = 2 with a remainder of 0 2 ÷ 2 = 1 with a remainder of 0 1 ÷ 2 = 0 with a remainder of 1
Next write down the value of the remainders from boBom to top (in other words write down the boBom remainder first and work your way up the list) which gives:
10011101 = 157 8
Conver/ng Binary to Decimal To convert binary into decimal is very simple and can be done as shown below: Say we want to convert the 8 bit value 10011101 into a decimal value, we can use a formula like that below:
27 26 25 24 23 22 21 20 128 64 32 16 8 4 2 1 1 0 0 1 1 1 0 1
As you can see, we have placed the numbers 1, 2, 4, 8, 16, 32, 64, 128 (powers of two) in reverse numerical order, and then wriBen the binary value below. To convert, you simply take a value from the top row wherever there is a 1 below, and then add the values together. For instance, in our example we would have 128 + 16 + 8 + 4 + 1 = 157. For a 16 bit value you would use the decimal values 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768 (powers of two) for the conversion. Because we know binary is base 2 then the above could be wriBen as: 1*27 + 0*26 + 0*25 + 1*24 + 1*23 + 1*22 + 0*21 + 1*20 = 157.
9
Encoding Systems: Bits and Bytes
10
Part 02 – Data Organiza/on (File System)
11
The Hard Disk
• Modern disks are organized at two levels: • Physical Organiza;on
– “Plan” for using the medium – Sectors, Cylinders, Clusters, Blocks – Contains 12 disk plaBers stacked on a spindle – This is fixed at the hardware level
• Logical Organiza;on – “Plan” for storing data on the disk – Par;;ons, Folders and Files
Hard Disk Organiza/on
The Hard Disk
• Fixed magne;c disk • Hard disk
– 1 to 5.25 inches – 20 GB to 2 TB – Contains 12 disk plaBers stacked on a spindle
– Disk spins over a read/write head
– Access arms float over the disk
• Portable hard disk – External hard disk
• Interchangeable hard disk – Portable hard disk to swap out
Hard Disk – Physical Organiza/on
• Data are stored in tracks – 80 tracks on a diskeBe – Thousands on hard disks
• Sectors are used to store and retrieve data – Recording surface is divided in pie slices
– Hard disks have thousands of sectors
• Adjacent sectors form clusters – Each cluster is numbered
Hard Disk – Physical Organiza/on
The process of accessing data has 4 steps. 1. Seek 2. Rotate 3. Se[le 4. Data transfer
Hard Disk – Physical Organiza/on
At the logical level, disks are organized as a series of files, within directories, located on a disk partition.
Disk Partition
With improvements in disk addressability, partitioning of disks is now used mainly to host more
than one file system on a single
physical medium.
Hard Disk – Logical Organiza/on
File System
Each Operating System will have a list of file systems that it supports. For example, Windows supports:
FAT, FAT16, FAT32, NTFS, as well as file systems for CD and DVD.
A file system is created when you “format” a partition.
This file system provides a Master Boot Record (to help the system start), an Index (table) of positions on the disk where files can be located, and a table connecting file positions to the file’s name and parent directory.
Hard Disk – Logical Organiza/on
File Fragmentation
As the system allocates space on the disk (a cluster of blocks) over time, files eventually become non-contiguous or “fragmented”.
This slows down working
with the files, due to the 4
access steps being repeated
multiple times.
Hard Disk – Logical Organiza/on
Part 03 – Data Organiza/on (Database)
Chapter 05
19
20
Principles and Learning Objec/ves
• Data management and modeling are key aspects of organizing data and informa;on – Define general data management concepts and terms, highligh;ng the advantages of the database approach to data management
– Describe the rela;onal database model and outline its basic features
21
Principles and Learning Objec/ves (con/nued)
• A well-‐designed and well-‐managed database is an extremely valuable tool in suppor;ng decision making – Iden;fy the common func;ons performed by all database management systems and iden;fy popular user database management systems
22
Principles and Learning Objec/ves (con/nued)
• The number and types of database applica;ons will con;nue to evolve and yield real business benefits – Iden;fy and briefly discuss current database applica;ons
23
Why Learn About Database Systems?
• Database systems process and organize large amounts of data
• Examples: – Marke;ng manager can access customer data – Corporate lawyer can access past cases and opinions
24
Introduc/on
• Database: an organized collec;on of data • Database management system (DBMS): group of programs to manage database – Manipulates database – Provides an interface between database and the user of the database and other applica;on programs
• Database administrator (DBA): skilled IS professional who directs all ac;vi;es related to an organiza;on’s database
25
Data Management
• Without data and the ability to process it, an organiza;on could not successfully complete most business ac;vi;es
• Data consists of raw facts • To transform data into useful informa;on, it must first be organized in a meaningful way
26
The Hierarchy of Data
• Bit (a binary digit): represents a circuit that is either on or off
• Byte: typically made up of eight bits • Character: a byte represents a character; the basic building block of informa;on – Can be an uppercase leBer, lowercase leBer, numeric digit, or special symbol
• Field: typically a name, number, or combina;on of characters that describes an aspect of a business object or ac;vity
27
The Hierarchy of Data (con/nued)
• Record: collec;on of related data fields • File: collec;on of related records • Database: collec;on of integrated and related files
• Hierarchy of data – Bits, characters, fields, records, files, and databases
The Hierarchy of Data (con/nued)
Figure 5.1: The Hierarchy of Data
28
29
Data En//es, A[ributes, and Keys
• En/ty: generalized class of people, places, or things (objects) for which data is collected, stored, and maintained
• A[ribute: characteris;c of an en;ty • Data item: specific value of an aBribute • Key: field or set of fields in a record that is used to iden;fy the record
• Primary key: field or set of fields that uniquely iden;fies the record
Data En//es, A[ributes, and Keys (con/nued)
Figure 5.2: Keys and Attributes
30
31
The Database Approach
• Tradi/onal approach to data management: separate data files are created and stored for each applica;on program
• Database approach to data management: a pool of related data is shared by mul;ple applica;on programs – Offers significant advantages over the tradi;onal file-‐based approach
The Database Approach (con/nued)
Figure 5.3: The Database Approach to Data Management
32
The Database Approach (con/nued)
Table 5.1: Advantages of the Database Approach
33
The Database Approach (con/nued)
Table 5.1: Advantages of the Database Approach (continued)
34
The Database Approach (con/nued)
Table 5.2: Disadvantages of the Database Approach
35
36
Data Modeling and the Rela/onal Database Model
• When building a database, an organiza;on must consider: – Content: What data should be collected and at what cost?
– Access: What data should be provided to which users and when?
– Logical structure: How should data be arranged so that it makes sense to a given user?
– Physical organiza6on: Where should data be physically located?
37
Data Modeling
• Building a database requires two types of designs – Logical design: abstract model of how the data should be structured and arranged to meet an organiza;on’s informa;on needs
– Physical design: starts from the logical database design and fine-‐tunes it for performance and cost considera;ons
38
Data Modeling (con/nued)
• Data model: diagram of data en;;es and their rela;onships
• En/ty-‐rela/onship (ER) diagrams: data models that use basic graphical symbols to show the organiza;on of and rela;onships between data
Data Modeling (con/nued)
Figure 5.4: An Entity-Relationship (ER) Diagram for a Customer Order Database
39
40
The Rela/onal Database Model
• Rela/onal model: describes data in which all data elements are placed in two-‐dimensional tables, called rela;ons, that are the logical equivalent of files – Each row of a table represents a data en;ty – Columns of the table represent aBributes – Domain: allowable values for data aBributes
41
The Rela/onal Database Model (con/nued)
Figure 5.5: A Relational Database Model
42
The Rela/onal Database Model (con/nued)
• Selec/ng: eliminates rows according to certain criteria
• Projec/ng: eliminates columns in a table • Joining: combines two or more tables • Linking: manipula;ng two or more tables that share at least one common data aBribute to provide useful informa;on and reports
43
The Rela/onal Database Model (con/nued)
Figure 5.6: A Simplified ER Diagram Showing the Relationship Between the Manager, Department, and Project Tables
44
The Rela/onal Database Model (con/nued)
Figure 5.7: Linking Data Tables to Answer an Inquiry
The Rela/onal Database Model (con/nued)
• Data cleanup: process of looking for and fixing inconsistencies to ensure that data is accurate and complete – Eliminate redundancies and anomalies
45
46
The Rela/onal Database Model (con/nued)
Table 5.3: Fitness Center Dues
The Rela/onal Database Model (con/nued)
Table 5.5: Dues Paid
Table 5.4: Fitness Center Members
47
48
Database Management Systems (DBMSs)
• Crea;ng and implemen;ng the right database system ensures that the database will support both business ac;vi;es and goals
• DBMS: a group of programs used as an interface between a database and applica;on programs or a database and the user
49
Overview of Database Types
• Flat file – Simple database program whose records have no rela;onship to one another
• Single user – Only one person can use the database at a ;me – Examples: Access, FileMaker, and InfoPath
• Mul;ple user – Allows dozens or hundreds of people to access the same database system at the same ;me
– Examples: Oracle, Sybase, and IBM
50
Providing a User View
• Schema: descrip;on of the en;re database – Typically used by large database systems to define tables and other database features associated with a person or user
• A DBMS can reference a schema to find where to access the requested data in rela;on to another piece of data
51
Crea/ng and Modifying the Database
• Data defini/on language (DDL): collec;on of instruc;ons and commands used to define and describe data and rela;onships in a specific database – Allows the database’s creator to describe the data and rela;onships that are to be contained in the schema
• Data dic/onary: detailed descrip;on of all the data used in the database
Crea/ng and Modifying the Database (con/nued)
Figure 5.10: Using a Data Definition Language to Define a Schema
52
Crea/ng and Modifying the Database (con/nued)
Figure 5.11: A Typical Data Dictionary Entry
53
54
Storing and Retrieving Data
• When an applica;on program requests data from the DBMS, the applica;on program follows a logical access path
• When the DBMS goes to a storage device to retrieve the requested data, it follows a path to the physical loca;on (physical access path) where the data is stored
Storing and Retrieving Data (con/nued)
Figure 5.12: Logical and Physical Access Paths
55
56
Manipula/ng Data and Genera/ng Reports
• Data manipula/on language (DML): commands that manipulate the data in a database
• Structured Query Language (SQL) – Adopted by the American Na;onal Standards Ins;tute (ANSI) as the standard query language for rela;onal databases
• Once a database has been set up and loaded with data, it can produce reports, documents, and other outputs
Manipula/ng Data and Genera/ng Reports (con/nued)
Table 5.6: Examples of SQL Commands
57
58
Database Administra/on
• Role of the database administrator (DBA): plan, design, create, operate, secure, monitor, and maintain databases
• DBA works with both users and programmers • A data administrator is responsible for defining and implemen;ng consistent principles for a variety of data issues, including sepng data standards and data defini;ons; a nontechnical posi;on
59
Popular Database Management Systems
• Popular DBMSs for end users include Microsoq’s Access and FileMaker Pro
• Complete database management soqware market includes: – Soqware for professional programmers – Databases for midrange, mainframe, and supercomputers
• Examples of open-‐source database systems: PostgreSQL and MySQL
• Many tradi;onal database programs are now available on open-‐source opera;ng systems
60
Special-‐Purpose Database Systems
• Some specialized database packages are used for specific purposes or in specific industries – Israeli Holocaust Database (www.yadvashem.org) – Hazmat database – Art and An;que Organizer Deluxe
• Special-‐purpose database by Tableau can be used to store and process visual images
61
Selec/ng a Database Management System
• Important characteris;cs of databases to consider – Database size – Database cost – Concurrent users – Performance – Integra;on – Vendor
62
Using Databases with Other Sojware
• DBMSs can act as front-‐end or back-‐end applica;ons – Front-‐end applica;ons interact directly with people or users
– Back-‐end applica;ons interact with other programs or applica;ons
63
Database Applica/ons
• Today’s database applica;ons manipulate the content of a database to produce useful informa;on
• Common manipula;ons are searching, filtering, synthesizing, and assimila;ng the data contained in a database using a number of database applica;ons
64
Linking Databases to the Internet
• Linking databases to the Internet is important for many organiza;ons and people
• Seman;c Web – Developing a seamless integra;on of tradi;onal databases with the Internet
– Allows people to access and manipulate a number of tradi;onal databases at the same ;me through the Internet
65
Data Warehouses, Data Marts, and Data Mining
• Data warehouse: database that collects business informa;on from many sources in the enterprise, covering all aspects of the company’s processes, products, and customers
• Data mart: subset of a data warehouse • Data mining: informa;on-‐analysis tool that involves the automated discovery of paBerns and rela;onships in a data warehouse
Data Warehouses, Data Marts, and Data Mining (con/nued)
Figure 5.17: Elements of a Data Warehouse
66
Data Warehouses, Data Marts, and Data Mining (con/nued)
Table 5.8: Common Data-Mining Applications
67
68
Business Intelligence
• Business intelligence (BI): process of gathering enough of the right informa;on in a ;mely manner and usable form and analyzing it to have a posi;ve impact on business strategy, tac;cs, or opera;ons – Turns data into useful informa;on that is then distributed throughout an enterprise
69
Business Intelligence (con/nued)
• Compe//ve intelligence: aspect of business intelligence limited to informa;on about compe;tors and the ways that knowledge affects strategy, tac;cs, and opera;ons
• Counterintelligence: steps an organiza;on takes to protect informa;on sought by “hos;le” intelligence gatherers
70
Distributed Databases
• Distributed database – Database in which the data may be spread across several smaller databases connected via telecommunica;ons devices
– Gives corpora;ons more flexibility in how databases are organized and used
• Replicated database – Database that holds a duplicate set of frequently used data
Online Analy/cal Processing (OLAP)
• Soqware that allows users to explore data from a number of different perspec;ves
71
72
Online Analy/cal Processing (OLAP) (con/nued)
Table 5.9: Comparison of OLAP and Data Mining
73
Object-‐Oriented and Object-‐Rela/onal Database Management Systems
• Object-‐oriented database: database that stores both data and its processing instruc;ons – Method: procedure or ac;on – Message: request to execute or run a method
74
Object-‐Oriented and Object-‐Rela/onal Database Management Systems (con/nued)
• Object-‐oriented database management system (OODBMS): group of programs that manipulate an object-‐oriented database and provide a user interface and connec;ons to other applica;on programs
• Object-‐rela/onal database management system (ORDBMS): DBMS capable of manipula;ng audio, video, and graphical data
75
Visual, Audio, and Other Database Systems
• Databases for storing images • Databases for storing sound • Virtual database systems: allow different databases to work together as a unified database system
• Other special-‐purpose database systems – Spa;al data technology: stores and accesses data according to the loca;ons it describes and permits spa;al queries and analysis