DATA MANAGEMENT PRINCIPLES IN APPLICATION … · DBMS creation through an implementation of a DBMS...

DATA MANAGEMENT

PRINCIPLES IN APPLICATION

DEVELOPMENT Database Development, Data Structures and Sorting

Algorithms

Tanasorn (Mimi) Chindasook Jacobs University, M.Sc. Data Engineering

Student ID: 30002281 [email protected]

Acknowledgements

I respectfully acknowledge Dr. Bendick Mahleko and Nilabhra Roy Chowdhury for their support, advice

and input on the project; Prateek K. Choudhary and Shengchen Dong for their support during the course

and critical review of the paper.

Abstract

Data management is a crucial skill that every data engineer should possess in order to effectively

implement and maintain database systems within an organisation. A relational database is an efficient

way to store data to perform queries that can be used in application development. Data structures and

sorting algorithms are also a crucial part in application development as both can be used to optimise

performance when correctly implemented.

Introduction

With the increasing volume of data being generated each day, database management systems (DBMS)

exist as a method to assist users in maintain and utilise large collections of data. Without proper

management, data collected cannot be fully utilised to its full potential as information retrieval and

analysis would be an incredibly difficult task to accomplish. This paper explores the various factors in

DBMS creation through an implementation of a DBMS for a start-up company looking to create an

employees database. There are many factors to consider in order to correctly implement an effective

DBMS. Therefore, it is imperative that research be done on the system that is to be modelled, and the

requirements be correctly and thoroughly collected as it is difficult to change the structure of a DBMS

once implemented.

Furthermore, to be able to effectively develop applications, research on the appropriate data structures

and sorting algorithms must also be conducted so that they can be appropriately implemented.

Therefore, this paper also aims to introduce different data structures by comparing its usage across

various applications, and provide an overview of sorting algorithms by evaluating its efficiency using

time-complexity comparisons.

The rest of the paper is structured as followed: Section 1 provides a brief literature review for data

management principles, data structures and sorting algorithms, Section 2 details the creation of an

example DBMS system for employees in a start-up company along with some query examples, Section 3

explores key data structures used in data management, Section 4 compares various naïve and efficient

sorting algorithms through a time complexity analysis and describes their applications, and finally,

Section 5 concludes the paper.

1. Literature Review

In 1960, Charles Bachman designed the IDS (Integrated Data Store) to improve performance which

significantly influenced the development of other DBMS systems. Later, in 1966, IBM released the IMS

(Information Management System) which was based on hierarchical database and was intended for

storing large bills of material for aerospace projects such as the Apollo space vehicle9 (Shagufta, 2017).

The relational data model was then proposed by E.F. Codd during his time at IBM in 1970. The idea was

that the data would be represented through table form and thus would allow the possibility of

incorporating many-to-many joins unlike the hierarchical data model. He released several papers after

his initial theoretical work which detailed the aspects of the relational model, such as relational algebra9

(Shagufta, 2017).

The ER Model was then developed by Peter Chen in 1976 (Chen, 1976). This model represents the world

in terms of entities and relationships, and is the model that is used in abstraction to assist in database

design. This led to the relational model being adopted as the standard approach for DBMS in 1980,

along with the development of SQL as a query language and its adoption by ANSI and ISO. Several

relational DBMS were developed, such as Informix and Oracle.

Another significant aspect in database management and application development are data structures.

Data structures can be characterised into primitive and non-primitive types. Primitive data structures

are used in data management to define the type of data that should be stored using DDL. Non-primitive

data structures are crucial in application development as the differing characteristics of each data

structure allows for various implementations.

Data structures such as queues have always existed as part of the fundamental logic in batch processing.

However, stacks were developed in 1946 in Turing’s computer design and linked lists were developed by

Newell, Shaw and Simon for RAND Corporation’s Information Processing Language.

Sorting algorithms also play an important role in application development and data management as a

data pre-processing step. Efficient sorting algorithms differ in stability and memory usage and are

implemented in several DBMS user interfaces. Efficient sorting algorithms such as Merge sort was

conceptualised by John von Neumann in 1945 (Knuth, 1998) and its more widely used counterpart,

Quick sort, was developed in by Tony Hoare in 1959 (Hoare, 1961).

2. Relational Database Concepts and Database Design

A database is defined as a structured collection of data that describes the different components of one

or more related organisations, and can be stored or accessed in various ways (Ramakrishnan, 2003).

Relational databases are a type of database that typically utilises the ANSI-SPARC architecture in data

management which was first proposed in 1975 (Brodie, 1975). The ANSI-SPARC architecture is defined at

three levels of abstraction which enable the end user to achieve logical and physical data independence.

Logical data independence protects users from alterations in the logical structure of the data, whilst

physical data independence refers to end user protection from changes at the physical storage as the

modifications are transformed through mapping techniques in the conceptual schema.

Fig 1: Structure of a DBMS (Mahleko, 2018)

Before beginning any database design, a requirements analysis must be carried out to ensure that all

data is represented in the appropriate format in the database. For example, the upper management at a

small start-up company would like to implement a relational database to store its employee

information. The requirements analysis for this particular case will be answered in the following

manner:

What data must be stored in the database?

All data pertaining to information on an employee in relation to the company, along with some personal

information must be stored in the database (e.g. name, birthdate, start date, end date, salary, email,

department). As employees can be promoted or change departments, records of how long an employee

has worked in which department at which position must also be kept. The company also provides an

extended health insurance policy to dependents of the employee.

Who will use this database and what do these users want from the database?

Upper management and HR will be the primary users of this database. The primary use of this database

is to be able to easily retrieve information on each employee in the company when issuing monthly

payments, preparing for company audits and providing an overview of the employees.

What operations are to be performed on the database? Which of these operations are

frequently performed?

Operations that will be mostly performed on the database are:

1.) Viewing of a set of employees and their corresponding information

2.) Updating information when an employee changes departments, gets promoted, or leaves

3.) Addition of new employees

Once the data requirements have been thoroughly analysed, the relational database design can then

commence.

Fig 1.1: Overview of ANSI-SPARC architecture (Abidin, 2010)

Conceptual Schema

The conceptual schema (or Data Modelling) is the first level of abstraction in ANSI-SPARC architecture

and consists of the definition of the data’s logical structure. In the conceptual schema, data base

designers define the tables that the database will be established upon, along with the entities that

should be included in those tables, its datatypes, the choice of relation between those entities and any

constraints on the data. The process of representing the data as a set of tables is denoted as the

conceptual database design.

Conceptual Design

The conceptual design aspect focuses on describing the data and customer intention. Entity Relationship

Models (ERM) are a high-level abstraction that represents the world in terms of relationships and

entities (Chilson, 1983). ER diagrams are a semiformal way of representing the data using the ERM

concept. Although ER diagrams cannot be immediately automated into database format, it is an

effective way to visualise the relationships between entities. The following ER diagram depicts the

Employees database in terms of the ER Model:

Fig 2: ER Diagram for the employees database

An entity is an object that can be distinctly identified. In the diagram, an employee is considered to be an entity. A department is also considered to be an entity (Chen, 1976).

Weak entities are entities that can be identified only by considering the primary key of its owner (Chen, 1976). In this ER diagram a dependent is considered to be a weak entity because a dependent is only related to a company through the employee that works there.

A descriptive fact about an entity. In this ER diagram, birthdate, email, name and salary are all descriptive attributes of an employee. Department name is a descriptive attribute of a department (Chen, 1976).

The unique identifier for an entity. No two entities can have the same primary key (Chen, 1976). In the ER diagram, the primary keys are eid (employee ID) and did (department ID)

Weak entities can only be identified by considering the primary key of another related entity (Chen, 1976). As represented in the ER diagram, dependents is a weak entity that can only be identified by the employee id through the company health insurance policy. In this case, the weak identifier for dependents is employee id.

Denotes a relationship between two entities. In this ER diagram, the “works for” relationship represents the relation between employees and departments (Chen, 1976).

Denotes an identifying relationship. In the case of the ER Diagram, Manages is an identifying relationship as when given a department, its manager can be uniquely identified. Policy is also an identifying relationship as when given a dependent, its related employee can be uniquely identified.

Key Constraint. The arrow points to the direction that is constrained. In the case of the employee database, only one employee can manage a department at any given time. Therefore, the arrow is pointed in the direction of the employee.

One-to-Many Relationship. As represented in the ER diagram, one employee can have many dependents.

Many-to-Many Relationship. As represented in the ER diagram, many employees work for one department, and one department can have many employees.

An aspect that is not included in this design but should be mentioned is the participation constraint.

example of a participation constraint is that all employees must work for at least one department, or all

departments must have at least one employee. An example of a one-sided participation constraint is

demonstrated in the manages relationship where each department must have a manager, but not all

employees must have a managing role. Due to the fact that companies change structure constantly

(especially at a start up stage in this case), participation constraints are not enforced in this database.

Logical Design

The logical model is constructed based on the conceptual model with the addition of the datatypes

(Abidin, 2010). The logical design focuses on the abstract and disregards the implementation. For the

Employees database, the logical design is as follows:

Employees(eid:integer, name:string, email:string, birthdate:date, salary:real)

Department(did:integer, dname:string, managerid:int)

Dependent(eid:integer, dependent_name:string)

Works(eid:integer, did:integer, start_date:date, end_date:date)

Manages(did:integer, eid:integer, start_date:date, end_date:date)

Policy(pid:integer, eid:integer)

Physical Schema

After the conceptual schema has been created, the physical schema should then be considered. The

physical schema is where the files and indexes used are defined. This step denotes how the data, such as

the relations defined in the conceptual schema, will be represented and stored in secondary storage

systems such as Oracle, Postgres, SQL Server and MySQL. These secondary storage systems all utilise

Structured Query Language (SQL) as a means of interaction but differ in the syntax. The statements that

are referenced in the body of this report and its appendix are in MySQL syntax.

Physical Design

The physical design and DDL of the employees database can be found in Appendix [1.1].

External Schema

Following the definition of the physical schema is the external schema. The external schema represents

the different views of the data that can be seen by the end-user (Ramakrishnan, 2003; Brodie, 1975). For

example, students at a university should not be allowed to view the salaries of the professors.

Therefore, permissions and roles of each user must be defined at this level. An example command in

DDL to grant SELECT access on all tables in the employees database to user tdaoruang is:

GRANT USAGE ON employees_n TO tdaoruang

GRANT SELECT ON ALL TABLES IN SCHEMA employees_n TO tdaoruang

In Fig. 4, the user mchindasook holds the role of Database Administrator and has permission for all

aspects of the database. In comparison, the user lhaller is the top manager in the HR department and

can view employee information, add new employees and update an employee’s information through an

application. Finally, the user tdaoruang is an employee in the HR department and can only view

information through an application. It is important to note that HR will only view the database through

an application as HR will not have direct access to the employees database in reality, but only through a

user-friendly interface. The external schema is used in application development as the external view is

not stored, but rather computed as it is accessed (Ramakrishnan, 2003).

Fig. 4: External schema for employees database

Languages that can be used to implement the conceptual and physical schema are data definition

language (DDL) and data modelling language (DML).

DDL: used to define conceptual and external schemas

o CREATE

DML: used to perform operations on the data

o INSERT, UPDATE, DELETE

For more examples of using DDL and DML to create a database and insert, update and delete data, along

with some basic queries that can be performed on databases, please refer to Appendix [1.2] and

Appendix [1.3].

Database Querying

Relational Algebra and Database Queries

The rudimentary operations of relational algebra are projection, selection, set union, set intersection,

set difference and Cartesian product (Ramakrishnan, 2003). Relational algebra is primarily used in data

modelling and database querying. The types of joins that are most used in database querying are: inner

join, left outer join, right outer join and full-outer join.

The natural join is one of the most essential operations in relational algebra as it is the relational

equivalent of the logical AND, and is the operation that returns the set of all arrangements of tuples that

are equivalent on a common attribute.

Fig 5: Example of a natural join followed by a query

Fig 5 depicts the process of finding the employee with the highest salary through the use of a join. This

join can be achieved in two ways:

Query A: Query B:

It should be noted that, in this case, there are other ways to produce the identical results, each with

variations in efficiency. An example of a highly ineffective query to achieve the result above is to

perform a Cartesian join then find the tuple that satisfies the conditions. The example queries above

depict two different ways to join different tables in the database. For this instance, query A is less

efficient than query B as it selects the highest salary with the use of a subquery. Queries that use

subqueries in this manner are often subject to slower performance as the subquery has to finish running

before the outer query can initialise. Another notable difference is that Query B will return all the

employees regardless of whether or not they are assigned to a department or not. In Query B, if an

employee is not assigned to a department, tuple that is returned for that employee will contain all the

selected information in the base table (employees) and NULL everywhere else.

For more examples of database SELECT queries, please refer to Appendix [2.1] and Appendix [2.2].

SELECT e.ename, d.dname, e.salary

FROM employees e, departments d, works w

WHERE e.eid = w.eid

AND d.did = w.did

AND w.to_date IS NOT NULL

AND e.salary = (SELECT MAX(salary) FROM

employees);

SELECT e.ename, d.dname, e.salary

FROM employees e

WHERE w.to_date IS NOT NULL

LEFT JOIN works w ON w.eid = e.eid

LEFT JOIN departments d ON d.did = w.did

ORDER BY e.salary DESC LIMIT 1;

The advantages of employing a DBMS include improvements in data integrity and security by enforcing

integrity constraints on data that is accessed or input and enforces access control for other users.

Furthermore, a DBMS also protects its users from being affected by system failure through crash

recovery mechanisms.

3. Data Structures

Data structures are vital components in data management and application development as it pertains to

the way data is stored in an exceptionally effective manner (Shaffer, 2009). Data structures can be

characterised into primitive and non-primitive types; primitive types refer to the datatypes such as

Boolean or Integer, and non-primitive types refer to arrays, stacks or queues where data is referenced

using an index and not directly stored (Shaffer, 2009). The various non-primitive data structures differ in

how data is inserted, deleted and queried, leading to diverse applications in data management.

Linear Data Structures

Stacks

A stack utilises the last-in-first-out (LIFO) principle and allows only two operations: the push of an item

onto the stack, and the pop of an item from the stack (Shaffer, 2009). A stack is considered a limited

access data structure as items can only be added and removed from the top of the stack. It is also a

recursive data structure as it can either be empty, or has a top element and the rest which is the stack

(Shaffer, 2009).

Fig 6: Visualisation of a stack (Techspirited.com, 2018)

Applications of stacks in data management include backtracking or undoing and runtime memory

management. Backtracking refers to the undo mechanism in text editors; this is accomplished by storing

all of the text changes in a stack. When a user presses undo in a text editor, the stack pops off the top

element and the remaining stack is the code that remains minus the last change.

Queue

A queue is a vital data structure in data management that follow the first-in-first-out (FIFO) principle

(Barnett, 2008). The item that is stored in the front of the queue can be removed and insertion can

occur only at the back of the queue. A traditional queue is allowed three operations: enqueue inserts an

item at the back of the queue, dequeue removes an item from the front of the queue, and peek allows

the user to view the item at the front of the queue without actually removing it (Barnett, 2008).

Fig 7: Visualisation of a Queue Data Structure (Techspirited.com, 2018)

Queues are effective in situations where data is transferred between processes. Typical data

management applications are data transmission and disk scheduling. One significant application of

queues that is commonly seen in typical web applications is online ticket purchasing. Queues are often

use in to determine the order in which customers are allowed to purchase tickets; this is applied across

various industries, such as airlines tickets, concert tickets or limited addition footwear purchase.

Linked Lists

Linked lists are a collection of nodes that are linearly linked to each other through pointers (Barnett,

2008). The first node in the list is referred to as the head. A characteristic feature of linked lists is that

each node is made out of two (or three for doubly linked lists) components: the data that is stored in the

node and the memory address(es) of the node(s) that it is pointing to (Barnett, 2008). The memory

addresses are randomly assigned. The two types of linked lists are single linked lists and double linked

lists, with the only differentiating factor being that single linked lists only point to the next node, but

double linked lists has pointers to the previous node and the next node.

Fig 8: Visualisation of a Singly Linked List (Techspirited.com, 2018)

Applications of linked lists in data management can be found in the history section of web browsers and

collision resolution by chaining in hash tables. The history section of web browsers employ double linked

lists to allow users to traverse through and fetch data of previously visited sites. When a user presses

the back button, the previous node’s data is returned; similarly, when the forward button is pressed, the

next node’s data is returned. In hash tables, linked lists are used for resolving collisions when one bucket

has more than one data point allocated to it. The collision will be resolved by the bucket referencing a

linked list that contains all the elements that have been assigned to the specific bucket.

Tree Data Structures

Heap

A heap is a simple tree data structure where all the nodes in the tree are arranged in a specific order;

the data structure is represented as an array. There are two types of heaps, a max heap and a min heap

(Cormen, 1989). Min heaps are typically used in for queueing jobs in the CPU; the heap data structure is

essential in the implementation of priority queues for operating systems. Max heap and min heap follow

similar approaches where each node has a left child and a right child. In the max heap, the root of the

heap is the first item in the array, the parent node and its children are then determined by the following

rules:

Parent {A[i]} > = A[i]

Left Child{A[i]} = A[2i]

Right Child{A[i]} = A[2i +1]

Fig 9: Heap data structure (Hackerearth.com, 2018)

Another type of tree data structure that should be mentioned are B-Trees. A B-Tree is an almost

balanced rooted tree with lg n height. This data structure is typically used to index external storage by

storing multiple keys based on some criteria. Data in a B-Tree is stored in the leaf nodes, which makes it

efficient for insertion and searching, leading to its primary use for caching objects (Cormen, 1989).

Hash tables

A hash table is a special type of data structure that implements a hash function to map keys to actual

values (Larson 1988). Hashing can be implemented via the division or multiplication method. The

division method assigns keys using a hash function that takes the remainder of the division between the

number of available slots in the table m by the index of the value to be stored k. For an effective hash

function, m must be a large prime number so that there are less numbers with the same remainder,

thus reducing collisions. The best case search time for an element in a hash table is O(1), and the worst

case is O(n) (Cormen, 1989). The largest problem that hashing faces are collisions, which occur when

two or more keys hash to the same spot. The two approaches in resolving collisions are:

1.) Chaining

The chaining method handles collision resolution by putting all of the elements that collide into a linked

list (Cormen, 1989). When implemented correctly, the hash function should not assign all of the

elements to the same slot. Mapping all elements to the same key causes the hash table to become a

linked list. In this worst case scenario, the search time for the hash table is O(n). Chaining has an

advantage in where the hash table’s capacity is not limited. In general, chaining is preferred over open

addressing due to this fact.

2.) Open Addressing

Open addressing deals with collisions by continuously searching the array by incrementing the index

until a free slot is found (Cormen, 1989). This searching method is called probing. Probing can be The

advantage of using open addressing is that there are no additional data structures required, however, an

inefficient hash function increases the possibilities of the keys clustering which will subsequently

increase the required search time.

Hashing has many imperative implementations in application programming as it can be used to protect

or verify information. The most universal example of hashing is in password storage. When

programming an application, password storage is essential in allowing users access to their account.

However, a password cannot be simply stored as the string that has been input. Instead, once the user

determines a password, the string will get hashed and the hash will be stored in the system to prevent

security vulnerabilities. Hash tables can then be used as an efficient lookup table to retrieve the hashed

password once the user logs in.

4. Sorting Algorithms

Sorting algorithms are another essential part in application development and data management as it is

commonly used in the processing of data. In DML, the statement that calls a sorting mechanism is

ORDER BY. The efficiency of sorting algorithms is an important aspect of data management as choosing

the best sort is imperative in sorting extremely large datasets. The efficiency of sorting algorithms can

be evaluated using asymptotic notation and performance is represented graphically by a time

complexity comparison graph to see how the sort fairs with larger sets of data. The worst case

asymptotic notation is typically used as the base for efficiency comparison. Sorting algorithms can be

categorised into two major groups: naïve and efficient algorithms.

Naïve Sorting Algorithms

Naïve sorting algorithms encompass Bubble Sort, Selection Sort and Insertion Sort. These algorithms are

considered naïve as they sort each element by searching for its position amongst the other sorted

elements (Wirth, 1986). The distinguishing difference between the three sorts are as follows:

Bubble sort compares neighbouring items in the array and swaps them when A[i] < A[i-1].

Selection sort finds the smallest value in the array and swaps it with the item in the first

position.

Insertion sort takes elements from the array and inserts them into the correct position in a new

array.

All three naïve sorting algorithms exhibit quadratic worst case behaviour,

For examples of the implementation of the sorts in Java, please refer to Appendix [3.1] for Bubble sort,

Appendix [3.2] for Selection sort and Appendix [3.3] for Insertion sort.

Fig 10: Time complexity comparison for Naïve sorting algorithms using the same dataset.

Fig 10 shows that out of the three sorts, insertion sort is the best performer, followed by selection sort,

then bubble sort. All of the three sorts exhibit quadratic behaviour like its worst case performance. In

bubble sort, there is not much evident difference between worst case and best case. Selection sort

exhibits the most variation between the best case and the worst case as its performance is highly

dependent on whether or not the array is already partially pre-sorted. Bubble sort performs the worst

because it will perform the same number of comparisons on every unsorted value, no matter if the array

is somewhat pre-sorted as it does not take into account the order of the remaining items. Insertion sort

performs the best here as it divides the array into sorted and unsorted elements, therefore making

comparisons only on the sorted values.

These naïve sorts are no longer implemented in real life applications, as more efficient sorts have

replaced them. However, they served as the foundation for these efficient algorithms to be developed

upon. For the Java Implementation of Bubble sort, Selection sort and Insertion sort used to obtain the

data for Fig 10, please refer to Appendix [3.4], Appendix [3.5] and Appendix [3.6] respectively.

Efficient Sorting Algorithms

Many efficient sorting algorithms solve problems by recursion. A recursive function is defined as having

a base case, and a recursive case that will eventually resolve itself to the base case when input with

smaller arguments. The recursive algorithm works in three stages:

1.) Divide the problem into smaller sub-problems

2.) Solve the sub-problems through recursion, if the problem is small enough, return a value

3.) Combine the solutions to the sub-problems

This approach is called the divide and conquer principle and is applied in efficient sorting algorithms

such as Merge sort and Quick sort.

Recursive Algorithm Efficiency Calculation

The efficiency of a recursion algorithm can be solved using two methods: the recursion tree and the

master theorem.

The recursion tree method represents the recurrence as a tree with nodes that represent the sub-

problem cost. The overall efficiency of the recursive algorithm is then calculated by aggregating the

Worst Case:

Bubble Sort – O(n2)

Selection Sort – O(n2)

Insertion Sort – O(n2)

costs at all levels. The recursion tree is a highly effective method in visualising how a recursion algorithm

works, however, the limitation to using this method is that other methods, such as substitution, must be

used to verify its solutions.

The Master theorem evaluates the efficiency of recursive sorting algorithms using three evaluation cases

(See Appendix [4.1]). The solution is determined by larger of f(n) or n logba where a, b are positive

constants that satisfy the conditions a ≥ 1, b > 1, and f(n) > 0 (Cormen, 1989). The limitation of the

master theorem is that it does not cover all the cases, but its intuitive reasoning makes it an easy

method for evaluating algorithm efficiency.

Examples of using the master theorem to find the asymptotic notation for recursion algorithms are as

follows (Cormen, 1989):

Case 1 Example: T (n) = 16T (n/4)+ n ⇒ T (n) = Θ (n2)

Case 2 Example: T (n) = 4T (n/2)+ n2 ⇒ T (n) = Θ (n2 log n)

Case 3 Example: T (n) = T (n/2) + 2n ⇒ Θ (2n)

Unsolvable Example: T (n) = 0.5T (n/2)+ 1/n ⇒ Does not apply (a < 1)

Merge Sort

Merge sort is a divide and conquer algorithm that divides the unsorted array into n sub arrays that

contain 1 element (Knuth, 1998). The sub arrays are then merged into a new sorted array. Merge sort is

an extremely stable sort, with the worst case, average case and best case all equating to n log n.

However, the drawbacks to this sort is that it requires O(n) amount of memory in order to duplicate the

elements that must be sorted as sub arrays. As it is an out-of-place sort, the amount of memory

required increases as the dataset increases, leading to memory allocation issues for large datasets.

For an example implementation of Merge sort, please refer to Appendix [4.2]

Quick Sort

As an alternative to combat Merge sort’s memory allocation disadvantage, Quick sort can be

implemented. Quick sort is also a divide and conquer sorting algorithm that divides an array into smaller

sub arrays using partition and recursively sorts those arrays (Hoare, 1961). It can be more efficient than

merge sort when correctly implemented; the partition value is key in the efficiency of the Quick sort

algorithm. Quicksort has a best case and average case of n log n, whilst its worst case is n2. The

advantage to using Quick sort is that it only uses O(log n) memory, therefore it is the preferred method

as it can efficiently sort large datasets without causing any memory allocation issues.

Due to Quick sort’s advantages, it is the algorithm of choice for many practical solutions. For example,

Java’s primary system sort, or the Arrays.sort() method, utilises a 3-way partitioned Quick sort

algorithm.

For an example implementation of Quick sort, please refer to Appendix [4.3]

Fig 11:

Time complexity comparisons for e

i

Fig 11: Time complexity comparisons for effcient sorting algorithms using the same dataset.

Fig 11 portrays how Merge sort is the more stable algorithm. Although, a lot of Quick sort’s data points

still lie on the same range of that of Merge sort, meaning that the two sorts are generally comparable in

efficiency. However, the stability of the sort is not the only factor that is considered, and the limitation

of the time-complexity comparison graph is that it does not show the memory allocation used. The

trade-off between choosing Quick sort over Merge sort is more efficient memory usage over speed, and

guaranteed performance over stability. Therefore, when factoring in the memory allocation and

practicality, Quick sort is typically the better performing sort.

For the Java implementation of Merge sort and Quick sort on a large dataset that was used to obtain

data for Fig 11, please refer to Appendix [4.2] and Appendix [4.3].

For a table that compares the worst case, average case and best case time complexity comparison for

naïve sorting algorithms and efficient sorting algorithms, please refer to Appendix [4.4].

5. Conclusions

The methods detailed in the report provide a basis for the foundations of data management and

application development and should be studied extensively before implementation. The factors that

should be considered before any DBMS implementation are the requirements analysis, conceptual

schema, physical schema and external schema. Once a database is developed, the data in the database

can be accessed in different views through SQL queries. Queries can be written in different formats with

varying efficiency, therefore it is essential that best practices be studied

Apart from databases, data structures also play a crucial role in data management and application

development as they determine how the data is stored. Sorting algorithms must also be considered by

its performance efficiency on large data sets. Ultimately, there are still more sophisticated methods out

there that can be applied to this topic, however, it is imperative that these fundamental concepts be

understood by all data engineers in order to establish a solid foundation for future research.

Worst Case:

Quick Sort – O(n2)

Merge Sort – O(n lg n)

References

[1] Mahleko, B. (2018). “MMM010-340163 Data Management for Graduate Students – Lecture 02”.

Jacobs University. pp 20.

[2] Abidin, Siti & Ahmad, Suzana & M S Yafooz, Wael. (2010). A new system architecture for flexible

database conversion. WSEAS Transactions on Computers. 9]

[3] Chilson, D., & Kudlac, M. (1983). Database design: a survey of logical and physical design techniques.

ACM SIGMIS Database, 15(1), pp.13

[4] Chen, P. (1976). The entity-relationship model—toward a unified view of data. ACM Transactions on

Database Systems (TODS), 1(1),

[5] Ramakrishnan, R. & Gehrke, J. (2003). Database Management Systems (pp. 3-50). 3rd edition. New

York: McGraw-Hill.

[6] Brodie, M. & Schmidt, J. (1975) ANSI/X3/SPARC Study Group on Data Base Management

Systems. Interim Report. FDT, ACM SIGMOD bulletin. Volume 7, No. 2

[7] Barnett, G. & Del Tongo, L. (2008). Data Structures and Algorithms: Annotated Reference with

Examples. First Edition Copyright.

[8] Shaffer, C. (2009). A Practical Introduction to Data Structures and Algorithm Analysis Third Edition

(Java). Department of Computer Science. Virginia Tech Blacksburg, VA 24061.

[9] Wirth, Niklaus (1986), Algorithms & Data Structures, Upper Saddle River, NJ: Prentice-Hall, pp. 76–

77, ISBN 0130220051

[10] Shagufta, P. & Chandra, U. & Wani, A. (2017). A Literature Review on Evolving Database.

International Journal of Computer Applications (0975 – 8887). Volume 162, No 9

[11] Knuth, D. (1998). "Section 5.2.4: Sorting by Merging". Sorting and Searching. The Art of Computer Programming. 3 (2nd ed.). Addison-Wesley. pp. 158–168. ISBN 0-201-89685-0.

[12] Hoare, C. A. R. (1961). "Algorithm 64: Quicksort". Comm. ACM. 4 (7): 321. doi:10.1145/366622.366644.

[13] Larson, P. (1988). Dynamic Hash Tables. Commun. ACM, 31, 446-457.

[14] Cormen, T & Leiserson, C. & Rivest, R. & Stein, C. (1989). Introduction to Algorithms Third Edition.

pp 151 – 484

[15] Heaps/Priority Queues Tutorials & Notes | Data Structures. (2018). Retrieved from

https://www.hackerearth.com/practice/data-structures/trees/heapspriority-queues/tutorial. Accessed

on 22 November 2018

[16] Types of Data Structures in Computer Science and Their Applications. (2018).

https://techspirited.com/types-of-data-structures-in-computer-science-their-applications. Accessed on

22 November 2018

https://en.wikipedia.org/wiki/Niklaus_Wirthhttps://en.wikipedia.org/wiki/International_Standard_Book_Numberhttps://en.wikipedia.org/wiki/Special:BookSources/0130220051https://en.wikipedia.org/wiki/The_Art_of_Computer_Programminghttps://en.wikipedia.org/wiki/The_Art_of_Computer_Programminghttps://en.wikipedia.org/wiki/International_Standard_Book_Numberhttps://en.wikipedia.org/wiki/Special:BookSources/0-201-89685-0https://en.wikipedia.org/wiki/Tony_Hoarehttps://en.wikipedia.org/wiki/Communications_of_the_ACMhttps://en.wikipedia.org/wiki/Digital_object_identifierhttps://doi.org/10.1145%2F366622.366644https://www.hackerearth.com/practice/data-structures/trees/heapspriority-queues/tutorialhttps://techspirited.com/types-of-data-structures-in-computer-science-their-applications

APPENDIX

Appendix [1.1]

CREATE TABLE employees (eid INT PRIMARY KEY NOT NULL auto_increment,

ename VARCHAR(128),

email VARCHAR(128),

birthdate DATE,

salary FLOAT(25));

CREATE TABLE departments (did INT PRIMARY KEY NOT NULL auto_increment,

dname VARCHAR(128));

CREATE TABLE dependents (eid INT NOT NULL,

dependent_name VARCHAR(256));

CREATE TABLE manages (eid INT NOT NULL,

did INT NOT NULL,

start_date DATE,

end_date DATE,

PRIMARY KEY(did),

FOREIGN KEY(eid) REFERENCES employees(eid),

FOREIGN KEY(did) REFERENCES departments(did));

CREATE TABLE works (eid INT,

did INT,

start_date DATE,

end_date DATE,

FOREIGN KEY(eid) REFERENCES employees(eid) NOT NULL,

FOREIGN KEY(did) REFERENCES departments(did) NOT NULL);

CREATE TABLE policy (eid INT,

pid VARCHAR(128),

FOREIGN KEY(eid) REFERENCES employees(eid));

Appendix [1.2]

Consider the following relational schema. An employee can work in more than one department; the

pct time field of the Works relation shows the percentage of time that a given employee works in a

given department.

Emp(eid: integer, ename: string, age: integer, salary: real) Works(eid: integer, did: integer, pcttime:

integer) Dept(did: integer, dname: string, budget: real, managerid: integer)

Create a database based on the above schema.

SHOW DATABASES;

CREATE DATABASE employees_new;

SHOW DATABASES;

USE employees_new;

CREATE TABLE emp (eid INT PRIMARY KEY NOT NULL auto_increment,

ename VARCHAR(128),

age INT,

salary FLOAT(25));

CREATE TABLE dept (did INT PRIMARY KEY NOT NULL auto_increment,

dname VARCHAR(128),

budget FLOAT(25),

managerid INT,

FOREIGN KEY(managerid) REFERENCES emp(eid));

CREATE TABLE works (eid INT,

did INT,

pcctime INT,

FOREIGN KEY(eid) REFERENCES emp(eid),

FOREIGN KEY(did) REFERENCES dept(did));

SHOW TABLES;

DESC emp;

+-----------+------------------+-------+------+----------+------------------------+

| Field | Type | Null | Key | Default | Extra |

+-----------+------------------+-------+------+----------+------------------------+

| eid | int(11) | NO | PRI | NULL | auto_increment |

| ename | varchar(128) | YES | | NULL | |

| age | int(11) | YES | | NULL | |

| salary | double | YES | | NULL | |

+-----------+------------------+-------+------+-----------+-----------------------+

INSERT INTO emp (ename,age,salary) VALUES

("Mimi",23,250000),("Akeem",35,21816),("Alexis",58,17439),("Jin",35,26836),("Clare",61,42221786),("El

eanor",27,5758651),("Murphy",65,232610),("Shad",61,1580),("Tobias",46,454323.50),("Randall",40,422

71.21),("Gray",31,12368.60);

SELECT * FROM emp;

+------+-----------+------+--------------+

| eid | ename | age | salary |

+------+-----------+------+--------------+

| 1 | Mimi | 23 | 250000 |

| 2 | Akeem | 35 | 21816 |

| 3 | Alexis | 58 | 17439 |

| 4 | Jin | 35 | 26836 |

| 5 | Clare | 61 | 42221786 |

| 6 | Eleanor | 27 | 5758651 |

| 7 | Murphy | 65 | 232610 |

| 8 | Shad | 61 | 1580 |

| 9 | Tobias | 46 | 454323.5 |

| 10 | Randall | 40 | 42271.21 |

| 11 | Gray | 31 | 12368.6 |

+------+-----------+------+--------------+

DESC dept;

+----------------+-----------------+--------+-------+-----------+--------------------+


+----------------+-----------------+--------+-------+-----------+---------------------+

| did | int(11) | NO | PRI | NULL | auto_increment |

| dname | varchar(128) | YES | | NULL | |

| budget | double | YES | | NULL | |

| managerid | int(11) | YES | MUL | NULL | |

+---------------+------------------+-------+--------+-----------+----------------------+

INSERT INTO dept (dname,budget,managerid) VALUES ("Software",60000,1),("Hardware",

10000000,4),("HR",5000,7),("Marketing",70000,2);

SELECT * FROM dept;

+-----+---------------+--------------+----------------+

| did | dname | budget | managerid |

+-----+---------------+--------------+----------------+

| 1 | Software | 60000 | 1 |

| 2 | Hardware | 10000000 | 4 |

| 3 | HR | 5000 | 7 |

| 4 | Marketing | 70000 | 2 |

+-----+---------------+--------------+----------------+

DESC works;

+------------+----------+-------+--------+-----------+---------+


+------------+----------+-------+--------+-----------+---------+

| eid | int(11) | YES | MUL | NULL | |

| did | int(11) | YES | MUL | NULL | |

| pcctime | int(11) | YES | | NULL | |

+------------+----------+-------+-------+-----------+----------+

INSERT INTO works (eid,did,pcctime) VALUES

(1,1,100),(2,1,50),(2,2,50),(3,4,100),(4,3,90),(4,4,10),(5,1,75),(5,2,25),(6,3,100),(7,4,100),(8,2,60),(8,1,10)

,(8,3,30),(9,1,25),(9,2,25),(9,3,25),(9,4,25),(10,4,100),(11,2,100);

SELECT * FROM works;

+------+------+---------+

| eid | did | pcctime |

+------+------+---------+

| 1 | 1 | 100 |

| 2 | 1 | 50 |

| 2 | 2 | 50 |

| 3 | 4 | 100 |

| 4 | 3 | 90 |

| 4 | 4 | 10 |

| 5 | 1 | 75 |

| 5 | 2 | 25 |

| 6 | 3 | 100 |

| 7 | 4 | 100 |

| 8 | 2 | 60 |

| 8 | 1 | 10 |

| 8 | 3 | 30 |

| 9 | 1 | 25 |

| 9 | 2 | 25 |

| 9 | 3 | 25 |

| 9 | 4 | 25 |

| 10 | 4 | 100 |

| 11 | 2 | 100 |

+------+------+---------+

Write the following queries in SQL:

a. Print the names and ages of each employee who works in both the Hardware department and

the Software department

SELECT e.ename, e.age FROM emp e WHERE e.eid in (SELECT eid FROM works WHERE did = 1) AND e.eid

in (SELECT eid FROM works WHERE did = 2);

+-----------+------+

| ename | age |

+-----------+------+

| Akeem | 35 |

| Clare | 61 |

| Shad | 61 |

| Tobias | 46 |

+-----------+-------+

b. Find the managerids of managers who manage only departments with budgets greater than

$1 million

SELECT managerid FROM dept WHERE budget > 1000000;

+-----------+

| managerid |

+-----------+

| 4 |

+-----------+

c. Find the enames of managers who manage the departments with the largest budgets.

SELECT ename FROM emp WHERE eid = (SELECT managerid FROM dept WHERE budget = (SELECT

max(budget) FROM dept));

+-------+

| ename |

+-------+

| Jin |

+-------+

Appendix [1.3]

1. Find the number of employees hired in the year 2000. SELECT COUNT(emp_no) FROM employees e WHERE YEAR(hire_date) = "2000";

+----------+

| COUNT(*) |

+----------+

| 13 |

+----------+

2. Find the average age (in years) of employees who were hired in the year 2000

SELECT AVG(TIMESTAMPDIFF(year,birth_date,curdate())) AS avg FROM employees e WHERE YEAR(hire_date) = "2000";

+-----------------+

| avg |

+------------------+

| 60.7692 |

+-------------------+

3. Create a table called millenial_hires consisting of the following fields: a. id (auto increment, unsigned int(6), not null, primary key) b. first_name (varchar(30)) c. dob (date) Describe the table you just created and validate if the description matches the specification. CREATE TABLE millennial_hires (id INT(6) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT, first_name VARCHAR(30), dob DATE);

DESC millennial_hires;

+------------+---------------------+-------+------+-----------+--------------------+


+------------+---------------------+-------+------+-----------+--------------------+

| id | int(6) unsigned | NO | PRI | NULL | auto_increment |

| first_name | varchar(30) | YES | | NULL | |

| dob | date | YES | | NULL | |

+---------------+-----------------+-------+------+-----------+------------------+

4. Insert the first name and birth date of all the people hired in the year 2000 into the table created in the last task. Add your details to the table. Print out all the values. INSERT INTO millennial_hires(first_name, dob) (SELECT first_name, birth_date FROM employees WHERE YEAR(hire_date) = "2000"); INSERT INTO millennial_hires(first_name, dob) VALUES ("Mimi", "1995-02-07");

SELECT * FROM millennial_hires;

+----+-------------+----------------+

| id | first_name | dob |

+----+-------------+----------------+

| 1 | Ulf | 1960-09-09 |

| 2 | Seshu | 1964-04-21 |

| 3 | Randi | 1953-02-09 |

| 4 | Mariangiola | 1955-04-14 |

| 5 | Ennio | 1960-09-12 |

| 6 | Volkmar | 1959-08-07 |

| 7 | Xuejun | 1958-06-10 |

| 8 | Shahab | 1954-11-17 |

| 9 | Jaana | 1953-04-09 |

| 10 | Jeong | 1953-04-27 |

| 11 | Yucai | 1957-05-09 |

| 12 | Bikash | 1964-06-12 |

| 13 | Hideyuki | 1954-05-06 |

| 16 | Mimi | 1995-02-07 |

+----+---------------+----------------+

SELECT COUNT(*) FROM millennial_hires;

+----------+

| COUNT(*) |

+----------+

| 14 |

+----------+

5. From the table, delete the entries of employees who were born in or after the year 1960. Find the number of records in the table after deletion. DELETE FROM millennial_hires WHERE YEAR(dob) >= "1960"; SELECT COUNT(*) FROM millennial_hires;

+----------+

| COUNT(*) |

+----------+

| 9 |

+----------+

6. Add a new column called birth_year in the table. Put the birth year of each person as the values in this column and delete the dob column. Print the resulting table.

ALTER TABLE millennial_hires ADD COLUMN birth_year INT(4); UPDATE millennial_hires SET birth_year = YEAR(dob);

ALTER TABLE millennial_hires DROP COLUMN dob;

SELECT * FROM millennial_hires;

+----+-----------------+------------+

| id | first_name | birth_year |

+----+-----------------+------------+

| 3 | Randi | 1953 |

| 4 | Mariangiola | 1955 |

| 6 | Volkmar | 1959 |

| 7 | Xuejun | 1958 |

| 8 | Shahab | 1954 |

| 9 | Jaana | 1953 |

| 10| Jeong | 1953 |

| 11| Yucai | 1957 |

| 13| Hideyuki | 1954 |

+----+----------------+------------+

+----------------+----------------+-------+------+-----------+-------+


+----------------+-----------------+------+------+------------+-------+

| dept_no | char(4) | NO | PRI | NULL | |

| dept_name | varchar(40) | NO | UNI | NULL | |

+----------------+-----------------+-------+-------+----------+-------+


+---------------+-----------+-------+-----+-----------+-----------+


+---------------+-----------+-------+------+----------+-----------+

| emp_no | int(11) | NO | PRI | NULL | |


| from_date | date | NO | | NULL | |

| to_date | date | NO | | NULL | |

+----------------+-----------+-------+-----+----------+-----------+


+---------------+----------+-------+-----+----------+----------+


+---------------+----------+-------+-----+----------+----------+

| emp_no | int(11) | NO | | NULL | |

| from_date | date | YES | | NULL | |


+----------------+---------+-------+-----+----------+----------+



+-------------+---------+--------+-----+------------+-------+


+-----------+-------------+------+------+-----------+-------+


+-----------+-------------+------+------+-----------+-------+


| title | varchar(50) | NO | PRI | NULL | |

| from_date | date | NO | PRI | NULL | |


+-----------+-------------+------+------+-----------+-------+


3. Find the number of employees in the database.

mysql> SELECT COUNT(*) FROM employees;

+----------+

| COUNT(*) |

+----------+

| 300024 |

+----------+

1 row in set (0.19 sec)

4. List all the departments and their number.

mysql> SELECT DISTINCT dept_name, dept_no FROM departments ORDER BY 2;

+--------------------+---------+

| dept_name | dept_no |

+--------------------+---------+

| Marketing | d001 |

| Finance | d002 |

| Human Resources | d003 |

| Production | d004 |

| Development | d005 |

| Quality Management | d006 |

| Sales | d007 |

| Research | d008 |

| Customer Service | d009 |

+--------------------+---------+


5. Find the number of female employees.

mysql> SELECT COUNT(*) FROM employees e where e.gender LIKE 'F';

+----------+

| COUNT(*) |

+----------+

| 120051 |

+----------+


6. Print the maximum and the minimum salary,

mysql> SELECT MAX(salary),MIN(salary) FROM salaries;

+-------------+-------------+

| MAX(salary) | MIN(salary) |

+-------------+-------------+

| 158220 | 38623 |

+-------------+-------------+


7. Print the department number and the corresponding number of employees who have ever worked

there.

mysql> SELECT d.dept_no, COUNT(d.emp_no) FROM dept_emp d GROUP BY 1 ORDER BY 1;

+---------+-----------------+

| dept_no | COUNT(d.emp_no) |

+---------+-----------------+

| d001 | 20211 |

| d002 | 17346 |

| d003 | 17786 |

| d004 | 73485 |

| d005 | 85707 |

| d006 | 20117 |

| d007 | 52245 |

| d008 | 21126 |

| d009 | 23580 |

+---------+-----------------+


8. Print the department name and the corresponding number of employees who have ever worked

there.

mysql> SELECT dp.dept_name, COUNT(d.emp_no) FROM dept_emp d, departments dp WHERE

d.dept_no = dp.dept_no GROUP BY 1 ORDER BY 1;

+-------------------------+-----------------+

| dept_name | COUNT(d.emp_no) |

+--------------------------+-----------------+

| Customer Service | 23580 |

| Development | 85707 |

| Finance | 17346 |

| Human Resources | 17786 |

| Marketing | 20211 |

| Production | 73485 |

| Quality Management | 20117 |

| Research | 21126 |

| Sales | 52245 |

+-------------------------+-----------------+


9. Print the department names and their corresponding average salaries.

mysql> SELECT dp.dept_name, AVG(s.salary) FROM dept_emp d, departments dp, salaries s WHERE

d.dept_no = dp.dept_no AND s.emp_no = d.emp_no GROUP BY 1 ORDER BY 1;

+-------------------------+--------------------+

| dept_name | AVG(s.salary) |

+-------------------------+-------------------+

| Customer Service | 58770.3665 |

| Development | 59478.9012 |

| Finance | 70489.3649 |

| Human Resources | 55574.8794 |

| Marketing | 71913.2000 |

| Production | 59605.4825 |

| Quality Management | 57251.2719 |

| Research | 59665.1817 |

| Sales | 80667.6058 |

+------------------------+-------------------+


10. Print the employee name, employee id and the maximum salary earned by him or her. Only report

for the employees with the top 5 highest salaries.

mysql> SELECT CONCAT(e.first_name," ",e.last_name) AS full_name, e.emp_no, MAX(s.salary) FROM

employees e, salaries s WHERE e.emp_no = s.emp_no GROUP BY 2 ORDER BY 3 DESC LIMIT 5;

+-------------------------+-----------+---------------+

| full_name | emp_no | MAX(s.salary) |

+-------------------------+-----------+---------------+

| Tokuyasu Pesch | 43624 | 158220 |

| Honesty Mukaidono | 254466 | 156286 |

| Xiahua Whitcomb | 47978 | 155709 |

| Sanjai Luders | 253939 | 155513 |

| Tsutomu Alameldin | 109334 | 155377 |

+---------------------------+------------+---------------+


Appendix [2.2] Consider the following schema:

Suppliers(sid: integer, sname: string, address: string) Parts(pid: integer, pname: string, color: string)

Catalog(sid: integer, pid: integer, cost: real)

The Catalog relation lists the prices charged for parts by Suppliers. Create a database based on the

above schema.

CREATE TABLE suppliers (sid INT PRIMARY KEY NOT NULL AUTO_INCREMENT, sname VARCHAR(128),

address VARCHAR(256));

CREATE TABLE parts(pid INT PRIMARY KEY NOT NULL AUTO_INCREMENT, pname VARCHAR(128), colour

VARCHAR(56));

CREATE TABLE catalog(sid INT,pid INT,cost FLOAT(25),FOREIGN KEY(sid) REFERENCES

suppliers(sid),FOREIGN KEY(pid) REFERENCES parts(pid));

INSERT INTO suppliers (sname,address) VALUES ("Martena","P.O. Box 872, 8417 Tellus.

St."),("Tatum","P.O. Box 216, 7552 Lacus, St."),("Gillian","4443 Donec Rd."),("Dylan","266-4108 Eu,

St."),("Austin","1114 Imperdiet St."),("Alice","P.O. Box 926, 5519 Feugiat. Avenue"),("Irma","P.O. Box

902, 5166 Pulvinar Rd."),("Gloria","P.O. Box 518, 5152 Tortor Av."),("Russell","Ap #428-7669 Sed

St."),("Amanda","1868 Orci. Ave");

INSERT INTO parts(pname,colour) VALUES ("Q6M-1V2","green"),("A8B-1P0","green"),("D7B-

6D3","blue"),("D1N-2E4","violet"),("F6B-8L7","orange"),("N2O-7V1","indigo"),("F8T-

1V3","indigo"),("T6F-0R9","indigo"),("T1Y-9M4","indigo"),("R0N-6N2","red");

INSERT INTO catalog (sid,pid,cost) VALUES

(7,7,"9220.76"),(2,7,"9833.77"),(1,9,"6641.91"),(8,10,"2505.70"),(4,5,"1601.36"),(3,10,"3887.97"),(10,9,"

2324.85"),(6,3,"6497.71"),(4,10,"2088.35"),(3,9,"9718.23"),

(5,5,"2442.83"),(6,6,"204.87"),(1,4,"716.85"),(8,10,"5489.20"),(7,1,"148.81"),(2,5,"713.07"),(5,6,"4232.0

9");

Write the following queries in SQL:

a.) Find the pnames of parts for which there is some supplier

SELECT pname FROM parts WHERE pid in (SELECT pid FROM catalog WHERE sid IS NOT NULL)

+--------------+

| pname |

+--------------+

| Q6M-1V2 |

| D7B-6D3 |

| D1N-2E4 |

| F6B-8L7 |

| N2O-7V1 |

| F8T-1V3 |

| T1Y-9M4 |

| R0N-6N2 |

+-------------+

b.) Find the snames of suppliers who supply every red part.

SELECT sname FROM suppliers WHERE sid in (SELECT sid FROM catalog WHERE pid in (SELECT pid FROM

parts WHERE colour LIKE 'red'));

+------------+

| sname |

+------------+

| Gillian |

| Dylan |

| Gloria |

+------------+

c.) Find the sids of suppliers who supply only red parts.

SELECT DISTINCT sid FROM (SELECT sid,colour from catalog c, parts p where c.pid = p.pid) t1 WHERE NOT

EXISTS (SELECT * FROM (SELECT sid, colour from catalog c, parts p where c.pid = p.pid) t2 where t1.sid =

t2.sid and t2.colour!='red');

+------+

| sid |

+------+

| 8 |

+------+

d.) Find the sids of suppliers who supply a red part and a green part.

SELECT sid FROM (SELECT sid,colour FROM catalog c, parts p WHERE p.pid = c.pid AND colour LIKE 'red')

t1 WHERE sid in (SELECT sid FROM catalog c, parts p WHERE p.pid = c.pid AND colour LIKE 'green');

Empty set (0.00 sec)

e.) Find the sids of suppliers who supply a red part or a green part.

SELECT DISTINCT sid FROM (SELECT sid,colour FROM catalog c, parts p WHERE p.pid = c.pid AND (colour

LIKE 'red' OR colour LIKE 'green')) t1;

+------+

| sid |

+------+

| 7 |

| 8 |

| 3 |

| 4 |

+------+

Appendix [3.1]

Bubble Sort Using Java

Appendix [3.2]

Selection Sort Using Java

Appendix [3.3]

Insertion Sort Using Java

Appendix [3.4]

Bubble Sort Implementation on the INPUT dataset

Appendix [3.5]

Selection Sort Implementation on the INPUT dataset

Appendix [3.6]

Insertion Sort Implementation on the INPUT dataset

Appendix [4.1]*

*Taken from: Cormen, T. H., Leiserson, C. E., Rivest, R. L. & Stein, C. (2001) Introduction to Algorithms, Second edition. Cambridge, Massachusetts: MIT

Press. Chapters 4.

Appendix [4.2]

Merge Sort Implementation on INPUT dataset

(Completed by Tanasorn Chindasook, Prateek Choudhary and Shengchen Dong)

Appendix [4.3]

Quick Sort Implementation on INPUT dataset

(Completed by Tanasorn Chindasook, Prateek Choudhary and Shengchen Dong)

Appendix [4.4]*

*Taken from "Know Thy Complexities!" Big-O Algorithm Complexity Cheat Sheet (Know Thy

Complexities!) @ericdrowell. Accessed December 09, 2018. http://bigocheatsheet.com/.

Date post:	19-Oct-2020
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

DATA MANAGEMENT PRINCIPLES IN APPLICATION … · DBMS creation through an implementation of a DBMS...

Documents