DATA MANAGEMENT
PRINCIPLES IN APPLICATION
DEVELOPMENT Database Development, Data Structures and Sorting
Algorithms
Tanasorn (Mimi) Chindasook Jacobs University, M.Sc. Data Engineering
Student ID: 30002281 [email protected]
Acknowledgements
I respectfully acknowledge Dr. Bendick Mahleko and Nilabhra Roy Chowdhury for their support, advice
and input on the project; Prateek K. Choudhary and Shengchen Dong for their support during the course
and critical review of the paper.
Abstract
Data management is a crucial skill that every data engineer should possess in order to effectively
implement and maintain database systems within an organisation. A relational database is an efficient
way to store data to perform queries that can be used in application development. Data structures and
sorting algorithms are also a crucial part in application development as both can be used to optimise
performance when correctly implemented.
Introduction
With the increasing volume of data being generated each day, database management systems (DBMS)
exist as a method to assist users in maintain and utilise large collections of data. Without proper
management, data collected cannot be fully utilised to its full potential as information retrieval and
analysis would be an incredibly difficult task to accomplish. This paper explores the various factors in
DBMS creation through an implementation of a DBMS for a start-up company looking to create an
employees database. There are many factors to consider in order to correctly implement an effective
DBMS. Therefore, it is imperative that research be done on the system that is to be modelled, and the
requirements be correctly and thoroughly collected as it is difficult to change the structure of a DBMS
once implemented.
Furthermore, to be able to effectively develop applications, research on the appropriate data structures
and sorting algorithms must also be conducted so that they can be appropriately implemented.
Therefore, this paper also aims to introduce different data structures by comparing its usage across
various applications, and provide an overview of sorting algorithms by evaluating its efficiency using
time-complexity comparisons.
The rest of the paper is structured as followed: Section 1 provides a brief literature review for data
management principles, data structures and sorting algorithms, Section 2 details the creation of an
example DBMS system for employees in a start-up company along with some query examples, Section 3
explores key data structures used in data management, Section 4 compares various naïve and efficient
sorting algorithms through a time complexity analysis and describes their applications, and finally,
Section 5 concludes the paper.
1. Literature Review
In 1960, Charles Bachman designed the IDS (Integrated Data Store) to improve performance which
significantly influenced the development of other DBMS systems. Later, in 1966, IBM released the IMS
(Information Management System) which was based on hierarchical database and was intended for
storing large bills of material for aerospace projects such as the Apollo space vehicle9 (Shagufta, 2017).
The relational data model was then proposed by E.F. Codd during his time at IBM in 1970. The idea was
that the data would be represented through table form and thus would allow the possibility of
incorporating many-to-many joins unlike the hierarchical data model. He released several papers after
his initial theoretical work which detailed the aspects of the relational model, such as relational algebra9
(Shagufta, 2017).
The ER Model was then developed by Peter Chen in 1976 (Chen, 1976). This model represents the world
in terms of entities and relationships, and is the model that is used in abstraction to assist in database
design. This led to the relational model being adopted as the standard approach for DBMS in 1980,
along with the development of SQL as a query language and its adoption by ANSI and ISO. Several
relational DBMS were developed, such as Informix and Oracle.
Another significant aspect in database management and application development are data structures.
Data structures can be characterised into primitive and non-primitive types. Primitive data structures
are used in data management to define the type of data that should be stored using DDL. Non-primitive
data structures are crucial in application development as the differing characteristics of each data
structure allows for various implementations.
Data structures such as queues have always existed as part of the fundamental logic in batch processing.
However, stacks were developed in 1946 in Turing’s computer design and linked lists were developed by
Newell, Shaw and Simon for RAND Corporation’s Information Processing Language.
Sorting algorithms also play an important role in application development and data management as a
data pre-processing step. Efficient sorting algorithms differ in stability and memory usage and are
implemented in several DBMS user interfaces. Efficient sorting algorithms such as Merge sort was
conceptualised by John von Neumann in 1945 (Knuth, 1998) and its more widely used counterpart,
Quick sort, was developed in by Tony Hoare in 1959 (Hoare, 1961).
2. Relational Database Concepts and Database Design
A database is defined as a structured collection of data that describes the different components of one
or more related organisations, and can be stored or accessed in various ways (Ramakrishnan, 2003).
Relational databases are a type of database that typically utilises the ANSI-SPARC architecture in data
management which was first proposed in 1975 (Brodie, 1975). The ANSI-SPARC architecture is defined at
three levels of abstraction which enable the end user to achieve logical and physical data independence.
Logical data independence protects users from alterations in the logical structure of the data, whilst
physical data independence refers to end user protection from changes at the physical storage as the
modifications are transformed through mapping techniques in the conceptual schema.
Fig 1: Structure of a DBMS (Mahleko, 2018)
Before beginning any database design, a requirements analysis must be carried out to ensure that all
data is represented in the appropriate format in the database. For example, the upper management at a
small start-up company would like to implement a relational database to store its employee
information. The requirements analysis for this particular case will be answered in the following
manner:
What data must be stored in the database?
All data pertaining to information on an employee in relation to the company, along with some personal
information must be stored in the database (e.g. name, birthdate, start date, end date, salary, email,
department). As employees can be promoted or change departments, records of how long an employee
has worked in which department at which position must also be kept. The company also provides an
extended health insurance policy to dependents of the employee.
Who will use this database and what do these users want from the database?
Upper management and HR will be the primary users of this database. The primary use of this database
is to be able to easily retrieve information on each employee in the company when issuing monthly
payments, preparing for company audits and providing an overview of the employees.
What operations are to be performed on the database? Which of these operations are
frequently performed?
Operations that will be mostly performed on the database are:
1.) Viewing of a set of employees and their corresponding information
2.) Updating information when an employee changes departments, gets promoted, or leaves
3.) Addition of new employees
Once the data requirements have been thoroughly analysed, the relational database design can then
commence.
Fig 1.1: Overview of ANSI-SPARC architecture (Abidin, 2010)
Conceptual Schema
The conceptual schema (or Data Modelling) is the first level of abstraction in ANSI-SPARC architecture
and consists of the definition of the data’s logical structure. In the conceptual schema, data base
designers define the tables that the database will be established upon, along with the entities that
should be included in those tables, its datatypes, the choice of relation between those entities and any
constraints on the data. The process of representing the data as a set of tables is denoted as the
conceptual database design.
Conceptual Design
The conceptual design aspect focuses on describing the data and customer intention. Entity Relationship
Models (ERM) are a high-level abstraction that represents the world in terms of relationships and
entities (Chilson, 1983). ER diagrams are a semiformal way of representing the data using the ERM
concept. Although ER diagrams cannot be immediately automated into database format, it is an
effective way to visualise the relationships between entities. The following ER diagram depicts the
Employees database in terms of the ER Model:
Fig 2: ER Diagram for the employees database
An entity is an object that can be distinctly identified. In the diagram, an employee is considered to be an entity. A department is also considered to be an entity (Chen, 1976).
Weak entities are entities that can be identified only by considering the primary key of its owner (Chen, 1976). In this ER diagram a dependent is considered to be a weak entity because a dependent is only related to a company through the employee that works there.
A descriptive fact about an entity. In this ER diagram, birthdate, email, name and salary are all descriptive attributes of an employee. Department name is a descriptive attribute of a department (Chen, 1976).
The unique identifier for an entity. No two entities can have the same primary key (Chen, 1976). In the ER diagram, the primary keys are eid (employee ID) and did (department ID)
Weak entities can only be identified by considering the primary key of another related entity (Chen, 1976). As represented in the ER diagram, dependents is a weak entity that can only be identified by the employee id through the company health insurance policy. In this case, the weak identifier for dependents is employee id.
Denotes a relationship between two entities. In this ER diagram, the “works for” relationship represents the relation between employees and departments (Chen, 1976).
Denotes an identifying relationship. In the case of the ER Diagram, Manages is an identifying relationship as when given a department, its manager can be uniquely identified. Policy is also an identifying relationship as when given a dependent, its related employee can be uniquely identified.
Key Constraint. The arrow points to the direction that is constrained. In the case of the employee database, only one employee can manage a department at any given time. Therefore, the arrow is pointed in the direction of the employee.
One-to-Many Relationship. As represented in the ER diagram, one employee can have many dependents.
Many-to-Many Relationship. As represented in the ER diagram, many employees work for one department, and one department can have many employees.
An aspect that is not included in this design but should be mentioned is the participation constraint.
example of a participation constraint is that all employees must work for at least one department, or all
departments must have at least one employee. An example of a one-sided participation constraint is
demonstrated in the manages relationship where each department must have a manager, but not all
employees must have a managing role. Due to the fact that companies change structure constantly
(especially at a start up stage in this case), participation constraints are not enforced in this database.
Logical Design
The logical model is constructed based on the conceptual model with the addition of the datatypes
(Abidin, 2010). The logical design focuses on the abstract and disregards the implementation. For the
Employees database, the logical design is as follows:
Employees(eid:integer, name:string, email:string, birthdate:date, salary:real)
Department(did:integer, dname:string, managerid:int)
Dependent(eid:integer, dependent_name:string)
Works(eid:integer, did:integer, start_date:date, end_date:date)
Manages(did:integer, eid:integer, start_date:date, end_date:date)
Policy(pid:integer, eid:integer)
Physical Schema
After the conceptual schema has been created, the physical schema should then be considered. The
physical schema is where the files and indexes used are defined. This step denotes how the data, such as
the relations defined in the conceptual schema, will be represented and stored in secondary storage
systems such as Oracle, Postgres, SQL Server and MySQL. These secondary storage systems all utilise
Structured Query Language (SQL) as a means of interaction but differ in the syntax. The statements that
are referenced in the body of this report and its appendix are in MySQL syntax.
Physical Design
The physical design and DDL of the employees database can be found in Appendix [1.1].
External Schema
Following the definition of the physical schema is the external schema. The external schema represents
the different views of the data that can be seen by the end-user (Ramakrishnan, 2003; Brodie, 1975). For
example, students at a university should not be allowed to view the salaries of the professors.
Therefore, permissions and roles of each user must be defined at this level. An example command in
DDL to grant SELECT access on all tables in the employees database to user tdaoruang is:
GRANT USAGE ON employees_n TO tdaoruang
GRANT SELECT ON ALL TABLES IN SCHEMA employees_n TO tdaoruang
In Fig. 4, the user mchindasook holds the role of Database Administrator and has permission for all
aspects of the database. In comparison, the user lhaller is the top manager in the HR department and
can view employee information, add new employees and update an employee’s information through an
application. Finally, the user tdaoruang is an employee in the HR department and can only view
information through an application. It is important to note that HR will only view the database through
an application as HR will not have direct access to the employees database in reality, but only through a
user-friendly interface. The external schema is used in application development as the external view is
not stored, but rather computed as it is accessed (Ramakrishnan, 2003).
Fig. 4: External schema for employees database
Languages that can be used to implement the conceptual and physical schema are data definition
language (DDL) and data modelling language (DML).
DDL: used to define conceptual and external schemas
o CREATE
DML: used to perform operations on the data
o INSERT, UPDATE, DELETE
For more examples of using DDL and DML to create a database and insert, update and delete data, along
with some basic queries that can be performed on databases, please refer to Appendix [1.2] and
Appendix [1.3].
Database Querying
Relational Algebra and Database Queries
The rudimentary operations of relational algebra are projection, selection, set union, set intersection,
set difference and Cartesian product (Ramakrishnan, 2003). Relational algebra is primarily used in data
modelling and database querying. The types of joins that are most used in database querying are: inner
join, left outer join, right outer join and full-outer join.
The natural join is one of the most essential operations in relational algebra as it is the relational
equivalent of the logical AND, and is the operation that returns the set of all arrangements of tuples that
are equivalent on a common attribute.
Fig 5: Example of a natural join followed by a query
Fig 5 depicts the process of finding the employee with the highest salary through the use of a join. This
join can be achieved in two ways:
Query A: Query B:
It should be noted that, in this case, there are other ways to produce the identical results, each with
variations in efficiency. An example of a highly ineffective query to achieve the result above is to
perform a Cartesian join then find the tuple that satisfies the conditions. The example queries above
depict two different ways to join different tables in the database. For this instance, query A is less
efficient than query B as it selects the highest salary with the use of a subquery. Queries that use
subqueries in this manner are often subject to slower performance as the subquery has to finish running
before the outer query can initialise. Another notable difference is that Query B will return all the
employees regardless of whether or not they are assigned to a department or not. In Query B, if an
employee is not assigned to a department, tuple that is returned for that employee will contain all the
selected information in the base table (employees) and NULL everywhere else.
For more examples of database SELECT queries, please refer to Appendix [2.1] and Appendix [2.2].
SELECT e.ename, d.dname, e.salary
FROM employees e, departments d, works w
WHERE e.eid = w.eid
AND d.did = w.did
AND w.to_date IS NOT NULL
AND e.salary = (SELECT MAX(salary) FROM
employees);
SELECT e.ename, d.dname, e.salary
FROM employees e
WHERE w.to_date IS NOT NULL
LEFT JOIN works w ON w.eid = e.eid
LEFT JOIN departments d ON d.did = w.did
ORDER BY e.salary DESC LIMIT 1;
The advantages of employing a DBMS include improvements in data integrity and security by enforcing
integrity constraints on data that is accessed or input and enforces access control for other users.
Furthermore, a DBMS also protects its users from being affected by system failure through crash
recovery mechanisms.
3. Data Structures
Data structures are vital components in data management and application development as it pertains to
the way data is stored in an exceptionally effective manner (Shaffer, 2009). Data structures can be
characterised into primitive and non-primitive types; primitive types refer to the datatypes such as
Boolean or Integer, and non-primitive types refer to arrays, stacks or queues where data is referenced
using an index and not directly stored (Shaffer, 2009). The various non-primitive data structures differ in
how data is inserted, deleted and queried, leading to diverse applications in data management.
Linear Data Structures
Stacks
A stack utilises the last-in-first-out (LIFO) principle and allows only two operations: the push of an item
onto the stack, and the pop of an item from the stack (Shaffer, 2009). A stack is considered a limited
access data structure as items can only be added and removed from the top of the stack. It is also a
recursive data structure as it can either be empty, or has a top element and the rest which is the stack
(Shaffer, 2009).
Fig 6: Visualisation of a stack (Techspirited.com, 2018)
Applications of stacks in data management include backtracking or undoing and runtime memory
management. Backtracking refers to the undo mechanism in text editors; this is accomplished by storing
all of the text changes in a stack. When a user presses undo in a text editor, the stack pops off the top
element and the remaining stack is the code that remains minus the last change.
Queue
A queue is a vital data structure in data management that follow the first-in-first-out (FIFO) principle
(Barnett, 2008). The item that is stored in the front of the queue can be removed and insertion can
occur only at the back of the queue. A traditional queue is allowed three operations: enqueue inserts an
item at the back of the queue, dequeue removes an item from the front of the queue, and peek allows
the user to view the item at the front of the queue without actually removing it (Barnett, 2008).
Fig 7: Visualisation of a Queue Data Structure (Techspirited.com, 2018)
Queues are effective in situations where data is transferred between processes. Typical data
management applications are data transmission and disk scheduling. One significant application of
queues that is commonly seen in typical web applications is online ticket purchasing. Queues are often
use in to determine the order in which customers are allowed to purchase tickets; this is applied across
various industries, such as airlines tickets, concert tickets or limited addition footwear purchase.
Linked Lists
Linked lists are a collection of nodes that are linearly linked to each other through pointers (Barnett,
2008). The first node in the list is referred to as the head. A characteristic feature of linked lists is that
each node is made out of two (or three for doubly linked lists) components: the data that is stored in the
node and the memory address(es) of the node(s) that it is pointing to (Barnett, 2008). The memory
addresses are randomly assigned. The two types of linked lists are single linked lists and double linked
lists, with the only differentiating factor being that single linked lists only point to the next node, but
double linked lists has pointers to the previous node and the next node.
Fig 8: Visualisation of a Singly Linked List (Techspirited.com, 2018)
Applications of linked lists in data management can be found in the history section of web browsers and
collision resolution by chaining in hash tables. The history section of web browsers employ double linked
lists to allow users to traverse through and fetch data of previously visited sites. When a user presses
the back button, the previous node’s data is returned; similarly, when the forward button is pressed, the
next node’s data is returned. In hash tables, linked lists are used for resolving collisions when one bucket
has more than one data point allocated to it. The collision will be resolved by the bucket referencing a
linked list that contains all the elements that have been assigned to the specific bucket.
Tree Data Structures
Heap
A heap is a simple tree data structure where all the nodes in the tree are arranged in a specific order;
the data structure is represented as an array. There are two types of heaps, a max heap and a min heap
(Cormen, 1989). Min heaps are typically used in for queueing jobs in the CPU; the heap data structure is
essential in the implementation of priority queues for operating systems. Max heap and min heap follow
similar approaches where each node has a left child and a right child. In the max heap, the root of the
heap is the first item in the array, the parent node and its children are then determined by the following
rules:
Parent {A[i]} > = A[i]
Left Child{A[i]} = A[2i]
Right Child{A[i]} = A[2i +1]
Fig 9: Heap data structure (Hackerearth.com, 2018)
Another type of tree data structure that should be mentioned are B-Trees. A B-Tree is an almost
balanced rooted tree with lg n height. This data structure is typically used to index external storage by
storing multiple keys based on some criteria. Data in a B-Tree is stored in the leaf nodes, which makes it
efficient for insertion and searching, leading to its primary use for caching objects (Cormen, 1989).
Hash tables
A hash table is a special type of data structure that implements a hash function to map keys to actual
values (Larson 1988). Hashing can be implemented via the division or multiplication method. The
division method assigns keys using a hash function that takes the remainder of the division between the
number of available slots in the table m by the index of the value to be stored k. For an effective hash
function, m must be a large prime number so that there are less numbers with the same remainder,
thus reducing collisions. The best case search time for an element in a hash table is O(1), and the worst
case is O(n) (Cormen, 1989). The largest problem that hashing faces are collisions, which occur when
two or more keys hash to the same spot. The two approaches in resolving collisions are:
1.) Chaining
The chaining method handles collision resolution by putting all of the elements that collide into a linked
list (Cormen, 1989). When implemented correctly, the hash function should not assign all of the
elements to the same slot. Mapping all elements to the same key causes the hash table to become a
linked list. In this worst case scenario, the search time for the hash table is O(n). Chaining has an
advantage in where the hash table’s capacity is not limited. In general, chaining is preferred over open
addressing due to this fact.
2.) Open Addressing
Open addressing deals with collisions by continuously searching the array by incrementing the index
until a free slot is found (Cormen, 1989). This searching method is called probing. Probing can be The
advantage of using open addressing is that there are no additional data structures required, however, an
inefficient hash function increases the possibilities of the keys clustering which will subsequently
increase the required search time.
Hashing has many imperative implementations in application programming as it can be used to protect
or verify information. The most universal example of hashing is in password storage. When
programming an application, password storage is essential in allowing users access to their account.
However, a password cannot be simply stored as the string that has been input. Instead, once the user
determines a password, the string will get hashed and the hash will be stored in the system to prevent
security vulnerabilities. Hash tables can then be used as an efficient lookup table to retrieve the hashed
password once the user logs in.
4. Sorting Algorithms
Sorting algorithms are another essential part in application development and data management as it is
commonly used in the processing of data. In DML, the statement that calls a sorting mechanism is
ORDER BY. The efficiency of sorting algorithms is an important aspect of data management as choosing
the best sort is imperative in sorting extremely large datasets. The efficiency of sorting algorithms can
be evaluated using asymptotic notation and performance is represented graphically by a time
complexity comparison graph to see how the sort fairs with larger sets of data. The worst case
asymptotic notation is typically used as the base for efficiency comparison. Sorting algorithms can be
categorised into two major groups: naïve and efficient algorithms.
Naïve Sorting Algorithms
Naïve sorting algorithms encompass Bubble Sort, Selection Sort and Insertion Sort. These algorithms are
considered naïve as they sort each element by searching for its position amongst the other sorted
elements (Wirth, 1986). The distinguishing difference between the three sorts are as follows:
Bubble sort compares neighbouring items in the array and swaps them when A[i] < A[i-1].
Selection sort finds the smallest value in the array and swaps it with the item in the first
position.
Insertion sort takes elements from the array and inserts them into the correct position in a new
array.
All three naïve sorting algorithms exhibit quadratic worst case behaviour,
For examples of the implementation of the sorts in Java, please refer to Appendix [3.1] for Bubble sort,
Appendix [3.2] for Selection sort and Appendix [3.3] for Insertion sort.
Fig 10: Time complexity comparison for Naïve sorting algorithms using the same dataset.
Fig 10 shows that out of the three sorts, insertion sort is the best performer, followed by selection sort,
then bubble sort. All of the three sorts exhibit quadratic behaviour like its worst case performance. In
bubble sort, there is not much evident difference between worst case and best case. Selection sort
exhibits the most variation between the best case and the worst case as its performance is highly
dependent on whether or not the array is already partially pre-sorted. Bubble sort performs the worst
because it will perform the same number of comparisons on every unsorted value, no matter if the array
is somewhat pre-sorted as it does not take into account the order of the remaining items. Insertion sort
performs the best here as it divides the array into sorted and unsorted elements, therefore making
comparisons only on the sorted values.
These naïve sorts are no longer implemented in real life applications, as more efficient sorts have
replaced them. However, they served as the foundation for these efficient algorithms to be developed
upon. For the Java Implementation of Bubble sort, Selection sort and Insertion sort used to obtain the
data for Fig 10, please refer to Appendix [3.4], Appendix [3.5] and Appendix [3.6] respectively.
Efficient Sorting Algorithms
Many efficient sorting algorithms solve problems by recursion. A recursive function is defined as having
a base case, and a recursive case that will eventually resolve itself to the base case when input with
smaller arguments. The recursive algorithm works in three stages:
1.) Divide the problem into smaller sub-problems
2.) Solve the sub-problems through recursion, if the problem is small enough, return a value
3.) Combine the solutions to the sub-problems
This approach is called the divide and conquer principle and is applied in efficient sorting algorithms
such as Merge sort and Quick sort.
Recursive Algorithm Efficiency Calculation
The efficiency of a recursion algorithm can be solved using two methods: the recursion tree and the
master theorem.
The recursion tree method represents the recurrence as a tree with nodes that represent the sub-
problem cost. The overall efficiency of the recursive algorithm is then calculated by aggregating the
Worst Case:
Bubble Sort – O(n2)
Selection Sort – O(n2)
Insertion Sort – O(n2)
costs at all levels. The recursion tree is a highly effective method in visualising how a recursion algorithm
works, however, the limitation to using this method is that other methods, such as substitution, must be
used to verify its solutions.
The Master theorem evaluates the efficiency of recursive sorting algorithms using three evaluation cases
(See Appendix [4.1]). The solution is determined by larger of f(n) or n logba where a, b are positive
constants that satisfy the conditions a ≥ 1, b > 1, and f(n) > 0 (Cormen, 1989). The limitation of the
master theorem is that it does not cover all the cases, but its intuitive reasoning makes it an easy
method for evaluating algorithm efficiency.
Examples of using the master theorem to find the asymptotic notation for recursion algorithms are as
follows (Cormen, 1989):
Case 1 Example: T (n) = 16T (n/4)+ n ⇒ T (n) = Θ (n2)
Case 2 Example: T (n) = 4T (n/2)+ n2 ⇒ T (n) = Θ (n2 log n)
Case 3 Example: T (n) = T (n/2) + 2n ⇒ Θ (2n)
Unsolvable Example: T (n) = 0.5T (n/2)+ 1/n ⇒ Does not apply (a < 1)
Merge Sort
Merge sort is a divide and conquer algorithm that divides the unsorted array into n sub arrays that
contain 1 element (Knuth, 1998). The sub arrays are then merged into a new sorted array. Merge sort is
an extremely stable sort, with the worst case, average case and best case all equating to n log n.
However, the drawbacks to this sort is that it requires O(n) amount of memory in order to duplicate the
elements that must be sorted as sub arrays. As it is an out-of-place sort, the amount of memory
required increases as the dataset increases, leading to memory allocation issues for large datasets.
For an example implementation of Merge sort, please refer to Appendix [4.2]
Quick Sort
As an alternative to combat Merge sort’s memory allocation disadvantage, Quick sort can be
implemented. Quick sort is also a divide and conquer sorting algorithm that divides an array into smaller
sub arrays using partition and recursively sorts those arrays (Hoare, 1961). It can be more efficient than
merge sort when correctly implemented; the partition value is key in the efficiency of the Quick sort
algorithm. Quicksort has a best case and average case of n log n, whilst its worst case is n2. The
advantage to using Quick sort is that it only uses O(log n) memory, therefore it is the preferred method
as it can efficiently sort large datasets without causing any memory allocation issues.
Due to Quick sort’s advantages, it is the algorithm of choice for many practical solutions. For example,
Java’s primary system sort, or the Arrays.sort() method, utilises a 3-way partitioned Quick sort
algorithm.
For an example implementation of Quick sort, please refer to Appendix [4.3]
Fig 11:
Time complexity comparisons for e
i
Fig 11: Time complexity comparisons for effcient sorting algorithms using the same dataset.
Fig 11 portrays how Merge sort is the more stable algorithm. Although, a lot of Quick sort’s data points
still lie on the same range of that of Merge sort, meaning that the two sorts are generally comparable in
efficiency. However, the stability of the sort is not the only factor that is considered, and the limitation
of the time-complexity comparison graph is that it does not show the memory allocation used. The
trade-off between choosing Quick sort over Merge sort is more efficient memory usage over speed, and
guaranteed performance over stability. Therefore, when factoring in the memory allocation and
practicality, Quick sort is typically the better performing sort.
For the Java implementation of Merge sort and Quick sort on a large dataset that was used to obtain
data for Fig 11, please refer to Appendix [4.2] and Appendix [4.3].
For a table that compares the worst case, average case and best case time complexity comparison for
naïve sorting algorithms and efficient sorting algorithms, please refer to Appendix [4.4].
5. Conclusions
The methods detailed in the report provide a basis for the foundations of data management and
application development and should be studied extensively before implementation. The factors that
should be considered before any DBMS implementation are the requirements analysis, conceptual
schema, physical schema and external schema. Once a database is developed, the data in the database
can be accessed in different views through SQL queries. Queries can be written in different formats with
varying efficiency, therefore it is essential that best practices be studied
Apart from databases, data structures also play a crucial role in data management and application
development as they determine how the data is stored. Sorting algorithms must also be considered by
its performance efficiency on large data sets. Ultimately, there are still more sophisticated methods out
there that can be applied to this topic, however, it is imperative that these fundamental concepts be
understood by all data engineers in order to establish a solid foundation for future research.
Worst Case:
Quick Sort – O(n2)
Merge Sort – O(n lg n)
References
[1] Mahleko, B. (2018). “MMM010-340163 Data Management for Graduate Students – Lecture 02”.
Jacobs University. pp 20.
[2] Abidin, Siti & Ahmad, Suzana & M S Yafooz, Wael. (2010). A new system architecture for flexible
database conversion. WSEAS Transactions on Computers. 9]
[3] Chilson, D., & Kudlac, M. (1983). Database design: a survey of logical and physical design techniques.
ACM SIGMIS Database, 15(1), pp.13
[4] Chen, P. (1976). The entity-relationship model—toward a unified view of data. ACM Transactions on
Database Systems (TODS), 1(1),
[5] Ramakrishnan, R. & Gehrke, J. (2003). Database Management Systems (pp. 3-50). 3rd edition. New
York: McGraw-Hill.
[6] Brodie, M. & Schmidt, J. (1975) ANSI/X3/SPARC Study Group on Data Base Management
Systems. Interim Report. FDT, ACM SIGMOD bulletin. Volume 7, No. 2
[7] Barnett, G. & Del Tongo, L. (2008). Data Structures and Algorithms: Annotated Reference with
Examples. First Edition Copyright.
[8] Shaffer, C. (2009). A Practical Introduction to Data Structures and Algorithm Analysis Third Edition
(Java). Department of Computer Science. Virginia Tech Blacksburg, VA 24061.
[9] Wirth, Niklaus (1986), Algorithms & Data Structures, Upper Saddle River, NJ: Prentice-Hall, pp. 76–
77, ISBN 0130220051
[10] Shagufta, P. & Chandra, U. & Wani, A. (2017). A Literature Review on Evolving Database.
International Journal of Computer Applications (0975 – 8887). Volume 162, No 9
[11] Knuth, D. (1998). "Section 5.2.4: Sorting by Merging". Sorting and Searching. The Art of Computer Programming. 3 (2nd ed.). Addison-Wesley. pp. 158–168. ISBN 0-201-89685-0.
[12] Hoare, C. A. R. (1961). "Algorithm 64: Quicksort". Comm. ACM. 4 (7): 321. doi:10.1145/366622.366644.
[13] Larson, P. (1988). Dynamic Hash Tables. Commun. ACM, 31, 446-457.
[14] Cormen, T & Leiserson, C. & Rivest, R. & Stein, C. (1989). Introduction to Algorithms Third Edition.
pp 151 – 484
[15] Heaps/Priority Queues Tutorials & Notes | Data Structures. (2018). Retrieved from
https://www.hackerearth.com/practice/data-structures/trees/heapspriority-queues/tutorial. Accessed
on 22 November 2018
[16] Types of Data Structures in Computer Science and Their Applications. (2018).
https://techspirited.com/types-of-data-structures-in-computer-science-their-applications. Accessed on
22 November 2018
https://en.wikipedia.org/wiki/Niklaus_Wirthhttps://en.wikipedia.org/wiki/International_Standard_Book_Numberhttps://en.wikipedia.org/wiki/Special:BookSources/0130220051https://en.wikipedia.org/wiki/The_Art_of_Computer_Programminghttps://en.wikipedia.org/wiki/The_Art_of_Computer_Programminghttps://en.wikipedia.org/wiki/International_Standard_Book_Numberhttps://en.wikipedia.org/wiki/Special:BookSources/0-201-89685-0https://en.wikipedia.org/wiki/Tony_Hoarehttps://en.wikipedia.org/wiki/Communications_of_the_ACMhttps://en.wikipedia.org/wiki/Digital_object_identifierhttps://doi.org/10.1145%2F366622.366644https://www.hackerearth.com/practice/data-structures/trees/heapspriority-queues/tutorialhttps://techspirited.com/types-of-data-structures-in-computer-science-their-applications
APPENDIX
Appendix [1.1]
CREATE TABLE employees (eid INT PRIMARY KEY NOT NULL auto_increment,
ename VARCHAR(128),
email VARCHAR(128),
birthdate DATE,
salary FLOAT(25));
CREATE TABLE departments (did INT PRIMARY KEY NOT NULL auto_increment,
dname VARCHAR(128));
CREATE TABLE dependents (eid INT NOT NULL,
dependent_name VARCHAR(256));
CREATE TABLE manages (eid INT NOT NULL,
did INT NOT NULL,
start_date DATE,
end_date DATE,
PRIMARY KEY(did),
FOREIGN KEY(eid) REFERENCES employees(eid),
FOREIGN KEY(did) REFERENCES departments(did));
CREATE TABLE works (eid INT,
did INT,
start_date DATE,
end_date DATE,
FOREIGN KEY(eid) REFERENCES employees(eid) NOT NULL,
FOREIGN KEY(did) REFERENCES departments(did) NOT NULL);
CREATE TABLE policy (eid INT,
pid VARCHAR(128),
FOREIGN KEY(eid) REFERENCES employees(eid));
Appendix [1.2]
Consider the following relational schema. An employee can work in more than one department; the
pct time field of the Works relation shows the percentage of time that a given employee works in a
given department.
Emp(eid: integer, ename: string, age: integer, salary: real) Works(eid: integer, did: integer, pcttime:
integer) Dept(did: integer, dname: string, budget: real, managerid: integer)
Create a database based on the above schema.
SHOW DATABASES;
CREATE DATABASE employees_new;
SHOW DATABASES;
USE employees_new;
CREATE TABLE emp (eid INT PRIMARY KEY NOT NULL auto_increment,
ename VARCHAR(128),
age INT,
salary FLOAT(25));
CREATE TABLE dept (did INT PRIMARY KEY NOT NULL auto_increment,
dname VARCHAR(128),
budget FLOAT(25),
managerid INT,
FOREIGN KEY(managerid) REFERENCES emp(eid));
CREATE TABLE works (eid INT,
did INT,
pcctime INT,
FOREIGN KEY(eid) REFERENCES emp(eid),
FOREIGN KEY(did) REFERENCES dept(did));
SHOW TABLES;
DESC emp;
+-----------+------------------+-------+------+----------+------------------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+-------+------+----------+------------------------+
| eid | int(11) | NO | PRI | NULL | auto_increment |
| ename | varchar(128) | YES | | NULL | |
| age | int(11) | YES | | NULL | |
| salary | double | YES | | NULL | |
+-----------+------------------+-------+------+-----------+-----------------------+
INSERT INTO emp (ename,age,salary) VALUES
("Mimi",23,250000),("Akeem",35,21816),("Alexis",58,17439),("Jin",35,26836),("Clare",61,42221786),("El
eanor",27,5758651),("Murphy",65,232610),("Shad",61,1580),("Tobias",46,454323.50),("Randall",40,422
71.21),("Gray",31,12368.60);
SELECT * FROM emp;
+------+-----------+------+--------------+
| eid | ename | age | salary |
+------+-----------+------+--------------+
| 1 | Mimi | 23 | 250000 |
| 2 | Akeem | 35 | 21816 |
| 3 | Alexis | 58 | 17439 |
| 4 | Jin | 35 | 26836 |
| 5 | Clare | 61 | 42221786 |
| 6 | Eleanor | 27 | 5758651 |
| 7 | Murphy | 65 | 232610 |
| 8 | Shad | 61 | 1580 |
| 9 | Tobias | 46 | 454323.5 |
| 10 | Randall | 40 | 42271.21 |
| 11 | Gray | 31 | 12368.6 |
+------+-----------+------+--------------+
DESC dept;
+----------------+-----------------+--------+-------+-----------+--------------------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-----------------+--------+-------+-----------+---------------------+
| did | int(11) | NO | PRI | NULL | auto_increment |
| dname | varchar(128) | YES | | NULL | |
| budget | double | YES | | NULL | |
| managerid | int(11) | YES | MUL | NULL | |
+---------------+------------------+-------+--------+-----------+----------------------+
INSERT INTO dept (dname,budget,managerid) VALUES ("Software",60000,1),("Hardware",
10000000,4),("HR",5000,7),("Marketing",70000,2);
SELECT * FROM dept;
+-----+---------------+--------------+----------------+
| did | dname | budget | managerid |
+-----+---------------+--------------+----------------+
| 1 | Software | 60000 | 1 |
| 2 | Hardware | 10000000 | 4 |
| 3 | HR | 5000 | 7 |
| 4 | Marketing | 70000 | 2 |
+-----+---------------+--------------+----------------+
DESC works;
+------------+----------+-------+--------+-----------+---------+
| Field | Type | Null | Key | Default | Extra |
+------------+----------+-------+--------+-----------+---------+
| eid | int(11) | YES | MUL | NULL | |
| did | int(11) | YES | MUL | NULL | |
| pcctime | int(11) | YES | | NULL | |
+------------+----------+-------+-------+-----------+----------+
INSERT INTO works (eid,did,pcctime) VALUES
(1,1,100),(2,1,50),(2,2,50),(3,4,100),(4,3,90),(4,4,10),(5,1,75),(5,2,25),(6,3,100),(7,4,100),(8,2,60),(8,1,10)
,(8,3,30),(9,1,25),(9,2,25),(9,3,25),(9,4,25),(10,4,100),(11,2,100);
SELECT * FROM works;
+------+------+---------+
| eid | did | pcctime |
+------+------+---------+
| 1 | 1 | 100 |
| 2 | 1 | 50 |
| 2 | 2 | 50 |
| 3 | 4 | 100 |
| 4 | 3 | 90 |
| 4 | 4 | 10 |
| 5 | 1 | 75 |
| 5 | 2 | 25 |
| 6 | 3 | 100 |
| 7 | 4 | 100 |
| 8 | 2 | 60 |
| 8 | 1 | 10 |
| 8 | 3 | 30 |
| 9 | 1 | 25 |
| 9 | 2 | 25 |
| 9 | 3 | 25 |
| 9 | 4 | 25 |
| 10 | 4 | 100 |
| 11 | 2 | 100 |
+------+------+---------+
Write the following queries in SQL:
a. Print the names and ages of each employee who works in both the Hardware department and
the Software department
SELECT e.ename, e.age FROM emp e WHERE e.eid in (SELECT eid FROM works WHERE did = 1) AND e.eid
in (SELECT eid FROM works WHERE did = 2);
+-----------+------+
| ename | age |
+-----------+------+
| Akeem | 35 |
| Clare | 61 |
| Shad | 61 |
| Tobias | 46 |
+-----------+-------+
b. Find the managerids of managers who manage only departments with budgets greater than
$1 million
SELECT managerid FROM dept WHERE budget > 1000000;
+-----------+
| managerid |
+-----------+
| 4 |
+-----------+
c. Find the enames of managers who manage the departments with the largest budgets.
SELECT ename FROM emp WHERE eid = (SELECT managerid FROM dept WHERE budget = (SELECT
max(budget) FROM dept));
+-------+
| ename |
+-------+
| Jin |
+-------+
Appendix [1.3]
1. Find the number of employees hired in the year 2000. SELECT COUNT(emp_no) FROM employees e WHERE YEAR(hire_date) = "2000";
+----------+
| COUNT(*) |
+----------+
| 13 |
+----------+
2. Find the average age (in years) of employees who were hired in the year 2000
SELECT AVG(TIMESTAMPDIFF(year,birth_date,curdate())) AS avg FROM employees e WHERE YEAR(hire_date) = "2000";
+-----------------+
| avg |
+------------------+
| 60.7692 |
+-------------------+
3. Create a table called millenial_hires consisting of the following fields: a. id (auto increment, unsigned int(6), not null, primary key) b. first_name (varchar(30)) c. dob (date) Describe the table you just created and validate if the description matches the specification. CREATE TABLE millennial_hires (id INT(6) UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT, first_name VARCHAR(30), dob DATE);
DESC millennial_hires;
+------------+---------------------+-------+------+-----------+--------------------+
| Field | Type | Null | Key | Default | Extra |
+------------+---------------------+-------+------+-----------+--------------------+
| id | int(6) unsigned | NO | PRI | NULL | auto_increment |
| first_name | varchar(30) | YES | | NULL | |
| dob | date | YES | | NULL | |
+---------------+-----------------+-------+------+-----------+------------------+
4. Insert the first name and birth date of all the people hired in the year 2000 into the table created in the last task. Add your details to the table. Print out all the values. INSERT INTO millennial_hires(first_name, dob) (SELECT first_name, birth_date FROM employees WHERE YEAR(hire_date) = "2000"); INSERT INTO millennial_hires(first_name, dob) VALUES ("Mimi", "1995-02-07");
SELECT * FROM millennial_hires;
+----+-------------+----------------+
| id | first_name | dob |
+----+-------------+----------------+
| 1 | Ulf | 1960-09-09 |
| 2 | Seshu | 1964-04-21 |
| 3 | Randi | 1953-02-09 |
| 4 | Mariangiola | 1955-04-14 |
| 5 | Ennio | 1960-09-12 |
| 6 | Volkmar | 1959-08-07 |
| 7 | Xuejun | 1958-06-10 |
| 8 | Shahab | 1954-11-17 |
| 9 | Jaana | 1953-04-09 |
| 10 | Jeong | 1953-04-27 |
| 11 | Yucai | 1957-05-09 |
| 12 | Bikash | 1964-06-12 |
| 13 | Hideyuki | 1954-05-06 |
| 16 | Mimi | 1995-02-07 |
+----+---------------+----------------+
SELECT COUNT(*) FROM millennial_hires;
+----------+
| COUNT(*) |
+----------+
| 14 |
+----------+
5. From the table, delete the entries of employees who were born in or after the year 1960. Find the number of records in the table after deletion. DELETE FROM millennial_hires WHERE YEAR(dob) >= "1960"; SELECT COUNT(*) FROM millennial_hires;
+----------+
| COUNT(*) |
+----------+
| 9 |
+----------+
6. Add a new column called birth_year in the table. Put the birth year of each person as the values in this column and delete the dob column. Print the resulting table.
ALTER TABLE millennial_hires ADD COLUMN birth_year INT(4); UPDATE millennial_hires SET birth_year = YEAR(dob);
ALTER TABLE millennial_hires DROP COLUMN dob;
SELECT * FROM millennial_hires;
+----+-----------------+------------+
| id | first_name | birth_year |
+----+-----------------+------------+
| 3 | Randi | 1953 |
| 4 | Mariangiola | 1955 |
| 6 | Volkmar | 1959 |
| 7 | Xuejun | 1958 |
| 8 | Shahab | 1954 |
| 9 | Jaana | 1953 |
| 10| Jeong | 1953 |
| 11| Yucai | 1957 |
| 13| Hideyuki | 1954 |
+----+----------------+------------+
Appendix 2.1
1. Print all the tables in the database.
mysql> SHOW TABLES;
+--------------------------------+
| Tables_in_employees |
+--------------------------------+
| current_dept_emp |
| departments |
| dept_emp |
| dept_emp_latest_date |
| dept_manager |
| employees |
| salaries |
| titles |
+--------------------------------+
8 rows in set (0.00 sec)
2. Find and understand the schemas of all the tables.
mysql> DESC current_dept_emp; DESC departments; DESC dept_emp; DESC dept_emp_latest_date;
DESC dept_manager; DESC employees; DESC salaries; DESC titles;
+----------------+---------+------+-----+------------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------+------+-----+-------------+-------+
| emp_no | int(11) | NO | | NULL | |
| dept_no | char(4) | NO | | NULL | |
| from_date | date | YES | | NULL | |
| to_date | date | YES | | NULL | |
+----------------+-----------+------+----+----------+-------+
4 rows in set (0.00 sec)
+----------------+----------------+-------+------+-----------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+-----------------+------+------+------------+-------+
| dept_no | char(4) | NO | PRI | NULL | |
| dept_name | varchar(40) | NO | UNI | NULL | |
+----------------+-----------------+-------+-------+----------+-------+
2 rows in set (0.00 sec)
+---------------+-----------+-------+-----+-----------+-----------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-----------+-------+------+----------+-----------+
| emp_no | int(11) | NO | PRI | NULL | |
| dept_no | char(4) | NO | PRI | NULL | |
| from_date | date | NO | | NULL | |
| to_date | date | NO | | NULL | |
+----------------+-----------+-------+-----+----------+-----------+
4 rows in set (0.00 sec)
+---------------+----------+-------+-----+----------+----------+
| Field | Type | Null | Key | Default | Extra |
+---------------+----------+-------+-----+----------+----------+
| emp_no | int(11) | NO | | NULL | |
| from_date | date | YES | | NULL | |
| to_date | date | YES | | NULL | |
+----------------+---------+-------+-----+----------+----------+
3 rows in set (0.00 sec)
+----------------+----------+-------+------+-----------+--------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-----------+-------+------+-----------+--------+
| emp_no | int(11) | NO | PRI | NULL | |
| dept_no | char(4) | NO | PRI | NULL | |
| from_date | date | NO | | NULL | |
| to_date | date | NO | | NULL | |
+---------------+-----------+-------+------+----------+--------+
4 rows in set (0.00 sec)
+---------------+-------------------+--------+-----+-----------+---------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-------------------+--------+-----+-----------+---------+
| emp_no | int(11) | NO | PRI | NULL | |
| birth_date | date | NO | | NULL | |
| first_name | varchar(14) | NO | | NULL | |
| last_name | varchar(16) | NO | | NULL | |
| gender | enum('M','F') | NO | | NULL | |
| hire_date | date | NO | | NULL | |
+---------------+-------------------+-------+-----+-----------+----------+
6 rows in set (0.00 sec)
+----------------+---------+--------+-----+------------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+---------+--------+-----+------------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| salary | int(11) | NO | | NULL | |
| from_date | date | NO | PRI | NULL | |
| to_date | date | NO | | NULL | |
+-------------+---------+--------+-----+------------+-------+
4 rows in set (0.00 sec)
+-----------+-------------+------+------+-----------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------+-------------+------+------+-----------+-------+
| emp_no | int(11) | NO | PRI | NULL | |
| title | varchar(50) | NO | PRI | NULL | |
| from_date | date | NO | PRI | NULL | |
| to_date | date | YES | | NULL | |
+-----------+-------------+------+------+-----------+-------+
4 rows in set (0.00 sec)
3. Find the number of employees in the database.
mysql> SELECT COUNT(*) FROM employees;
+----------+
| COUNT(*) |
+----------+
| 300024 |
+----------+
1 row in set (0.19 sec)
4. List all the departments and their number.
mysql> SELECT DISTINCT dept_name, dept_no FROM departments ORDER BY 2;
+--------------------+---------+
| dept_name | dept_no |
+--------------------+---------+
| Marketing | d001 |
| Finance | d002 |
| Human Resources | d003 |
| Production | d004 |
| Development | d005 |
| Quality Management | d006 |
| Sales | d007 |
| Research | d008 |
| Customer Service | d009 |
+--------------------+---------+
9 rows in set (0.01 sec)
5. Find the number of female employees.
mysql> SELECT COUNT(*) FROM employees e where e.gender LIKE 'F';
+----------+
| COUNT(*) |
+----------+
| 120051 |
+----------+
1 row in set (0.16 sec)
6. Print the maximum and the minimum salary,
mysql> SELECT MAX(salary),MIN(salary) FROM salaries;
+-------------+-------------+
| MAX(salary) | MIN(salary) |
+-------------+-------------+
| 158220 | 38623 |
+-------------+-------------+
1 row in set (1.44 sec)
7. Print the department number and the corresponding number of employees who have ever worked
there.
mysql> SELECT d.dept_no, COUNT(d.emp_no) FROM dept_emp d GROUP BY 1 ORDER BY 1;
+---------+-----------------+
| dept_no | COUNT(d.emp_no) |
+---------+-----------------+
| d001 | 20211 |
| d002 | 17346 |
| d003 | 17786 |
| d004 | 73485 |
| d005 | 85707 |
| d006 | 20117 |
| d007 | 52245 |
| d008 | 21126 |
| d009 | 23580 |
+---------+-----------------+
9 rows in set (2.94 sec)
8. Print the department name and the corresponding number of employees who have ever worked
there.
mysql> SELECT dp.dept_name, COUNT(d.emp_no) FROM dept_emp d, departments dp WHERE
d.dept_no = dp.dept_no GROUP BY 1 ORDER BY 1;
+-------------------------+-----------------+
| dept_name | COUNT(d.emp_no) |
+--------------------------+-----------------+
| Customer Service | 23580 |
| Development | 85707 |
| Finance | 17346 |
| Human Resources | 17786 |
| Marketing | 20211 |
| Production | 73485 |
| Quality Management | 20117 |
| Research | 21126 |
| Sales | 52245 |
+-------------------------+-----------------+
9 rows in set (0.63 sec)
9. Print the department names and their corresponding average salaries.
mysql> SELECT dp.dept_name, AVG(s.salary) FROM dept_emp d, departments dp, salaries s WHERE
d.dept_no = dp.dept_no AND s.emp_no = d.emp_no GROUP BY 1 ORDER BY 1;
+-------------------------+--------------------+
| dept_name | AVG(s.salary) |
+-------------------------+-------------------+
| Customer Service | 58770.3665 |
| Development | 59478.9012 |
| Finance | 70489.3649 |
| Human Resources | 55574.8794 |
| Marketing | 71913.2000 |
| Production | 59605.4825 |
| Quality Management | 57251.2719 |
| Research | 59665.1817 |
| Sales | 80667.6058 |
+------------------------+-------------------+
9 rows in set (11.60 sec)
10. Print the employee name, employee id and the maximum salary earned by him or her. Only report
for the employees with the top 5 highest salaries.
mysql> SELECT CONCAT(e.first_name," ",e.last_name) AS full_name, e.emp_no, MAX(s.salary) FROM
employees e, salaries s WHERE e.emp_no = s.emp_no GROUP BY 2 ORDER BY 3 DESC LIMIT 5;
+-------------------------+-----------+---------------+
| full_name | emp_no | MAX(s.salary) |
+-------------------------+-----------+---------------+
| Tokuyasu Pesch | 43624 | 158220 |
| Honesty Mukaidono | 254466 | 156286 |
| Xiahua Whitcomb | 47978 | 155709 |
| Sanjai Luders | 253939 | 155513 |
| Tsutomu Alameldin | 109334 | 155377 |
+---------------------------+------------+---------------+
5 rows in set (4.18 sec)
Appendix [2.2] Consider the following schema:
Suppliers(sid: integer, sname: string, address: string) Parts(pid: integer, pname: string, color: string)
Catalog(sid: integer, pid: integer, cost: real)
The Catalog relation lists the prices charged for parts by Suppliers. Create a database based on the
above schema.
CREATE TABLE suppliers (sid INT PRIMARY KEY NOT NULL AUTO_INCREMENT, sname VARCHAR(128),
address VARCHAR(256));
CREATE TABLE parts(pid INT PRIMARY KEY NOT NULL AUTO_INCREMENT, pname VARCHAR(128), colour
VARCHAR(56));
CREATE TABLE catalog(sid INT,pid INT,cost FLOAT(25),FOREIGN KEY(sid) REFERENCES
suppliers(sid),FOREIGN KEY(pid) REFERENCES parts(pid));
INSERT INTO suppliers (sname,address) VALUES ("Martena","P.O. Box 872, 8417 Tellus.
St."),("Tatum","P.O. Box 216, 7552 Lacus, St."),("Gillian","4443 Donec Rd."),("Dylan","266-4108 Eu,
St."),("Austin","1114 Imperdiet St."),("Alice","P.O. Box 926, 5519 Feugiat. Avenue"),("Irma","P.O. Box
902, 5166 Pulvinar Rd."),("Gloria","P.O. Box 518, 5152 Tortor Av."),("Russell","Ap #428-7669 Sed
St."),("Amanda","1868 Orci. Ave");
INSERT INTO parts(pname,colour) VALUES ("Q6M-1V2","green"),("A8B-1P0","green"),("D7B-
6D3","blue"),("D1N-2E4","violet"),("F6B-8L7","orange"),("N2O-7V1","indigo"),("F8T-
1V3","indigo"),("T6F-0R9","indigo"),("T1Y-9M4","indigo"),("R0N-6N2","red");
INSERT INTO catalog (sid,pid,cost) VALUES
(7,7,"9220.76"),(2,7,"9833.77"),(1,9,"6641.91"),(8,10,"2505.70"),(4,5,"1601.36"),(3,10,"3887.97"),(10,9,"
2324.85"),(6,3,"6497.71"),(4,10,"2088.35"),(3,9,"9718.23"),
(5,5,"2442.83"),(6,6,"204.87"),(1,4,"716.85"),(8,10,"5489.20"),(7,1,"148.81"),(2,5,"713.07"),(5,6,"4232.0
9");
Write the following queries in SQL:
a.) Find the pnames of parts for which there is some supplier
SELECT pname FROM parts WHERE pid in (SELECT pid FROM catalog WHERE sid IS NOT NULL)
+--------------+
| pname |
+--------------+
| Q6M-1V2 |
| D7B-6D3 |
| D1N-2E4 |
| F6B-8L7 |
| N2O-7V1 |
| F8T-1V3 |
| T1Y-9M4 |
| R0N-6N2 |
+-------------+
b.) Find the snames of suppliers who supply every red part.
SELECT sname FROM suppliers WHERE sid in (SELECT sid FROM catalog WHERE pid in (SELECT pid FROM
parts WHERE colour LIKE 'red'));
+------------+
| sname |
+------------+
| Gillian |
| Dylan |
| Gloria |
+------------+
c.) Find the sids of suppliers who supply only red parts.
SELECT DISTINCT sid FROM (SELECT sid,colour from catalog c, parts p where c.pid = p.pid) t1 WHERE NOT
EXISTS (SELECT * FROM (SELECT sid, colour from catalog c, parts p where c.pid = p.pid) t2 where t1.sid =
t2.sid and t2.colour!='red');
+------+
| sid |
+------+
| 8 |
+------+
d.) Find the sids of suppliers who supply a red part and a green part.
SELECT sid FROM (SELECT sid,colour FROM catalog c, parts p WHERE p.pid = c.pid AND colour LIKE 'red')
t1 WHERE sid in (SELECT sid FROM catalog c, parts p WHERE p.pid = c.pid AND colour LIKE 'green');
Empty set (0.00 sec)
e.) Find the sids of suppliers who supply a red part or a green part.
SELECT DISTINCT sid FROM (SELECT sid,colour FROM catalog c, parts p WHERE p.pid = c.pid AND (colour
LIKE 'red' OR colour LIKE 'green')) t1;
+------+
| sid |
+------+
| 7 |
| 8 |
| 3 |
| 4 |
+------+
Appendix [3.1]
Bubble Sort Using Java
Appendix [3.2]
Selection Sort Using Java
Appendix [3.3]
Insertion Sort Using Java
Appendix [3.4]
Bubble Sort Implementation on the INPUT dataset
Appendix [3.5]
Selection Sort Implementation on the INPUT dataset
Appendix [3.6]
Insertion Sort Implementation on the INPUT dataset
Appendix [4.1]*
*Taken from: Cormen, T. H., Leiserson, C. E., Rivest, R. L. & Stein, C. (2001) Introduction to Algorithms, Second edition. Cambridge, Massachusetts: MIT
Press. Chapters 4.
Appendix [4.2]
Merge Sort Implementation on INPUT dataset
(Completed by Tanasorn Chindasook, Prateek Choudhary and Shengchen Dong)
Appendix [4.3]
Quick Sort Implementation on INPUT dataset
(Completed by Tanasorn Chindasook, Prateek Choudhary and Shengchen Dong)
Appendix [4.4]*
*Taken from "Know Thy Complexities!" Big-O Algorithm Complexity Cheat Sheet (Know Thy
Complexities!) @ericdrowell. Accessed December 09, 2018. http://bigocheatsheet.com/.