FYP1 Progress Report (final)

FYP I Final Report

Parallel - DParallel - D

Project Code: CS-491

Project Supervisor: Prof. Hasina Khatoon

Project Team:Muhammad Waqas Khan – 12k-2466

Submission Date: 22nd – Dec - 2015

____________________________________________________

Signature of the Project Supervisor

CS-491 FYP I Final Report 2.0

Document Information

Category Information

Customer NUCES-FASTProject Title Parallel-DDocument FYP I Final Report Document Version 2.0Identifier CS-491Status FinalAuthor(s) Abeeha, Ali Shah and Waqas KhanApprover(s) Prof. Hasina KhatoonIssue Date 4th – Dec - 2015

Definition of Terms, Acronyms and Abbreviations

This section should provide the definitions of all terms, acronyms, and abbreviations required to interpret the terms used in the document properly.

Term DescriptionJDBC Java Database ConnectionGPU Graphical Processing UnitCPU Central Processing UnitDBMS Database Management SystemDB DatabaseRAM Random Access Memory / Main MemoryOpenCL Open computing LanguageHSA Heterogeneous system Architecture

Project Coordination Office Page 2 of 17


ContentsFYP I Final Report................................................................................................................................1

1 INTRODUCTION........................................................................................................................5

1.1 Purpose of Document.............................................................................................................51.2 Intended Audience.................................................................................................................5

2 CONTEXT AND PRELIMINARY INVESTIGATION....................................................................5

2.1 Project Selection....................................................................................................................52.2 Project Background................................................................................................................62.3 Project Feasibility Analysis...................................................................................................6

2.3.1 Economic Feasibility......................................................................................................62.3.2 Technical Feasibility......................................................................................................6

2.4 Project Scope.........................................................................................................................72.5 Project Objectives..................................................................................................................72.6 Stakeholders...........................................................................................................................72.7 Operating Environment..........................................................................................................7

3 REQUIREMENT ANALYSIS.......................................................................................................8

3.1 User Requirements/ Use Cases..............................................................................................83.2 Domain Model.......................................................................................................................83.3 Use-Case Diagram.................................................................................................................83.4 System Specifications............................................................................................................9

3.4.1 Non-Functional Requirements........................................................................................93.4.2 Quality Requirements.....................................................................................................93.4.3 Interface Requirements...................................................................................................9

4 SYSTEM DESIGN.....................................................................................................................10

4.1 Hardware Component Design..............................................................................................104.2 System Architecture Design................................................................................................104.3 Application Design..............................................................................................................11

4.3.1 Sequence Diagram........................................................................................................114.3.2 State Diagram...............................................................................................................12

4.4 Strategy................................................................................................................................124.4.1 Future System Extension..............................................................................................124.4.2 System Reuse................................................................................................................124.4.3 Data Management.........................................................................................................12

4.5 Methodology........................................................................................................................124.5.1 Reading Data into Memory..........................................................................................124.5.2 Partial Processing of Queries........................................................................................134.5.3 Bitonic Sorting Algorithm............................................................................................144.5.4 Data in RAM................................................................................................................15

5 PROBLEMS FACED DURING THE DEVELOPMENT...............................................................15

6 ROADMAP FOR FINAL YEAR PROJECT – 2............................................................................16

6.1 Tools and techniques selection............................................................................................16 Project Coordination Office Page 3 of 17


6.2 Limitations...........................................................................................................................166.2.1 Hardware Limitation....................................................................................................166.2.2 Software Limitation......................................................................................................16

6.3 Future development plan during Final Year Project II........................................................177 References..................................................................................................................................18



1 INTRODUCTION

1.1 Purpose of Document The purpose of this document is to explain the audience about the project, which is to be developed; the

requirements, the goals to be achieved, how the application will interact with the system hardware and what the system requirements will be.

1.2 Intended AudienceThe document is to be read by development team, stakeholders, supervisor and anyone related to the

project development or project evaluation.

2 CONTEXT AND PRELIMINARY INVESTIGATION

2.1 Project Selection

The availability of GPU resources within the university premises contributed towards the motivation for this project. Upon further research, it was discovered that a lot of work can be done by using GPUs. Our initial research suggests that a considerable amount of work has been done on GPUs and Database Management Systems. However, the research also depicted that a lot of work can still be done if we correlate GPUs and DBMS.

Our research revealed that most of the heterogeneous systems have been made for a cluster of CPUs and GPUs. This leads to an important factor of power consumption. GPUs use a great amount of power as a price for superior speed and performance. While CPU clusters use less power than the GPU clusters, the performance is highly compromised.

Our project takes these factors into consideration. The functionalities and the specifications will be discussed in the sections that follow.



2.2 Project Background

The processors nowadays rely heavily on faster computations. By continuously increasing the power of the CPUs, we have encountered the power wall [1]. To deal with this, processors have a constrained amount of power allocated to them. With the limited amount of power available to the processors, people are moving towards heterogeneous computing [1].

The most common and easy approach is to use GPUs for quickening the processes. Big Data has become a popular term, and many researches have been conducted in order to find a way to optimally process it. People have created optimization methods for processing Big Data using a hybrid GPU/CPU based model [2],by optimizing queries using the GPUs [3] and by optimizing the data transfer between CPU and GPU by pipe-lining it [4].

Multi-core systems have gained popularity and are being used by many applications including database systems [7]. GPUs are very highly parallel processors as compared to CPUs that have pipelined and optimized cores [5]. Increasing number of cores in multi-core processor increase parallelism, but introduce more room for cache conflicts and performance degradation [7], however, GPUs do not face the similar problem in that respect.

CUDA allows programmers to write programs for the GPUs that support them[8]. This helps the programmer to take advantage of the high computation power of the GPUs [8]. CUDA offers advanced features such as allocation of device memory inside the running kernel and allows the CPU and GPU to share the same address space which allows us to avoid the bottleneck we encounter while sending data from CPU to GPU [6].

Our project takes into consideration all these facts and targets the use of a GPU for query processing and CPU for query scheduling.

2.3 Project Feasibility Analysis

2.3.1 Economic Feasibility Project require GPU, higher RAM and more hard disk space so it has a cost, however all these are

already available for implementing the project.

2.3.2 Technical Feasibility Yes, the project is feasible with current technology available. The possible technical risk is that the

algorithm used for this purpose may require more power than the system might have so this need to be resolved with the system specifications.

1.4.3 Operational FeasibilityThe project is operationally stable. It requires GPU for operating and other computer components.

1.4.4 Schedule FeasibilityThe project is assumed to be completed within the time.

1.4.5 Conclusion of Feasibility AnalysisBased on the above feasibility analysis, it can be concluded that the project is feasible and will be

completed on time.



2.4 Project ScopeThe features of the DBMS are as under:

The Utilization of Graphical Processing Unit(GPU)

The project utilizes GPU for query processing which will make the DBMS performance faster than other Databases that use CPU or CPU clusters. Support Multi-Platform Environment

The project will support multi-platforms i.e. if it is a shared server user can pass query in any supporting DBMS and get results in seconds or less. Multi-User Support

Multiple users can be supported with this Database Engine, because it is optimize for parallel massive data processing and focus on performance. Parallel and Big Data Processing

As the project is based on GPU, the main purpose of this project will be to process the big data in parallel. Other features include:

Scheduling the queries for processing Transaction log management and maintenance Managing RAM to act as a cache for DBMS Utilizing the CPU for the multiple algorithms being used by the DBMS or for using the RAM as

cache

2.5 Project Objectives

To implement a multi-platform DBMS that utilizes the GPU for massive parallel data processing, and to utilize other resources to optimize the performance and process big data in seconds.2.6 Stakeholders

The Primary Stakeholders are the one who will use the system, this include DB admins and the organization that use the DBMS for their purpose.

The secondary stakeholders are those who are associated with the project, this include the project team itself, supervisor and others associated with the project.

2.7 Operating EnvironmentThe environments in which the software will operate are as follows:

Hardware Platform: - The software requires a CUDA OpenCL supported GPU.- Relevant supported CPU- RAM equal or higher than GPU memory with 1GB extra.- Hard Disk enough space to store the Database and other transaction logs.

Operating System:- The operating system environment will be Linux preferably Ubuntu.



3 REQUIREMENT ANALYSIS

3.1 User Requirements/ Use CasesThe requirement for this project is simple a query will be inserted by the user. The software is required to use GPU to compute the result in the minimum processing time.

3.2 Use-Case Diagram

3.3 Domain Model



3.4 System Specifications

3.4.1 Non-Functional Requirements

3.4.1.1 Nature of the usersUsers are assuming to be DB expert or have prior knowledge about Databases, how to load Data into DBMS and how to run queries.

3.4.1.2 Error-HandlingWhenever user type a wrong query it will not be executed and will be reported to the user that what was wrong with the query. Also if in the process any fatal error occurred during runtime the operation will be aborted however the DBMS will remain in a stable state and whatever the error occurred it will be resolved without informing the user.

3.4.1.3 Performance ConstraintsThe DBMS is faster in performance from that of CPU or CPU cluster.

3.4.2 Quality Requirements

3.4.2.1 MaintainabilityFuture project(FYP-2) will be design as in self-maintenance, DBMS are self-maintenance and they run without the interference of the user in maintaining Database Data.

3.4.2.2 SimplicityUser interface will be simple as much as it can, there will be a console type interface too for interfacing the DBMS in realtime for user with higher knowledge there will also be a interface simplified and easy for users with less knowledge.

3.4.3 Interface Requirements

3.4.3.1 Hardware InterfaceThe following are the hardware interfaces and their characteristic:

- CPU Manages the entire algorithm, like how the query is set in the queue, how the data is retrieved and stored in RAM, and in what manner is sent to process.- RAM Contains the session logs and data from the hard disk to be processed and processed data that are waiting to be synchronized or send the result back to the user.- Hard Disk/Database Contains all the database files and logs and all the data from the user, when the request is made the stored data is sent to the RAM.- GPU The GPU processes any request queries’ data in parallel and it process multiple queries in a single time or single query divided into several parts and process it in parallel.

3.4.3.2 Software InterfaceThe following are the software interfaces and their characteristic:

- Query Analyzer



Responsible to check whether the query is correct or not. It checks the syntax and whether the entity exist or not and if the table exist or not. Valid queries are sent to the DB engine (we termed as Query Engine) for further processing.

- Query Engine When query is received, it goes through a number of procedures. First. CPU calculates how long it will take to compute the final result-set. If it will take too much time then the CPU divides the table into two and the query too. Such logs are maintained as backup. The queries are then sent to the waiting queue, during this time the data is loaded from hard disk to RAM and when the query is ready for processing, this data is synchronized with the GPU memory and the computation begins. The computed result is sent back to the query engine where CPU synchronizes the result-set coming from the CPU. If the result-set is from the partial query generated and then the query engine sends the result back to the user.

4 SYSTEM DESIGN

4.1 Hardware Component DesignBelow figure describe the components of the project, software component that interact with the system

hardware components:

4.2 System Architecture Design


Figure 3: The diagram shows how the components will interact with each other.


4.3 Application DesignThe application will be designed to answer query taken from user, the following sequence and

state diagrams shows how the system will respond to user query:

4.3.1 Sequence Diagram

4.3.2 State Diagram


Figure 3.2: Layered Approach System Architecture

Figure 5.1: User inserts a query into the query engine; if query is not verified, then a dialog message will appear for the user that the following query is not correct. After verifying the query, CPU fetches data from the DBMS and processes it. If there is any data failure CPU fetches the data again from DBMS and then sends it to the GPU for computation. After completion of the computations, GPU sends result back to CPU where CPU saves the result in RAM and sends result to the query engine where user sees the result.


4.4 Strategy

4.4.1 Future System Extension The future system extension will be an interactive design and it will be used for massive processing

of data through database.

4.4.2 System Reuse The system will be using same design for the GUI as that of other DBMS. The methods for

maintaining backups and databases will be reused from the already available sources, with some changes to be compatible with the GPU environment.

4.4.3 Data Management The databases are managed on hard drive and when a query arrives, it retrieves that data and stores it

into RAM for processing. Data management will be done when fetching the data into RAM and when storing the data on hard disk. We will maintain a log that will monitor the failures and the query processing time. This will help us in maintaining the data and to keep the data (that are generated from transaction processing error) out.

4.5 MethodologySome methodology for designing the DBMS is as under:

4.5.1 Reading Data into Memory Since the data is in the hard drive, it has to be in a structure where the program can read that data

efficiently. We want to omit those data, which are not necessary for processing for instance

Query: select name from table1 where age == 60;


Figure 5.2: There are different states i.e. User, Engine, CP, GPU. In User State, query is sent to the Query Engine. When it is in Query Engine State, the engine verifies the query. In CPU state the data is fetched from DBMS and sent to the GPU. While in GPU State, data is computed and result is sent back to the CPU, where CPU saves the result and sends the result to the Engine where the user sees the result.


ID Name Age Salary1 Sample1 50 400002 Sample2 60 30000

Here salary is an unnecessary entity and we don’t want to waste time in reading that data. The structure of the files has to be designed in such a way that it will know how many bytes should be skipped to load the required data into RAM. This will load less data and will save time and memory.

4.5.2 Partial Processing of Queries Massive data can go from MBs to GBs, even TBs. Loading this data from hard disk to RAM and

synchronizing this data with GPU-RAM can be very time taking. Therefore, to resolve this, we have come up with a solution that whatever is loaded into the RAM the engine can compute and store a partial result set back to it. During this time, more data will be loaded and the cycle can go on. Therefore, we parallelized the loading and processing of data. The advantage is that we can show the partial result when the user starts examining the results, more result will be shown on the bottom and the processing time that was to be wasted in loading is saved.

4.5.3Bitonic Sorting Algorithm

Sorting is the one of the most common operation in database therefore we use different sorting algorithms here we use bitonic algorithm. Bitonic is a parallel sorting algorithm. It is very efficient for heterogeneous systems. Data is distributed among multiple processors and then sorted in parallel.


Figure: 4.1: Partial Processing of Data


Bitonic uses the bitonic sequence (i.e., either in increasing order or in decreasing order). If the given sequence is not in bitonic sequence, then it first converts it into a bitonic sequence. The complexity of Bitonic algorithm is as follows:Best Case Performance: O (log (n2)) parallel timeWorst Case Performance: O (log (n2)) parallel timeAverage Case Performance: O (log (n2)) parallel timeWorst Case Space Complexity: O (n log (n2)) comparators Examples are 3, 2, 4, 1 after sort 1, 2, 3, 4 and 11, 13, 16, 35, 15, 4, 3, 2, 1 after sort 1, 2,3,4,11,13,15,16,35.

4.5.4Data in RAM

RAM can be divided into two parts; i) the part that contains frequently used data and ii) the part that contains the currently being used data. We can optimize first part of the RAM. We can use a strategy to store half of the data (i.e. if the table contains 1000 rows, we store 500 rows). In this way, we can store larger number of rows from of different tables. Here, we can reserve 1GB of ram for the frequently used data and 3 GB for the data currently being used. Here, we use most frequently used



data algorithm. By doing this, it can save the time for loading the data and increase the response time to the user, because the data is already present in part 1. But, if the data is not present in first part then the data is again loaded form hard disk to the RAM.

5 PROBLEMS FACED DURING THE DEVELOPMENT

We had to face a lot of problems due to the unavailability of the GPUs in the Syslab. We had access to the GPUs for about 2 months, and later the access was revoked. Because of that, we had to consider other options for continuing our project. We decided to move to OpenCL, since it also allows us to communicate with the GPU as well. However, there are a lot of problems regarding the SDK for OpenCL. Because of the problems with OpenCL, we have moved our platform to Linux.

6 ROADMAP FOR FINAL YEAR PROJECT – 2

6.1 Tools and techniques selectionWe wish to continue our work in CUDA, and we are to trying to get Research Grants from AWS, but

if this does not work, we will complete our project in OpenCL. Our requirements of system and operating system are specified below:

Hardware Platform: - The software requires a CUDA OpenCL supported GPU.- Relevant supported CPU- RAM equal or higher than GPU memory with 1GB extra.- Hard Disk enough space to store the Database and other transaction logs.



Operating System:- The operating system environment will be Linux (preferably Ubuntu).

6.2 Limitations

6.2.1 Hardware Limitation Due to the unannounced decline of the resource, now we have no supported hardware to continue with

CUDA GPU Programming. However we had to move to OpenCL. The only problem we are facing is to setup the environment for it, however this will be resolved soon.

6.2.2 Software Limitation As mentioned above we don’t have hardware to continue using CUDA. And we do not have the

official SDK of OpenCL for GPU programming. A number of tutorials are available but none is working in our GPU Context.

6.3 FUTURE DEVELOPMENT PLAN DURING FINAL YEAR PROJECT II

The above limitations will be resolved before the FYP-2 semester course starts. Once we have overcome our issues, we will download an open source DBMS code, which is based on CPU cluster (MariaDB) and revise their code with ours algorithms and GPU programming. Then, we will run tests on both of them and compare them based on time. After then we will download GPU based DBMS(s) and compare ours with them, again based on time and accuracy since the GPU DBMS were found to be less accurate in fetching data from database (less operations can be performed on GPU due to less instruction set)

The timeline is in the form of table, which is shown below:

S.NO Tasks Date of Completion(assumed)1 Fix problems with GPU SDK Before the start of FYP-22 Analyze Code (open source DBMS) January – 20163 Editing Code – To support multiplatform March – 20164 Editing Code – to Compute in GPU March – 20165 Editing Code – Writing routine log, cache etc. March – 20166 Completing the Database with error handling March - 20167 Running Benchmarks and finalizing our work April - 2016

7 REFERENCES

[1] S. Bre, M. Heimel, N. Siegmund, GPU-accelerated Database Systems: Survey and Open Challenges, 12-Dec-2014, pp 1-35



[2] P. Przymis, K. Kaczmarski, K. Stencel, A Bi-Objective Optimization Framework for Heterogeneous CPU/GPU Query Plans, Fundamenta Informaticae – Concurrency Specification and Programming 2012 (CS&P’13) Vol 135, Issue 4, October 2014, pp. 483-501

[3] M. Heimel, V. Markl, A First Step Towards GPU-assisted Query Optimizations, 2012

[4] L. Beyer, P. Bientinesi, Streaming Data from HDD to GPUs for Sustained Peak Performance, 18-Feb-2013

[5] P. Bakkum, K. Skadrom, Accelerating SQL Database Operations on a GPU with CUDA, GPGPU - 3, pp. 94 – 103

[6] NVIDIA. NVIDIA CUDA C programming guide. http://docs.nvidia.com/cuda/pdf/CUDA_C_ Programming_ Guide.pdf, 2014. pp. 31{36, 40, 213-216, Version 6.0, [Online; accessed 21-Apr-2014].

[7] R. Lee, X. Ding, F. Chen, Q. Lu, X. Zhang, MCC-DB: Minimizing Cache Conflicts in Multi-core Processors for Databases, 24-Aug-2009, China.

[8] M. Christiansen, C. E. Hansen. CUDA DBMS, GPGPU Programming, 10-Jun-2009, pp.1-71.


http://docs.nvidia.com/cuda/pdf/CUDA_C_

Date post:	12-Apr-2017
Category:	Documents
Upload:	waqas-khan
View:	85 times
Download:	0 times

FYP1 Progress Report (final)

Documents