STUDENT PART TIME JOB AS TUTOR SYSTEM
USING K-MEANS ALGORITHM
NUR ZARITH AKILLA BINTI AMBOAKA
BACHELOR OF COMPUTER SCIENCE
(INTERNET COMPUTING)
UNIVERSITI SULTAN ZAINAL ABIDIN
2018
STUDENT PART TIME JOB AS TUTOR SYSTEM USING K-MEANS
ALGORITHM
NUR ZARITH AKILLA BINTI AMBOAKA
Bachelor of Computer Science (Internet Computing)
Faculty of Informatics and Computing
Universiti Sultan Zainal Abidin, Terengganu, Malaysia
MAY 2018
i
DECLARATION
I hereby declare that this report is based on my original work except for quotations
and citations, which have been duly acknowledged. I also declare that it has not been
previously or concurrently submitted for any other degree at Universiti Sultan Zainal
Abidin or other institutions.
________________________________
Name : Nur Zarith Akilla Binti Amboaka
Date : ..................................................
ii
CONFIRMATION
This is to confirm that:
The research conducted and the writing of this report was under my supervision.
________________________________
Name : ..................................................
Date : ..................................................
iii
DEDICATION
In the name of Allah, the Most Gracious and the Most Merciful, all praise is only for
Him the documentation and the system for the subject, CSB 35102, Projek Ilmiah
2018/2019 is finished due the time. I would like to take these opportunities to give a
big thanks to my kind supervisor, Dr. Suhailan Bin Dato’ Safei for the valuable idea,
time, support, advice, guidance, and ideas given through the development of research
until complete the part of the project in phase one. Besides that, I also want to dedicate
my appreciation to my beloved family that supports and motivates me during finishing
this project. And not forget I would to thank a lot to friends that willing to lend their
hand for finishing the project. Lastly, thank you everyone who directly or indirectly
involved in the process of making the system and documentation
iv
ABSTRACT
Nowadays there is students who are need an extra pocket money to support their life
in university. One of the way to get an extra pocket money is to be a part time tutor
either among their friends in university or among the school students outside the
university. Being a part time tutor is so good for them to build their self-esteem and
also to gain an experience for their future career. However, some of them are still
confused to teach since they don’t really know how to assess their abilities in the
specific subject. Moreover, they need to proof to their future client or students that
they are capable to teach the subject. Therefore, this project was built to classify their
abilities to teach a subject based on their achievement in the courses that they take in
university. This project is important to convince another student who need a tutor in a
specific subject. To realize this project, clustering technique will be apply using
centroid based clustering algorithm, K-means. K-means is often called an
unsupervised learning, as we don’t have prescribed labels in the data and no class
values denoting a priori grouping of the data instances are given.
v
ABSTRAK
Pada masa kini terdapat pelajar yang memerlukan wang poket tambahan untuk
menyokong kehidupan mereka di universiti. Salah satu cara untuk mendapatkan
wang saku tambahan ialah menjadi tutor sambilan sama ada di antara rakan
mereka di universiti atau di kalangan pelajar sekolah di luar universiti. Sebagai
tutor sambilan adalah sangat baik bagi mereka untuk meningkatkan tahap
keyakinan diri mereka dan juga untuk mendapatkan pengalaman untuk kerjaya
masa depan mereka. Walau bagaimanapun, sesetengah daripada mereka masih
keliru untuk mengajar kerana mereka tidak tahu bagaimana menilai kebolehan
mereka dalam subjek tertentu. Lebih-lebih lagi, mereka perlu membuktikan
kepada klien atau pelajar masa depan mereka bahawa mereka mampu mengajar
subjek. Oleh itu, projek ini dibina untuk mengklasifikasikan kebolehan mereka
untuk mengajar mata pelajaran berdasarkan pencapaian mereka dalam kursus
yang mereka ambil di universiti. Projek ini penting untuk meyakinkan pelajar lain
yang memerlukan tutor dalam subjek tertentu. Untuk merealisasikan projek ini,
teknik clustering akan digunakan menggunakan algoritma kluster berasaskan
centroid, K-means. K-means sering dipanggil pembelajaran tanpa pengawasan,
kerana kami tidak menetapkan label dalam data dan tidak ada nilai kelas yang
menunjukkan kumpulan priori dari contoh data yang diberikan.
vi
CONTENTS
PAGE
DECLARATION i
CONFIRMATION ii
DEDICATION iii
ABSTRACT iv
ABSTRAK v
CONTENTS vi
LIST OF TABLES vii
LIST OF FIGURES xvi
LIST OF ABBREVIATIONS xv
CHAPTER I INTRODUCTION
1.1 Background 1
1.2 Problem statement 1
1.3 Objectives 1
1.4
1.5
1.6
Scopes
1.4.1 Scope Admin
1.4.2 Scope Student
Limitation of Work
Expected Outcome
2
2
2
1.7 Report Structure 3
CHAPTER II LITERATURE REVIEW
2.1 Introduction 4
2.2 Similar System 4
2.3 K-Means Clustering Algorithm
2.3.1 What is Clustering Technique
2.3.2 Introduction to K-Means Clustering
2.3.3 K-Means Clustering Algorithm
4
vii
CHAPTER III
METHODOLOGY
3.1 Introduction 7
3.2 Iterative Model 7
3.2.1 Requirement Phase 8
3.3 Analysis and System Design 9
3.3.1 Framework Design 9
3.3.2 System Design 10
3.3.3 Data Model 11
3.3.4 Technique
3.3.5 User Interface Design
15
16
REFERENCES 18
viii
LIST OF TABLES
TABLE TITLE PAGE
3.1 First table in chapter 3 8
3.2 Second table in chapter 3 9
3.3 Third table in chapter 3 13
3.4
3.5
3.6
Fourth table in chapter 3
Fifth table in chapter 3
Sixth table in chapter 3
14
14
14
ix
LIST OF FIGURES
Figure TITLE PAGE
2.1
2.2
First figure in chapter 2
Second figure in chapter 2
6
6
3.1 First figure in chapter 3 7
3.2 Second figure in chapter 3 9
3.3 Third figure in chapter 3 10
3.4
3.5
3.6
3.7
3.8
3.9
Fourth figure in chapter 3
Fifth figure in chapter 3
Sixth figure in chapter 3
Seventh figure in chapter 3
Eighth figure in chapter 3
Ninth figure in chapter 3
11
12
13
15
16
16
x
LIST OF ABBREVIATIONS / TERMS / SYMBOLS
CD Context Diagram
DFD Data Flow Diagram
ERD Entity Relationship Diagram
FYP Final year project
xi
LIST OF APPENDICES
APPENDIX TITLE PAGE
A Appendix 1 80
B Appendix 2 81
C Appendix 3 82
D Appendix 4 83
1
CHAPTER I
INTRODUCTION
1.1 Background
Student Part Time Job as Tutors System using K-Means Algorithm is a web
base application system. This system is developed based on academic
achievement in a subject. This is to help students who want to be a part-timer
teacher to teach subject that fit their skills in a particular subject. The problem
is how to choose a tutor based on their academic achievement in particular
subject. As example, if they wanted to be a tutor in Data Structure subject,
they must have a good result in basic programming subject and object-oriented
programming subject. This system will count how far they good in this subject
and clast them. To realize the system, K-Means Clustering Algorithm will be
used. To apply a tutor jobs, they need to fill in subject grade and the grade will
be calculated based on the centroids to determine they are in the right tutors
group.
1.2 Problem Statement
how to classify tutor teacher among students according to certain subject
achievement correctly.
1.3 Objectives
There is three main objective that to develop this system such as:
1.3.1 To analyze current problem in Student Part Time Job as Tutors.
1.3.2 To design a proposed system Student Part Time Job as Tutors
based on Subject grade using K-Means technique.
2
1.3.3 To develop system of Student Part Time Job as Tutors based on
Subject grade using K-Means technique.
1.4 Scope
There is two scope in this system :
1.4.1 Scope Admin
1.4.1.1 Admin can login to the system.
1.4.1.2 Admin can manage profile, which the student part timer
tutor profile.
1.4.1.3 Admin can create, update, and delete user profile.
1.4.2 Scope Student
1.4.2.1 Student can register to the system.
1.4.2.2 Student can add, update and delete their details in the
system.
1.4.2.3 Student need to fill in profile form and educational form
in the system.
1.4.2.4 Student can view recommended subject to teach at the
system.
1.5 Limitation of Work
1.5.1 The subject marks are entered manually by the students. It is up
to the management to validate the data.
1.5.2 This system only can cluster the result and give
recommendation to the part timer tutor.
1.6 Expected Outcome
This system is expected to group part time tutors based on similar course
achievement and assign them with a suitable subject to teach that suit their
skill. Finally, students will be given a list of recommended subjects that is
suitable with their range group.
3
1.7 Report Structure
This report structure has six (6) chapters. In the Chapter 1, the content consists
of project background, problem statement of project, the objective and system
scope. Then, Chapter 2 is about the study of literature review. This chapter is
reviewing the previous systems. The next is Chapter 3, describes the
methodology of research. This research used iterative model. Chapter 4
explains the system’s framework and design. Then, Chapter 5 is all about
implementation, testing and result. Lastly, Chapter 6 is the conclusion of the
whole project.
4
Chapter 2
LITERATURE REVIEW
2.1 Introduction
This chapter describes and explains about the literature review about technique
used for the development of a Student Part Time Job as Tutor System on
student’s subject achievement using K-Means Clustering Algorithm.
2.2 Similar System
Student Part Time Job as Tutor system is a project that built to help an
organization to choose the best tutor teacher among student. The system will
choose a tutor will choose a tutor base on a subject that there are good with,
which is they will be choose based on their achievement in particular subject
by calculate their grade of the subject. This is because not all of the student is
good with every subject they take. Some of them have a high understanding
and good achievement in particular subject. This is what we want so that they
can teach the other who didn’t good at the subject. Nowadays, a normal
procedure for tutor or lecture or teacher selections are based on CGPA and
interview session. This method does not guarantee completely that the selected
tutor is good in the job scope given. There is a lack of selection based on
certain subject achievement.
2.3 K-Means Clustering Algorithm
2.3.1 What is clustering technique
Clustering is a technique for finding similarity groups in a data, called clusters.
It is attempts to group individuals in a population together by similarity, but
not driven by a specific purpose. Clustering is often called an unsupervised
learning, as you don’t have prescribed labels in the data and no class values
5
denoting a priori grouping of the data instances are given (Manu Jeevan,2017).
This K-Means clustering is purposed by J.B. MacQueen (Zhang Yufang,2003).
2.3.2 Introduction to K-Means Clustering Algorithm
K-Means is a method of clustering observations into a specific number of
disjoint clusters. The ‘K’ refers to the number of clusters specified. Various
distance measures exist to determine which observation is to be appended to
which cluster. The algorithm aims at minimizing the measure between the
centroid of the cluster and the given observation by iteratively appending an
observation to any cluster and terminate when the lowest distance measure is
achieved.
2.3.3 K-Means Clustering Algorithm
K-Means defines a prototype in terms of a centroid, which is usually the mean
of a group of points and is typically applied to objects in a continuous n-
dimensional space. The K-Means clustering technique is simple and we begin
with a description of the basic algorithm.
2.3.3.1 Initial Centroids Selection
We first choose K initial centroids, centroid (k) is referring to a cluster centre
that is represented using the feature points for a group of the nearby assigned
objects. It is also used as a reference point in assigning objects into a cluster
based on their nearest distance to the centroid. In the beginning of the
assignment process, a number of K set of initial centroids need to be
predetermined so that the objects can be assigned accordingly. In basic K-
Means, these initial centroids are randomly selected among objects.
2.3.3.2 Nearest Cluster Assignment
Each point is then assigned to the closest centroid, and each collection of
points assigned to a centroid cluster. Clustering process begins by measuring
each object distance on each centroid (mk).
6
Figure 2.1 Nearest cluster assignment formula
where Sik is set of the object in cluster-k, k= 0 to K and d is a feature. The
objects will be assigned to a cluster where they have the closest distance to the
centroid. The distance measurement is using the Euclidean distance method; a
typical K-Means nearest object measurement.
2.3.3.3 Centroids Update
Then, the centroid of each cluster updated based on the points assign to the
cluster. We repeat the assignment and update steps until no point changes
clusters, or equivalently, until the centroids remain the same. This is the final
step where once the objects have been re-assigned, the centroid for each cluster
needs to be re-calculated.
Figure 2.2 Centroids update formula
where M is the total of objects in cluster-k, k = 0 to K and d=0 to D. This step
is to ensure that all objects that currently assigned to a cluster definitely belong
to that cluster (i.e. nearest to its new assigned centroid) and far away from
other clusters. If there is an object that turns out to be nearer to another
centroid, then this object needs to be reassigned to the nearest cluster. Thus,
iteratively, the whole process cycle starting from step (b) to (c) needs to be
repeated until there are no changes to the centroids in all clusters.
2.3.3.4 Basic K-Means Algorithm
1; Select K points as initial centroids.
2; repeat
3; Form K clusters by assigning each point to its closest centroid.
4; Recompute the centroid of each cluster.
5; until Centroids do not change.
7
Chapter 3
METHODOLOGY
3.1 Introduction
This chapter will discuss the methodology that has been used to develop the
system from the beginning until the system is completed. Methodology
process is very important in develop our system. It is because, it can describe
step by step about how to develop the system and also as a revision for the
next generation who will continue expand or to study the system. In addition, a
methodology is a formalized approach to implement Software Development
Life Cycle (SDLC). There are various models defined and designed for
software development process. The chosen SDLC model to develop this
system is Iterative Model Life Cycle. Details for every phase involved in this
system development will be explained in this chapter.
3.2 Iterative Model
Figure3.1 Iterative Model
8
In this model the process starts from the requirements and iteratively enhance
the requirements until the final software implemented. The development
begins by specifying and implementing just part of the software, which can
then be reviewed in order to identify further requirements. This process is then
repeated, producing a new version of the software for each cycle of the model.
This model works on four phases. The phases are, requirement phase, design
phase, implementation phase and evaluation phase. This model purposely used
because we can possibly do a better testing at each iteration. In addition, this
model does not require high complexity rate and the feedback is generated
quickly. However, this model requires planning of technical level and also it is
not easily understandable.
3.2.1 Requirement Phase
In this phase, the requirement for the software are gathered and analyzed.
Iteration should eventually result a requirements phase that produces a
complete and final specification of requirements.
3.2.1.1 Software Requirement
Software used to develop the Student Part Time as Tutor System.
Table 3.1 List of Software
9
3.2.1.2 Hardware Requirement
Hardware used to develop the Student Part Time as Tutor System.
Software Description
Laptop
• HP 15-r236TX
Processor: Intel® Core™ i3-4005U CPU @
1.7 GHz
RAM: 8.00 GB
OS: Window 10
GPU: NVIDIA GeForce FT 820M
Table 3.2 List of Hardware
3.3 Analysis and Design Phase
In this phase, the software solution to meet the requirement is designed. The
diagram of system framework, Context Diagram (CD), Data Flow Diagram
(DFD) and Entity Relationship Diagram (ERD) is built to clarify about the
actual system.
3.3.1 Framework Design
Figure 3.2 System Framework
The figure above shows the overview of the system. Both admin and student
will register and login to the system. Admin will update the available tutor
subject to the system, and student can view and apply as many subjects they
10
want. During apply for the subject, they will enter the requirement subject
mark and the mark will be calculate using K-Means technique in the system.
Once the calculation is done, the result we be give to admin for evaluation and
update the result to student if he or she is success or not.
3.3.2 System Design
3.3.2.1 Context Diagram
A system context diagram (CD) is a diagram that defines the boundary
between the system, or part of a system, and its environment, showing the
entities that interact with it. This diagram is a high-level view of a system.
Figure 3.3 Context Diagram
Figure above show the overview flow of the whole system where there is 2
entities included which is Student and Admin.
3.3.2.2 Data Flow Diagram
A data flow diagram (DFD) is a graphical representation of the “flow” of data
through an information system, modeling its process aspects. A DFD is often
used as a preliminary step to create an overview of the system without going
into great detail, which can later be elaborated.
11
3.3.2.2.1 Data Flow Diagram Level – 0
Figure 3.4 Data Flow Diagram Level-0 [Admin]
Figure above show the DFD Level-0 for Admin where there are 6 processes
included in Admin process.
12
Figure 3.5 Data Flow Diagram Level-0 [Student]
Figure above show the DFD Level-0 for Student where there are 6 processes
included in Student process.
3.3.2.3 Entity Relationship Diagram
Entity relationship diagram (ERD) is a graphical representation of entities and
their relationships to each other, typically used in computing in regard to the
organization of data within databases or information systems.
13
Figure 3.6 Entity Relationship Diagram
Figure above show the ERD of the system, where there is 5 entity and 6
relations included.
3.3.3 Data Model
A data model (or data model) is an abstract model that organizes elements
of data and standardizes how they relate to one another and to properties of
the real-world entities.
3.3.3.1 Admin
Table 3.3 Admin Data Model
Table above shows the details of admin data.
14
3.3.3.2 Student
Table 3.4 Student Data Model
Table above shows the details of student data.
3.3.3.3 Subject
Table 3.5 Subject Data Model
Table above shows the details of subject data.
3.3.3.4 Subject Mark
Table 3.6 Subject Mark Data Model
Table above shows the details of subject mark data.
15
3.3.4 Technique
3.3.4.1 K-Means Clustering
K-Means Clustering is the simplest unsupervised learning technique that can
solve clustering problem. The step follows a simple and easy way to classify a
given set of data set through a certain number of cluster (assume k clusters)
fixed a prior.
Define k centroids, one for each cluster.
These centroids should be placed in a wily way because of different
location cause different result. So, is better to place them as much as
possible far away from each other.
Take each point belonging to a given data set and associated it to a
nearest centroid.
When no point is pending the first step is done. At this point, recalculated k
new centroids as center of the clusters resulting from the previous step is
needed.
After this k new centroids, a new binding has to be done between the same
data points and nearest new centroids.
A loop has been generated, until it notices that the k centroids change their
location step by step until no more changes are done. In the simplest
words, centroids do not move any more.
3.3.5 Interface Design
This phase is when the software is coded, integrated and tested for prototyping
purpose.
16
3.3.5.1 Student Interface Prototype
Figure 3.7 Main Page
Figure above show the Main page of the system where Student need to login or
register to the system.
Figure 3.8 Dashboard Page
Figure above show the student Dashboard page where they can view the
available subject to teach and they can apply for it.
17
Figure 3.9 Application Page
Figure above shows the application page where student need to insert the
requirement subject mark by their self.
18
REFERENCES
Ju, C., & Xu, C. (2013). A New Collaborative Recommendation Approach
Based on
Users Clustering Using Artificial Bee Colony Algorithm, 2013.
Kodinariya, T. M., & Makwana, P. R. (2013). Review on determining number
of
Cluster in K-Means Clustering. International Journal of Advance Research in
Computer Science and Management Studies, 1(6), 2321–7782.
Li, C. S. (2011). Cluster center initialization method for K-means algorithm
over data
sets with two clusters. Procedia Engineering, 24, 324–328.
https://doi.org/10.1016/j.proeng.2011.11.2650
Li, Y., & Wu, H. (2012). A Clustering Method Based on K-Means Algorithm.
Physics
Procedia, 25, 1104–1109. https://doi.org/10.1016/j.phpro.2012.03.206
Yadav, S., Bharadwaj, B., & Pal, S. (2012). Data mining applications: A
comparative
study for predicting student’s performance. International Journal of Innovative
Technology & Creative Engineering, 1(12), 13–19. Retrieved from
http://arxiv.org/abs/1202.4815
https://doi.org/10.1016/j.proeng.2011.11.2650https://doi.org/10.1016/j.phpro.2012.03.206http://arxiv.org/abs/1202.4815