CITEE 2012 Yogyakarta, 12 July 2012 ISSN: 2088-6578
1
Data Warehouse for Study Program Evaluation Reporting
Based on Self Evaluation (EPSBED) using EPSBED
Data Warehouse Model:
Case Study Budi Luhur University
Indra, Yudho Giri Sucahyo, Windarto Faculty of Information Technology of Universitas Budi Luhur, Faculty of Computer Science of Universitas
Indonesia, Faculty of Information Technology of Universitas Budi Luhur
Jl.Ciledug Raya, Post Code 12260, Jakarta, Indonesia
[email protected], [email protected], [email protected]
Abstract- Each of study program at a university in
Indonesia are required to report the results of academic
activities for one semester to the Directorate General of
Higher Education, The Ministry of National Education
of Republic of Indonesia (DIKTI) through the
Coordinator of Private Higher Education (KOPERTIS).
The reporting will be used to measure the performance
of the study program for each university in Indonesia.
The reporting process is known as the Study Program
Evaluation Based on Self Evaluation (EPSBED).
Until now data from the processing of EPSBED
not yet maximized by the executive party of Budi Luhur
University to become one of the reference in the field of
academic decisions. For this reason the analysis of the
data warehouse of EPSBED may be one of an important
component to be considered in any decision-making by
the executive party of Budi Luhur University. Moreover,
by using the EPSBED data warehouse the process of
generating report become faster in a count of minutes
because the process is automated and scheduled.
The methodology undertaken in this research
contains several stages. The first stage is analyzing the
needs of information which required by the executive of
Budi Luhur University. The second stage is to collect
data to fill up information needs. The third stage is
analyzing the data warehouse which designed using star
schema techniques and using the Pentaho for community
or open source version as a tool. The last stage is to
implement Online Analytical Processing (OLAP) from
the application of the data warehouse.
Keywords-components: EPSBED, data warehouse, star
schema, study program, and college
I. INTRODUCTION
A. Background
Since 2002, every college in Indonesia has carry
out its obligations in reporting of study program
performances using a particular database structure.
The database structure has been formalized and
packaged in a reporting system called Study Program
Evaluation Based on Self Evaluation known as
EPSBED by all universities in Indonesia. In
accordance with that was stipulated in Director
General of Higher Education Decree
No.34/DIKTI/Kep/2001.
In terms of the EPSBED reporting process, Budi
Luhur University must do so many query processes to
retrieve the required data from the database of it. This
is due to of UBL database has different structure from
EPSBED database structure. Moreover, it should be
done the data cleansing process because there are
many incomplete data.
The results of the process which generated by
EPSBED report has been building for many years in
the UBL database. To date, there has not been data
warehouse to show historical data results of these
EPSBED. The EPSBED data has not been maximized
as a material consideration by the executive in taking a
decision.
B. Problem Formulation
From those core issues, this is a
fundamental question for “How to design a data
warehouse to facilitate and accelerate the
reporting process and how to implement the
EPSBED data warehouse to be taken into
consideration material in any decision-making by
the executive of Budi Luhur University?“
C. Research Objectives
The final purposes of this research is to:
1) Describes the design and developing of data
warehouse to facilitate and accelerate the
process of EPSBED reporting.
2) Make a cube (fact table) and OLAP (Online
Analytical Processing) to view detailed of
EPSBED report using roll up and drill down
features.
II. LITERATURE REVIEW
A. EPSBED
1) EPSBED Definition
EPSBED is a reporting media which
organized by the study program of each college
CITEE 2012 Yogyakarta, 12 July 2012 ISSN: 2088-6578
2
to the Directorate General of Higher Education,
Ministry of National Education of Republic of
Indonesia (DIKTI). Under the provisions and the
legislation, each of study programs must report
their ongoing activities related to the academic
activities each semester. Since the academic year
of 2002-2003, the reporting of study program
activities has been using electronic data and the
reporting aspects includes institutional,
curriculum, lecturers, students, and associated to
infrastructures which accessed by the study
program (Ilah, http://evaluasi.dikti.go.id/).
2) The Legal Basis of EPSBED
Based on the Decree of Director General of
Higher Education Number: 08/DIKTI/Kep/2002
on Technical Guidelines for the National
Education Decree Number: 184/U/2001 About
Monitoring Control Guidelines and the
Development of Diploma Program, Bachelor and
Master degree in Higher Education (including
the provision of certificates and transcripts).
Those decrees are some of the legal bases in the
implementation of EPSBED in each of
Universities in Indonesia.
3) ESPBED Workflow (Figure 2.1)
EPSBED workflow is sequence of data
migration process from each college's internal
database to DIKTI EPSBED database. In
accordance with the reference of the Higher
Education Development Data Base (PDPT), the
workflow of EPSBED can be described below.
Figure 2.1
EPSBED Workflow [1]
B. Data Warehouse
1) Data Warehouse Definition
Data warehouse is a collection of data used
for management decision making, which subject
oriented (topic), integrated, time variant and not
easily to changed (Inmon, 2005). Turban, Sharda
and Delen (2011) explained that the data
warehouse is also used as a central repository of
past data and current data which potential for
manager's deliberation of an organization.
2) Data Warehouse Modeling Techniques
(Figure 2.2)
In this research will be used multi
dimensional model, where there are two
dimensions for each data warehouse, namely:
fact tables and dimension tables. Fact tables
generally have a foreign key and measurement.
Measurement is a field that has a numeric value,
used for the measurement (measure), while the
foreign key is the primary key of the
corresponding dimension in the design of the fact
table. The data warehouse modeling technique is
using star join approach. Star join approach
resembles the form of star, which is fact table in
the center and dimension tables surrounded it.
This approach can be seen in Figure 2.2.
Figure 2.2
Star Join Approach Multidimensional Data Model [3]
C. Data Warehouse Architecture
Data warehouse architecture can be described
below (Figure 2.3):
Figure 2.3 Data Warehouse Architecture [2]
CITEE 2012 Yogyakarta, 12 July 2012 ISSN: 2088-6578
3
Figure 2.3 show that data warehouse is divided
into four parts:
1) Source Data System
Sources of data obtained from various transactions
and production result of operational application of
the company that runs every day. Transactional
data is still a regular data or raw data.
2) Data Staging Area
Before entering into this phase, first stage data
extracted and entered into the staging area. At this
stage in the data is cleansed, reconciled, matched,
and standardized so that the data are clean from
defects, this process is commonly known as
transform.
3) Data & Metadata Storage
Once data are cleansed then the data inserted
(loaded) into the data warehouse. Data in the data
warehouse can be used as a material in
determining the policy (decision support) by the
executive in a variety of issues.
4) End User Presentation Tools
At this final stage is the development of an
existing data warehouse. One of these is the using
of data warehouse to use as business intelligence.
D. Data Warehouse Tools
To design a data warehouse is used Pentaho
Schema Workbench. As for the implementation
of the Online Analytical Processing (OLAP) is
using JPivot which is already integrated with BI
Pentaho Server. Both of these tools can be
downloaded through a site http://www.
sourceforge.net.
III. RESEARCH METHODOLOGY
1) Information Requirement Analysis
At this stage, the research carried out by
conducting in-depth analysis of the information
required by the executive. This information needs to
be the basis for data collection at a later stage.
2) Data Collection Techniques
At this stage the process of collecting data by
observation techniques, the study of literature and
interviews with relevant parties. Interviews were
conducted with the Head of Information and
Technology bureau, the Chairman of Information
Techniques study program, Information Systems and
Information Management Diploma 3. The result of
this stage is data transaction (OLTP) that will be
retrieved to be used in designing of data warehouse.
3) Designing The Data Warehouse
At this stage the data extracted from
transactional database, and then performed a cleansing
process to eliminate the empty or redundant data.
After cleansing, the data in will be transformed with a
view to defining the tables in the relational data
source. After the transformation process the data is
performed loading process to inserting data into the
data warehouse. In process of designing the data
warehouse used a star schema model. From the result
of designing a model is obtained a star schema fact
table that is expected to support the reporting of
EPSBED.
4) Results of Data Warehouse Processing Analysis
At this stage, the results of data warehouse
process will be developed to be used by the executive
as materials analysis in making decision. The results
of this processing are presented in the form of OLAP
(Online Analytical Processing) with more detailed and
dynamic in roll up and drill down feature.
From four stages of design above, below is
described the flow diagram (Figure 3.1):
Figure 3.1 Research Methodology Stages
IV. DATA WAREHOUSE ARCHITECTURE
1. Logical Data Warehouse architecture (Figure 4.1)
At Figure 4.1 contains an explanation of the logical
architecture of the data warehouse for EPSBED UBL
reporting needs.
CITEE 2012 Yogyakarta, 12 July 2012 ISSN: 2088-6578
4
Figure 4.1 Logical Architecture of EPSBED UBL Data
Warehouse
The data source is a source of data from the
entire academic transaction processes in UBL. The
data source is using Oracle 9i licensed software. At
the first stage, tables selection process would be
carried out which are needed in designing data
warehouse in accordance with the existing dimension
tables and fact tables, this process known as selection
process. Then the specified tables extracted thereafter
performed the data mapping from each of tables which
needs to be inserted into the data warehouse, this
process is known as extraction.
2. Physical Architecture of EPSBED UBL Data
Warehouse
Figure 4.2 Physical Architecture of EPSBED UBL Data Warehouse
UBL operational database is using Oracle 9i with
SID: SYSTEM. While the database which used for the
data warehouse is using Oracle 9i with SID: SIDIKTI.
While in ETL process is using Pentaho Data
Integration (Kettle) as a tool, for the cube is using
Pentaho Schema Workbench and Pentaho Analysis
Services OLAP.
V. DESIGNING THE DATA WAREHOUSE
In implementing the EPSBED data warehouse it
contains multiple fact tables, including the fact table
of student's academic activities.
1) Design of Student’s Academic Activities Fact
Table (FACT_TRAKM)
Figure 5.1 Design of Student’s Academic Activities Fact Table
(FACT_TRAKM)
FACT_TRAKM fact table is a fact table that is
used to generate reports of GPA distribution,
distribution of IPS and the number of student's credits
along their study in each subject and used to generate
student's status reports. FACT_TRAKM fact table
contains of a measurement and the foreign key.
Measurement is a numeric type field which used as a
measurement in the fact table. The foreign key is a
primary key in the corresponding dimension in the
design of the fact table and description of a
measurement and the foreign key in FACT_TRAKM
fact table.
Dimensions related to the design of academic
activities (FACT_TRAKM) fact table are dimension
of student (DIM_MSMHS), dimension of college
(DIM_MSPTI), dimension of GPA condition
(DIM_KONDISI_IPK), dimension of IPS condition
(DIM_KONDISI_IPS), dimension academic year
(DIM_TAHUNAJARAN), dimension of the study
level (DIM_JENJANG), dimension of study program
(DIM_MSPST) and dimension of status of student
(DIM_STATUS_MHS).
CITEE 2012
VI. RESEARCH DISCUSSION
A. Staging Process of Extract, Transform, and
Loading
After designing fact tables and dimension
the next stage are to do the extraction, transform
and loading (ETL) to obtain a valid data which stored
in the data warehouse. ETL processes can be
described as follows:
1) Extraction Process Extraction process is the process of taking data
from the data source as a field or a tabl
transactional database, which is required in the
EPSBED data warehouse. This process is done in two
methods. Those methods are a manual method and the
method of Kettle. The manual method is done because
the data were taken less than 20 records. Th
method is done only by using the query manually to
recall the data.
Kettle is a method of extracting the data source to
select a field or a table using Kettle's tool. The results
of extraction process can be seen in Figure
Figure 6.1 Extract Scheme using
Yogyakarta, 12 July 2012 ISSN: 2088
5
RESEARCH DISCUSSION
Staging Process of Extract, Transform, and
act tables and dimension tables,
extraction, transformation
(ETL) to obtain a valid data which stored
in the data warehouse. ETL processes can be
Extraction process is the process of taking data
from the data source as a field or a table from a
transactional database, which is required in the
EPSBED data warehouse. This process is done in two
methods. Those methods are a manual method and the
method of Kettle. The manual method is done because
the data were taken less than 20 records. The manual
method is done only by using the query manually to
Kettle is a method of extracting the data source to
select a field or a table using Kettle's tool. The results
ure 7.1.
Kettle
2) Transformation Process The transformation process is a process to adjust
the field's name from the data source with
fields dimension and fact tables in accordance with the
requirements of EPSBED data warehouse. Th
adjustment is done due to differences of database
structure in the data source with the data warehouse
structure. The results of transformation process can be
seen in Figure 6.2
Figure 6.2 Transform Process Scheme
3) Loading Process Loading process is the final process of the data
warehouse development stages, after
an extraction phase, transformation phase and
cleansing phase to be inserted into the data warehouse.
This loading process uses the Pentaho Data Integration
(Kettle) tool. The complete scheme of ETL process
described below (see Figure 6.3).
Figure 6.3 ETL Scheme using Kettle
After the ETL process then the data that was
inserted into the data warehouse is a subject
data, has dimension of time and integrated. The
of this data warehouse process can be used as
consideration materials by the executive to make a
decision.
ISSN: 2088-6578
The transformation process is a process to adjust
the field's name from the data source with attributes or
fields dimension and fact tables in accordance with the
requirements of EPSBED data warehouse. The
due to differences of database
structure in the data source with the data warehouse
structure. The results of transformation process can be
.2 Transform Process Scheme
al process of the data
after passing through
an extraction phase, transformation phase and
cleansing phase to be inserted into the data warehouse.
This loading process uses the Pentaho Data Integration
he complete scheme of ETL process
.3 ETL Scheme using Kettle
After the ETL process then the data that was
inserted into the data warehouse is a subject-oriented
data, has dimension of time and integrated. The results
of this data warehouse process can be used as
consideration materials by the executive to make a
CITEE 2012 Yogyakarta, 12 July 2012 ISSN: 2088-6578
6
B. TRAKM Cube Schema
After the ETL stages is completed, the dimension
tables and the fact tables already contain valid
required data for designing the OLAP in data
warehouse. Each dimension will be linked to a fact
table to become a star schema that will be used in the
data warehouse implementation. A tool named
Pentaho Workbench Schema is used to create a star
schema. In Pentaho Workbench Schema will contain a
single fact table (cube) which has some relevant
dimension tables. The fact table contains some
attributes and measurements that will be shown in a
figure below:
Figure 6.4 Cube Scheme in Workbench
From the figure shown above, a cube (fact
table) named c_trakm contains these attributes
nimhstrakm, kdptitrakm, kdjentrakm, kdpsttrakm and
the measurement that is the average of GPA and IPS.
The dimension tables which associated with c_trakm
cube are dim_mahasiswa, dim Perguruan Tinggi
Indonesia (dim_PTI), dim of level study and dim of
course (prodi). After making of cube is completed, the
next step is to publish the results of this scheme into
Pentaho cube bi-server to generate the necessary
OLAP for EPSBED data warehouse analyses.
C. Application Results of The EPSBED Data
Warehouse Model This part is explaining the implementation results
of EPSBED data warehouse model. This information
is used to generate visualization of EPSBED reporting
process and it will be displayed in an Online
Analytical Process (OLAP) form. The result contains
of these information of Cumulative Grade Point
Average, Semester Grade Point Average, distribution
of Cumulative Grade Point Average, and distribution
of Semester Grade Point Average. Figure below
shows the result of OLAP visualization of Cumulative
Grade Point Average and Semester Grade Point
Average.
Figure 6.5 Cumulative Grade Point Average dan Semester
Grade Point Average
As shown at Figure 6.5, by using drill down
from a Cumulative Grade Point Average and Semester
Grade Point Average in the odd semester of year
academic of 2010/2011 for Information Management
study program at Diploma 3 degree in UBL. On the
drill down can be seen that the Cumulative Grade
Point Average value is 2,935 and that the Semester
Grade Point Average value is 2,979. The drill down is
functioned to look at the Cumulative Grade Point
Average and the Semester Grade Point Average each
of study program or any other study program more
dynamic and detailed.
Besides being able to drill down, Mondrian
can also roll up. Here is the roll up of the Cumulative
Grade Point Average and the Semester Grade Point
Average (see Figure 6.6).
Figure 6.6 Roll Up of the Cumulative Grade Point Average and
the Semester Grade Point Average
At the figure 6.6 above is shown the result of
the use of roll-up feature from the Cumulative
Grade Point Average and the Semester Grade
Point of Information Management study program
in the academic year 20102011. The value of
Cumulative Grade Point Average in the academic
year 20102011 is 2.91 and the average value of the
Semester Grade Point Average is 2.792. Those
values mentioned above are an accumulation of all
CITEE 2012 Yogyakarta, 12 July 2012 ISSN: 2088-6578
7
students point at the Information Management
study program in the academic year of 2010/2011.
D. Data Warehouse of The EPSBED Facilitate and
Accelerate The Reporting Process of EPSBED However, after the EPSBED data warehouse being
formed it is already contains required historical data
for reporting the EPSBED. The reporting process of
EPSBED each semester (to report the academic data
each of program study) just take from an existing
EPSBED data warehouse. Formerly, EPSBED
reporting process used to using manual query to
retrieve data from scattered tables on academic
transactional database was usually takes a long time
that was about five days.
By the existence of this EPSBED data warehouse,
process of reporting EPBED become faster and will be
completed in a count of minutes, because the required
data have been prepared in dimension tables and fact
tables. Likewise, the previous semester's data had
been documented at this EPSBED data warehouse as
well.
VII. CONCLUSION
Based on the research that has been done, can be
summed up some of the following:
1) Data warehouse implementation in UBL can help
to solve problems in completing the EPSBED
reporting quickly. Before the implementation of
data warehouse, process of collecting data
includes: extracting, transform and load were
done by queries for EPSBED reporting needs.
Usually it takes a month to complete all the
reports by using queries. By the existence of data
warehouse, then overall data of EPSBED which
would be reported to DIKTI has been passed
stages of extract, transform and load using
Kettle. Thus, the data in EPSBED application is
more quickly presented and have a valid data. It
requires shorter time within only two hours in
completing the EPSBED report as well.
2) The process of reporting EPSBED is done
automatically and can be scheduled using the job
components of Kettle, so it simplify and speed
up the performance of EPSBED reporting team.
3) The results of EPSBED data warehouse
processing can be used as a material
consideration by the executive in determining
policies. The information is presented in a form
of a distribution of Cumulative Grade Point
Average reports, a distribution of Semester
Grade Point Average reports, reports of student's
status and graduation rates of students, the
number of active tenured lecturer reports and the
number of tenured lecturers reports based on
recent education in each of study program.
VIII. REFERENCES
[1] DIKTI. (2010). Pengembangan Pangkalan Data
Pendidikan Tinggi. May, 12 2011. Direktorat
Jenderal Pendidikan Tinggi, Kementerian
Pendidikan Nasional Republik Indonesia.
http://bapsi.ub.ac.id/documents/Paparan_PDPT_
Dikti_Hery.ppt
[2] Efraim Turban et al. (2007). Decision Support
and Business Intelligent System. Pearson.
[3] Inmon, W.H.(2005). Building The Data
warehouse. New York: John Wiley and Sons,
Inc.w
[4] Ilah. (2010). Evaluasi Program Studi
Berdasarkan Evaluasi Diri (EPSBED). May, 10
2011. Direktorat Jenderal Pendidikan Tinggi,
Kementerian Pendidikan Nasional Republik
Indonesia. http://evaluasi.dikti.go.id