+ All Categories
Home > Documents > Components for the health information system DHIS 2 - Muni

Components for the health information system DHIS 2 - Muni

Date post: 11-Feb-2022
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
54
MASARYK UNIVERSITY FACULTY OF I NFORMATICS Components for the health information system DHIS 2 BACHELOR THESIS Tomáš Krajˇ ca Brno, spring 2010
Transcript
Page 1: Components for the health information system DHIS 2 - Muni

MASARYK UNIVERSITY

FACULTY OF INFORMATICS

}w���������� ������������� !"#$%&'()+,-./012345<yA|Components for the healthinformation system DHIS 2

BACHELOR THESIS

Tomáš Krajca

Brno, spring 2010

Page 2: Components for the health information system DHIS 2 - Muni

Declaration

Hereby I declare, that this paper is my original authorial work, which Ihave worked out by my own. All sources, references and literature used orexcerpted during elaboration of this work are properly cited and listed incomplete reference to the due source.

Advisor: Ing. RNDr. Barbora Bühnová, Ph.D. (MU Brno) and Univer-sitetslektor Lars Helge Øverland (UiO Oslo, DHIS 2 core developer)

ii

Page 3: Components for the health information system DHIS 2 - Muni

Acknowledgement

I would like to thank my supervisors Ing. RNDr. Barbora Bühnová, Ph.D.and Universitetslektor Lars Helge Øverland for their guidance, supportand many valuable pieces of advice.

Many thanks also belong to Magnus Funder Halldal, Diyar Amin andMagnus Hørven – my colleagues from the Open Source Software develop-ment course who cooperated with me on the excelimport project.

I am grateful to Jenna Linehan for her never ending patience when help-ing me with my English.

Last but not least, special gratitude goes to my family – Pavel Krajca,Martin Krajca and Vera Krajcova for their help, encouragement and lovethroughout this endeavor and my stay in Norway.

Finally, thanks to everyone else who made my life easier in the courseof writing this thesis.

iii

Page 4: Components for the health information system DHIS 2 - Muni

Abstract

This research was conducted in order to develop two components for Dis-trict health information software 2. The first component implements func-tionality for data import from excel sheets. The purpose of the second com-ponent is to allow for data visualisations via pivot tables. An emphasiswas placed on a study in the field of web-based pivot table solutions andtheir potential application and integration as a component into the Dis-trict health information software 2 system. Furthermore, the reader is in-troduced into the health-information-system problems from IT perspective.Last but not least, history, architecture and development process of Districthealth information software 2 are discussed.

iv

Page 5: Components for the health information system DHIS 2 - Muni

Keywords

DHIS 2, District Health Information Software 2, Health information sys-tems programme, Excel sheet data import, Medical information systems,Web-based pivot tables, Open-source software, Online analytical process-ing, Java frameworks

v

Page 6: Components for the health information system DHIS 2 - Muni

Contents

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Information systems . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1 Information and Information systems . . . . . . . . . . . . . 42.2 Specifics of health information systems . . . . . . . . . . . . 52.3 District Health Information Software . . . . . . . . . . . . . . 6

2.3.1 Health information systems programme . . . . . . . 62.3.2 Objectives of DHIS . . . . . . . . . . . . . . . . . . . . 8

3 DHIS 2 system overview . . . . . . . . . . . . . . . . . . . . . . . 93.1 Technologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3.1.1 Frameworks . . . . . . . . . . . . . . . . . . . . . . . . 93.1.2 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2.1 Domain model . . . . . . . . . . . . . . . . . . . . . . 123.2.2 Service layer . . . . . . . . . . . . . . . . . . . . . . . . 13

3.3 Development . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 Excelsheet data import . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.2 Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.3 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4.3.1 Importers . . . . . . . . . . . . . . . . . . . . . . . . . 194.3.2 Converters . . . . . . . . . . . . . . . . . . . . . . . . . 194.3.3 jExcel . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.3.4 Apache POI . . . . . . . . . . . . . . . . . . . . . . . . 214.3.5 jxls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.4 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 244.6 Tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5 Pivot tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.3 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1

Page 7: Components for the health information system DHIS 2 - Muni

5.5 Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.5.1 Testing environment . . . . . . . . . . . . . . . . . . . 345.5.2 JPivot and Mondrian . . . . . . . . . . . . . . . . . . . 355.5.3 DHIS 2 reporting module . . . . . . . . . . . . . . . . 365.5.4 Pentahoanalysistool . . . . . . . . . . . . . . . . . . . 365.5.5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5.6 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385.7 Observation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41A A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

A.1 Excel import module . . . . . . . . . . . . . . . . . . . . . . . 45A.2 OrgUnit hierarchy converter . . . . . . . . . . . . . . . . . . . 46

B B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48B.1 Pivot table loose integration . . . . . . . . . . . . . . . . . . . 48

2

Page 8: Components for the health information system DHIS 2 - Muni

Chapter 1

Introduction

Medical information systems are an essential part of health care services.They are used to store, validate, aggregate or analyze vast amounts of cap-tured data in order to provide key information for health-care analysts.Their decisions form medical programmes and strategies.

In the past, developing countries often suffered from the lack of suchsystems. Moreover, IT infrastructure and cost of the software can still bea problem. Hence, an open-source information system with low hardwarerequirements such as DHIS 2 is a convenient solution.

Data import is a crucial function in data-processing systems. There canbe no analysis without data. Filling in excel sheets is an easy to learn wayof making reports. Furthermore, unlike classical paper-based records, theycan be directly processed by computers to generate relevant statistics. Datavisualisation is another core functionality. The better reports an analyst gets,the easier he can see the patterns and make appropriate decisions. The con-cept of pivot tables makes data manipulations and visualisations be easilyachievable and understandable even by non-IT-aware analysts.

The goal of this thesis was to explore District Health Information Soft-ware 2 and research possible approaches to implement modules for excelsheet data import and pivot table visualisations.

The document is split into parts. The 2nd chapter introduces the readerinto the problems of information systems and specifics of District healthinformation software, followed by a description of the system architecturein chapter 3. The 4th chapter starts a practical part of this thesis and aimsto provide approaches used for developing the excelimport module. Chap-ter 5 describes a research in the web-based pivot-table area. Besides, dif-ficulties of its integration into DHIS 2 are discussed. The last chapter (6)reviews results and outlines possible approaches for future work.

Motto:Life is an interesting game.

3

Page 9: Components for the health information system DHIS 2 - Muni

Chapter 2

Information systems

The objective of this chapter is to give an introduction into the problemsof (health) information systems (IS). In the first part, the reader is brieflyacquainted with a background of data systems. The second part focuseson characteristics of health information systems (HIS) and presents the Dis-trict health information software system.

2.1 Information and Information systems

First of all, one should understand what the term information system means.

“The purpose of an information system is to collect, store, pro-cess and distribute information. Information systems compriseboth humans and computers, and their interplay.” [23]

But [23] also further argues that the concept of information is not clearlydefined. They define information as a part of data object which humansare able to interpret in such a way that one can obtain some knowledge.But what is the relationship among knowledge, information and data then?Solvberg et al. [23] consider this answer to be also unclear.

Another definition of a data system is from [5].

“Information system is an integrated set of components for col-lecting, storing, processing, and communicating information.”

As computers have become less expensive and more powerful, they havebeen increasingly used in more and more areas. Nowadays, people canhardly imagine world being without computers or ISs.

According to [20], organizations can mainly benefit from using comput-erized IS over paper-based one in the following areas.

• Quick analysis and processing of large amounts of data• Production of a wide variety of reports from a single data set

4

Page 10: Components for the health information system DHIS 2 - Muni

2. INFORMATION SYSTEMS

• Reduction of duplicated work• Data quality improvement (e.g. by automatic validation during data

entry)• Analysis and presentation improvements (they further facilitates in-

terpretation and use)

Modern information systems are increasingly built through reuse, modi-fication, integration or interfacing of available software components andframeworks. New methodologies also put more stress on the design phasesthan coding itself [23].

Data systems are used in practically every area of human interest, andespecially in the business area. Examples of large information processingsystems are listed in [23] as follows.

• A passenger booking system of airline• A transaction processing system of bank• A flight control system of modern airplane• A university information system• A health information system

2.2 Specifics of health information systems

Health information systems became a standard for manipulating medicaldata. A well functioning HIS is vital for efficiency of administration andimprovement of delivering health-care services.

A health information system can be defined as

“a system comprising all computer-based components whichare used to enter, store, process, communicate, and presenthealth related or patient related information, and which areused by health care professionals or patients themselves inthe context of inpatient or outpatient care.” [14]

Conrick [19] outlines his chapter about health data systems by saying:

“Healthcare is an information-intensive industry in which qual-ity and timely information is a critical resource.”

Health information systems as an application of information systems in the healtharea are sources of decisions for health management [20].

5

Page 11: Components for the health information system DHIS 2 - Muni

2. INFORMATION SYSTEMS

2.3 District Health Information Software

District health information software (DHIS) is an open-source medical sys-tem used mainly in developing countries. It is one of the core elementsfor Health information systems programme (HISP) to success.

A summary of the startup of HISP which further led to the develop-ment of DHIS is offered at [18]. After the apartheid regime in South Africain 1994, the country inherited one of the least equitable health care sys-tems in the world. Therefore, the new democratic government launcheda programme to reconstruct the health sector. As a result, a pilot projectto develop district health and management information system was pro-posed in early 1995 [17]. HISP was established in three pilot districts in andaround Cape Town in 1996. District health information software originatedas a collaboration between the University of Oslo and University of WesternCape, funded by NORAD1. The first version of DHIS was released in 1998.DHIS was used to capture and analyze monthly data reports at district, re-gional and provincial levels in Western Cape [18].

DHIS 1 was developed for Microsoft Access in Visual Basic for the MSWindows platform. MS Office and especially MS Excel were used for pur-poses of analysis. The system was published through the HISP website2.Even though DHIS 1 was released as freeware3, it needed commercial soft-ware to run [16].

In fall of 2003, lots of core actors within HISP-project criticised currentstate of DHIS 1.3. As a result, HISP hired a full-time researcher who startedto work on DHIS 2 in 2004. According to [17], the first release of DHIS 2(version 2.0-M1) was announced in February 2006. The current stable ver-sion 2.0.3-SNAPSHOT was published in December 2009 (see figure 2.1).

2.3.1 Health information systems programme

“HISP aims to support the improvement of health care sys-tems in the southern hemisphere by increasing the capacityof health care workers to make decisions based on accurateinformation.” [4]

HISP is a non-profit organization which focuses on improving health sys-tems at district levels. It encourages the use of health data at local levels

1. NORwegian Agency for Development Cooperation2. located on <www.hisp.info>3. free of charge software, see <http://en.wikipedia.org/wiki/Freeware>

6

Page 12: Components for the health information system DHIS 2 - Muni

2. INFORMATION SYSTEMS

Figure 2.1: District Health Information Software 2 web-based interface

in order to make decisions in connection with the local context. HISP usesDHIS to collect and process these data. DHIS is an important factor forHISP to succeed since the system makes changes in the institutions pos-sible. But it is not a driving factor as [24] further remarks.

The HISP organization supports or is responsible for HIS implementa-tion in many mainly development countries. These include South Africa,Nigeria, Botswana, Zanzibar, Zambia, 3-4 states in India, Malawi, Mozam-bique, Vietnam, Ethiopia, Tajikistan and Sierra Leone [17]. The areas in-volved in Health information systems programme are shown on map 2.2.

HISP cooperates with ministries of health, universities, private compa-nies and non-governmental organizations. They together support integra-tion of information systems through open standards and data exchangemechanisms. Focussing on local solutions for developing-country contexts,they subscribe to free and open-source philosophy of sharing their productssuch as training materials and software solutions [4].

“Databases using our District Health Information Software con-tain data representing over one billion patient visits.” [4]

7

Page 13: Components for the health information system DHIS 2 - Muni

2. INFORMATION SYSTEMS

Figure 2.2: HISP in the world (as of 01/2002) [17]

2.3.2 Objectives of DHIS

In 2004, new system called DHIS 2 started to be developed and also newconceptual goals were set.

Firstly, DHIS 2 should be platform-independent and fully open-source.The new software should also work with most relational databases. Butaccording to [22], a standard database can not always be used due to lo-cal needs. For instance, there may be licence issues or lack of trained staff.Secondly, DHIS 2 is supposed to be web-based with support of both net-work and standalone environments. Furthermore, it should be designedwith modular architecture, dynamic data model and flexible user interface[28]. New system shall support all existing functionality in DHIS 1.44.

4. last version of DHIS 1

8

Page 14: Components for the health information system DHIS 2 - Muni

Chapter 3

DHIS 2 system overview

This chapter aims to introduce an overview of core technologies and archi-tecture of DHIS 2. The first part summarises frameworks and tools. The nextpart presents database design and service layer of the system. Last but notleast, development-related questions are discussed.

3.1 Technologies

DHIS 2 takes advantage of many open-source software tools, frameworksand technologies. Using them makes application more independent andflexible, as well as it simplifies and speeds up development process. Toolslike bazaar or launchpad web portal can also make distributed develop-ment easier and effective.

DHIS 2 uses Java 6 (JDK 1.6) programming environment. Java program-ming language is designed to offer cross-platform compatibility which isone of the goals of DHIS 2. Java is developed by Sun Microsystems andlicensed under GNU General Public License1. DHIS 2 is designed to runon any J2EE compatible servlet container. As [24] says, the J2EE develop-ment platform offers a wide range of services for the development of Javaapplications. Among others, it simplifies development of multi-tiered ap-plications with a thin client which are, for instance, web applications.

A J2EE web application is usually packaged as a WAR2 file which can bedeployed to a Java web-application container, and finally accessed throughthe HTTP protocol [24].

3.1.1 Frameworks

Spring framework3 is used to handle concept called Dependency injection.

1. Open-source license, for details see <http://www.gnu.org/licenses/gpl.html>2. Web Application aRchive3. also referred to as Spring IoC containter

9

Page 15: Components for the health information system DHIS 2 - Muni

3. DHIS 2 SYSTEM OVERVIEW

Instead of hard-coding modules directly into an application, the moduleswhich should be used can be defined in a configuration file. Spring frame-work then injects modules into the application. This concept is also knownas aspect-oriented4 programming. It simplifies switching among variousimplementations of one specific task handled by a module. Spring frame-work is developed by springSource community and licensed with Apachelicense 2.05.

“Spring is a layered Java/J2EE application framework witha lightweight container that takes care of assembling of the DHIS 2modules, and the communication between them.” [24]

Hibernate framework is used in a persistence layer of DHIS 2 to providedatabase-independence. Most important data objects have their own storeimplementation which provides methods for the CRUD6 operations andqueries. For instance, class HibernateDataElementStore offers methods suchas addDataElement, deleteDataElement, etc. [27]. Hibernate is an object-relational mapping library for relational database management systems. Itsprimary focus is to map Java objects to database tables and vice versa. Hi-bernate is compatible with most of currently used relational databases likeOracle, Microsoft SQL server, Mysql, Postgresql and many more. The frame-work has its own query language HQL7 which gives an abstraction froma particular database environment. Hibernate is issued by Red Hat un-der GNU General Public License.

In the presentation layer, DHIS 2 developers use Struts 28 as a model-view controller in connection with Velocity9 as template engine. These twoframeworks support reuse and decoupling of application’s presentation(view) layer from its logic (model) layer [24]. This concept is also knownas model-view architecture. Both Struts 2 and Velocity are maintained byApache software foundation and issued with Apache license 2.0. All webmodules of DHIS 2 can be run as standalone applications or assembledinto one web portal.

Another useful piece of software is JUnit. It is a popular Java unit-testingframework which simplifies test phases of a development process and en-courages writing repeatable tests. JUnit tests are compiled and then linked

4. For details see <http://en.wikipedia.org/wiki/Aspect-oriented>5. Open-source license <http://www.apache.org/licenses/LICENSE-2.0.html>6. Create, Read, Update, Delete7. Hibernate Query Language8. re-brand of WebWork, <http://struts.apache.org/2.x/index.html>9. Homepage at <http://velocity.apache.org/>

10

Page 16: Components for the health information system DHIS 2 - Muni

3. DHIS 2 SYSTEM OVERVIEW

as JAR10 files at compile-time. The framework is released under Commonpublic license11 and maintained by a small group of developers.

3.1.2 Tools

Maven is a project management and build-automation tool produced byApache software foundation. It uses a construct known as POM12 to sup-port flexible configuration of a modular application. POM describes a wayto build the project, correlation among internal modules and its dependen-cies on other external modules. DHIS 2 uses Maven to couple its modulestogether and build a running release of the system [24], as well as for auto-mated unit testing with JUnit.

Native revision control system for DHIS 2 is Bazaar. It is an easy-to-use,open-source and flexible tool for distributed revision control. Revision con-trol is a concept used in almost any software project with a team of develop-ers. It simplifies undoing changes, comparing various versions of software,branching or history tracking. Bazaar is developed by Canonical Ltd. andits community.

The most commonly used application servlet container for District healthinformation system 2 is Jetty. Another open-source alternative is, for exam-ple, Tomcat. Jetty is a small and easily embeddable software application.It can be considered both an HTTP server or JSP/servlet container, andused for deployment of standard J2EE web applications. Jetty is embeddedin many popular projects like JBoss or Geronimo.

3.2 Architecture

DHIS 2 is designed according to a 3-layer architecture (see figure 3.1). The 3-layer architecture is a client-server concept which strictly splits applicationarchitecture into presentation, application13 and data14 layer. All these lay-ers are supposed to be independent of each other. That allows for reim-plementing just one specific part or layer of software without touchingthe others. The presentation layer provides implementation of user inter-face, the application layer implements functional-process logic, and finally

10. Java Archive File11. Open-source license <http://www.ibm.com/developerworks/library/os-cpl.html>12. Project Object Model13. also called business or logic layer14. also called persistence layer

11

Page 17: Components for the health information system DHIS 2 - Muni

3. DHIS 2 SYSTEM OVERVIEW

Figure 3.1: Three-layer architecture of DHIS 2 [27]

the data layer provides functionality for communicating with a computerdata storage (often a database server).

The design of DHIS 2 is made up of 42 maven projects, out of which 18are web modules. Having system split into a number of modules with lim-ited functionality makes maintenance or development of new modules eas-ier. There is no need to redesign the whole system in order to make changes,as opposed to monolithic systems which was, for instance, DHIS 1 [24], [27].

3.2.1 Domain model

The domain model of DHIS 2 is designed to be flexible in all dimensionsin order to allow capturing of any type of data (see diagram 3.2). The coreunit is DataValue which can be captured for any DataElement, Period andSource. Hence, DataValue represents a captured item in a certain time pe-riod reported by a certain organisation unit [27].

The central concept for data analysis and reporting is Indicator. It isbasically a mathematical formula consisting of DataElements and num-bers. Indicator means the difference between absolute and relative num-bers. For example, analysts need to know the context of the whole popula-

12

Page 18: Components for the health information system DHIS 2 - Muni

3. DHIS 2 SYSTEM OVERVIEW

Figure 3.2: Simplified domain model of DHIS 2 [28]

tion (relative), not only about a certain group of people which the data wassampled on (absolute).

The AggregatedDataValue and AggregatedIndicatorValue entities rep-resent processed data from DataValue and IndicatorValue tables. Besides,the whole DHIS 2 database model consists of 173 tables. The interestedreader can download a sample postgresql dump from the homepage of Dis-trict health information software 2.

3.2.2 Service layer

This layer is responsible for main logic of the whole system. Its main tasksinclude functionality for administration, settings, datamart, import, export,GIS15 mapping, reporting and user management [28]. The import-exportand reporting projects are further described as they were the topic of this the-sis.

The import-export project provides interface for producing and con-suming interchange-format files. The data-import operation includes im-port, preview or analysis of data. DHIS 2 currently supports DHIS eX-change Format (XML), Indicator eXchange Format (XML), DHIS 1.4 XMLformat (XML), DHIS 1.4 Datafile format (XML), Comma Separated Values

15. Global Infrastructure Service

13

Page 19: Components for the health information system DHIS 2 - Muni

3. DHIS 2 SYSTEM OVERVIEW

format (CSV) and Portable Document Format (PDF).The core components for importing and exporting are converter classes.

They provide read(Reader, ImportParams) method for data import cus-tomized by import parameters (e.g. type or strategy, details at 4.3.1), orwrite(Writer, ExportParams) method for data export customized by exportparameters. Every read method is supposed to call abstract read methodinherited from the AbstractConverter class. This method dispatches pro-cessed objects to the analysis, preview or import routines depending onthe import parameters. The ImportService, or ExportService interface isthen responsible for invoking an appropriate converter with its import, orexport method. Each set of converters for the same data object is inher-ited from the abstract converter class for this object. This allows simple ex-tensibility since converters for new formats extend corresponding abstractconverter class and reuse its functionality [27]. The system-flow diagram isshown at figure 4.2.

The reporting project consists of ReportTable, Chart, Document and Piv-otTable submodules. The pivot table module offers functionality for filter-ing (slice and dice) and pivoting the data (functions are further explainedin section 5.1). The business logic (data manipulation) of pivot tables is im-plemented in Javascript and located in the presentation layer. That is notconsidered convenient because it requires a fat client. The data filtering andretrieval is programmed in the service layer in Java [27].

3.3 Development

DHIS 2 is licensed under Berkelley Source Distribution16 (BSD) license.As opposed to DHIS 1, it strictly keeps the idea of requiring only open-source software for its run. BSD license allows users to freely develop add-ons or derive products which are not necessarily open-source.

DHIS 2 has lots of different contributors. Many of them are students,master students, doctoral students or professors from the University of Oslo.Besides, the University of Oslo sends many students abroad to help developthe project further from there and to train, integrate and form local develop-ment teams. These teams can then customize the system according to localneeds [16].

The current situation about development of DHIS 2 is tracked on launch-pad <https://launchpad.net/dhis2> web portal where one can findinformation about main milestones, bug reports, tasks or other developer-

16. Open-source license <http://www.linfo.org/bsdlicense.html>

14

Page 20: Components for the health information system DHIS 2 - Muni

3. DHIS 2 SYSTEM OVERVIEW

related issues. Official homepage for DHIS 2 is located on <http://dhis2.com/>. Moreover, the reader can explore live demo of the system at <http://dhis.uio.no/demo>.

15

Page 21: Components for the health information system DHIS 2 - Muni

Chapter 4

Excelsheet data import

This chapter introduces the practical part of the thesis. The task was to ex-tend DHIS 2 of the functionality for importing data from excel sheets.

Firstly, the reader is provided with motivation for the task. Secondly,specific requirements and problems are outlined. The third section analysesalready implemented modules and describes several APIs for manipulatingdata in excel sheets. The last part summarizes implementation of the mod-ule and concludes achievements.

4.1 Motivation

An excel sheet is de-facto standard and widely-used type of file produced,for instance, by Microsoft’s office. Use of excel sheets is still in many devel-oping countries a common way of saving medical data and creating reports.Besides, filling in excel-sheet data templates is the easiest way of report-ing medical reports in many cases. Hence, the functionality for importingfrom excel sheet data templates was a desired extension of DHIS 2.

4.2 Requirements

The task was divided into sub-tasks. The first one was to define a format fordata representation in excel sheets. That format was supposed to be intu-itive and easy-to-use. The second task was to implement conversion fromexcel sheets to Java domain objects. Finally, those objects should be per-sisted to the database, integrating and using the existing framework for im-port operations in DHIS 2.

Reported data were supposed to be in the following structures:

• Data element (uuid, name, alternativeName, shortName, code, de-scription, active, type, aggregationOperator)

• Period (periodType, startDate, endDate)

16

Page 22: Components for the health information system DHIS 2 - Muni

4. EXCELSHEET DATA IMPORT

• Organisation unit (uuid, name, shortName, organisationUnitCode,openingDate, closedDate, active, comment)

• Hierarchy of organisation units (parent, child0, child1)• Data value (Data element, Period, Organisation unit, value, storedBy,

timeStamp, comment)

As the reader can see, the structure for Data value contains other data struc-tures, namely Data element, Organisation unit and Period. Furthermore,there is a two-level hierarchy of Organisation units.

According to user demands, the following requirements were set (illus-trated at figure 4.1):

• Import data from an excel sheet• Preview data from an excel sheet• Analyse data from an excel sheet

In addition, the system specification was given as follows:

• Reuse current functionality for importing in DHIS 2• Find a suitable API for working with excel sheets• Follow the schema of already implemented modules for importing• Use spring framework for dependency injection• Define a standard format of excel sheets for the data structures• Reuse GUI1 of DHIS 2

4.3 Analysis

As already stated, the resulting solution was required to integrate nicelywith the existing import-export framework of DHIS 2. Every import-exportmodule in DHIS 2 consists of converters and importers and is collectedas a Spring bean2. This simplifies integration of new modules.

Every request for an import-export module triggers execute method ofthe ImportAction class (see class-hierarchy diagram 4.2). The request is pro-cessed through the org.amplecode.cave3 framework and passed to an in-stance of the ImportInternalProcess class. According to the type of import-ed/exported file, an appropriate import/export module is further executed.

1. Graphical User Interface2. bean denotes a class which is managed by Spring IoC container3. further reading at <http://www.amplecode.org/>

17

Page 23: Components for the health information system DHIS 2 - Muni

4. EXCELSHEET DATA IMPORT

Figure 4.1: Use case diagram of the import module

18

Page 24: Components for the health information system DHIS 2 - Muni

4. EXCELSHEET DATA IMPORT

For example for an excel sheet file, method importData of the DefaultXL-SImportService class is executed. This method further calls the relevant con-verters which finally access hibernate framework through the HibernateIm-portObjectStore class.

Furthermore, an API for loading data from excel sheets needed to beselected. The selection finally boiled down to jExcel4, jxls5 and Apache POI6

which are further discussed.

4.3.1 Importers

The objective of importers is to define method importData whose prototypeis specified in the importService interface. The first argument of this methodis a set of import parameters. These parameters determine import type (pre-view, analysis, import), checking for matching data values, a boolean optionto import data values or a strategy for accepting incoming records. The sec-ond argument denotes an input stream. This is very general type, thereforeit can be an XML stream, file with comma-separated values, zip file as wellas an excel sheet or other file types.

The main goal of the importData method is to handle an input streamfor it to be ready for further processing in converters. The other aim is to callthe right and properly initialized converter.

4.3.2 Converters

The converter classes are supposed to implement at least one of the read orwrite methods (see appendix A.2).

The read method takes as arguments an input stream and import pa-rameters. It implements functionality for mapping data from input sourcesinto Java objects. These objects are later saved into a database via the readmethod from the org.hisp.dhis.importexport.converter package.

The write method takes an output stream and export parameters as ar-guments. This method loads data objects from the database and writesthem into an output stream. Depending on the type of export, it can cre-ate an XML file, excel sheet, CSV file, etc.

4. Homepage located at <http://jexcelapi.sourceforge.net/>5. Homepage at <http://jxls.sourceforge.net/>6. Homepage at <http://jakarta.apache.org/poi/>

19

Page 25: Components for the health information system DHIS 2 - Muni

4. EXCELSHEET DATA IMPORT

Figure 4.2: Class hierarchy diagram of import-export modules

20

Page 26: Components for the health information system DHIS 2 - Muni

4. EXCELSHEET DATA IMPORT

4.3.3 jExcel

jExcel7 API is an open-source project which allows developers to dynami-cally read or generate excel spreadsheets. It is licensed under GNU LesserGeneral Public Licence8.

JExcel API requires Java 2 JDK to run. When dealing with large spread-sheets, allocation of extra memory space for a Java virtual machine is oftenneeded.

This API is quite simple to use as we can see, for instance, at [21]. It sup-ports reading operations for Excel 95, 97 and 2000 file formats and generat-ing workbooks for Excel 97 and later. It can also read and write formulas.Last but not least, it handles font, number and date formatting.

4.3.4 Apache POI

Apache POI9 is a complex and open-source project which provides the pro-grammer with modules for working with files based on Microsoft’s OLE 2Compound Document Format10 or Office Open XML file format11. Theseinclude Microsoft office’s documents from version Office 97 onwards. Thisproject is licensed under Apache Software Licence 2.0.

Apache POI requires Java 1.5 JDK or newer JDK versions.POI provides a fairly complex API for dealing with spreadsheets. It

works with low level structures and provides an event-model API for ef-ficient read-only access as well as full user-model API for creating, readingand modifying excel-sheet files

When dealing with complex, rich-formatted worksheets, a bunch of Javacode usually have to be written. The code created using this approach ishard to debug and the whole process is time-consuming.

4.3.5 jxls

jXLS12 is a small and easy-to-use open-source Java library for manipulatingExcel files. It uses functionality of the Apache POI API. jXLS is publishedunder GNU Lesser General Public Licence.

7. Homepage at <http://jexcelapi.sourceforge.net/>8. Open source license <http://www.gnu.org/copyleft/lesser.html>9. Homepage at <http://poi.apache.org/>10. MS Office 2000 or earlier versions <http://sc.openoffice.org/compdocfileformat.pdf>11. MS Office 2007 or later versions <http://en.wikipedia.org/wiki/Office_Open_XML>12. Homepage at <http://jxls.sourceforge.net/>

21

Page 27: Components for the health information system DHIS 2 - Muni

4. EXCELSHEET DATA IMPORT

jXLS requires Apache POI 3.2 and Java 1.5 JDK or their later versions.This API uses XLS templates, therefore an acceptable amount of code is

required for a creation of complex workbooks. Besides, it is already incor-porated in DHIS 2.

jXLS works with the same workbooks as Apache POI since it uses itas a back-end for reading and writing operations.

4.4 Design

First of all, the format of excel sheets was to be outlined. To make it sim-ple, every data structure was defined on its own sheet. Values of period, or-ganisation unit, hierarchy of organisation units and data element structurescould then be placed into individual columns since they were not depen-dent on any other structures.

Secondly, the format of the sheet with Data values which are depen-dent on Organisation units, Periods and Data elements was to be defined.Organisation units and Data elements could be referenced by their uuidssince those were unique. But there was no unique single value in the Pe-riod structure. Thus, Periods could be referenced by either a composed keyfrom periodType, startDate and endDate, or by adding an extra uniquevalue to the Period structure. The first option with a composed key waschosen because repeating values was considered more human-acceptablethan creating additional ones.

A few requirements were established when choosing API for accessingworksheets. The API was supposed to be open-source, simple to use andcompatible at least with Microsoft’s excel file formats. These requirementseliminated Apache POI since it was too complex. Finally, jXLS was chosenbecause it had already been used in DHIS 2 and therefore a redundant de-pendency was avoided.

Afterwards, the importData method was designed (see diagram 4.3).Firstly, it converts an input stream into a workbook object. Next, it findssheets with appropriate data structures (referenced case-insensitively byname) and sorts them for the Data value sheet to be loaded as the last one(it contains references to the other sheets).

The following task was to create converters for individual sheets. Ev-ery column of a sheet is referenced by his name from the appropriate datastructure. Hence, it is easy to find a row of cells and load it into a Java ob-ject. As a result, the sheets can contain extra columns and their order doesnot matter.

22

Page 28: Components for the health information system DHIS 2 - Muni

4. EXCELSHEET DATA IMPORT

Figure 4.3: State diagram for the excel importer

23

Page 29: Components for the health information system DHIS 2 - Muni

4. EXCELSHEET DATA IMPORT

4.5 Implementation

All the specified requirements were successfully fulfilled. Consequently,the component for importing data from Microsoft’s excel sheets is workingand fully integrated into DHIS 2 (see screenshot 4.4). Operations for pre-viewing, importing or analysis of workbooks can be accessed from the stan-dard DHIS 2 web GUI.

A small problem was experienced in the last phase of the development.When importing data with preview, Java exceptions were reported. DataEle-mentCategoryCombo property of DataElement object was not properly set,therefore there was an undefined cross-reference in the database. There wassupposed to be an implicit value for DataElementCategoryCombo propertybut it did not work. Having discussed this strange behaviour, the problemwas reported as a bug.

At the time of writing this thesis, excel-import module has not beenmerged into the trunk of DHIS 2, yet. Instead, alternative approaches tothis module were discussed.

“Not sure yet, we are considering whether we should havea programmatic approach (POI and excel-import module) orXSLT/transformations . . . ” (Lars Helge Øverland, personal com-munication, March 03, 2010)

XSLT13 is a declarative, XML-based language which is used to transformXML sources. A XSLT processor, for example SAXON14 or Xalan15 for Javaapplications, takes XSLT style-sheet modules and XML documents, andproduces an output document, for instance in HTML or PDF format.

The advantage of using XSLT transformations is that no Java devel-opment is required. Moreover, different XML sources can be transformedinto a common format which can be lately processed by a set of universalXSLT templates. Consequently, development of redundant style-sheets isavoided. The downside of this strategy is that advantages from powerfulfeatures contained in Java libraries, like jXLS, are not taken. The XSLT ap-proach could also benefit from the new excel-sheet file format introduced inOffice 2007 which is, in fact, a zipped XML document. The analysis of thesetechniques is left for future development.

13. eXtensible Stylesheet Language Transformations ( <http://en.wikipedia.org/wiki/Xslt>)14. Homepage at <http://saxon.sourceforge.net/>15. Homepage at <http://xml.apache.org/xalan-j/>

24

Page 30: Components for the health information system DHIS 2 - Muni

4. EXCELSHEET DATA IMPORT

Figure 4.4: DHIS 2 excel-import user interface

4.6 Tests

The module for importing from excel sheets was successfully tested with Mi-crosoft Excel 97/2000/XP file formats. Nevertheless, it is not working withopenoffice.org’s ODF spreadsheet files because jXLS API does not supportit, yet. This is left for future development.

25

Page 31: Components for the health information system DHIS 2 - Muni

Chapter 5

Pivot tables

This chapter describes the design and implementation of a pivot table mod-ule for the DHIS 2 system. The task was to get familiar with the conceptsof pivot tables, explore current alternatives in this field and propose a suit-able solution for the DHIS 2 context. A possible integration into DHIS 2 wasalso considered.

“I think the point of view for this chapter is that not many ex-isting solutions for web-based pivot tables exist, and not a lotof info can be obtained” (Lars Helge Øverland, personal com-munication, March 01, 2010).

The first part of this section gives an explanation of the basic conceptsand describes purposes of having a module for pivot tables in DHIS 2.Next part follows by outlining requirements for that module. Section 5.4 re-views current solutions from which a few suitable candidates are selectedto be further explored. They are compared with the current DHIS 2 mod-ule in the next part. Section 5.6 describes approaches for loose integrationof JPivot and DHIS 2. Finally, the conclusion is drawn.

5.1 Background

A pivot table report is an interactive way to quickly summarize, analyze,explore or present summary of large amounts of data (see an example re-port at figure 5.1). Such reports can further be visualised by charts or graphsto provide a convenient and easy way to see comparisons, patterns andtrends. A program tool for creating pivot tables aggregates, sorts and sumsdata independently of the original data-base layout. Thus, the results arepresented in the context of multidimensional data sources. That is, for ex-ample, a data value because it consists of indicator, organisation unit andperiod proportions.

26

Page 32: Components for the health information system DHIS 2 - Muni

5. PIVOT TABLES

Figure 5.1: An example of web-based pivot table application (DHIS 2)

“The user sets up and changes the summary’s structure by drag-ging and dropping fields graphically. This "rotation" or pivot-ing of the summary table gives the concept its name.” [12]

The District health information software development team has alreadytried to do a research in this area and integrate JPivot based solution intoDHIS 2 in 2006. However, this project does not seem to be alive any more.The reader can have a look at the project website1 for further references.

Online analytical processing (OLAP) is a technology that allows usersof multidimensional databases to generate on-line2 descriptive or compar-ative summaries (views) of data and other analytic queries in an efficientway [3].

At the core of any OLAP system is a concept of OLAP cube3 (see fig-ure 5.2) [10]. An OLAP cube is a data structure that allows for fast manip-ulating and analyses of data from multiple perspectives. It is comparablewith a table in relational database. The design of relational databases is suit-able for efficiency in data storage, whereas OLAP databases are designedfor efficient data retrieval.

An OLAP cube consists of dimensions and facts. Dimension representsdescriptive categories or their groupings of data such as Period type or In-dicator. Dimensions can also be divided into levels. Measures denote a di-mension for the actual data values in the cells (mostly numeric). For ex-ample value 5 at figure 5.2. Slicing, dicing and rotating (pivoting) are threeimportant operations associated with OLAP technologies. A slice is a sub-set of OLAP cube corresponding to a single value of one or more dimen-sions. This single value is, for example, denoted by the Chiefdom organ-isation unit level. A related operation is dicing, which defines a sub-cubeof the original space. That is, for instance, a sub-cube of indicator Malaria,period type weekly and organisation unit level District which denotes a cal-culated number of people weekly suffering with Malaria on a district level.

1. Homepage on <http://208.76.222.114/confluence/display/REP/Web+Pivot+Module>2. in the meaning of immediate3. also referred to as multidimensional cube or hypercube

27

Page 33: Components for the health information system DHIS 2 - Muni

5. PIVOT TABLES

Figure 5.2: An illustration image of an OLAP-cube slice [9]

28

Page 34: Components for the health information system DHIS 2 - Muni

5. PIVOT TABLES

This small sub-cube can contain smaller sub-cubes depending on the levelof accuracy. Pivoting changes the dimensional orientation of the cube. The Pe-riod type and Indicator dimensions can, for instance, be swapped [9].

Comparison of different categories of OLAP systems is given at [10]:

• Multidimensional MOLAP systems store data in an optimized multi-dimensional array storage, rather than in a relational database. Hence,it requires pre-computation and storage of information in the cube.MOLAP generally delivers better performance due to specialized in-dexing and storage optimizations. It also requires less storage spacein comparison with ROLAP because the specialized storage typicallyincludes compression techniques.

• Relational ROLAP systems work directly with relational databases.The base data and the dimension tables are stored as relational tablesand new tables are created to hold the aggregated information. RO-LAP is generally more scalable, however it is not necessarily alwaysthe case because pre-processing of large volumes of data is difficultto implement efficiently.

• Hybrid HOLAP systems divide data between relational and special-ized storage. Basically, it is a combination of MOLAP and ROLAPphilosophies where the exact implementation varies from vendorto vendor.

What SQL means for relational databases, MultiDimensional eXpressions(MDX4) does for the OLAP systems. MDX is a query and calculation lan-guage for manipulating data stored in OLAP cubes. Moreover, MDX be-came the de facto standard for OLAP systems [8]. This technology allowsto create complex queries which can simulate functionality of the excelpivot tables. XML for analysis (XMLA) is another standard used to queryOLAP systems. It is de facto an XML wrapper for the MDX, SQL or DMXexpressions.

5.2 Motivation

Data visualisation is a convenient way to present various statistical results.Having a good quality overview of medical reports is a key feature to en-hance health care programmes. Health environments gather vast amountsof data every day, hence making useful conclusions and decisions based

4. A good tutorial can be found at [13]

29

Page 35: Components for the health information system DHIS 2 - Muni

5. PIVOT TABLES

on them can be a challenge. Pivot tables is a user-friendly concept to visu-alise and manipulate quantity of aggregated statistical data.

Pivot tables were already supported in DHIS 1 since it used MS Excelwhich provided the functionality. DHIS 2 is designed to use only open-source tools and be platform-independent, thus the logical conclusion isto replace MS excel with an open-source web-based solution.

The attentive reader might have noticed that a functionality of visual-isation with pivot tables is already implemented in the reporting moduleof DHIS 2. But it is rather a simple Javascript-based solution which offersway less range of functionality than MS Excel. With MS Excel, a pivot tablecan be, for instance, based on database views with joins through all the re-source tables, moreover filters on any of dimensions can be specified. Thus,totals an subtotals inside the pivot table can be obtained, and therefore it ispossible to include an organisation unit hierarchy and filter on its variouslevels. As a result, sums of a province, district etc. are achieved.

5.3 Specification

The initial consultation set up main goals.

“So the current solution is good for a simple pivot view, andthe ambition of that solution is to be just that. But in orderto replace Excel we must provide the users with somethingmore powerful . . . So the objective is to make a solution thatprovides as much of the functionality from excel as possiblein a web environment . . . ” (Lars Helge Øverland, personalcommunication, March 03, 2010).

The objectives were to

• get familiar with concepts of pivot tables in the web environment,• research current solutions for web-based pivot tables,• test suitable candidates and compare them with the current module

in DHIS 2,• propose a solution,• implement and integrate the solution into DHIS 2.

When trying to find a suitable framework, the emphasis was placed onplatform and DBMS5 independence, non-commercial and open-source li-cense, sufficient documentation and stable developer community. The so-lution should preferably be written in Java as DHIS 2 is programmed in

5. DataBase Management System

30

Page 36: Components for the health information system DHIS 2 - Muni

5. PIVOT TABLES

Java. Besides, the tool should provide functionality closed to MS Excel andbe light-weight according to system resources.

5.4 Analysis

The very first challenge was to explore currently used solutions in orderto decide whether it was more convenient to implement an own solution,extend DHIS 2 pivot-table module or make use of an existing framework.

Firstly, a suitable OLAP server implementation for creating OLAP cubesneeded to be found. A nice overview of OLAP servers can be found at [2].As the reader can see and further obtain, JPalo and Mondrian are the onlyopen-source alternatives.

The Mondrian server is a ROLAP engine written in Java which sup-ports most of the database management systems. It is now being devel-oped as a part of the Pentaho BI6 package. But it can also be used withoutthe rest of the Pentaho software. Mondrian has a good documentation andsupport forums. The project supports MDX queries, XMLA and Java OLAPInterface specifications. Pentaho also provides graphical tools for creatingMDX queries as well as defining OLAP cubes. Mondrian project is licencedunder eclipse public license7 [7] [26].

“Users are reporting that Mondrian performance is good evenwhen handling hundreds of gigabytes data with hundreds of mil-lions rows in industrial settings.” [26]

Palo is a MOLAP server being developed by Jedox AG8. It is available un-der GNU General Public Licence with well-written documentation. Activedeveloper forums can also be found. Palo supports XMLA and MDX APIsfor connectivity, as well as OLE DB9 for an OLAP interface. Palo can beused with most of the DBMS systems. Moreover, clients for complete mod-elling and administration capabilities are developed by the JPalo project.Palo loads its data sets completely into memory, therefore its size limitsan amount of processed data [26] [11].

“The most rewarding aspect of the in-memory technology isthe speed advantage. Compared to an old-style ROLAP sys-

6. Business Intelligence7. Open-source license, <http://en.wikipedia.org/wiki/Eclipse\_Public\_License>8. Homepage at <http://www.jedox.com/>9. Object Linking and Embeding, DataBase, further at <http://en.wikipedia.org/wiki/OLE_DB>

31

Page 37: Components for the health information system DHIS 2 - Muni

5. PIVOT TABLES

tem in-memory technologies have the potential to be as muchas 100 times faster than an OLAP system with a disk basedrelational database.” [6]

The next task was to determine a framework or software tool which wouldprovide an easy to use API to define a web-page structure and communi-cate (query) an OLAP server. There are quite a lot of such tools available butmost of them are commercial, platform-dependent or closed-source. Thesecriteria immediately eliminated pentaho BI suite, IBM rational portfoliomanager, JMagallanes, Tableau server, Aqua fold products, ASPxPivotGrid,Pivot table light, and Flex OLAP tools and Pivot charts. Open-source projectJRubik was not suitable either since it was mainly a swing10 GUI front-endfor JPivot and the aim was to develop a web application.

Open-source alternatives Jpivot, Pentaho Analysis Tool, Jasper, Free-analysis and JPalo are described further. All of these projects provide func-tionality for slicing, dicing and pivoting. Moreover, they facilitate an inter-face for MDX or XMLA queries which ensures wide possibilities to definedata retrieved from an OLAP-compatible server.

Pentaho Analysis Tool11 (PAT) is quite young and ambitious projectwhich aims to create web-based OLAP analysis tool to replace JPivot.

“The basic idea is to replicate and then extend the functionof JPivot in Google Web Toolkit and other wonderful technolo-gies to help bring Mondrian into the 21st Century. At the mo-ment the code is very rough . . . ” (as of 01/2009) [15]

The current version of the tool is 0.5.1 and is licenced under GNU Gen-eral Public Licence. PAT uses Olap4j API12 to communicate with an OLAPserver which is natively Mondrian. Not a lot of documentation, tutorials orsupport forums can be found. PAT uses JSP tags to render html documents.

JPivot13 is a JSP custom tag library. Together with Mondrian, it is prob-ably the most often used open-source BI platform. JPivot renders OLAPtables and lets users perform typical OLAP navigations like slice and dice,drill-down or roll-up. JPivot can be used with any XMLA-enabled serverand is also involved in the Olap4j specification. JPivot or its parts are in-cluded in other BI suites or clients. Besides, lots of tutorials and how-to

10. widget toolkit for Java <http://en.wikipedia.org/wiki/Swing_(Java)>11. Homepage at <http://code.google.com/p/pentahoanalysistool/>12. Basically the Java database connectivity API for multi-dimensional data-sources13. Homepage at <http://jpivot.sourceforge.net/>

32

Page 38: Components for the health information system DHIS 2 - Muni

5. PIVOT TABLES

articles can be found on the internet. JPivot is provided under CommonPublic License [26].

JasperReports is another open-source reporting library. It uses its ownOLAP server and is designed to be compatible with any data source pro-vider. JasperReports come with Lesser GNU General Public Licence andan sufficient amount of tutorials. The library provides necessary featuresto generate dynamic reports, including data retrieval using JDBC14, as wellas support for parameters, expressions, variables, and groups. JasperRe-ports also include advanced features such as custom data sources, sub-reports and scriptlets15[25].

FreeAnalysis16 is a complete Java application that provides OLAP func-tions against the Mondrian OLAP Server and other MDX/XMLA compliantcube data-sources. It supports ROLAP as well as MOLAP technologies.

“FreeAnalysis platform gives choice for designer deploymentstrategy and to allow to choose between flexibility with «livedata » and rigidity/performance with data store in cubes.” [1]

However, the licence of the package is unclear. Previously Mozilla publiclicense (MPL)17 and licenses derived from it have been used. The currentproject created on Google Code also states that the license is MPL but onlya little documentation can be obtained. The reports in FreeAnalysis consistof pivot tables and graphs which are rendered by JFreeChart [26].

JPalo Web Client18 is an AJAX19-based web-application tool to visual-ize and model data of Palo or other XMLA-enabled sources. JPalo is is-sued with a good quality documentation, and is available under a commer-cial or GPL license. JPalo is primarily designed to work with Palo OLAPserver and offers browsing functionalities to view and edit Palo server datawith possibility of rendering charts [26].

“I have been looking a bit myself but I can’t seem to find anygood alternatives.” (Lars Helge Øverland, personal communi-cation, March 11, 2010)

14. Java DataBase Connectivity15. Blocks of Java code inside JSP scripts16. Homepage at <http://sourceforge.net/projects/freeanalysis/> or<http://ns36005.ovh.net/web/guest/products/free_analysis>17. Open-source license <http://www.mozilla.org/MPL/MPL-1.1.html>18. Homepage at <http://www.jpalo.com/en/index.html>19. Asynchronous Javascript And XML

33

Page 39: Components for the health information system DHIS 2 - Muni

5. PIVOT TABLES

5.5 Review

The JPalo distribution comes with its own apache and tomcat server config-urations. This makes the possible integration with DHIS 2 heavy and quitecomplicated. Moreover, JPalo uses Palo MOLAP engine to deal with OLAPcubes which increases RAM memory requirements. Therefore, JPalo wasnot considered to be a suitable candidate for further exploration.

Freeanalysis was not considered to be a good candidate either. Primar-ily, the license terms was not clear and the application had different distri-butions for Linux and Windows environments. Usable manuals or tutorialsfor setting the pivot module up were not found either.

Furthermore, Jasperreports use JPivot and Mondrian as an OLAP ren-derer in its open-source distribution. Since the desire was to add as littleoverhead as possible and also due to easier integration, JPivot and Mon-drian themselves were considered instead.

JPivot and Pentahoanalysistool are further explored and analysed as theywere suitable candidates for integration. The following few sections de-scribe their memory, CPU and response analysis.

5.5.1 Testing environment

First of all, an application server needed to be selected. Tomcat and Jettyare two easily configurable and usable servlet containers which can be usedfor deploying WAR applications. Finally, Jetty was chosen due to less com-plexity and easier configuration for memory and CPU analysis. Moreover,configuration issues were experienced when setting up connection betweenpostgresql and Tomcat20. The postgresql DBMS was installed mainly be-cause the sample database dump21 of DHIS 2 was in the postgresql format.

Secondly, an OLAP cube was specified to correspond to the currentfunctionality of the pivot-table module in DHIS 2 (see schema 5.3). The ag-gregatedindicatorvalue table was used instead of its non-aggregated equiv-alent because it contained already processed data (not just captured data).This entity was used as a core fact table. Organisationunit and orgunit-structure and orgunitlevel showing orgUnitShortname and orgUnitLevelrepresent one dimension, period and periodtype showing periodType cor-respond to another dimension, and the last dimension consists of indicator,indicatorgroup and indicatorgroupmembers which show indicatorgroup

20. further reading, for instance, at <http://www.fankhausers.com/tomcat/jdbc/>21. Available at <http://dhis2.com/download/dhis2demo.backup>

34

Page 40: Components for the health information system DHIS 2 - Muni

5. PIVOT TABLES

Figure 5.3: Schema of the basic OLAP cube for DHIS 2

and indicatorShortname.

5.5.2 JPivot and Mondrian

The testing was done with JPivot version 1.8.022 which also included in-tegrated Mondrian server. Installation of JPivot is easygoing but in addi-tion an OLAP cube needs to be defined. That can be done either manu-ally since it is an XML file or via the schema-workbench23 utility. Schema-workbench has some issues when creating a schema with multiple joinedtables as a dimension. This mostly needs to be done manually. The nexttask is to set up an MDX query and database connection. Having that done,JPivot can be deployed into the Jetty application server.

22. Available at <http://sourceforge.net/projects/jpivot/files/JPivot\%20Web\%20Frontend/1.8.0/jpivot-1.8.0.zip/download>23. To download at <http://sourceforge.net/projects/mondrian/files/schema\%20workbench/workbench-2.3.2.9247/workbench-2.3.2.9247.zip/download>

35

Page 41: Components for the health information system DHIS 2 - Muni

5. PIVOT TABLES

5.5.3 DHIS 2 reporting module

DHIS 2.0.324 was used for testing. A WAR file of reporting module can beeasily built using maven. Furthermore, as described at the developmentsection of the DHIS 2 homepage, hibernate configuration file and environ-ment variable DHIS2_HOME needs to be set (refer to appendixes A.1, B).

5.5.4 Pentahoanalysistool

An installation of Pentahoanalysistool25 is straightforward. The testing wasdone with version 0.6. The application can be started straight away afterplacing its WAR distribution into the webapps folder because all the set-tings are done through its web GUI. The same OLAP cube schema as withJPivot can be used since both tools use Mondrian OLAP server. However,when uploading that schema, one can get a warning about its non-validityeven though it is valid. That is a bug of PAT. A database connection isset from the web GUI and MDX query is executed via the Cube buttonin the web-GUI menu.

5.5.5 Results

To measure the CPU usage, time utility from binutils was used.

cd $JETTY && /usr/bin/time java -jar start.jar

Jconsole26 monitoring tool was used to track the memory usage.

cd $JETTY && java -jar start.jar & jconsole

The server response was taken using firefox plugin Firebug27 and its netmodule because it can measure even time response of AJAX requests.

The first test was to get the data forIndicator group: all, Start date: 2009-01-01, End date: 2010-01-01, Period type:Monthly, Organisation unit level: Districtand then pivot on Indicators, Periods and Org units at once.The other test loaded data forIndicator group: ANC, Start date: 2008-01-01, End date: 2010-01-01, Period type:Monthly, Organisation unit level: Chiefdom

24. Available at <http://dhis2.com/download/dhis2-2.0.3-source.zip>25. Accesible on <http://pentahoanalysistool.googlecode.com/files/pat-0.6.war>26. Part of the standard Java 2 platform27. References at <http://getfirebug.com/>

36

Page 42: Components for the health information system DHIS 2 - Muni

5. PIVOT TABLES

CPU (s) Mem (MB) Resp1 (ms) Resp2 (ms)DHIS 2 module (1) 31.34 103.92 1133 837JPivot + Mondrian (1) 8.69 38.44 2101 1204PAT (1) 16.00 104.40 1060 988DHIS 2 module (2) 32.13 98.94 2407 2070JPivot + Mondrian (2) 13.53 81.23 16591 14674PAT (2) 16.48 128.33 9180 8931

Table 5.1: Results of the DHIS 2, JPivot and PAT analysis

Figure 5.4: Comparison graph of the tools

and then pivoted firstly only on Indicators, secondly only on Periods andfinally on Org units.

The data were loaded twice (Resp1, Resp2) in order to take into accountcache and performance-booster features. Furthermore, to eliminate variousdeviations, the tests were repeated three times and their results were av-eraged out. The Jetty application server was started separately for eachof the tests and shut down after three minutes in order to obtain compa-rable results.

The final results were a bit surprising (see table 5.1 and figure 5.4).The DHIS 2 reporting module used quite a lot of processor time and mem-ory resources but had really good response. JPivot with Mondrian usedan average amount of system resources but had bad response time, es-pecially in the second case. Pentahoanalysistool used a bit more memorythan DHIS 2 but the response times were lot slower in the second case.

The DHIS 2 reporting module offers limited functionality in compari-son with JPivot or Pentahoanalysistool. Their complexity makes responsesslower but on the other hand offers wide range of features including MDX

37

Page 43: Components for the health information system DHIS 2 - Muni

5. PIVOT TABLES

queries for data definitions.

5.6 Integration

Having discussed the results from the previous section (5.5) and possi-bilities of integration, the desire was to implement a loose integration ofJPivot and Mondrian with DHIS 2. JPivot is widely used and has stable de-veloper community unlike Pentahoanalysistool which was the key aspectsince the performance analysis of the tools brought comparable results. Theloose integration was chosen because implementing full integration wouldbe rather complicated and DHIS 2 developers did not want to incorpo-rate Mondrian directly. An own implementation or extension of the DHIS 2module would also be quite complex, besides it would require redundantwork since an OLAP server would also have to be implemented.

“I can say that the disadvantages of mondrian are that it in-troduces a lot of complexity and requires a lot of work to inte-grate into the application. Also the application becomes "heav-ier" (larger, more memory footprint). The ideal solution wouldbe to avoid it and use the aggregated* tables directly but I amnot sure if this is possible.Regarding JPivot the disadvantages are that it requires Mon-drian (which we ideally want to avoid - but I am not sure) andits a bit hard to integrate into DHIS 2 since its based on JSP (weuse velocity/struts)” (Lars Helge Øverland, personal commu-nication, March 01, 2010).

The first approach was to make use of veltag28 which facilitates usingJSP and velocity tags at once. As a result, velocity templates of DHIS 2could be merged with JSP templates of JPivot. The last step would then beto modify the DHIS 2 web.xml descriptor to handle both applications. Thisway of merging the code-bases was later considered to be too dirty, further-more the veltag project did not seem to be alive since the library even didnot build. Therefore, the new approach was to integrate the systems onlyat the GUI level by making hyperlinks between them and packaging themas separate WAR files. This approach was successfully implemented (seescreenshots 5.5 and B.1).

28. Homepage located at <http://velocity.apache.org/engine/releases/velocity-1.4/veltag.html>

38

Page 44: Components for the health information system DHIS 2 - Muni

5. PIVOT TABLES

Figure 5.5: Screenshot of a loosely-integrated pivot-table-report module

39

Page 45: Components for the health information system DHIS 2 - Muni

5. PIVOT TABLES

5.7 Observation

To achieve complex pivot table features closed to MS Excel functionality,an OLAP server with MDX-query support is a logical approach even thoughit brings an extra overhead. Light integration of DHIS 2 and JPivot is a goodand stable solution, nevertheless having two separate applications bring upcertain overlap and security-related issues. The similar principle of havingan “application inside of application” has already been used for the GISmodule29. Another possible strategy is to develop custom velocity-basedfront-end for Mondrian similar to JPivot which would eliminate the redun-dant dependence on JSP-tag libraries and make the integration tighter. Con-versely, an own implementation of the whole framework for pivot tablesseems to be an extra workload since similar software modules have alreadybeen developed.

Taking into account the requirement of open-source tools only, JPivotseems to be the best available solution. Nevertheless, Pentahoanalysistoolis a young and ambitious project which aims to take over the leading roleof JPivot. To sum it up, Mondrian is a reliable back-end, and together withJPivot offer a stable solution, whereas using Pentahoanalysistool as a front-end is still experimental.

29. see the demo application for more details ( <http://dhis.uio.no/demo/dhis-web-mapping/mapping/index.html>)

40

Page 46: Components for the health information system DHIS 2 - Muni

Chapter 6

Conclusion

The aim of this thesis was to explore medical information system Districthealth information software 2 in order to implement a component for dataimport from excel sheets, as well as to propose and integrate a solutionfor web-based pivot-table visualisations.

The excelimport task was primarily a programming mission. The designpatterns of the other import-export modules of DHIS 2 were acquired and3 APIs for manipulating workbook data were explored. Finally, a suitablesolution using jXLS API was designed and implemented in Java. The result-ing module includes javadoc1 documentation and is fully integrated intoDHIS 2. Moreover, it is discussed to be merged into its trunk.

The pivot tables were mainly a research project which was supposed toinvestigate and compare alternatives for web-based pivot tables in the mar-ket. The task was to conduct an experiment with functional prototypesof pivot-table applications to see how and if they can be integrated withDHIS 2. As a result, 2 OLAP servers and 14 pivot-table frameworks wereconsidered. Furthermore, 5 frameworks were closer reviewed, and finallyJPivot with Mondrian and Pentahoanalysistool were installed and com-pared with the DHIS 2 pivot-table module. Consequently, JPivot and Mon-drian were chosen to be loosely integrated into DHIS 2. This approach wassuccessfully implemented.

Loose coupling of JPivot with DHIS 2 is a prototype solution. It is an il-lustration of possible ways how to achieve the MS Excel pivot-table func-tionality in the web-based environment of DHIS 2. Mainly the security ques-tions need to be considered and the module can also be improved by includ-ing predefined MDX queries to simulate the functionality of MS excel pivottables. Conversely, the excel-import module is a stable working solution.Nevertheless, it can be improved to support openoffice spreadsheet docu-ments as they are getting popular. Last but not least, analysis of alternativeapproaches like XSLT transformations was left for future development.

1. API documentation in HTML for Java applications

41

Page 47: Components for the health information system DHIS 2 - Muni

Bibliography

[1] BPM Conseil – FreeAnalysis. Retrieved on 2010-03-02 from <http://ns36005.ovh.net/web/guest/products/free_analysis>.

[2] Comparison of OLAP servers. Retrieved on 2010-02-25 fromWikipedia: <http://en.wikipedia.org/wiki/Comparison_of_OLAP_Servers>.

[3] Data Mining, Predictive Modeling, Techniques. Retrieved on2010-02-23 from <http://www.statsoft.com/textbook/data-mining-techniques/>.

[4] Health Information Systems Programme. Retrieved on 2010-01-15from <http://www.hisp.org>.

[5] Information system. Retrieved on 2010-01-18 from Britannica:<http://www.britannica.com/EBchecked/topic/287895/information-system>.

[6] Jedox AG – Business Intelligence – Freedom from Excel Limita-tions. Retrieved on 2010-03-01 from <http://www.jedox.com/en/products/Palo-Suite/palo-olap-server.html>.

[7] Mondrian OLAP server. Retrieved on 2010-02-26 from Wikipedia:<http://en.wikipedia.org/wiki/Mondrian_OLAP_server>.

[8] MultiDimensional eXpressions. Retrieved on 2010-03-23from Wikipedia: <http://en.wikipedia.org/wiki/MultiDimensional_eXpressions>.

[9] OLAP Tutorial. Retrieved on 2010-03-21 from <http://training.inet.com/OLAP/OLAP.htm>.

[10] Online analytical processing. Retrieved on 2010-02-27 from Wikipedia:<http://en.wikipedia.org/wiki/Olap>.

42

Page 48: Components for the health information system DHIS 2 - Muni

6. CONCLUSION

[11] Palo (OLAP database). Retrieved on 2010-02-27 from Wikipedia:<http://en.wikipedia.org/wiki/Palo_(OLAP_database)>.

[12] Pivot table. Retrieved on 2010-03-01 from Wikipedia: <http://en.wikipedia.org/wiki/Pivot_table>.

[13] TUTORIAL: Introduction to Multidimensional Expressions (MDX).Retrieved on 2010-03-23 from <http://www.fing.edu.uy/inco/grupos/csi/esp/Cursos/cursos_act/2005/DAP_SistDW/Material/2-SDW-Laboratorio1-2005.pdf>.

[14] Elske Ammenwerth and Nicolette de Keizer. A web-based inventoryof evaluation studies in medical informatics. Retrieved on 2010-01-17 from <http://evaldb.umit.at/>, 2005. University for HealthSciences, Medical Informatics an Technology.

[15] Tom Barber. Pentaho Musings: Pentaho Analysis Tool. Retrievedon 2010-02-26 from <http://pentahomusings.blogspot.com/2009/01/pentaho-analysis-tool.html>, January 2009.

[16] Eivind Anders Berg. The challenges of implementing a health infor-mation system in Vietnam. Master’s thesis, University of Oslo, 2007.

[17] Jørn Braa. HISP – Health information systems programme [PDFdocument]. Retrieved on 2009-11-13 from Lecture Notes On-line: <http://www.uio.no/studier/emner/matnat/ifi/INF5750/h07/undervisningsmateriale/hisp.pdf>, 2009.University of Oslo.

[18] Jørn Braa and Calle Hedberg. The Struggle for District-based HealthInformation Systems in South Africa. Guide for the DFID Healthresource center, University of Oslo and University of Western Cape,2002. Retrieved on 2010-01-15 from <http://folk.uio.no/patrickr/refdoc/BraaHedberg02.pdf>.

[19] Moya Conrick. Health informatics: Transforming health care withtechnology. Nelson Thornes Ltd, 2006. ISBN: 0170127311, 978-0170127318.

[20] Hirut Gebrekidan Damitew. Sustainability and optimal use of healthinformation systems : an action research study on implementationof an integrated district-based health information system in Ethiopia.Master’s thesis, University of Oslo, 2005.

43

Page 49: Components for the health information system DHIS 2 - Muni

6. CONCLUSION

[21] Andy Khan. Java Excel API Tutorial. Retrieved on 2009-10-23 from<http://www.andykhan.com/jexcelapi/tutorial.html>.

[22] Kristian Nordal. The challenge of Being Open – Building an OpenSource Development Network. Master’s thesis, University of Oslo,2006.

[23] Arne Solvberg and David C. H. Kung. Information systems engineer-ing: an introduction. Springer-Verlag, 1993. ISBN: 3-540-56310-5, 0-387-56310-5.

[24] Therese Steensen and Hanne Vibekk. DHIS and Joly: two distributedsystems under development: design and technology. Master’s thesis,University of Oslo, 2006.

[25] Erik Swenson. Reports made easy with JasperReports – JavaWorld.Retrieved on 2010-03-05 from <http://www.javaworld.com/javaworld/jw-09-2002/jw-0920-opensourceprofile.html>, February 2009.

[26] Christian Thomsen and Torben Bach Pedersen. A Survey of OpenSource Tools for Business Intelligence. Technical report, Department ofComputer Science, Aalborg University, September 2008. Retrieved on2010-02-23 from <http://dbtr.cs.aau.dk/DBPublications/DBTR-23.pdf>.

[27] Lars Helge Øverland. DHIS 2 Technical Architecture. Tech-nical report, Health information systems programme, 2009.<http://www.dhis2.com/download/userdocs/pdf/dhis2_technical_architecture_guide.pdf>.

[28] Lars Helge Øverland. District Health Information Software 2[PDF document]. Retrieved on 2009-11-13 from Lecture NotesOnline: <http://www.uio.no/studier/emner/matnat/ifi/INF5750/h07/undervisningsmateriale/dhis_2.pdf>, 2009.University of Oslo.

44

Page 50: Components for the health information system DHIS 2 - Muni

Appendix A

A

A.1 Excel import module

The attached source code xkrajca_thesis_excelimport.zip represents the ex-celimport module implementation discussed in chapter 4.

To build and execute the module, one needs Java SDK 61, Subversion2

and Maven3.A copy of the excelimport module can be obtained by executing

svn co \’https://svn.ifi.uio.no/repos/projects/\dhis2/branches2009/excelimport/’

To build the source code, invoke

mvn -Dmaven.test.skip=true clean install

from the root (/dhis-2/) and web directory (/dhis-2/dhis-web/) of DHIS 2.To deploy the web portal into the Jetty application server, execute

mvn jetty:run-war

from /dhis-2/dhis-web-portal to run the whole portal, or from /dhis-2/dhis-web-portal/dhis-web-portal-importexport to run the excelimport web in-terface only. The application can then be accessed with a web browser on<http://localhost:8080/> (see screenshot 4.4).

Environment variables might need to be set

export JAVA_OPTS="−Xms512m −Xmx512m −XX : PermSize =512m−XX : MaxPermSize=512m"

export MAVEN_OPTS=$JAVA_OPTS

in order to allocate more memory to the Java or Maven process.

1. http://java.sun.com/javase/downloads/index.jsp2. http://subversion.apache.org/packages.html3. http://maven.apache.org/download.html

45

Page 51: Components for the health information system DHIS 2 - Muni

A. A

A.2 OrgUnit hierarchy converter

Below is a commented demonstration of Organisation unit hierarchy con-verter (package org.hisp.dhis.importexport.xls.converter).

/∗ ∗∗ @param Workbook workbook∗ @param ImportParams params∗ @return i n t r e t u r n c o d e∗ Read o p e r a t i o n f o r c r e a t i n g an O r g a n i s a t i o n U n i t H i r a r c h y∗ o b j e c t by p a r s i n g e a c h row in t h e i m p o r t e d e x c e l s h e e t .∗ /

public i n t read ( Workbook workbook , ImportParams params ){

Map<Str ing , Integer > columnMap = new HashMap<Str ing , Integer > ( ) ;i n t columnCounter = 0 ; / / c h e c k i f t h e r e a r e a l l t h e columns

/ / g e t t h e a p p r o p r i a t e s h e e t o r r e p o r t an e r r o r s t a t eSheet sheet = workbook . getSheet ( SHEETNAME ) ;i f ( sheet == null ) return 1 ;

/ / l o c a t e column p o s i t i o n s o f t h e i t e m sC e l l c e l l s [ ] = sheet . getRow ( 0 ) ;

for ( C e l l c e l l : c e l l s )for ( S t r i n g name : columnames )

i f ( name . equalsIgnoreCase ( c e l l . getContents ( ) ) )i f ( ! columnMap . containsKey (name . toLowerCase ( ) ) ) {

columnCounter ++;columnMap . put ( name . toLowerCase ( ) , c e l l . getColumn ( ) ) ;

}/ / no t a l l r e q u i r e d columns foundi f ( columnCounter != COLUMNREQCOUNT) return 2 ;

/ / i t e r a t e through r e l a t i o n sfor ( i n t i =1 ; i <sheet . getRows ( ) ; i ++) {

GroupMemberAssociation a s s o c i a t i o n ;/ / g e t a r e f e r e n c e t o t h e p a r e n t o r g a n i s a t i o n U n i tS t r i n g orgunitName = sheet . g e t C e l l ( columnMap . get (PARENT) , i ) .

getContents ( ) ;

/ / g e t a l r e a d y s t o r e d o r g a n i s a t i o n U n i tf i n a l Source source = o r g a n i s a t i o n U n i t S e r v i c e .

getOrganisationUnitByName ( orgunitName ) ;i f ( source == null ) return 3 ;

/ / g e t c h i l d o r g a n i s a t i o n U n i t sSource sourceChild ;a s s o c i a t i o n = new GroupMemberAssociation ( AssociationType . SET ) ;a s s o c i a t i o n . setGroupId ( source . get Id ( ) ) ;/ / g e t t h e f i r s t c h i l d o r g a n i s a t i o n U n i t , i g n o r e i f no t found

46

Page 52: Components for the health information system DHIS 2 - Muni

A. A

orgunitName = sheet . g e t C e l l ( columnMap . get (CHILD0) , i ) .getContents ( ) ;

/ / g e t a l r e a d y s t o r e d o r g a n i s a t i o n U n i tsourceChild = o r g a n i s a t i o n U n i t S e r v i c e .

getOrganisationUnitByName ( orgunitName ) ;i f ( sourceChild == null ) continue ;

/ / a t t a c h c h i l d o r g a n i s a t i o n u n i t t o i t s p a r e n ta s s o c i a t i o n . setMemberId ( sourceChild . get Id ( ) ) ;/ / c a l l i n h e r i t e d r e a d method t o p e r s i s t t h e h i e a r c h yread ( a s s o c i a t i o n , GroupMemberType .

ORGANISATIONUNITRELATIONSHIP, params ) ;

/ / g e t s i m i l a r l y t h e s e c o n d c h i l d o r g a n i s a t i o n U n i t ( CHILD1 )/ / . . . c o d e r e d u c e d

}return 0 ; / / s u c c e s s

}

47

Page 53: Components for the health information system DHIS 2 - Muni

Appendix B

B

B.1 Pivot table loose integration

Files needed for the loose coupling of JPivot with DHIS 2 (see screenshots 5.5and B.1 or section 5.6) can be found in file xkrajca_thesis_pivottable.zip.

Firstly, download the WAR files of DHIS 2 and JPivot with Mondrian.

wget ’http://dhis2.com/download/dhis2-2.0.3.war’wget ’http://sourceforge.net/projects/jpivot/files/\JPivot%20Web%20Frontend/1.8.0/jpivot-1.8.0.zip/download’

The dhis2-2.0.3.war and jpivot.war1 files should then be extracted into theJetty webapps folder (supposing Jetty is used as application server). Thenext step is to unzip the attached file and copy its content into the JPivotroot folder. File menu.vm needs to be copied into the DHIS 2 dhis-web-reporting folder. The application is then ready to be deployed into the Jettyapplication server.

cd $JETTY_DIR && java -jar start.jar

Sample database data for postgresql from the Sierra Leone district is avail-able at <http://dhis2.com/download/dhis2demo.backup>. To im-port the data, Postgresql version 8.4 or higher is required.

pg_restore -C dhis2demo.backup

In order to let DHIS 2 know about a primary database source, DHIS2_HOMEenvironment variable needs to be set to denote a directory with the hiber-nate.properties file.

1. content of file jpivot-1.8.0.zip

48

Page 54: Components for the health information system DHIS 2 - Muni

B. B

Figure B.1: Screenshot of a loosely-integrated pivot-table-report module

An example of the hibernate.properties file is below.

hibernate . d i a l e c t = org . h ibernate . d i a l e c t . PostgreSQLDialecth ibernate . connect ion . d r i v e r _ c l a s s = org . p o s t g r e s q l . Driverh ibernate . connect ion . u r l = jdbc : p o s t g r e s q l : dhis2demohibernate . connect ion . username = usernamehibernate . connect ion . password = passwordhibernate . hbm2ddl . auto = update

49


Recommended