Developing an online database for the UvA light scattering experiment · 2019-11-19 · Developing...

Developing an online database for the UvA light scattering experiment

for the Bachelors Degree in Computer Science

by

M. Lankamp [email protected]

June 16th, 2004

Supervisors:

H. Volten [email protected]

I. Bethke [email protected]

��

2

3

Abstract At the astronomical institute “Anton Pannekoek” of the University of Amsterdam (UvA)

researchers used an online database to publish their data of the light scattering experiment.

This database, however, was a simple database and was not designed to handle large amounts

of data because it consisted entirely of static pages which needed to be rewritten when new

data had to be entered. Because it was expected that large amounts of data would soon need to

be entered in the system, the system had to be improved. This new system had to be capable

of handling the large amounts of data that were expected and improve the ease of use of this

data for fellow researchers. This thesis will describe the development of the system that was

created to meet these demands. To this end, the system will employ server side scripts, a

database, applets and binaries.

4

5

Table of Contents

1. INTRODUCTION............................................................................................................................................. 7 2. BACKGROUND ............................................................................................................................................... 9 3. REQUIREMENTS AND SOLUTIONS ........................................................................................................ 11

3.1. DYNAMIC EDITING, SEARCHING AND VIEWING........................................................................................ 11 3.2. ACCESS CONTROL ..................................................................................................................................... 12 3.3. QUALITY CONTROL................................................................................................................................... 14 3.4. MISCELLANEOUS ...................................................................................................................................... 15

4. CLIENT SIDE ................................................................................................................................................. 17 4.1. BROWSER COMPATIBILITY ....................................................................................................................... 17 4.2. APPLETS .................................................................................................................................................... 18

5. SERVER SIDE ................................................................................................................................................ 21 5.1. TRANSACTIONS ......................................................................................................................................... 21 5.2. SESSIONS ................................................................................................................................................... 22 5.3. ACCESS CONTROL ..................................................................................................................................... 23 5.4. SCRIPT LAYOUT ........................................................................................................................................ 23 5.5. BINARIES ................................................................................................................................................... 24

6. FUTURE WORK ............................................................................................................................................ 27 7. CONCLUSION................................................................................................................................................ 29 APPENDIX A: ERROR PROPAGATION FORMULAS............................................................................... 31 APPENDIX B: CODE FRAGMENTS .............................................................................................................. 33

CODE FRAGMENT 1: SESSION HANDLING FUNCTIONS .................................................................................... 33 CODE FRAGMENT 2: CHECKDBSTATE FUNCTION .......................................................................................... 35 CODE FRAGMENT 3: DATABASE.PHP TOP ....................................................................................................... 39 CODE FRAGMENT 4: APPLET AXIS SCALING AND DRAWING .......................................................................... 43 CODE FRAGMENT 5: NUMBER CLASS .............................................................................................................. 47

6

7

1. Introduction

At the astronomical institute “Anton Pannekoek” of the University of Amsterdam (UvA)

research is being conducted on the optical properties of very small ‘dust’ particles, which

occur in many places in the cosmos, such as in planetary atmospheres or around stars. A

unique scientific experiment is available in which the light scattering of these particles is

measured. The scientists involved want to make this data available in an electronic form to

stimulate the use of the data in future research. By placing it on the web the data becomes

easily accessible to scientists all over the world. At the moment they choose for a very simple

solution. However, the expectation is that the amount of data will increase considerably over

the next couple of years, amongst others because a new light scattering facility is currently

build by Spanish collaborators. This increase calls for a substantially improved online

database.

This thesis describes the system that was developed to meet these needs. The system makes it

possible for users all over the world to add measured data from experiments to the database

and to delete this data. It becomes possible to search the database for information concerning

particular samples. The system also has the ability to display the data obtained from these

searches in graphical representations. Because of the desired quality control over added data,

the system does not allow everybody to add data. Instead, only privileged users are able to

add data. In addition, the data will only be accepted into the system after approval from

another privileged user. To enforce these privileges, the system implements user accounts.

Apart from personal information about the user, these accounts hold rights to indicate what

the user can and cannot do.

8

9

2. Background

To understand the choices made during the development of the system, it is important to

understand what kind of data the system will handle, what their meaning and importance is,

and how the different kind of data are related.

The setup of the experiment with which the main data are obtained consists of a horizontal

ring on which a detector can move. In the center of this ring is located a vertical stream of

sample particles. A beam of light is emitted by a laser in the same plane as the ring. This laser

light is incident on the stream and part of it is scattered by the particles. The detector moves

around the ring to measure the amount and polarization properties of this scattered light at

various angles1. In this way a light scattering matrix is obtained; a 4x4 matrix, traditionally

denoted by F, with its 16 elements Fij, describing the transformation from the incident light to

the scattered light. This matrix depends on the type of particles, the angle at which it was

measured, and the wavelength of the laser light. Researchers are interested in the values of the

elements of these matrices in relation to the angles and wavelength at which they were

measured.

To clarify this relationship, the data are often plotted in a

graph. In such a graph, the angle is represented on the

horizontal axis and the value of the matrix element on the

vertical axis. Figure 2-1 shows such a graph, displaying the

negated matrix element F12 divided by the matrix element

F11 from the sample Feldspar for 37 angles ranging from 5

to 173 degrees, using laser light with a wavelength of 441.6

nm. The matrix element is divided by F11 because

researchers are most interested in the properties of the

scattered light with respect to the intensity of the total

scattered light. This intensity is represented by F11. The

quantity −F12/F11 can be interpreted as the degree of linear polarization for unpolarized

incident light, which can also be observed for astronomical objects such as comets. Also note

that, because the values are measured values, they have a measuring error. This is displayed in

the graph by vertical bars through each point indicating the size of this error.

1 Usually only half a circle is measured because the scattering of the light is symmetric along the axis of the original light beam.

Figure 2-1: the scattering matrix element –F12 /F11 for the sample Feldspar at 441.6 nm.

10

To know what kind of particles a sample is made of is of particular interest to researchers

since this will enable them to relate the properties of the scattered light to the size and shape

of the particles. A size distribution is used to present the size information. It is obtained by

measuring, for example, the volumes of particles as a function of (equivalent sphere) radius.

These measurements result in a specific size distribution called a volume distribution. This

distribution can be plotted in a graph, with the radius2 displayed on the horizontal axis and the

effective volume on the vertical axis. It is also useful to calculate the particle number and

surface area distribution from the volume distribution.

Figure 2-2: The size distributions for the Feldspar sample

Figure 2-2 shows such size distributions for the Feldspar sample. The displayed distributions

are denoted (as is usual) with V for the volume distribution, S for the surface distribution and

N for the number distribution. Note that the vertical axis is dimensionless because all three

distributions have been normalized to have a surface of 1.

Further, images are provided to give the visitor an idea of the shape of the particles and the

color of the sample.

Therefore, it is conceptually best to view samples as a container for the scattering matrices

grouped by wavelength, size distributions, images, article references and additional metadata

indicating where the data originated from, what certain properties such as sample color and

particle shape are and of course, what the sample’s name is. Article references are provided to

enable the visitors to find the articles in which the data originally was published.

2 It is common to plot the radius on a logarithmic scale since this improves readability of the graph. The values on the vertical axis are then, of course, adjusted appropriately.

11

3. Requirements and solutions

As the new system was designed to improve upon the old system, various requirements were

set on the functionality of the system. These requirements are listed below and for each of the

requirements the adopted solution is discussed.

3.1. Dynamic editing, searching and viewing

The most important needed improvement was the ability for data to be added dynamically,

without manually rewriting any of the website sources. It should be possible for users of the

website to point-and-click their way through the addition procedure. This meant that the

website would have to be written in a server-side scripting language and the data would have

to be stored on the server. Besides the fact that PHP is a good server-side scripting language

for this problem, it was also enforced because it was the only scripting language supported on

the target web server. For data storage, a MySQL database was chosen because a relational

database management system was the obvious choice given the complex relations between the

components of the system, and MySQL in particular because it was also enforced and often

used in combination with PHP, thus having proven its reliability. The interface between the

user and the system would be web pages with forms to submit data the users want to add.

Again, this method has been chosen because it has been proven to work again and again on

various other websites.

Because the database was created to hold large amounts of samples, it was necessary to

implement a search function. With this search function visitors could search for samples by

enter the criteria for the samples they would wish to browse. This search function was easy to

realize since the underlying system already used SQL. The search function simply extended

the query used to retrieve all samples by including a where-clause that limited the samples to

the criteria entered by the visitor.

The system had to be able to reproduce the graphs that are used by researchers to display the

matrix elements belonging to a sample. There are, however, various ways to present these

graphs. In certain cases the graph might be negated, as seen for the –F12/F11 example in Figure

2-1 in the previous chapter. Also, some researchers don’t use � as the angle, but �. This is a

different representation of the angle which can be calculated by � = 180 – �. It is also

interesting for researchers to have the plots of the matrix elements of several samples shown

12

in one graph. This allows them to easily analyse the differences and similarities of the

samples. Following this, it is also useful for the researchers to view the average of these plots.

One possible solution could have been to create a script that dynamically generates the graph

using PHP’s GD module. Which samples to draw and which options such as negate, average,

invert angle and show errors to apply, could be passed as GET parameters. However, this

would result in visitors sending page requests for every kind of graph they wish to view. This

would be very cumbersome since visitors would be likely to switch options and matrix

elements to try out different perspectives.

So instead, Java applets were chosen as the solution. Such an applet would be passed the

sample data in a single page request after which the applet would present the user with the

graph and options to change that graph. Once different options, samples or matrix elements

are selected, the graph is immediately updated accordingly. This solution would improve both

speed and usability of this tool. To enable researchers to still use the generated graphs for

other purposes, the graphs are downloadable as standard images. This has been implemented

by creating a button on the applet which would redirect the visitor’s browser to a script that

generates the graph they have currently selected in the applet.

To view the size distributions of a sample, an identical reasoning and solution has been used.

Now, there is only one set of data and the options are a choice of displaying N(log r), V(log r),

S(log r), n(r), v(r) and s(r). Once again, the generated graphs can be downloaded by pressing a

button on the applet.

3.2. Access control

Access control on the system is enforced by user accounts. The main reason for introducing

these was the fact that not everyone should be allowed to add data; only an elite group of

people should be able to add and delete data. This meant that these people would have to be

authorized by the system in such a way that the system was somehow able to determine that

the current page request is from such an authorized person.

To this end it was chosen to implement user accounts, where each person which would need

to have authorized access to the system is given a username, uniquely identifying him or her,

and a password, preventing anyone else from using his or her account. User accounts in

particular were chosen because they are a standard solution for this sort of problem, used in

many other systems. This means that the concept will most likely be understood by everyone

13

using the system, especially since most, if not all, users are from scientific institutes where

they have to log into workstations using usernames and passwords.

The solution of user accounts presented the new problem of determining who should be able

to create or delete the accounts in case new users want to join the system or existing users

want to leave. It could be argued that there would be a unique and undeletable user who can

perform these operations. However, in the case where there are many users joining and

unjoining, this would present a great deal of stress on that single person who would have to

process all of these requests. It was therefore chosen to let that one user be able to assign other

users the same right. By enabling this feature, the special user could assign other users to be

user creators. The load could then be distributed over these users. To implement this feature,

user rights were introduced. This meant that every user had a set of rights describing what he

or she could and could not do in the system. Also, in order to identify the special user it was

named ‘root’. This decision is based upon the UNIX operating system since it was likely that

the users of the system were familiar with it and could thus more easily understand the

relationship between the users in the system.

In the interest of scalability it was chosen to extend this rights mechanism to allow the root to

assign users who could assign rights to other users. These ‘granters’ could be added to relieve

the root of his granting tasks should he or she feel the need for this. The addition of this

possibility meant that the root could assign both normal rights and rights to grant rights to

users. The recursiveness of this mechanism was easily spotted, so the entire rights system was

generalized by dividing each right in 3 groups or ‘levels’ named L0 to L2. The level 0 rights

were the actual rights that enabled the users to perform the tasks on the data in the system.

The level 1 and 2 rights are the rights to grant level 0 and level 1 rights respectively, or, more

formal, the L(n) right is the right to grant the L(n-1) right for n > 0. This meant that for every

right two values had to be stored, one for level 0 and one for level 1. The level 2 rights did not

have to be explicitly stored, since they were implicated by the username: only the root has

these rights. Because it is the intention that users with granting rights should only be able to

grant rights to other users and not to themselves, the latter possibility has been explicitly

disabled. Also, any operations on the root user have been disabled since it is obvious that this

user should always exist with all possible rights.

Displaying these rights could be easily done in a tabular representation where each row

represents a single right and each column a level.

14

Figure 3-1: Example of user rights

Figure 3-1 shows an example of user rights in such a tabular manner as presented on the

website. The system currently uses seven types of rights. As can be seen, this user can add

data, delete his or her data and grant both approve rights to other users. The precise meaning

and function of these approve rights are discussed in the next paragraph.

3.3. Quality control

All data that ends up in the database by addition is meant to be viewed by the international

scientific community. This means that data should not just be added without some form of

peer review. This quality control is enforced by hiding changes to the data from unauthorized

visitors until an authorized user has approved the changes. It is conceivable however, that the

system will host users which should not be allowed to approve these changes. To prevent

these users from doing so, two new rights were introduced to the previously discussed rights

system: approve additions and approve deletions. Only users with these rights are able to

approve additions and deletions, respectively. Any additions or deletions made by other users3

remain hidden from visitors of the site. Instead, for users who are privileged to approve these

alterations, the samples in question have an added remark to indicate what has changed. These

users can then view the changes and approve or reject them. Of course, the system does not

allow someone to approve their own changes, should that user have both the alteration and the

approve right since that would defeat the purpose of peer review.

3 These addition or deletions are, of course, only allowed if the users have the appropriate rights.

15

3.4. Miscellaneous

The requirements discussed in the previous paragraphs make up the core of the system. But

there were additional requirements that, although not essential to the working of database,

provided various other benefits.

A guestbook was one of these requirements. It was felt that a guestbook would increase

participation by fellow researchers by enabling them to post remarks about the samples,

recommendations, links to other interesting website and much more. This guestbook was

implemented using a straight-forward table with a single tuple per guestbook entry. Visitors

could supply their name, the message and optionally, an e-mail and website address. Also,

besides storing the time and date of the entry, the IP address of the visitor that posted the entry

is also stored. The purpose of this IP address is to provide a way for the root to track abusers

of the guestbook and is thus hidden from anyone except the root for privacy reasons.

A help system was also implemented to increase the user friendliness of the system. This help

system contained various help text explaining how the system works and what visitors should

and could do in certain situations. A search function and references to other help topics are

also implemented to help the user navigate the help topics. Although it was very unlikely that

the help topics should ever needed to be changed once they were entered in the database, a

mechanism was implemented that allowed the root to add and delete help topics, change

existing topics and add or remove topic references.

16

17

4. Client side The developed system is a website, so the development can be split into two areas: the client

side and the server side. The client side will cover everything that runs on or is interpreted by

the software of the client while the server side will cover everything that runs on or is

interpreted by software on the server. This chapter will discuss the problems and issues that

have been encountered on the client side and the solutions that have been implemented for

them.

4.1. Browser compatibility

To make the website compliant with modern day standards, XHTML 1.1 was chosen as the

Document Type Definition, combined with CSS. This solution was chosen because it was the

latest mark-up language standard at the creation of the system and it still is compatible with

browsers using the older HTML. Unfortunately, some of the older browsers that had to be

supported did not support CSS. To partially counter this problem, the website was constructed

in a way that left the site usable when viewed without full CSS support. As an example,

consider the main page menu of the website. Figure 4-1 shows this menu rendered on a

browser with correct CSS support while figure 4-2 shows the same menu when rendered in an

older browser.

Figure 4-1: Menu rendered with correct CSS support

As can be seen, by using certain XHTML and CSS constructs, the menu

is visible and usable in both older and newer browsers. This does not

however, hold for every older browser. Some of these, like Internet

Explorer 4 and Netscape 4, have incorrect CSS support. This would

result in parts of the page being rendered wrong or even completely left

out. The only way to solve this problem would be to completely abandon

XHTML and CSS and write the website in HTML 4 without CSS. This

presented the problem of choosing between the newer and theoretically

better solution and the older and better supported solution. After careful

Figure 4-2: Menu rendered with no CSS support.

18

consideration it was decided to use XHTML and CSS instead of HTML 4. This decision was

based on the fact that XHTML and CSS are a better solution with respect to future

development in the area of the World Wide Web than HTML. For the visitors of the website

who use older browsers with incorrect CSS support, a help topic has been created, explaining

that the website might not render correctly on their browser and that he or she is advised to

upgrade his or her browser. This is, of course, only helpful when the site renders correct

enough for the visitors to read the help text.

4.2. Applets

The system features two kinds of applets, as described in paragraph 3.1: an applet for viewing

the scattering matrix graphs of one or more samples, and an applet for viewing the size

distributions graph of a sample. Both applets work according to the same principle, but have

slightly different calculations and GUI elements depending on their function.

Both applets use the init() method to read the data from the HTML parameters and initialize

the GUI elements. Should any step of this process fail, a flag will be set. The paint method of

the applet will check this flag and draw an error message if it is set. Otherwise, when

everything was loaded and initialized correctly, it calls the paint method of the child elements.

The paint method of the Graph class, responsible for drawing the graph, first determines the

scales of both axes by using the maximum and minimum values of the horizontal and vertical

axis. This algorithm differs slightly in both applets because one applet plots the radius on the

horizontal axis, which is preferably scaled in units of 0.1 �m, or any power of 10. The other

applet however, plots the angle at the horizontal axis, which might be better scaled using units

of 5 degrees. Code fragment 4 in appendix B is the axis scaling and drawing algorithm of the

latter applet.

As explained in chapter 2, the values of the scattering matrices have errors associated with

them. When calculating with these values, the errors must be propagated using the error

propagation rules (see appendix A for these formulas). To this end, a Number class has been

constructed to represent this value and error pair. Methods were created to represent the basic

operations such as addition, subtraction, division, multiplication and logarithms. These

methods adjust the values in the class while applying the appropriate error propagating

formulas. Code fragment 5 in appendix B shows this class.

One of the problems found was that after determining the unit size on an axis, values might

not be properly displayed due to rounding errors. For example, when the vertical axis ranges

19

from -0.3 to 0.2 with a unit size of 0.1 the resulting values on the vertical axis could be

displayed as -0.3, -0.199999999, -0.099999999, 0, 0.100000003 and 0.2. These errors were

caused by floating point round off errors in the calculations. To compensate for this problem,

Java’s DecimalFormat class was used to print the numbers while forcing a fixed number of

decimals. This solution eliminated the display of the rounding errors. This same principle was

applied to the horizontal axis as well. Another problem, found during tests of the size

distribution applet, was that the applet appeared to flicker. With the size distribution applet,

the graph needed to be repainted whenever the user moved the mouse over it. This, combined

with non-buffered painting, resulted in the flickering. The solution was to create a back

buffer; an instance of the Image class in which all the painting would be done and which

would be copied to the canvas in the paint method. Because this copying would be done

atomically, the flickering was removed. This solution was also applied to the scattering matrix

applet.

20

21

5. Server side

This chapter will discuss the problems and issues that have been encountered on the server

side and the solutions that have been implemented for them.

5.1. Transactions

Data integrity is an absolute must for the system. In the case of server failure or script

abortion, there should be no incomplete samples additions or deletions. Given the fact that a

database management system was used, the obvious solution was to use transactions. This

ensures exactly the data integrity that was needed. Unfortunately, the system where the

system was to run, ran MySQL 3.23.54 without support for transactions.

However, once data was added to the system it was not allowed to be altered because it

involved officially reviewed data. This meant that updates involving multiple tables were not

required. This simplified the transactions to dealing with just insertions and deletions. These

could be easily transactionalized by including a state column. This would be a two-valued

column describing whether the tuple in question has been committed or not. When applying

this change to the table designs and adding the appropriate code to the scripts, transactions

could be emulated with the same guarantee of data integrity.

Because most tables already had a state column with possible values of 0 or higher to indicate

if that item was already approved or not (also see paragraph 3.3), it was chosen to extend

these values to include -1 for the state ‘uncommitted’. Then, when an insertion involving

multiple tables is required, the tuple in the main table is inserted with the state column set to

the value -1. Then all other tuples in the other tables are inserted with normal states (0 or

higher, depending on the situation). Finally, when all these tuples are successfully inserted,

the first inserted tuple’s state column is updated to the ‘committed’ state (0 or higher).

This algorithm ensures that the state of the first tuple is 0 or higher if and only if all tuples

depending on it have been successfully inserted. Should the transaction be interrupted because

the server crashes or the user cancels the page request, the sample tuple remains in the table

with its state set to the value -1. Then, the next time someone sends a page request to the web

server, the script on that page calls a routine that checks the state of the database and rolls

back any unfinished transactions. It does this by selecting all sample tuples with a state value

of -1. It then deletes all tuples in the tables that depend on those samples. And only when

22

these deletions have been successfully completed, will it delete the sample tuple itself. After

this routine, the database has been returned to a state before the insertion and the page can

resume as normal. Code fragment 2 in appendix B shows this routine.

Transactions on deletions are achieved by first setting the state of the main tuple to -1. After

this, all tuples depending on the main tuple are deleted after which the main tuple itself is

deleted. This algorithm uses the assumptions of the insertion algorithm mentioned above.

Namely, should the transaction be interrupted after the main tuple’s state has been set to -1, it

will seem as if an insertion was interrupted and the routine mentioned above will roll back the

failed ‘insertion’ on the next page request. This rollback then finishes what was interrupted

before, namely deleting the tuples.

5.2. Sessions

Users who are allowed to alter data in the system must be authenticated. To prevent this

authentication from occurring on every page request, or passing the username and password

in the URI of each page (which then has a much higher chance of being intercepted or

decrypted), the system must remember that a user has logged in. Also, since visitors of the

website can add samples to their selection, this selection must be remembered across page

request. Server sessions are the ideal mechanism for implementing these requirements.

However, instead of simply using PHP’s default session handler functions, the website used

PHP’s session handler override function to define a custom set of session handler functions.

This has been done because PHP’s default session handler functions store the data

unencrypted in a public readable file in a temporary directory (on UNIX, this is usually /tmp),

whose filename literally contains the session identifier. This way of handling sessions

presented a considerable security risk: anyone authorized to read the temporary directory of

the system (usually every user that can log in) could figure out what the session identifier is of

a new session by monitoring the files in that directory. It could then send custom page

requests to the server using that session identifier. This way, unauthorized users could fake

being logged in to the system and perform otherwise unauthorized actions on the data.

The custom session handlers defined by the system store the session data in a table in the

database and use additional access control to prevent anyone using stolen session identifiers.

This access control mechanism consists of comparing the host address where the page request

originated from to the stored host address for the session. Only if these match, i.e. the page

request came from the same host which created the session, will the session data be read or

23

written. Using these techniques, unauthorized users cannot find out what sessions identifiers

are currently in use because even read access to the database is restricted. And should they

acquire a useable session identifier from e.g. eavesdropping on a connection of an authorized

user, they would not be able to use it. Code fragment 1 in appendix B shows these session

handling functions.

5.3. Access control

Access control for the system is realized by assigning accounts to every user who is intended

to use the website. This account holds the username, password hash, personal information and

access rights.

To incorporate the user accounts into the scripts, a User class has been constructed. This class

is created with the name of the user which it should represent or an empty string for

anonymous users. The class retrieves the tuple from the users table of the database on creation

and defines several functions that are used to query this tuple. The most important of these

functions is the HasRights function. This function checks whether the user has any or all of

the rights passed to the function. The scripts use this function to determine if the user is

allowed to perform certain actions. Namely, after the correct function has been determined by

examining the HTTP GET arguments, that function is passed the user class instance for the

currently logged in user (also see paragraph 5.4). The function checks whether that user has

the rights to do what it wants to do. If this is not the case, the function prints an error

describing the lack of rights and exits.

Since the username of the currently logged in user is stored in the session data, which cannot

be forged (see paragraph 5.2), it is guaranteed that a user cannot do anything that his or her

rights prevent.

5.4. Script layout

All scripts are structured in a way that allows for ease of maintenance and extension. Every

script starts by including the various include files that provide often used procedures such as

database querying, user right checking, session handling and error handling.

In particular, every script includes the page include file, page.inc. This file defines the

functions needed to output the actual requested page. The site is constructed in such a way

that the actual page-specific output is a single table cell and can thus be composed without

24

knowing the design of the rest of the webpage. Using this abstraction, the WriteBody function

in the page include file takes the current menu tab index, page title and page contents as

arguments and outputs the entire HTML body tag and contents. Should the design of the

website ever need to be changed, only the page include file would have to be altered instead

of every single page of the website.

After including these files, the script reads the HTTP GET and session data to determine

which user is logged in and what function needs to be called. This function writes its output

using the standard output functions such as echo or printf. This output is, however, not

immediately written to the client. It is buffered using PHP’s output buffering functions. After

all appropriate functions have been called, the contents of this output buffer is copied and

discarded. The copy is then used as the page contents argument to the WriteBody function.

Using this mechanism simplifies writing and maintaining the functions, since they can now

pretend that they’re writing directly to the client instead of adding the output to a variable

which would then be returned.

Up to the where the output has been copied and discarded, nothing has yet been written to the

client. To return the resulting page to the requesting client, the script includes a file,

header.inc, which simply contains the HTML header and head tag contents, writes any

custom head tag contents such as a style sheet and finally calls WriteBody.

Like the page include file, header.inc was also created for ease of maintenance and extension.

Should anything of the first part of a page ever need to change, like the page title, meta tags or

Document Type Definition, only header.inc would need to be changed. Finally, note that this

file does not include the closing head tag; it has to be explicitly added by every file including

header.inc. This has been done to allow pages to add their own contents in the head tag like

scripts, style sheets or additional meta tags as mentioned above. Code fragment 3 in appendix

B shows the top of a script, database.php, structured as described above.

5.5. Binaries

At certain points the website performs various calculations. These calculations involve

calculating the effective radius, effective variance, and size distribution of several samples.

They are obtained by integrating over a function for several hundreds of thousands of steps.

Testing showed that implementing these calculations in PHP resulted in execution times

beyond the web server’s maximum allowed execution time for scripts (which was 30 seconds

at the time of testing). Obviously needing a speed-up, the calculations were rewritten to C and

25

compiled into binaries. These binaries took significantly less time executing than their PHP

equivalents (around one second at the time of testing). To interface the binaries with the

scripts, the scripts execute the binaries, write the input data to the binary’s standard input

through a pipe and read the results from the binary’s standard output through another pipe.

Unfortunately, this solution is not without problems. The drawback is that the binaries would

have to be recompiled when the system is moved to another web server or when the current

web server is altered in a way that requires a recompilation. However, despite this drawback,

the binaries solution has still been adopted for the single reason that the execution time would

otherwise simply be too high, resulting in an error from the web server.

26

27

6. Future work

The system as it has been developed provided all facilities for adding data, approving data,

viewing data and searching for data. However, the search algorithm only allows visitors to

search for samples matching the entered criteria. For researchers it could also be of interest to

be able to search on matrix element values. The researchers could use this feature to search

for a sample with scattering matrices closely resembling their own. The system could generate

a list of possible samples matching the uploaded scattering matrices, sorted on the degree of

resemblance.

Should the system ever grow up to a point where it might be desirable to have more than just

a web interface, it could be possible to create a network protocol, server application and client

application to enable visitors to manipulate the data directly over the internet using the client

application, with the server application enforcing the quality and access control as the scripts

do now. By making the network protocol publicly available, it could be integrated into other

applications which could then directly use the data from the database.

28

29

7. Conclusion The purpose of the system was to improve the old database by supporting the expected growth

of data and simplifying the use of this data for researchers. Through the use of dynamic data

addition, deletion, quality control, user accounts and applets all of the requirements have been

realized. The threshold for researchers to use the system has been lowered by implementing

search functions, applets and help topics. The participation of and communication between

researchers has been stimulated by implementing a guestbook. And although there were some

problems, the technology behind the system has made it secure, extensible, fast, easy to

maintain and ready for the future. Overall, the new system has succeeded in improving the old

system and will soon replace it at http://www.astro.uva.nl/scatter.

30

31

Appendix A: Error propagation formulas When having a function f(x, y) with absolute errors xerr and yerr in x and y respectively and z

being f(x, y), the error in z (zerr) for uncorrelated x and y is defined as:

2

22

2

dd

dd

��

��

�+�

�

��

�=yf

yxf

xz errerrerr (1)

For addition (z = f (x, y) = x + y) and subtraction (z = f (x, y) = x - y) Eq. (1) works out to:

22errerrerr yxz += (2)

For division (z = f (x, y) = x / y) Eq. (1) works out to:

2

2

22

22

2

2

yy1

��

��

�+��

�

��

�=��

�

��

�−+��

�

��

�=

yxyx

yx

yxz errerrerrerrerr

2

2

2222

2

2

yy ��

��

��

��

�+��

��

��

��

�=��

��

�+��

�

��

�=

yxy

xyx

xy

yxyx

xy

zz errerrerrerrerr

222

2

2

��

��

�+�

�

��

�=��

��

�+��

�

��

�=

yy

xx

xyxyy

xyyx

zz errerrerrerrerr (3)

For multiplication (z = f (x, y) = xy) Eq. (1) works out to the same formula:

2222 xyyxz errerrerr +=

2222

22221��

��

�+�

�

��

�=��

��

�+��

�

��

�=+=

yy

xx

xyxy

xyyx

xyyxxyz

z errerrerrerrerrerr

err (4)

Note that for Eqs. (3) and (4) the resulting error is the relative error, which must be multiplied

by z to obtain the absolute error. For the natural logarithm (z = f (x) = ln x) the y component

can be discarded and Eq. (1) works out to:

xx

xxz err

errerr =��

��

�=2

2 1 (5)

32

33

Appendix B: Code fragments Code fragment 1: Session handling functions <?php // // By inclusion of this file sessions will be stored in the // database, ensuring that no one else can read the // (possibly sensitive) information. // It also enables load distribution over multiple servers, // since they can now all read the same session data by // connecting to the same database server. // And finally, it makes garbage collection trivial by use // of queries. // Also, the address of the requesting host is stored to // prevent people from hijacking a session from another host. // require_once('database.inc'); function sess_open($sess_path, $session_name) { ConnConnect(); } function sess_read($key) { $rs = ConnExecute("SELECT value FROM sessions WHERE sesskey='$key' AND host='".$_SERVER['REMOTE_ADDR']."'"); if ($rs !== NULL) { return $rs->Item('value'); } return ''; } function sess_close() { return TRUE; } function sess_destroy($key) { $host = $_SERVER['REMOTE_ADDR']; $rs = ConnExecute("DELETE FROM sessions WHERE sesskey='$key' AND host='$host'"); if ($rs !== NULL) { $rs->Close(); return TRUE; } return FALSE; }

34

function sess_write($key, $data) { if ($data == '') { // No need to save anything return TRUE; } $modified = time(); $value = ConnEscapeString($data); $host = $_SERVER['REMOTE_ADDR']; // Try inserting (new session) $rs = ConnExecute("INSERT INTO sessions VALUES ('$key', $modified, '$host', '$value')"); if ($rs === NULL) { // Duplicate key, session already exists; update $rs = ConnExecute("UPDATE sessions SET modified=$modified, host='$host', value='$value' WHERE sesskey='$key' AND host='$host'"); } if ($rs !== NULL) { $rs->Close(); return TRUE; } return FALSE; } function sess_gc( $maxlifetime ) { $rs = ConnExecute("DELETE FROM sessions WHERE modified < ". time() - $maxlifetime ); if ($rs !== NULL) { $rs->Close(); return TRUE; } return FALSE; } session_set_save_handler('sess_open', 'sess_close', 'sess_read', 'sess_write', 'sess_destroy', 'sess_gc'); ?>

35

Code fragment 2: CheckDbState function // // This physically removes all data from a sample, assuming the state // is set to STATE_UNCOMMITTED. // Write locks on all involved tables are assumed. // function CleanupSample( $id ) { $rs = ConnExecute("SELECT ID FROM wavelengths WHERE sample=$id"); if ($rs !== NULL) { $wids = array(); while (!$rs->EOF) { $wids[] = $rs->Item(0); } $rs->Close(); if ((count($wids) == 0) || (ConnExecute("DELETE FROM matrices WHERE wavelength IN (". implode(",", $wids) .")") !== NULL)) if (ConnExecute("DELETE FROM wavelengths WHERE sample=$id") !== NULL) if (ConnExecute("DELETE FROM articles WHERE sample=$id") !== NULL) if (ConnExecute("DELETE FROM images WHERE sample=$id") !== NULL) if (ConnExecute("DELETE FROM minerals WHERE ID=$id") !== NULL) if (ConnExecute("DELETE FROM size_distr WHERE ID=$id") !== NULL) { ConnExecute("DELETE FROM samples WHERE ID=$id"); } } } // // This function checks the state of the sample database // Should the server crash during inserts, this function // deletes the records that were being inserted. // Make sure this gets called before any other function to ensure // that the database is in a consistent state. // // This is because transactions were not an option at the time of // development. // function CheckDbState() { $rs = ConnExecute("LOCK TABLES samples WRITE, size_distr WRITE, wavelengths WRITE, matrices WRITE, images WRITE, minerals WRITE, articles WRITE"); if ($rs === NULL) { SetLastError( E_INTERNAL_ERROR ); return FALSE; } $rs->Close();

36

// // First we check for uncommitted samples and size distributions // $rs = ConnExecute("SELECT ID, state, size_state FROM samples WHERE state = ".STATE_UNCOMMITTED." OR size_state = ".STATE_UNCOMMITTED ); if ($rs === NULL) { ConnExecute("UNLOCK TABLES"); SetLastError( E_INTERNAL_ERROR ); return FALSE; } while (!$rs->EOF) { if ($rs->Item("state") == STATE_UNCOMMITTED) { CleanupSample( $rs->Item("ID") ); } else if ($rs->Item("size_state") == STATE_UNCOMMITTED) { ConnExecute("DELETE FROM size_distr WHERE ID = ". $rs->Item("ID") ); } $rs->MoveNext(); } $rs->Close(); // // Then we check for uncommitted wavelengths // $rs = ConnExecute("SELECT ID FROM wavelengths WHERE state = ". STATE_UNCOMMITTED); if ($rs === NULL) { ConnExecute("UNLOCK TABLES"); SetLastError( E_INTERNAL_ERROR ); return FALSE; } while (!$rs->EOF) { $id = $rs->Item(0); if (ConnExecute("DELETE FROM matrices WHERE wavelength=". $id ) !== NULL) { ConnExecute("DELETE FROM wavelengths WHERE ID=". $id ); } $rs->MoveNext(); } $rs->Close();

37

// // And the same for images // $rs = ConnExecute("SELECT ID, ext FROM images WHERE state= ". STATE_UNCOMMITTED); if ($rs === NULL) { ConnExecute("UNLOCK TABLES"); SetLastError( E_INTERNAL_ERROR ); return FALSE; } while (!$rs->EOF) { $pid = $rs->Item("ID"); $ext = $rs->Item("ext"); if (unlink("dbimages/image$pid.$ext")) { ConnExecute("DELETE FROM images WHERE ID=". $pid ); } $rs->MoveNext(); } $rs->Close(); // // And articles (just to be safe, but at the moment this is actually // impossible since articles are never inserted uncommitted) // $rs = ConnExecute("SELECT ID FROM articles WHERE state = ". STATE_UNCOMMITTED); if ($rs === NULL) { ConnExecute("UNLOCK TABLES"); SetLastError( E_INTERNAL_ERROR ); return FALSE; } while (!$rs->EOF) { ConnExecute("DELETE FROM articles WHERE ID=". $rs->item("ID") ); $rs->MoveNext(); } $rs->Close(); ConnExecute("UNLOCK TABLES"); return TRUE; }

38

39

Code fragment 3: Database.php top <?php require_once('session.inc'); require_once('page.inc'); session_start(); ob_start(); AddAction("help.php?id=4", "Learn about the sample database"); if (ConnConnect()) { $user = new User( $_SESSION['username']); $action = $_GET['action']; $sample = $_GET['sample']; $view = $_GET['view']; // Check database state CheckDbState(); if ($user !== NULL) { // Pre-read some rights $insertpriv = $user->HasRights( PRIV_INSERT ); if ($insertpriv) { AddAction("database.php?action=add", "Add a new sample"); } } if (($action != '') && ($user !== NULL) && (!$user->IsAnonymous())) { // Only authorized users may perform special actions if ($sample != 0) { // We're doing something on a sample if ($action == "approve1") ApproveSampleInsert( $user, $sample, TRUE ); else if ($action == "approve2") ApproveSampleDelete( $user, $sample, TRUE ); else if ($action == "approve3") ApproveSampleAddition( $user, $sample, TRUE ); else if ($action == "reject1") ApproveSampleInsert( $user, $sample, FALSE ); else if ($action == "reject2") ApproveSampleDelete( $user, $sample, FALSE ); else if ($action == "reject3") ApproveSampleAddition( $user, $sample, FALSE ); else if ($action == "delete") DeleteSample( $user, $sample ); else if ($action == "restore") RestoreSample( $user, $sample ); else if ($action == "chown") ChangeOwner( $user, $sample ); else if ($action == "addsize") AddSizeDistribution( $user, $sample ); else if ($action == "addfreq") AddFrequency( $user, $sample ); else if ($action == "addarticle")

40

AddArticle( $user, $sample ); else if ($action == "addimage") AddImage( $user, $sample ); } else if (($action == "add") && ($insertpriv)) { AddSample( $user ); } } else if ($sample != 0) { if ($view == 'sizedist') { // Show the size distribution ShowSizeDistribution( $user, $sample ); } else { // Show the sample ShowSample( $user, $sample ); } } else if ((int)$view != 0) { // Show a frequency ShowFrequency( $user, (int)$view ); } else { // Browse the samples ShowSamples( $user ); } } $output = ob_get_contents(); ob_end_clean(); require_once('header.inc'); ?> <style type="text/css"></style> </head> <?php WriteBody( 1, "Sample Database", $output ); ?> </html>

42

43

Code fragment 4: Applet axis scaling and drawing // The minValue, maxValue, minAngle and maxAngle are the minima and maxmima // of the vertical and horizontal axis, respectively. g is the Graphics // instance of the back buffer and size is the size of the back buffer. DecimalFormatSymbols dfs = new DecimalFormatSymbols(); dfs.setDecimalSeparator('.'); DecimalFormat df = new DecimalFormat(); df.setMaximumFractionDigits(6); df.setDecimalFormatSymbols( dfs ); FontMetrics fm = g.getFontMetrics(); double miny, maxy, minx, maxx; double step, range, scalex, scaley, tickSize; int nTicks; // // Calculate and draw Y axis // step = 1; if (!logarithmic) { // We take as step the first power of 10 below the range step = Math.pow( 10, Math.floor( log10( maxValue - minValue ) ) - 1 ); } // Round the maxmium and minimum to whole steps miny = Math.floor( minValue / step ); maxy = Math.ceil( maxValue / step ); // This loop calculates the best distance between two 'ticks'. // It does so by continuously removing a tick from the maximum amount of // ticks until the range fits entirely in a whole number of ticks. // Should this not be possible for the current range, the range is // increased with one step. while (true) { range = (int)maxy - (int)miny; nTicks = size.height / 25; while ((nTicks > 0) && (range % nTicks != 0)) { nTicks--; } if (nTicks > 0) { break; } maxy++; } // Calculate some values we use miny = miny * step; maxy = maxy * step; tickSize = size.height / (double)nTicks; scaley = size.height / (maxy - miny);

44

// Draw the axis boolean zero = false; for (int i = 0; i <= nTicks; i++) { double val = miny + i * range * step / nTicks; if (logarithmic) { val = Math.pow(10, val); } StringBuffer sb = new StringBuffer(); df.format( val, sb, new java.text.FieldPosition(0) ); String s = sb.toString(); int y = graphY - (int)(i * tickSize); // Draw zero-line // We parse the resulting string instead of the double itself // to avoid floating point inaccuracies. if (Double.parseDouble( s ) == 0) { g.setColor( Color.lightGray ); g.drawLine( graphX, y, graphX + size.width - 1, y ); g.setColor( Color.black ); zero = true; } g.drawString( s, graphX- 3- fm.stringWidth(s), y + fm.getAscent() / 2); g.drawLine( graphX, y, graphX + 5, y); } if ((!logarithmic) && (!zero) && (!showAverage)) { // We didn't draw the zero-line in the loop, because "0.0" was // not found, so now we draw it by calculating its position. int y = graphY - (int)(-minValue * scaley); g.setColor( Color.lightGray ); g.drawLine( graphX, y, graphX + size.width, y ); g.setColor( Color.black ); } // // Calculate and draw X axis, same principle as Y axis // step = 5; minx = Math.floor( minAngle / step ); maxx = Math.ceil( maxAngle / step ); while (true) { range = (int)maxx - (int)minx; nTicks = size.width / 50; while ((nTicks > 0) && (range % nTicks != 0)) { nTicks--; }

45

if (nTicks > 0) { break; } maxx++; } minx = minx * step; maxx = maxx * step; tickSize = size.width / (double)nTicks; scalex = size.width / (maxx - minx); for (int i = 0; i <= nTicks; i++) { String s = String.valueOf( (int)(minAngle + i*range * step / nTicks)); int x = graphX + (int)(i * tickSize); g.drawString( s, x - fm.stringWidth(s)/2, graphY+ 3+ fm.getAscent() ); g.drawLine( x, graphY, x, graphY - 5 ); }

46

47

Code fragment 5: Number class /* * This class represents a number and its error. * It has functions that properly deal with error propagation. * The error propagation for a function f is: * error_f = sqrt( error_x^2 * (df/dx)^2 + error_y^2 * (df/dy)^2 ) * * For addition and subtraction: * error_f = sqrt( error_x^2 + error_y^2 ) * * For division and multiplication: * error_f/f = sqrt( (error_x/x)^2 + (error_y/y)^2 ); * * For natural logarithm: * error_f = error_x / x * * Also see: http://mathworld.wolfram.com/ErrorPropagation.html */ public class Number { private double value; private double error; public double getValue() { return value; } public double getError() { return error; } public void negate() { this.value = -this.value; } public void add( Number y ) { this.value += y.getValue(); this.error = Math.sqrt( this.error * this.error + y.getError() * y.getError() ); } public void subtract( Number y ) { this.value -= y.getValue(); this.error = Math.sqrt( this.error * this.error + y.getError() * y.getError() ); } public void divide( Number y ) { this.error = Math.sqrt( Math.pow(this.error / this.value, 2) + Math.pow(y.getError() / y.getValue(), 2) ); // The error is now the relative error (error_f/f) this.value /= y.getValue(); this.error *= this.value; // And now absolute }

48

public void multiply( Number y ) { this.error = Math.sqrt( Math.pow(this.error / this.value, 2) + Math.pow(y.getError() / y.getValue(), 2) ); // The error is now the relative error (error_f/f) this.value *= y.getValue(); this.error *= this.value; // And now absolute } public void log10() { // The fact that we divide by ln(10) does not change the error, // so just apply the log-based error this.error /= this.value; this.value = Math.log( value ) / Math.log(10); } public Number( Number n ) { this.value = n.value; this.error = n.error; } public Number( double value, double error ) { this.value = value; this.error = error; } };

Date post:	31-Jan-2020
Category:	Documents
Upload:	others
View:	5 times
Download:	0 times

Developing an online database for the UvA light scattering experiment · 2019-11-19 · Developing...

Documents