Developing an online database for the UvA light scattering experiment
for the Bachelors Degree in Computer Science
by
M. Lankamp [email protected]
June 16th, 2004
Supervisors:
H. Volten [email protected]
I. Bethke [email protected]
����������� ��������
2
3
Abstract At the astronomical institute “Anton Pannekoek” of the University of Amsterdam (UvA)
researchers used an online database to publish their data of the light scattering experiment.
This database, however, was a simple database and was not designed to handle large amounts
of data because it consisted entirely of static pages which needed to be rewritten when new
data had to be entered. Because it was expected that large amounts of data would soon need to
be entered in the system, the system had to be improved. This new system had to be capable
of handling the large amounts of data that were expected and improve the ease of use of this
data for fellow researchers. This thesis will describe the development of the system that was
created to meet these demands. To this end, the system will employ server side scripts, a
database, applets and binaries.
4
5
Table of Contents
1. INTRODUCTION............................................................................................................................................. 7 2. BACKGROUND ............................................................................................................................................... 9 3. REQUIREMENTS AND SOLUTIONS ........................................................................................................ 11
3.1. DYNAMIC EDITING, SEARCHING AND VIEWING........................................................................................ 11 3.2. ACCESS CONTROL ..................................................................................................................................... 12 3.3. QUALITY CONTROL................................................................................................................................... 14 3.4. MISCELLANEOUS ...................................................................................................................................... 15
4. CLIENT SIDE ................................................................................................................................................. 17 4.1. BROWSER COMPATIBILITY ....................................................................................................................... 17 4.2. APPLETS .................................................................................................................................................... 18
5. SERVER SIDE ................................................................................................................................................ 21 5.1. TRANSACTIONS ......................................................................................................................................... 21 5.2. SESSIONS ................................................................................................................................................... 22 5.3. ACCESS CONTROL ..................................................................................................................................... 23 5.4. SCRIPT LAYOUT ........................................................................................................................................ 23 5.5. BINARIES ................................................................................................................................................... 24
6. FUTURE WORK ............................................................................................................................................ 27 7. CONCLUSION................................................................................................................................................ 29 APPENDIX A: ERROR PROPAGATION FORMULAS............................................................................... 31 APPENDIX B: CODE FRAGMENTS .............................................................................................................. 33
CODE FRAGMENT 1: SESSION HANDLING FUNCTIONS .................................................................................... 33 CODE FRAGMENT 2: CHECKDBSTATE FUNCTION .......................................................................................... 35 CODE FRAGMENT 3: DATABASE.PHP TOP ....................................................................................................... 39 CODE FRAGMENT 4: APPLET AXIS SCALING AND DRAWING .......................................................................... 43 CODE FRAGMENT 5: NUMBER CLASS .............................................................................................................. 47
6
7
1. Introduction
At the astronomical institute “Anton Pannekoek” of the University of Amsterdam (UvA)
research is being conducted on the optical properties of very small ‘dust’ particles, which
occur in many places in the cosmos, such as in planetary atmospheres or around stars. A
unique scientific experiment is available in which the light scattering of these particles is
measured. The scientists involved want to make this data available in an electronic form to
stimulate the use of the data in future research. By placing it on the web the data becomes
easily accessible to scientists all over the world. At the moment they choose for a very simple
solution. However, the expectation is that the amount of data will increase considerably over
the next couple of years, amongst others because a new light scattering facility is currently
build by Spanish collaborators. This increase calls for a substantially improved online
database.
This thesis describes the system that was developed to meet these needs. The system makes it
possible for users all over the world to add measured data from experiments to the database
and to delete this data. It becomes possible to search the database for information concerning
particular samples. The system also has the ability to display the data obtained from these
searches in graphical representations. Because of the desired quality control over added data,
the system does not allow everybody to add data. Instead, only privileged users are able to
add data. In addition, the data will only be accepted into the system after approval from
another privileged user. To enforce these privileges, the system implements user accounts.
Apart from personal information about the user, these accounts hold rights to indicate what
the user can and cannot do.
8
9
2. Background
To understand the choices made during the development of the system, it is important to
understand what kind of data the system will handle, what their meaning and importance is,
and how the different kind of data are related.
The setup of the experiment with which the main data are obtained consists of a horizontal
ring on which a detector can move. In the center of this ring is located a vertical stream of
sample particles. A beam of light is emitted by a laser in the same plane as the ring. This laser
light is incident on the stream and part of it is scattered by the particles. The detector moves
around the ring to measure the amount and polarization properties of this scattered light at
various angles1. In this way a light scattering matrix is obtained; a 4x4 matrix, traditionally
denoted by F, with its 16 elements Fij, describing the transformation from the incident light to
the scattered light. This matrix depends on the type of particles, the angle at which it was
measured, and the wavelength of the laser light. Researchers are interested in the values of the
elements of these matrices in relation to the angles and wavelength at which they were
measured.
To clarify this relationship, the data are often plotted in a
graph. In such a graph, the angle is represented on the
horizontal axis and the value of the matrix element on the
vertical axis. Figure 2-1 shows such a graph, displaying the
negated matrix element F12 divided by the matrix element
F11 from the sample Feldspar for 37 angles ranging from 5
to 173 degrees, using laser light with a wavelength of 441.6
nm. The matrix element is divided by F11 because
researchers are most interested in the properties of the
scattered light with respect to the intensity of the total
scattered light. This intensity is represented by F11. The
quantity −F12/F11 can be interpreted as the degree of linear polarization for unpolarized
incident light, which can also be observed for astronomical objects such as comets. Also note
that, because the values are measured values, they have a measuring error. This is displayed in
the graph by vertical bars through each point indicating the size of this error.
1 Usually only half a circle is measured because the scattering of the light is symmetric along the axis of the original light beam.
Figure 2-1: the scattering matrix element –F12 /F11 for the sample Feldspar at 441.6 nm.
10
To know what kind of particles a sample is made of is of particular interest to researchers
since this will enable them to relate the properties of the scattered light to the size and shape
of the particles. A size distribution is used to present the size information. It is obtained by
measuring, for example, the volumes of particles as a function of (equivalent sphere) radius.
These measurements result in a specific size distribution called a volume distribution. This
distribution can be plotted in a graph, with the radius2 displayed on the horizontal axis and the
effective volume on the vertical axis. It is also useful to calculate the particle number and
surface area distribution from the volume distribution.
Figure 2-2: The size distributions for the Feldspar sample
Figure 2-2 shows such size distributions for the Feldspar sample. The displayed distributions
are denoted (as is usual) with V for the volume distribution, S for the surface distribution and
N for the number distribution. Note that the vertical axis is dimensionless because all three
distributions have been normalized to have a surface of 1.
Further, images are provided to give the visitor an idea of the shape of the particles and the
color of the sample.
Therefore, it is conceptually best to view samples as a container for the scattering matrices
grouped by wavelength, size distributions, images, article references and additional metadata
indicating where the data originated from, what certain properties such as sample color and
particle shape are and of course, what the sample’s name is. Article references are provided to
enable the visitors to find the articles in which the data originally was published.
2 It is common to plot the radius on a logarithmic scale since this improves readability of the graph. The values on the vertical axis are then, of course, adjusted appropriately.
11
3. Requirements and solutions
As the new system was designed to improve upon the old system, various requirements were
set on the functionality of the system. These requirements are listed below and for each of the
requirements the adopted solution is discussed.
3.1. Dynamic editing, searching and viewing
The most important needed improvement was the ability for data to be added dynamically,
without manually rewriting any of the website sources. It should be possible for users of the
website to point-and-click their way through the addition procedure. This meant that the
website would have to be written in a server-side scripting language and the data would have
to be stored on the server. Besides the fact that PHP is a good server-side scripting language
for this problem, it was also enforced because it was the only scripting language supported on
the target web server. For data storage, a MySQL database was chosen because a relational
database management system was the obvious choice given the complex relations between the
components of the system, and MySQL in particular because it was also enforced and often
used in combination with PHP, thus having proven its reliability. The interface between the
user and the system would be web pages with forms to submit data the users want to add.
Again, this method has been chosen because it has been proven to work again and again on
various other websites.
Because the database was created to hold large amounts of samples, it was necessary to
implement a search function. With this search function visitors could search for samples by
enter the criteria for the samples they would wish to browse. This search function was easy to
realize since the underlying system already used SQL. The search function simply extended
the query used to retrieve all samples by including a where-clause that limited the samples to
the criteria entered by the visitor.
The system had to be able to reproduce the graphs that are used by researchers to display the
matrix elements belonging to a sample. There are, however, various ways to present these
graphs. In certain cases the graph might be negated, as seen for the –F12/F11 example in Figure
2-1 in the previous chapter. Also, some researchers don’t use � as the angle, but �. This is a
different representation of the angle which can be calculated by � = 180 – �. It is also
interesting for researchers to have the plots of the matrix elements of several samples shown
12
in one graph. This allows them to easily analyse the differences and similarities of the
samples. Following this, it is also useful for the researchers to view the average of these plots.
One possible solution could have been to create a script that dynamically generates the graph
using PHP’s GD module. Which samples to draw and which options such as negate, average,
invert angle and show errors to apply, could be passed as GET parameters. However, this
would result in visitors sending page requests for every kind of graph they wish to view. This
would be very cumbersome since visitors would be likely to switch options and matrix
elements to try out different perspectives.
So instead, Java applets were chosen as the solution. Such an applet would be passed the
sample data in a single page request after which the applet would present the user with the
graph and options to change that graph. Once different options, samples or matrix elements
are selected, the graph is immediately updated accordingly. This solution would improve both
speed and usability of this tool. To enable researchers to still use the generated graphs for
other purposes, the graphs are downloadable as standard images. This has been implemented
by creating a button on the applet which would redirect the visitor’s browser to a script that
generates the graph they have currently selected in the applet.
To view the size distributions of a sample, an identical reasoning and solution has been used.
Now, there is only one set of data and the options are a choice of displaying N(log r), V(log r),
S(log r), n(r), v(r) and s(r). Once again, the generated graphs can be downloaded by pressing a
button on the applet.
3.2. Access control
Access control on the system is enforced by user accounts. The main reason for introducing
these was the fact that not everyone should be allowed to add data; only an elite group of
people should be able to add and delete data. This meant that these people would have to be
authorized by the system in such a way that the system was somehow able to determine that
the current page request is from such an authorized person.
To this end it was chosen to implement user accounts, where each person which would need
to have authorized access to the system is given a username, uniquely identifying him or her,
and a password, preventing anyone else from using his or her account. User accounts in
particular were chosen because they are a standard solution for this sort of problem, used in
many other systems. This means that the concept will most likely be understood by everyone
13
using the system, especially since most, if not all, users are from scientific institutes where
they have to log into workstations using usernames and passwords.
The solution of user accounts presented the new problem of determining who should be able
to create or delete the accounts in case new users want to join the system or existing users
want to leave. It could be argued that there would be a unique and undeletable user who can
perform these operations. However, in the case where there are many users joining and
unjoining, this would present a great deal of stress on that single person who would have to
process all of these requests. It was therefore chosen to let that one user be able to assign other
users the same right. By enabling this feature, the special user could assign other users to be
user creators. The load could then be distributed over these users. To implement this feature,
user rights were introduced. This meant that every user had a set of rights describing what he
or she could and could not do in the system. Also, in order to identify the special user it was
named ‘root’. This decision is based upon the UNIX operating system since it was likely that
the users of the system were familiar with it and could thus more easily understand the
relationship between the users in the system.
In the interest of scalability it was chosen to extend this rights mechanism to allow the root to
assign users who could assign rights to other users. These ‘granters’ could be added to relieve
the root of his granting tasks should he or she feel the need for this. The addition of this
possibility meant that the root could assign both normal rights and rights to grant rights to
users. The recursiveness of this mechanism was easily spotted, so the entire rights system was
generalized by dividing each right in 3 groups or ‘levels’ named L0 to L2. The level 0 rights
were the actual rights that enabled the users to perform the tasks on the data in the system.
The level 1 and 2 rights are the rights to grant level 0 and level 1 rights respectively, or, more
formal, the L(n) right is the right to grant the L(n-1) right for n > 0. This meant that for every
right two values had to be stored, one for level 0 and one for level 1. The level 2 rights did not
have to be explicitly stored, since they were implicated by the username: only the root has
these rights. Because it is the intention that users with granting rights should only be able to
grant rights to other users and not to themselves, the latter possibility has been explicitly
disabled. Also, any operations on the root user have been disabled since it is obvious that this
user should always exist with all possible rights.
Displaying these rights could be easily done in a tabular representation where each row
represents a single right and each column a level.
14
Figure 3-1: Example of user rights
Figure 3-1 shows an example of user rights in such a tabular manner as presented on the
website. The system currently uses seven types of rights. As can be seen, this user can add
data, delete his or her data and grant both approve rights to other users. The precise meaning
and function of these approve rights are discussed in the next paragraph.
3.3. Quality control
All data that ends up in the database by addition is meant to be viewed by the international
scientific community. This means that data should not just be added without some form of
peer review. This quality control is enforced by hiding changes to the data from unauthorized
visitors until an authorized user has approved the changes. It is conceivable however, that the
system will host users which should not be allowed to approve these changes. To prevent
these users from doing so, two new rights were introduced to the previously discussed rights
system: approve additions and approve deletions. Only users with these rights are able to
approve additions and deletions, respectively. Any additions or deletions made by other users3
remain hidden from visitors of the site. Instead, for users who are privileged to approve these
alterations, the samples in question have an added remark to indicate what has changed. These
users can then view the changes and approve or reject them. Of course, the system does not
allow someone to approve their own changes, should that user have both the alteration and the
approve right since that would defeat the purpose of peer review.
3 These addition or deletions are, of course, only allowed if the users have the appropriate rights.
15
3.4. Miscellaneous
The requirements discussed in the previous paragraphs make up the core of the system. But
there were additional requirements that, although not essential to the working of database,
provided various other benefits.
A guestbook was one of these requirements. It was felt that a guestbook would increase
participation by fellow researchers by enabling them to post remarks about the samples,
recommendations, links to other interesting website and much more. This guestbook was
implemented using a straight-forward table with a single tuple per guestbook entry. Visitors
could supply their name, the message and optionally, an e-mail and website address. Also,
besides storing the time and date of the entry, the IP address of the visitor that posted the entry
is also stored. The purpose of this IP address is to provide a way for the root to track abusers
of the guestbook and is thus hidden from anyone except the root for privacy reasons.
A help system was also implemented to increase the user friendliness of the system. This help
system contained various help text explaining how the system works and what visitors should
and could do in certain situations. A search function and references to other help topics are
also implemented to help the user navigate the help topics. Although it was very unlikely that
the help topics should ever needed to be changed once they were entered in the database, a
mechanism was implemented that allowed the root to add and delete help topics, change
existing topics and add or remove topic references.
16
17
4. Client side The developed system is a website, so the development can be split into two areas: the client
side and the server side. The client side will cover everything that runs on or is interpreted by
the software of the client while the server side will cover everything that runs on or is
interpreted by software on the server. This chapter will discuss the problems and issues that
have been encountered on the client side and the solutions that have been implemented for
them.
4.1. Browser compatibility
To make the website compliant with modern day standards, XHTML 1.1 was chosen as the
Document Type Definition, combined with CSS. This solution was chosen because it was the
latest mark-up language standard at the creation of the system and it still is compatible with
browsers using the older HTML. Unfortunately, some of the older browsers that had to be
supported did not support CSS. To partially counter this problem, the website was constructed
in a way that left the site usable when viewed without full CSS support. As an example,
consider the main page menu of the website. Figure 4-1 shows this menu rendered on a
browser with correct CSS support while figure 4-2 shows the same menu when rendered in an
older browser.
Figure 4-1: Menu rendered with correct CSS support
As can be seen, by using certain XHTML and CSS constructs, the menu
is visible and usable in both older and newer browsers. This does not
however, hold for every older browser. Some of these, like Internet
Explorer 4 and Netscape 4, have incorrect CSS support. This would
result in parts of the page being rendered wrong or even completely left
out. The only way to solve this problem would be to completely abandon
XHTML and CSS and write the website in HTML 4 without CSS. This
presented the problem of choosing between the newer and theoretically
better solution and the older and better supported solution. After careful
Figure 4-2: Menu rendered with no CSS support.
18
consideration it was decided to use XHTML and CSS instead of HTML 4. This decision was
based on the fact that XHTML and CSS are a better solution with respect to future
development in the area of the World Wide Web than HTML. For the visitors of the website
who use older browsers with incorrect CSS support, a help topic has been created, explaining
that the website might not render correctly on their browser and that he or she is advised to
upgrade his or her browser. This is, of course, only helpful when the site renders correct
enough for the visitors to read the help text.
4.2. Applets
The system features two kinds of applets, as described in paragraph 3.1: an applet for viewing
the scattering matrix graphs of one or more samples, and an applet for viewing the size
distributions graph of a sample. Both applets work according to the same principle, but have
slightly different calculations and GUI elements depending on their function.
Both applets use the init() method to read the data from the HTML parameters and initialize
the GUI elements. Should any step of this process fail, a flag will be set. The paint method of
the applet will check this flag and draw an error message if it is set. Otherwise, when
everything was loaded and initialized correctly, it calls the paint method of the child elements.
The paint method of the Graph class, responsible for drawing the graph, first determines the
scales of both axes by using the maximum and minimum values of the horizontal and vertical
axis. This algorithm differs slightly in both applets because one applet plots the radius on the
horizontal axis, which is preferably scaled in units of 0.1 �m, or any power of 10. The other
applet however, plots the angle at the horizontal axis, which might be better scaled using units
of 5 degrees. Code fragment 4 in appendix B is the axis scaling and drawing algorithm of the
latter applet.
As explained in chapter 2, the values of the scattering matrices have errors associated with
them. When calculating with these values, the errors must be propagated using the error
propagation rules (see appendix A for these formulas). To this end, a Number class has been
constructed to represent this value and error pair. Methods were created to represent the basic
operations such as addition, subtraction, division, multiplication and logarithms. These
methods adjust the values in the class while applying the appropriate error propagating
formulas. Code fragment 5 in appendix B shows this class.
One of the problems found was that after determining the unit size on an axis, values might
not be properly displayed due to rounding errors. For example, when the vertical axis ranges
19
from -0.3 to 0.2 with a unit size of 0.1 the resulting values on the vertical axis could be
displayed as -0.3, -0.199999999, -0.099999999, 0, 0.100000003 and 0.2. These errors were
caused by floating point round off errors in the calculations. To compensate for this problem,
Java’s DecimalFormat class was used to print the numbers while forcing a fixed number of
decimals. This solution eliminated the display of the rounding errors. This same principle was
applied to the horizontal axis as well. Another problem, found during tests of the size
distribution applet, was that the applet appeared to flicker. With the size distribution applet,
the graph needed to be repainted whenever the user moved the mouse over it. This, combined
with non-buffered painting, resulted in the flickering. The solution was to create a back
buffer; an instance of the Image class in which all the painting would be done and which
would be copied to the canvas in the paint method. Because this copying would be done
atomically, the flickering was removed. This solution was also applied to the scattering matrix
applet.
20
21
5. Server side
This chapter will discuss the problems and issues that have been encountered on the server
side and the solutions that have been implemented for them.
5.1. Transactions
Data integrity is an absolute must for the system. In the case of server failure or script
abortion, there should be no incomplete samples additions or deletions. Given the fact that a
database management system was used, the obvious solution was to use transactions. This
ensures exactly the data integrity that was needed. Unfortunately, the system where the
system was to run, ran MySQL 3.23.54 without support for transactions.
However, once data was added to the system it was not allowed to be altered because it
involved officially reviewed data. This meant that updates involving multiple tables were not
required. This simplified the transactions to dealing with just insertions and deletions. These
could be easily transactionalized by including a state column. This would be a two-valued
column describing whether the tuple in question has been committed or not. When applying
this change to the table designs and adding the appropriate code to the scripts, transactions
could be emulated with the same guarantee of data integrity.
Because most tables already had a state column with possible values of 0 or higher to indicate
if that item was already approved or not (also see paragraph 3.3), it was chosen to extend
these values to include -1 for the state ‘uncommitted’. Then, when an insertion involving
multiple tables is required, the tuple in the main table is inserted with the state column set to
the value -1. Then all other tuples in the other tables are inserted with normal states (0 or
higher, depending on the situation). Finally, when all these tuples are successfully inserted,
the first inserted tuple’s state column is updated to the ‘committed’ state (0 or higher).
This algorithm ensures that the state of the first tuple is 0 or higher if and only if all tuples
depending on it have been successfully inserted. Should the transaction be interrupted because
the server crashes or the user cancels the page request, the sample tuple remains in the table
with its state set to the value -1. Then, the next time someone sends a page request to the web
server, the script on that page calls a routine that checks the state of the database and rolls
back any unfinished transactions. It does this by selecting all sample tuples with a state value
of -1. It then deletes all tuples in the tables that depend on those samples. And only when
22
these deletions have been successfully completed, will it delete the sample tuple itself. After
this routine, the database has been returned to a state before the insertion and the page can
resume as normal. Code fragment 2 in appendix B shows this routine.
Transactions on deletions are achieved by first setting the state of the main tuple to -1. After
this, all tuples depending on the main tuple are deleted after which the main tuple itself is
deleted. This algorithm uses the assumptions of the insertion algorithm mentioned above.
Namely, should the transaction be interrupted after the main tuple’s state has been set to -1, it
will seem as if an insertion was interrupted and the routine mentioned above will roll back the
failed ‘insertion’ on the next page request. This rollback then finishes what was interrupted
before, namely deleting the tuples.
5.2. Sessions
Users who are allowed to alter data in the system must be authenticated. To prevent this
authentication from occurring on every page request, or passing the username and password
in the URI of each page (which then has a much higher chance of being intercepted or
decrypted), the system must remember that a user has logged in. Also, since visitors of the
website can add samples to their selection, this selection must be remembered across page
request. Server sessions are the ideal mechanism for implementing these requirements.
However, instead of simply using PHP’s default session handler functions, the website used
PHP’s session handler override function to define a custom set of session handler functions.
This has been done because PHP’s default session handler functions store the data
unencrypted in a public readable file in a temporary directory (on UNIX, this is usually /tmp),
whose filename literally contains the session identifier. This way of handling sessions
presented a considerable security risk: anyone authorized to read the temporary directory of
the system (usually every user that can log in) could figure out what the session identifier is of
a new session by monitoring the files in that directory. It could then send custom page
requests to the server using that session identifier. This way, unauthorized users could fake
being logged in to the system and perform otherwise unauthorized actions on the data.
The custom session handlers defined by the system store the session data in a table in the
database and use additional access control to prevent anyone using stolen session identifiers.
This access control mechanism consists of comparing the host address where the page request
originated from to the stored host address for the session. Only if these match, i.e. the page
request came from the same host which created the session, will the session data be read or
23
written. Using these techniques, unauthorized users cannot find out what sessions identifiers
are currently in use because even read access to the database is restricted. And should they
acquire a useable session identifier from e.g. eavesdropping on a connection of an authorized
user, they would not be able to use it. Code fragment 1 in appendix B shows these session
handling functions.
5.3. Access control
Access control for the system is realized by assigning accounts to every user who is intended
to use the website. This account holds the username, password hash, personal information and
access rights.
To incorporate the user accounts into the scripts, a User class has been constructed. This class
is created with the name of the user which it should represent or an empty string for
anonymous users. The class retrieves the tuple from the users table of the database on creation
and defines several functions that are used to query this tuple. The most important of these
functions is the HasRights function. This function checks whether the user has any or all of
the rights passed to the function. The scripts use this function to determine if the user is
allowed to perform certain actions. Namely, after the correct function has been determined by
examining the HTTP GET arguments, that function is passed the user class instance for the
currently logged in user (also see paragraph 5.4). The function checks whether that user has
the rights to do what it wants to do. If this is not the case, the function prints an error
describing the lack of rights and exits.
Since the username of the currently logged in user is stored in the session data, which cannot
be forged (see paragraph 5.2), it is guaranteed that a user cannot do anything that his or her
rights prevent.
5.4. Script layout
All scripts are structured in a way that allows for ease of maintenance and extension. Every
script starts by including the various include files that provide often used procedures such as
database querying, user right checking, session handling and error handling.
In particular, every script includes the page include file, page.inc. This file defines the
functions needed to output the actual requested page. The site is constructed in such a way
that the actual page-specific output is a single table cell and can thus be composed without
24
knowing the design of the rest of the webpage. Using this abstraction, the WriteBody function
in the page include file takes the current menu tab index, page title and page contents as
arguments and outputs the entire HTML body tag and contents. Should the design of the
website ever need to be changed, only the page include file would have to be altered instead
of every single page of the website.
After including these files, the script reads the HTTP GET and session data to determine
which user is logged in and what function needs to be called. This function writes its output
using the standard output functions such as echo or printf. This output is, however, not
immediately written to the client. It is buffered using PHP’s output buffering functions. After
all appropriate functions have been called, the contents of this output buffer is copied and
discarded. The copy is then used as the page contents argument to the WriteBody function.
Using this mechanism simplifies writing and maintaining the functions, since they can now
pretend that they’re writing directly to the client instead of adding the output to a variable
which would then be returned.
Up to the where the output has been copied and discarded, nothing has yet been written to the
client. To return the resulting page to the requesting client, the script includes a file,
header.inc, which simply contains the HTML header and head tag contents, writes any
custom head tag contents such as a style sheet and finally calls WriteBody.
Like the page include file, header.inc was also created for ease of maintenance and extension.
Should anything of the first part of a page ever need to change, like the page title, meta tags or
Document Type Definition, only header.inc would need to be changed. Finally, note that this
file does not include the closing head tag; it has to be explicitly added by every file including
header.inc. This has been done to allow pages to add their own contents in the head tag like
scripts, style sheets or additional meta tags as mentioned above. Code fragment 3 in appendix
B shows the top of a script, database.php, structured as described above.
5.5. Binaries
At certain points the website performs various calculations. These calculations involve
calculating the effective radius, effective variance, and size distribution of several samples.
They are obtained by integrating over a function for several hundreds of thousands of steps.
Testing showed that implementing these calculations in PHP resulted in execution times
beyond the web server’s maximum allowed execution time for scripts (which was 30 seconds
at the time of testing). Obviously needing a speed-up, the calculations were rewritten to C and
25
compiled into binaries. These binaries took significantly less time executing than their PHP
equivalents (around one second at the time of testing). To interface the binaries with the
scripts, the scripts execute the binaries, write the input data to the binary’s standard input
through a pipe and read the results from the binary’s standard output through another pipe.
Unfortunately, this solution is not without problems. The drawback is that the binaries would
have to be recompiled when the system is moved to another web server or when the current
web server is altered in a way that requires a recompilation. However, despite this drawback,
the binaries solution has still been adopted for the single reason that the execution time would
otherwise simply be too high, resulting in an error from the web server.
26
27
6. Future work
The system as it has been developed provided all facilities for adding data, approving data,
viewing data and searching for data. However, the search algorithm only allows visitors to
search for samples matching the entered criteria. For researchers it could also be of interest to
be able to search on matrix element values. The researchers could use this feature to search
for a sample with scattering matrices closely resembling their own. The system could generate
a list of possible samples matching the uploaded scattering matrices, sorted on the degree of
resemblance.
Should the system ever grow up to a point where it might be desirable to have more than just
a web interface, it could be possible to create a network protocol, server application and client
application to enable visitors to manipulate the data directly over the internet using the client
application, with the server application enforcing the quality and access control as the scripts
do now. By making the network protocol publicly available, it could be integrated into other
applications which could then directly use the data from the database.
28
29
7. Conclusion The purpose of the system was to improve the old database by supporting the expected growth
of data and simplifying the use of this data for researchers. Through the use of dynamic data
addition, deletion, quality control, user accounts and applets all of the requirements have been
realized. The threshold for researchers to use the system has been lowered by implementing
search functions, applets and help topics. The participation of and communication between
researchers has been stimulated by implementing a guestbook. And although there were some
problems, the technology behind the system has made it secure, extensible, fast, easy to
maintain and ready for the future. Overall, the new system has succeeded in improving the old
system and will soon replace it at http://www.astro.uva.nl/scatter.
30
31
Appendix A: Error propagation formulas When having a function f(x, y) with absolute errors xerr and yerr in x and y respectively and z
being f(x, y), the error in z (zerr) for uncorrelated x and y is defined as:
2
22
2
dd
dd
���
����
�+�
�
���
�=yf
yxf
xz errerrerr (1)
For addition (z = f (x, y) = x + y) and subtraction (z = f (x, y) = x - y) Eq. (1) works out to:
22errerrerr yxz += (2)
For division (z = f (x, y) = x / y) Eq. (1) works out to:
2
2
22
22
2
2
yy1
���
����
�+��
�
����
�=��
�
����
�−+��
�
����
�=
yxyx
yx
yxz errerrerrerrerr
2
2
2222
2
2
yy ���
����
���
���
�+���
����
���
���
�=���
����
�+��
�
����
�=
yxy
xyx
xy
yxyx
xy
zz errerrerrerrerr
222
2
2
���
����
�+�
�
���
�=���
����
�+��
�
����
�=
yy
xx
xyxyy
xyyx
zz errerrerrerrerr (3)
For multiplication (z = f (x, y) = xy) Eq. (1) works out to the same formula:
2222 xyyxz errerrerr +=
2222
22221���
����
�+�
�
���
�=���
����
�+��
�
����
�=+=
yy
xx
xyxy
xyyx
xyyxxyz
z errerrerrerrerrerr
err (4)
Note that for Eqs. (3) and (4) the resulting error is the relative error, which must be multiplied
by z to obtain the absolute error. For the natural logarithm (z = f (x) = ln x) the y component
can be discarded and Eq. (1) works out to:
xx
xxz err
errerr =��
���
�=2
2 1 (5)
32
33
Appendix B: Code fragments Code fragment 1: Session handling functions <?php // // By inclusion of this file sessions will be stored in the // database, ensuring that no one else can read the // (possibly sensitive) information. // It also enables load distribution over multiple servers, // since they can now all read the same session data by // connecting to the same database server. // And finally, it makes garbage collection trivial by use // of queries. // Also, the address of the requesting host is stored to // prevent people from hijacking a session from another host. // require_once('database.inc'); function sess_open($sess_path, $session_name) { ConnConnect(); } function sess_read($key) { $rs = ConnExecute("SELECT value FROM sessions WHERE sesskey='$key' AND host='".$_SERVER['REMOTE_ADDR']."'"); if ($rs !== NULL) { return $rs->Item('value'); } return ''; } function sess_close() { return TRUE; } function sess_destroy($key) { $host = $_SERVER['REMOTE_ADDR']; $rs = ConnExecute("DELETE FROM sessions WHERE sesskey='$key' AND host='$host'"); if ($rs !== NULL) { $rs->Close(); return TRUE; } return FALSE; }
34
function sess_write($key, $data) { if ($data == '') { // No need to save anything return TRUE; } $modified = time(); $value = ConnEscapeString($data); $host = $_SERVER['REMOTE_ADDR']; // Try inserting (new session) $rs = ConnExecute("INSERT INTO sessions VALUES ('$key', $modified, '$host', '$value')"); if ($rs === NULL) { // Duplicate key, session already exists; update $rs = ConnExecute("UPDATE sessions SET modified=$modified, host='$host', value='$value' WHERE sesskey='$key' AND host='$host'"); } if ($rs !== NULL) { $rs->Close(); return TRUE; } return FALSE; } function sess_gc( $maxlifetime ) { $rs = ConnExecute("DELETE FROM sessions WHERE modified < ". time() - $maxlifetime ); if ($rs !== NULL) { $rs->Close(); return TRUE; } return FALSE; } session_set_save_handler('sess_open', 'sess_close', 'sess_read', 'sess_write', 'sess_destroy', 'sess_gc'); ?>
35
Code fragment 2: CheckDbState function // // This physically removes all data from a sample, assuming the state // is set to STATE_UNCOMMITTED. // Write locks on all involved tables are assumed. // function CleanupSample( $id ) { $rs = ConnExecute("SELECT ID FROM wavelengths WHERE sample=$id"); if ($rs !== NULL) { $wids = array(); while (!$rs->EOF) { $wids[] = $rs->Item(0); } $rs->Close(); if ((count($wids) == 0) || (ConnExecute("DELETE FROM matrices WHERE wavelength IN (". implode(",", $wids) .")") !== NULL)) if (ConnExecute("DELETE FROM wavelengths WHERE sample=$id") !== NULL) if (ConnExecute("DELETE FROM articles WHERE sample=$id") !== NULL) if (ConnExecute("DELETE FROM images WHERE sample=$id") !== NULL) if (ConnExecute("DELETE FROM minerals WHERE ID=$id") !== NULL) if (ConnExecute("DELETE FROM size_distr WHERE ID=$id") !== NULL) { ConnExecute("DELETE FROM samples WHERE ID=$id"); } } } // // This function checks the state of the sample database // Should the server crash during inserts, this function // deletes the records that were being inserted. // Make sure this gets called before any other function to ensure // that the database is in a consistent state. // // This is because transactions were not an option at the time of // development. // function CheckDbState() { $rs = ConnExecute("LOCK TABLES samples WRITE, size_distr WRITE, wavelengths WRITE, matrices WRITE, images WRITE, minerals WRITE, articles WRITE"); if ($rs === NULL) { SetLastError( E_INTERNAL_ERROR ); return FALSE; } $rs->Close();
36
// // First we check for uncommitted samples and size distributions // $rs = ConnExecute("SELECT ID, state, size_state FROM samples WHERE state = ".STATE_UNCOMMITTED." OR size_state = ".STATE_UNCOMMITTED ); if ($rs === NULL) { ConnExecute("UNLOCK TABLES"); SetLastError( E_INTERNAL_ERROR ); return FALSE; } while (!$rs->EOF) { if ($rs->Item("state") == STATE_UNCOMMITTED) { CleanupSample( $rs->Item("ID") ); } else if ($rs->Item("size_state") == STATE_UNCOMMITTED) { ConnExecute("DELETE FROM size_distr WHERE ID = ". $rs->Item("ID") ); } $rs->MoveNext(); } $rs->Close(); // // Then we check for uncommitted wavelengths // $rs = ConnExecute("SELECT ID FROM wavelengths WHERE state = ". STATE_UNCOMMITTED); if ($rs === NULL) { ConnExecute("UNLOCK TABLES"); SetLastError( E_INTERNAL_ERROR ); return FALSE; } while (!$rs->EOF) { $id = $rs->Item(0); if (ConnExecute("DELETE FROM matrices WHERE wavelength=". $id ) !== NULL) { ConnExecute("DELETE FROM wavelengths WHERE ID=". $id ); } $rs->MoveNext(); } $rs->Close();
37
// // And the same for images // $rs = ConnExecute("SELECT ID, ext FROM images WHERE state= ". STATE_UNCOMMITTED); if ($rs === NULL) { ConnExecute("UNLOCK TABLES"); SetLastError( E_INTERNAL_ERROR ); return FALSE; } while (!$rs->EOF) { $pid = $rs->Item("ID"); $ext = $rs->Item("ext"); if (unlink("dbimages/image$pid.$ext")) { ConnExecute("DELETE FROM images WHERE ID=". $pid ); } $rs->MoveNext(); } $rs->Close(); // // And articles (just to be safe, but at the moment this is actually // impossible since articles are never inserted uncommitted) // $rs = ConnExecute("SELECT ID FROM articles WHERE state = ". STATE_UNCOMMITTED); if ($rs === NULL) { ConnExecute("UNLOCK TABLES"); SetLastError( E_INTERNAL_ERROR ); return FALSE; } while (!$rs->EOF) { ConnExecute("DELETE FROM articles WHERE ID=". $rs->item("ID") ); $rs->MoveNext(); } $rs->Close(); ConnExecute("UNLOCK TABLES"); return TRUE; }
38
39
Code fragment 3: Database.php top <?php require_once('session.inc'); require_once('page.inc'); session_start(); ob_start(); AddAction("help.php?id=4", "Learn about the sample database"); if (ConnConnect()) { $user = new User( $_SESSION['username']); $action = $_GET['action']; $sample = $_GET['sample']; $view = $_GET['view']; // Check database state CheckDbState(); if ($user !== NULL) { // Pre-read some rights $insertpriv = $user->HasRights( PRIV_INSERT ); if ($insertpriv) { AddAction("database.php?action=add", "Add a new sample"); } } if (($action != '') && ($user !== NULL) && (!$user->IsAnonymous())) { // Only authorized users may perform special actions if ($sample != 0) { // We're doing something on a sample if ($action == "approve1") ApproveSampleInsert( $user, $sample, TRUE ); else if ($action == "approve2") ApproveSampleDelete( $user, $sample, TRUE ); else if ($action == "approve3") ApproveSampleAddition( $user, $sample, TRUE ); else if ($action == "reject1") ApproveSampleInsert( $user, $sample, FALSE ); else if ($action == "reject2") ApproveSampleDelete( $user, $sample, FALSE ); else if ($action == "reject3") ApproveSampleAddition( $user, $sample, FALSE ); else if ($action == "delete") DeleteSample( $user, $sample ); else if ($action == "restore") RestoreSample( $user, $sample ); else if ($action == "chown") ChangeOwner( $user, $sample ); else if ($action == "addsize") AddSizeDistribution( $user, $sample ); else if ($action == "addfreq") AddFrequency( $user, $sample ); else if ($action == "addarticle")
40
AddArticle( $user, $sample ); else if ($action == "addimage") AddImage( $user, $sample ); } else if (($action == "add") && ($insertpriv)) { AddSample( $user ); } } else if ($sample != 0) { if ($view == 'sizedist') { // Show the size distribution ShowSizeDistribution( $user, $sample ); } else { // Show the sample ShowSample( $user, $sample ); } } else if ((int)$view != 0) { // Show a frequency ShowFrequency( $user, (int)$view ); } else { // Browse the samples ShowSamples( $user ); } } $output = ob_get_contents(); ob_end_clean(); require_once('header.inc'); ?> <style type="text/css"><!-- .sizedist th { background: inherit; border-bottom: 1px solid black; padding: 0px 10px; } .sizedist td { padding: 0px 10px; text-align: right; white-space: nowrap; } .matrix { border-top: solid 1px black; border-right: solid 1px black; } .matrix td { padding: 0px 5px; border-left: solid 1px black; border-bottom: solid 1px black; text-align: center; width: 120px; font-size: 8pt; } .samplename { text-align: left; font-weight: bold; border-bottom: solid 1px #C0C0C0; } .sampledesc td { font-size: 8pt; padding: 1px 0px 1px 10px; white-space: nowrap; } #details
41
{ border: 1px solid black; margin-bottom: 20px; } #details td { padding: 0px 5px; text-align: left; } #images { border: 1px solid black; margin-bottom: 20px; width: 500px; } .input { border: 1px solid black; margin-bottom: 20px; } .input td { text-align: left; vertical-align: middle; padding: 1px 5px; } .input td small { display: block; margin-bottom: 10px; } .error { font-weight: bold; color: red; } --></style> </head> <?php WriteBody( 1, "Sample Database", $output ); ?> </html>
42
43
Code fragment 4: Applet axis scaling and drawing // The minValue, maxValue, minAngle and maxAngle are the minima and maxmima // of the vertical and horizontal axis, respectively. g is the Graphics // instance of the back buffer and size is the size of the back buffer. DecimalFormatSymbols dfs = new DecimalFormatSymbols(); dfs.setDecimalSeparator('.'); DecimalFormat df = new DecimalFormat(); df.setMaximumFractionDigits(6); df.setDecimalFormatSymbols( dfs ); FontMetrics fm = g.getFontMetrics(); double miny, maxy, minx, maxx; double step, range, scalex, scaley, tickSize; int nTicks; // // Calculate and draw Y axis // step = 1; if (!logarithmic) { // We take as step the first power of 10 below the range step = Math.pow( 10, Math.floor( log10( maxValue - minValue ) ) - 1 ); } // Round the maxmium and minimum to whole steps miny = Math.floor( minValue / step ); maxy = Math.ceil( maxValue / step ); // This loop calculates the best distance between two 'ticks'. // It does so by continuously removing a tick from the maximum amount of // ticks until the range fits entirely in a whole number of ticks. // Should this not be possible for the current range, the range is // increased with one step. while (true) { range = (int)maxy - (int)miny; nTicks = size.height / 25; while ((nTicks > 0) && (range % nTicks != 0)) { nTicks--; } if (nTicks > 0) { break; } maxy++; } // Calculate some values we use miny = miny * step; maxy = maxy * step; tickSize = size.height / (double)nTicks; scaley = size.height / (maxy - miny);
44
// Draw the axis boolean zero = false; for (int i = 0; i <= nTicks; i++) { double val = miny + i * range * step / nTicks; if (logarithmic) { val = Math.pow(10, val); } StringBuffer sb = new StringBuffer(); df.format( val, sb, new java.text.FieldPosition(0) ); String s = sb.toString(); int y = graphY - (int)(i * tickSize); // Draw zero-line // We parse the resulting string instead of the double itself // to avoid floating point inaccuracies. if (Double.parseDouble( s ) == 0) { g.setColor( Color.lightGray ); g.drawLine( graphX, y, graphX + size.width - 1, y ); g.setColor( Color.black ); zero = true; } g.drawString( s, graphX- 3- fm.stringWidth(s), y + fm.getAscent() / 2); g.drawLine( graphX, y, graphX + 5, y); } if ((!logarithmic) && (!zero) && (!showAverage)) { // We didn't draw the zero-line in the loop, because "0.0" was // not found, so now we draw it by calculating its position. int y = graphY - (int)(-minValue * scaley); g.setColor( Color.lightGray ); g.drawLine( graphX, y, graphX + size.width, y ); g.setColor( Color.black ); } // // Calculate and draw X axis, same principle as Y axis // step = 5; minx = Math.floor( minAngle / step ); maxx = Math.ceil( maxAngle / step ); while (true) { range = (int)maxx - (int)minx; nTicks = size.width / 50; while ((nTicks > 0) && (range % nTicks != 0)) { nTicks--; }
45
if (nTicks > 0) { break; } maxx++; } minx = minx * step; maxx = maxx * step; tickSize = size.width / (double)nTicks; scalex = size.width / (maxx - minx); for (int i = 0; i <= nTicks; i++) { String s = String.valueOf( (int)(minAngle + i*range * step / nTicks)); int x = graphX + (int)(i * tickSize); g.drawString( s, x - fm.stringWidth(s)/2, graphY+ 3+ fm.getAscent() ); g.drawLine( x, graphY, x, graphY - 5 ); }
46
47
Code fragment 5: Number class /* * This class represents a number and its error. * It has functions that properly deal with error propagation. * The error propagation for a function f is: * error_f = sqrt( error_x^2 * (df/dx)^2 + error_y^2 * (df/dy)^2 ) * * For addition and subtraction: * error_f = sqrt( error_x^2 + error_y^2 ) * * For division and multiplication: * error_f/f = sqrt( (error_x/x)^2 + (error_y/y)^2 ); * * For natural logarithm: * error_f = error_x / x * * Also see: http://mathworld.wolfram.com/ErrorPropagation.html */ public class Number { private double value; private double error; public double getValue() { return value; } public double getError() { return error; } public void negate() { this.value = -this.value; } public void add( Number y ) { this.value += y.getValue(); this.error = Math.sqrt( this.error * this.error + y.getError() * y.getError() ); } public void subtract( Number y ) { this.value -= y.getValue(); this.error = Math.sqrt( this.error * this.error + y.getError() * y.getError() ); } public void divide( Number y ) { this.error = Math.sqrt( Math.pow(this.error / this.value, 2) + Math.pow(y.getError() / y.getValue(), 2) ); // The error is now the relative error (error_f/f) this.value /= y.getValue(); this.error *= this.value; // And now absolute }
48
public void multiply( Number y ) { this.error = Math.sqrt( Math.pow(this.error / this.value, 2) + Math.pow(y.getError() / y.getValue(), 2) ); // The error is now the relative error (error_f/f) this.value *= y.getValue(); this.error *= this.value; // And now absolute } public void log10() { // The fact that we divide by ln(10) does not change the error, // so just apply the log-based error this.error /= this.value; this.value = Math.log( value ) / Math.log(10); } public Number( Number n ) { this.value = n.value; this.error = n.error; } public Number( double value, double error ) { this.value = value; this.error = error; } };