+ All Categories
Home > Documents > Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia...

Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia...

Date post: 29-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
69
Atlas of Living Australia Collections Project Report Atlas of Living Australia Collections Project Report Implementing Specify 6 in Australia December 2011 Ben Richardson Western Australian Herbarium, Science Division, Department of Environment and Conservation Piers Higgs Gaia Resources
Transcript
Page 1: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

Atlas of Living Australia Collections Project Report

Implementing Specify 6 in Australia

December 2011

Ben Richardson Western Australian Herbarium, Science Division, Department of Environment and Conservation

Piers Higgs Gaia Resources

Page 2: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

2

Executive Summary In mid–late 2010 a number of institutions were separately considering Specify Software’s collections management tool Specify as a migration path for their current Collection Management System (CMS).

The Atlas of Living Australia (ALA) commissioned Ben Richardson and Piers Higgs to undertake some work in this area. An initial meeting to determine the parameters of the project decided that it would “determine the resources available and those required on a per-institution basis and return a set of recommendations to ALA on how it can best help.” This report represents the completion of this task.

Upon beginning work it quickly became obvious that—given the number, size and complexity of collection databases, the lack of Australian knowledge of Specify, and the tight time frame associated with ALA projects—a project to migrate even a medium-sized collection would find it difficult to meet a mid-2012 deadline.

Instead, this project evolved into an evaluation of Specify that included:

Gauging the interest in conducting training workshops;

Conducting a follow-up survey of interested institutions;

Collation of requirements from the community;

Discussions with Specify about implementing those requirements; and

Writing this report.

Project Outcomes The outcomes from this project were:

1. Raised Awareness: Institutions are more aware of the capability of Specify as a result of this project. We will continue to raise awareness about Specify through the blog developed during this project, located at http://alacollections.wordpress.com/.

2. Determined the Demand: We have determined that 13 institutions, with over 17 million specimens between them, are interested in Specify across Australia. Several institutions performed their own evaluation and removed themselves from the listing.

3. Evaluated Suitability: From our discussions with interested parties, and our own evaluation, Specify is well suited to many of the uses envisaged by contributors to this project, although we did identify some barriers to entry. Specify seems to be particularly well suited to relatively small collections with no existing Collections Management System, but not well suited to seed-banking and cultural collections.

4. Determined the Barriers to Entry: We identified several barriers to entry, such as the missing and needed features in Specify, and the issues with the potential collaboration pathways through Specify Software and local developers.

5. Determined Collaboration Pathways: We now understand the terms by which Specify Software would engage or work with contract developers. This is critical for ongoing support, and to resolve the barriers to entry for the institutions involved.

A Way Forward for the Atlas of Living Australia A key outcome of this project was to develop a way forward for the ALA for the support of the Specify platform within Australia both up to and after June 2012, when the current core ALA

Page 3: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

3

funding ceases. This way forward had to also take into account the current life of the ALA, and the full commitment of the budget for the existing project.

Given the remaining funding for this project, we identify two possible ways for ALA to support the uptake of Specify through to June 2012:

1. Continued support for the blog (http://alacollections.wordpress.com/); and

2. Potentially a small trial of targeted support.

We also identify three main ways for this project to continue, should the ALA receive additional funding and be recommissioned at the end of the current funding cycle. These are:

1. Development of missing features;

2. Targeted support to interested parties; and

3. Other potential sources of support.

Page 4: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

4

Table of Contents Atlas of Living Australia Collections Project Report .......................................................................... 1

Executive Summary ...................................................................................................................... 2

Project Outcomes ...................................................................................................................... 2

A Way Forward for the Atlas of Living Australia ........................................................................ 2

Table of Contents .......................................................................................................................... 4

Raising Awareness........................................................................................................................ 5

What is Specify? ........................................................................................................................ 5

National Herbarium of Victoria................................................................................................... 6

Determining the Demand .............................................................................................................. 8

Gauging Interest in Specify........................................................................................................ 8

Interest from the Collections ...................................................................................................... 9

Other Interest........................................................................................................................... 12

Training Workshops................................................................................................................. 12

Evaluating Suitability ................................................................................................................... 14

Training Materials .................................................................................................................... 14

Barriers to Entry .......................................................................................................................... 15

Summary of Missing and Needed Features ............................................................................ 15

Collaboration Pathways............................................................................................................... 17

Specify Software...................................................................................................................... 17

Other Australian Developers.................................................................................................... 17

A Way Forward for the Atlas of Living Australia .......................................................................... 19

Continued Support for the Blog ............................................................................................... 19

Targeted Support Trial............................................................................................................. 19

Development of Missing Features ........................................................................................... 20

Targeted ALA Support ............................................................................................................. 21

Other Potential Methods of Support ........................................................................................ 22

Appendices ..................................................................................................................................... 23

Page 5: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

5

Raising Awareness An important part of this project was raising awareness across the community that there are additional tools that can be used for collection management.

Natural history institutions around the country already use a variety of tools to manage their specimen collections including: CSIRO’s BioLink, University of Connecticut’s Biota, Knowledge Engineering’s (KE) EMu, Specify Software’s Specify, and Vernon Systems’ Vernon CMS. Rather than use an off-the-shelf product such as these, a number of institutions have developed their collection management capability directly within a database platform, where the database structure is completely bespoke. Database platforms such as KE Texpress, MySQL, Oracle, Microsoft SQL Server, Microsoft Access and FileMaker Pro are common in this scenario. Also, some collections are managed in this way because they were built prior to the existence of CMS applications.

The wide variety of tools in use, even within some institutions leads us to suggest that a migration of many smaller collections into one tool would offer benefits to the institution in terms of standardisation, reduced support overhead, better cross-collection data sharing and Internet readiness.

ALA has provided support to three collections management systems:

1. BioLink has been re-engineered and its code that modernised, the source code provided under an open source license, and a number of outstanding issues fixed. This provides a simple upgrade path for those already using BioLink.

2. BioloMICS was selected as the preferred CMS for micro-organism collections. ALA is supporting the rollout of BioloMICS in 12 institutions.

3. Specify. ALA introduced Australian collections to Specify and initiated this project to determine how ALA could best help.

What is Specify? Specify is a GNU GPL-licensed1 (i.e. open source) CMS. As a result, it can be used by anyone without charge. Although no direct fee is charged to use Specify, there are substantial costs associated with installation and configuration of the software, staff training, data entry and maintenance, and maintenance of the infrastructure needed to run it.

By comparison, a database management system (DBMS) is a storage technology with one or more client programs and application programming interfaces (API) that provide connectivity to it. They contain no inherent understanding of the data they store. Designing a CMS in this scenario requires that an institution begin building the data structure, then the customised client forms and reports to enable staff to manage their collection. Examples of DBMS’s used to manage collection data include: KE Texpress, Microsoft Access, MySQL, Oracle and SQL Server.

By comparison, a CMS such as Specify has more features than a DBMS. A CMS provides features specific to its area of specialisation, including:

The data structure into which a collection is stored;

Import from and export to known biodiversity informatics standards;

Data-aware reporting;

Validation required to properly manage collection data.

1 The GNU Project General Public License, http://www.gnu.org/copyleft/gpl.html.

Page 6: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

6

Specify uses the MySQL DBMS as its data store in a large number of related tables2. Collection management systems in use in Australia include: BioLink, Biota, EMu, Specify and Vernon CMS.

The open source license also allows other developers to directly change the way their copy of Specify works, either to contribute fixes for program errors (i.e. “bugs”) or to implement features not yet available. Contributing this source code back to the managers of a project—in the hope that it becomes part of the official source code—is something that commonly occurs in the development of open source software. This would be beneficial to the Specify community.

There are, however, a number of complications associated with this collaboration pathway that we address in the sections “Evaluating Suitability” and “Barriers to Entry”.

National Herbarium of Victoria In 2010, the National Herbarium of Victoria at the Royal Botanic Gardens in Melbourne began the process of migrating its collection data to Specify from Knowledge Engineering’s Texpress. The Herbarium’s collection comprises 1.2 million plant, algae and fungi specimens from around the world, of which approximately 820,0003 are databased. The database is now maintained by two database administrators, 11 users with direct access and 14 others with query-only privileges. The migration was completed in the first quarter of 2011.

In preparing for the migration the National Herbarium of Victoria developed a business case as well as a migration plan. We were given permission to include these documents in the report, and they are provided as Appendix 1 and 2. The entire project was timed to complete prior to the International Botanical Congress in Melbourne, 23–30 July 2011. Documentation was developed by the team and made available to the public via their implementation blog at http://bit.ly/oKNt34 and also at http://bit.ly/qV3I3F.

Of the migration, the implementation team note that:

They each (Alison Vaughan and Niels Klazenga) spent 12 months at approximately 0.5 FTE completing the work;

Some tables in the Specify database schema are not accessible yet from the GUI, which precludes some staff from making use of them;

Some links between database tables and the GUI are not completed;

Some work-flow requirements may still require either direct MySQL access, or code to be developed outside Specify.

Alison and Niels have made themselves, their mapping, customised forms and other technical details available to others, particularly university collections. This was a major project to migrate from an existing system to Specify, which included a range of development, data cleaning and other tasks. In comparison, Gaia Resources4 have completed a much smaller project—a direct migration without cleaning—which took a much smaller amount of time, but is wrapped in similar cleaning processes that complicate even the smallest collection migration project. This work exemplifies the kind of effort needed to catalyse a self-sufficient Specify community in Australia.

2 See Specify Software’s online database schema at http://bit.ly/nLmxxq. 3A. Vaughan, pers. comm., 20 September, 2011. 4 Note also that one of the authors of this report, Piers Higgs, is the Director of Gaia Resources, and a Research Associate of the Western Australian Museum.

Page 7: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

7

Page 8: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

8

Determining the Demand

Gauging Interest in Specify This project began with an unfortunate implicit assumption that Specify was suitable for Australian collection institutions. It transpired that a number of institutions had performed a basic analysis of the available CMS and DBMS options and Specify had compared well. Unfortunately, the most comprehensive analysis, done by the Canadian Heritage Information Network (CHIN; see http://bit.ly/oOMPzT), did not include Specify.

Prior to the inception of this project, the Atlas—through John Tann—sponsored 40 people from 30 Australian biological collection institutions to attend workshops around Australia in 2010. Specify Software’s Andy Bentley was sponsored to present Specify at the workshops. Of the 30 institutions represented, 20 responded to a subsequent follow-up survey indicating their interest in Specify. Twelve institutions were wholly agreeable to implementing Specify and five more had caveats on their interest. A significant number of institutions did not respond to the survey, or indicated they were not interested in Specify.

Figure 1. Response to the mid-2010 post-workshop survey of attendees’ interest in Specify.

The reasons that workshop attendees gave for a favourable analysis of Specify included:

the source code is GNU GPL-licensed (i.e. open source) and is thus directly accessible to staff or contractors hired to fix bugs;

the lack of infrastructure funding available to some institutions preclude the purchase of more expensive options;

access to technical support isn’t deemed to be important;

the present system (ignoring its upgrade options) doesn’t handle modern database structures or character sets well;

the present system isn’t able to communicate with biodiversity web services tools.

Page 9: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

9

The reasons that workshop attendees gave for an unfavourable analysis of Specify included:

it isn’t able to interact with one or more important pre-existing applications;

it isn’t able to store data on objects from a seed bank or cultural collection;

it isn’t able to connect to a pre-existing database platform;

support for the software isn’t very accessible to Australian institutions.

Interest from the Collections This section lists the collections that indicated an interest in Specify, the collection contact(s), the management tool being used and the status of the collection. As outlined in Table 1, below, these collections (plus the existing installation at the National Herbarium of Victoria) comprise over 17 million specimens, of which 2.75 million are currently shared through ALA.

Table 1. Collections with an interest in Specify, and some associated statistics.

Collection Estimated Collection Size

Collection Held Digitally

Records In ALA

Australian National Insect Collection 12,000,000 500,000 133,052

Australian National Wildlife Collection 200,000 119,723 115,073

Australian National Fish Collection 148,000 Unknown 29,970

Western Australian Department of Agriculture and Food 400,000 142,089 0

Western Australian Museum 1,386,600 Unknown 265,175

Curtin University Entomology 11,216 11,216 0

Western Australian Herbarium 729,500 729,500 961,668

Department of Primary Industries, Parks, Water & Environment, Tasmania

150,000 80,000 0

La Trobe University Herbarium, Melbourne 25,000 Unknown 0

University of Sydney Herbarium 71,503 Unknown 0

University of Melbourne Herbarium 100,000 9,000 0

Brisbane Botanic Gardens Unknown Unknown 0

Department of Environment and Natural Resources, South Australia

946,000 600,000 688,876

National Herbarium of Victoria 1,250,212 803,000 560,707

Totals 17,418,031 2,994,528 2,754,521

(Information from the ALA Collectory, as well as direct from collection contacts.)

Page 10: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

10

Australian National Insect Collection (ANIC) CSIRO Ecosystem Sciences, Canberra Beth Mantle

The ANIC collection is managed in BioLink. The decision on whether to migrate to Specify is in the hands of CSIRO Information Management & Technology.

Australian National Wildlife Collection (ANWC) CSIRO Ecosystem Sciences, Canberra Margaret Cawsey

The ANWC collection is managed in Microsoft SQL Server, using Access as a client. The decision on whether to migrate to Specify is in the hands of CSIRO Information Management & Technology.

Australian National Fish Collection (ANFC) CSIRO Marine Sciences, Hobart Alastair Graham

The ANFC collection is managed in Texpress. ANFC has no plan to migrate to Specify because they can’t see a benefit that would outweigh the various issues that exist and resourcing required to migrate.

Entomology, Department of Agriculture and Food, Perth Rob Emery

The Entomology collection is managed in Microsoft Access. DAFWA has put the idea of migrating to Specify on hold because it lacks the resources to do the work in house. DAFWA Entomology doesn’t have a full-time database operator, and no development work has taken place on the database in about a decade.

Various Collections, Western Australian Museum, Perth Morgan Strong, Piers Higgs

The Museum manages a number of collections primarily in Microsoft Access, but also FileMaker Pro and Vernon. Some collections are not databased. WAM has commenced migration of several small collections into Specify. WAM has also begun evaluating Specify for use with cultural collections, by letting a tender which was won by Gaia Resources.

Barrow Island Project, Resource Management, Curtin University, Perth Jonathan Majer

The collection is managed in Biota. Curtin University has no plan to migrate to Specify unless they can get direct help to carry out the migration, as the resource previously available for this task has already been expended adopting Biota.

Western Australian Herbarium, Department of Environment and Conservation, Perth Ben Richardson

The collection is managed in KE Texpress. Some Specify evaluation meetings have taken place, but any plan to migrate is on hold until several crucial features are added to Specify, including Oracle support, external taxonomies and batch editing.

Page 11: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

11

Entomology, Biosecurity and Plant Health Branch, Department of Primary Industries, Parks, Water & Environment, Launceston Guy Westmore

The collection is managed in BioLink. DPIPWE Entomology is still considering Specify, but resourcing to perform the actual migration is needed: “Our commitment to move to Specify was dependent on ALA’s offer of technical/financial support to help us.”

Herbarium, Department of Botany, La Trobe University, VIC Alison Kellow

The collection was until recently not databased. The herbarium has installed Specify and is now seeking funding for a technician to enter data. It is using the customised forms and taxonomy tree developed by Royal Botanic Gardens, Victoria.

John Ray Herbarium, School of Biological Sciences, The University of Sydney, NSW Murray Henwood

The collection is currently not databased. The herbarium is hoping to have a server-based instance of Specify running in 2011. Like La Trobe University, it is planning to use the taxonomy tree developed by Royal Botanic Gardens, Victoria.

Herbarium, School of Botany, University of Melbourne, VIC Gillian Brown

The collection of around 100,000 records is currently managed in a FileMaker Pro database designed by third year IT students. A project to migrate to Specify is on hold while other tasks are completed. The herbarium is receiving help from Niels Klazenga and Alison Vaughan at the National Herbarium of Victoria.

Seed Bank, Brisbane Botanic Gardens, QLD Philip Cameron

The collection is managed in Microsoft Access 97. The seed bank is waiting to see what the Australian Seed Bank Partnership project decides before making any decision.

Page 12: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

12

State Herbarium of South Australia, Department of Environment and Natural Resources, Adelaide Stuart Pillman

The collection is managed in Texpress with links to Oracle. Oracle support is an almost guaranteed requirement, due to the number of pre-existing databases it already contains. May consider migrating anyway if the benefits outweigh the drawbacks.

Other Interest A number of other collections were either initially interested in Specify but chose not to take it any further, or indicated a general interest in the project with no current involvement. These included:

Australian National Herbarium (ANH)

The collection is currently managed in Oracle. ANH rejected Specify as an option because it felt Specify did not match the work-flow used in the Herbarium.

Forestry Tasmania

Interested in Specify 6 to manage their insect collection.

Western Australian Threatened Flora Seed Centre (WATFSC)

WATFSC was interested in Specify for seed-banking, but unfortunately there is insufficient ability for Specify to act as a seed-banking module for this to be taken up without major development, which is not feasible with their budget.

Training Workshops In mid-February, Piers and John Tann met with several Specify Software employees (Andy Bentley, Rod Spears, and Jim Beach). As a result of this meeting, we resolved to invite interested institutions to attend 3 training workshops. The workshops were targeted at 3 groups of people based on their experience, technical knowledge, and the kind of input they could make to the growth of a Specify community in Australia. The workshops proposed were:

Train the Expert, for those invitees who were/had:

Part of an Australian institution considering Specify;

Ability to answer a couple of queries about Specify from other institutions within a year of attending the course.

Train the Trainers, for those invitees who were/had:

Part of an Australian institution considering Specify;

The ability to run an in-house or other training course within a year of attending the course.

Train the Developer, for those invitees who were/had:

Part of an Australian institution considering Specify;

The ability to develop software in Java;

Agreed to submit a change to the Specify code-base within a year of attending the course.

There were nine respondents to this invitation, four from CSIRO, two from Gaia Resources, and one from each of AQIS, the University of Melbourne Herbarium and the University of Sydney Herbarium. All nine wanted to attend the Train the Experts course, six were interested in the Train

Page 13: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

13

the Trainer course, and one in the Train the Developer course. Given this low level of interest, the decision was made to not proceed with this training in conjunction with the ALA.

We briefly considered developing Specify training materials as part of this project, but given that Gaia Resources was already liaising with Specify Software to obtain their training material and to work up additional materials, we decided not to pursue this avenue. This was part of the ongoing projects that Gaia Resources is undertaking for the Western Australian Museum.

Page 14: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

14

Evaluating Suitability Given the limited time frame for this study, a “light” evaluation of how Specify could work for Australian institutions was conducted—a pilot or prototype software installation was not possible. This evaluation consisted of further investigation of the requirements of several institutions, including the WA Herbarium, WA Museum and the WA Threatened Flora Seed Centre. This was also combined with the results of our interviews with interested parties, some of whom also provided comment in the blog.

The pertinent findings from this review were:

1. A number of functions or features are missing from Specify that formed quite significant “barriers to entry” for some institutions, as are outlined in the next section,

2. Specify isn’t a capable seed bank collection management system, but Specify Software have indicated an interest in changing this5, and

3. Specify isn’t a capable cultural collection management system, but the Western Australian Museum have indicated an interest in changing this.

Also, a project seeking to use Specify must take note of a number of assumptions made by the software. Some examples include:

Specify expects numeric identifiers to be zero-filled integers. As this will be a consequence of migrating to Specify, it is crucial that institutions are aware that changing a database’s numeric identifiers often has a deleterious impact on connectivity with related systems.

Agents, Identifications and other parts of a specimen record are separate entities in the data model. Older systems may not have separated these, thus separating them becomes a necessary part of the migration and may significantly increase the workload associated with migrating to Specify.

The data model does not yet cater to the needs of seed banks or cultural collections.

Training Materials Specify Software does not have comprehensive, up-to-date training materials for other parties to provide training. Specify Software has several generic presentations, and some additional user help materials (such as on-line videos). Gaia Resources is now drafting its own training materials and will contribute these to the community.

5 “I would be interested in looking at use cases and requirements.” — Rod Spears (Specify Software), http://bit.ly/pgU968.

Page 15: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

15

Barriers to Entry Aside from the already-identified threats to suitability, we found a range of barriers that complicate a successful migration to Specify. Those we discovered in the course of this project are detailed in the following sections.

Summary of Missing and Needed Features A number of the surveyed institutions indicated that Specify was missing one or more features they considered necessary for them to migrate their collection(s). It was also clear that some institutional contacts had done only a preliminary investigation of the suitability of Specify to their needs.

One of the aims we set ourselves for this project was to set out potential methods of support for the decision-making process within institutions. As part of this, we generated an article summarising the features missing and needed in Specify and published it in the project’s blog at http://bit.ly/otjNme. A link to the article (titled “Summary of Issues in Specify 6”) was then emailed to each of the contacts we’d developed previously. The intention of this article was to inform institutions of potential pitfalls should some begin a migration without a full analysis of Specify. Subsequently, Rod Spears (the lead developer of Specify) and Niels Klazenga (National Herbarium of Victoria) added some important comments to the article. Rod Spears’ replies made it clear that they had requested funding to resolve a number of these issues but had not received enough funding to do so. Specify Software’s prioritisation process has so far caused the features we identified to be put aside so that those with a higher priority (perhaps those more relevant to US institutions) could be implemented.

As a result of the information we received from institutions, we were able to compile a list of the features that were missing in Specify but that were considered a priority by at least one Australian institution.

To prioritise the features that were missing in Specify, we asked institutions to complete a questionnaire by email. Answers to the following question enabled us to rank the features using a first-past-the-post voting method:

“In relation to your collections database(s), which 3 software issues (of those available at http://bit.ly/otjNme, or others) do you consider the highest priority to be fixed?”

A chart summarising the 11 responses is presented in Figure 2.

Page 16: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

16

Figure 2. Priority Specify features as determined by Australian institutions.

Page 17: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

17

Collaboration Pathways There are two main collaboration pathways—through Specify Software, or through collaboration with other Australian developers.

Specify Software The Specify Software project is funded by the Biology stream of the US National Science Foundation (NSF). This arrangement imposes some constraints on what a grant recipient may do with the money, including:

the NSF Biology stream does not encourage Specify Software to directly add or develop non-biological features6;

Specify Software receives no explicit support from NSF to support non-US collections7. With 300 US and around 75 international institutions, the time it takes an Australian institution to receive support is likely to be reduced from that it may already receive from an Australian company;

features that could be implemented are being put on hold because the level of funding being attracted to the project is insufficient.

Generally, while Specify is supportive of international collaboration, there are problems with this pathway for collaboration.

In an email communication with Piers Higgs, Jim Beach of Specify Software noted that Gaia Resources could not sub-contract Specify to undertake work due to complications with their core US NSF funding. While Specify remain in favour of international collaboration, their participation in this would jeopardise their core funding. This directly affects the previously mentioned Western Australian Museum project to add support for cultural collections to Specify.

When asked for costings for the development of additional features, Specify Software noted in our communication with them that:

co-development, with programmers sitting in offices in Australia, is something they are interested in pursuing, as they already do this with various other projects;

every so often, 1–2 weeks of face-to-face time is useful for managing the project;

short-term programming contracts, i.e. less than 1 year, will be more difficult for them to manage.

Given the remaining time-frame and budget of the ALA, it was not possible to continue any further collaboration down this pathway.

Other Australian Developers Gaia Resources was the only group that was interested in the “Train the Developer” courses that were offered earlier in this project. This may be interpreted as a general lack of willingness to

6 “..support biological collections only with [..] funding [from NSF]”—Jim Beach (Specify Software), pers. comm., 17 August, 2011. 7 “We also receive no explicit support from NSF for supporting non-U.S. collections...” Jim Beach (Specify Software), pers. comm., 17 August, 2011.

Page 18: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

18

undertake development in the Australian community, although it should be noted this call was only put out to those who attended the workshops.

Gaia Resources were the successful tendering organisation with the Western Australian Museum to extend Specify for cultural collections, and as part of this project will be developing significant understanding and knowledge of Specify, and would be prepared to also work with other institutions and other developers to build a knowledge base in Australia.

However, developing additional Specify expertise and providing additional services to the community is not something that Gaia Resources would do without a budget and resources to undertake this work.

Page 19: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

19

A Way Forward for the Atlas of Living Australia Given the remaining funding for this project, we have identified two possible ways forward for this project in the current ALA funding cycle, namely:

1. Continued support for the blog, and

2. Potentially a small trial of targeted support.

We have also identified three main ways for this project to continue, should the ALA receive additional funding and be recommissioned at the end of the current funding cycle. These are:

1. Development of missing features,

2. Targeted support to interested parties, and

3. Other potential sources of support.

Details of these are provided in the sections below.

Continued Support for the Blog Given the funding available to this project, one of the steps that is available is to provide continued support for the blog (http://alacollections.wordpress.com/), where the authors of this report can write and solicit articles from the community. Other groups (notably CSIRO’s IM&T group, and Gaia Resources) will continue to move ahead with new implementations of Specify, and the experiences of these groups could provide additional material for the blog, and continue to engage those interested in Specify in Australia. This would be a minor activity associated with the authors’ ongoing work in the Australian collections community.

Targeted Support Trial Should the remaining funding be adequate in this project, then it may be possible to undertake a small targeted support trial for an institution. This would involve finding a small institution that is willing to move to Specify, and to undertake an installation of Specify for them. A proposed project plan for this could include:

Ensuring that the institution is aware of the limitations of Specify, and the bigger picture for the Specify community,

Installing Specify in the organisation,

Undertaking a review of the existing data in preparation for a move to Specify, and working with the organisation to ensure that the data is brought across to Specify,

Importing the existing data to Specify,

Ensuring that the staff at the organisation have been trained in the use of Specify, and have available support for the rest of the ALA project (or as agreed).

As a rough estimate, Gaia Resources indicated that to move a small, denormalised database for the Western Australian Museum from a Microsoft Access database through to Specify took approximately two weeks of 1 FTE. This did not include data cleaning, training (including the need to develop training materials) or other processes, such as those undertaken by the National Herbarium of Victoria, which would add significant time to the project. A rough timeline of three months from start to finish is envisaged for the eventual completion of this project.

A good candidate for this trial would be a small collection based around an Access database that is not already funded separately, and has indicated an interest in moving to Specify.

Page 20: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

20

Development of Missing Features As detailed in section “Barriers to Entry” Australian institutions would like a number of features added to Specify before they consider it to be ready for their collections.

Once it was clear that Specify was of interest to Australian institutions, we sought to cost some of the most popular items listed in our “Summary of Missing and Needed Specify Features” in Appendix 3. The items and an approximate cost in development time are provided below.

Oracle Support

Resources: 6 months for 1 programmer, 3 months for 2. Employing further programmers would probably not result in a quicker outcome.

Batch Editing

Resources: none, Specify Software are working on this now, and it should be delivered by early 2012.

External Taxonomies

Resources: 6 months full-time for an experienced developer; completed in a manner suitable for use by other name services, such as ALA’s National Species Lists (NSL) project.

It was agreed that this feature request would be limited to the automatic synchronisation of the taxon tree in Specify with that of an external application. The determinations assigned to individual specimens would not be changed automatically, as changing the determination of one or more specimens is commonly a human decision, or is an event that forms part of a specimen management work-flow.

An external taxonomy source would need to be implemented as a web service, in the absence of a push messaging architecture, and Specify would interact with this source using a polling methodology. It would check for changes and then provide the user with some interactive controls for accepting or rejecting the changes as entries into their taxon tree.

Record Import Limits

Resources: 3–4 months full-time for an experienced developer.

“OR” Queries

Resources: Possibly 2 months.

It was not possible to properly define the work to be done for this item, given the lack of understanding of Specify in Australia. It is possible in Specify to perform a logical “OR” query within fields now using the “IN” operator in the Query tool. It is likely that institutions were interested in querying between two or more fields. Specify Software feels they might be able to make this work with the “ANY” operator, but a clearer understanding of the feature request is needed from our end before Specify Software can firm up the quote here.

As a means of summarising this, a table outlining these development tasks is included below, with an approximate cost for a developer included to provide a financial estimate as well as a time estimate. The included hourly rate of $100 is an attempted balance between senior and junior rates, and is chosen as an approximation of the actual cost; this is not a quoted figure from Specify or any other institution.

Page 21: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

21

Table 2. Estimated cost of development tasks.

Task Estimated Timeframe (1 FTE)

Estimated cost ($100/hr)

Oracle Support 6 months $96,000

Batch Editing Already under development

$0

External Taxonomies 6 months $96,000

Record Import Limits 4 months $64,000

“OR” Queries 2 months $32,000

TOTALS 18 months $288,000

Targeted ALA Support From the outset, it was the opinion of many of the interested parties that ALA should directly support the implementation of Specify in various institutions. However, it was clear that the funding being made available was not going to make this possible. Also, the size and nature of the suggested collection databases, whether the data in the collection was supported by Specify, the time-line for ALA project itself, and the availability of developers meant that in many cases direct migration of institutional collection databases would be difficult or impossible. As a direct example, seed bank and cultural collections are not yet supported by Specify, so should not be considered for this funding.

To make it possible to support the community with a limited budget, criteria could be used to rank institutions by their need for ALA support. We do not have the funding to undertake any actions resulting from such a prioritisation, so we recommend that this prioritisation is done as a first step should this be taken forward into a new funded project.

Firstly, it is useful to separate the collections into several broad groups:

Group 1: Medium-large, well-maintained and supported collections. Funding is commonly found within the institution to maintain the content and the structure of the database. More than 5 staff edit the database daily, and its content is available via the web.

Group 2: Small- or medium-sized collections with little or no technical support. An off-the-shelf package such as Microsoft Access or FileMaker Pro is used rather than an enterprise-grade DBMS. The database structure or controlled vocabularies may not have been upgraded in several years. The database administrator, curator and/or technical officer are in some cases the one person.

These groups will have different needs for ALA support:

Group 1: An interest in plug-in development to enhance the collection’s interaction directly with the community, such as Annotations Support.

Group 2: An interest in having someone migrate the entire collection into Specify and be involved in training workshops to help current staff make the most from the new environment.

Criteria that would usefully separate institutions include:

size of collection;

size of databased portion of collection;

years since the database structure and/or controlled vocabulary was last changed;

number of staff directly involved in the maintenance of the collection data;

Page 22: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

22

whether an upgrade to the database software in use is available but has not yet been applied;

whether the collection is a seed bank or is in some part cultural, as Specify is not capable of managing these collections.

The aim of any criteria would be to separate the well-managed collection databases from those in desperate need of support. It may be the case, however, that the more poorly-managed databases need not only to be upgraded, but to become a part of the wider institution’s ICT policy.

The required funding for this approach would be dependent upon the individual institutions. It would likely also be able to be determined from the “Targeted Support Trial” project outlined above.

In regards to actual implementations—if the ALA was to receive additional funding, and ALA partners deemed additional support for Specify (and/or other CMS’s) was a priority—the ALA could provide additional support for overall implementation. In general, drawing upon the experience of the National Herbarium of Victoria, any CMS project would include:

Business case establishment;

Project Plan – including time and resources for:

Requirements gathering;

Gap analysis – between requirements and CMS capability;

Data Analysis – including data migration plan;

Migration plan – including training, actual switchover planning;

Maintenance plan;

Initial Pilot;

Conduct final implementation and switchover;

Project review.

Other Potential Methods of Support

Data Schema and Migration Assistance

A new funded project, could assist with the migration process without performing the migration directly. This support would only be beneficial to those institutions with an in-house technical resource capable of performing the migration. The project staff might explain how to best map a database structure to that of Specify, for example. We made a start in the blog on this, providing an initial posting on mapping columns, see http://bit.ly/oqosac. Similar work with other standards are being undertaken by other staff within the ALA on tasks such as seed-banking and mobilising data from collection institutions, and this may be a logical extension.

Strategic Roadmap for Australian Research Infrastructure

Senator the Hon Kim Carr, Minister for Innovation, Industry, Science and Research recently announced the release of the 2011 Strategic Roadmap for Australian Research Infrastructure. This roadmap notes that “a Digitisation Infrastructure capability will be implemented by assembling state-of-the-art digitisation technology and expertise to provide high-throughput digitisation services to the Australian research community to achieve priority research outcomes”. This may encourage or provide means to access future funding for digitisation efforts, but is not a source of funding itself. The Roadmap can be found at http://www.innovation.gov.au/Science/ResearchInfrastructure/Pages/default.aspx.

Page 23: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

23

Appendices

Page 24: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

Appendix 1. Business Case developed by the National Herbarium of Victoria

Page 25: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc  12/6/2011 

Future­proofing MELISR Toward a robust and standards­compliant herbarium management system 

for the National Herbarium of Victoria 

 

Executive summary MELISR is the collections database of the National Herbarium of Victoria. MELISR has not undergone any major development in the last 8+ years, in order not to interrupt data entry for Australia’s Virtual Herbarium  (AVH).  Consequently,  there  are  now  many  outstanding  issues  that  need  a  major redevelopment effort to resolve. These  include unreliability of the back‐up system,  issues with data  capture and data integrity, and issues with data delivery to AVH. 

In  order  for MELISR  to  optimally  fulfill  the  business  needs  of  the  Plant  Sciences  and  Biodiversity Division (PS&B), MELISR also will need to incorporate the loans and exchange database and the MEL herbarium census, as well as efficiently interface with nomenclatural or taxonomic databases such as VicList,  APNI  and  the  Australian  Plant  Census  (APC),  and  be  able  to  communicate with mapping software  such  as  ArcGIS.  As  not  all  these  objectives  can  be  met  under  the  current  database management system, KE Texpress, other database options have been investigated. 

It  is  concluded  that  the  Royal  Botanic  Gardens  (RBG)  does  not  currently  have  the  expertise  to develop  its own herbarium management  system.  In any  case,  this option would  take  the greatest development effort and lead to the largest loss of productivity during development, while there are pre‐built collections management systems available, some of them at little or no cost. 

KE EMu  is the successor of Texpress and still uses Texpress as  its back‐end database, and therefore would  represent  a  logical  upgrade  path.  However,  it  is  very  expensive  and  has  a  history  of implementation problems at other herbaria. As EMu is designed for all kinds of museum collections, customisation cost also would be maximal. 

BRAHMS is especially developed for herbarium management, is free, and would probably require the least customisation to accommodate the MELISR data structure. However, BRAHMS uses a soon to be phased out database management system, and it is unclear how BRAHMS will develop after that. Because  of  the  rather weak  back‐end  database  there  are  issues with  robustness,  scalability  and extensibility. 

Specify  is developed  for  natural history  collections,  including herbaria  and  natural history musea. Specify has  a highly  structured,  standards‐compliant data model  and was  found  to be usable  and robust.  It  is also  the only system  that  is completely open source and hence can be adapted  to our future  needs.  Implementation  of  Specify  requires  only  minimal  changes  to  existing  Information Services  (IS)  infrastructure.  It  is  recommended  that  the RBG  adopts  Specify  as  its new herbarium management system. 

Converting to a new collections management system will involve significant commitment of time and resources to retrain existing staff who use MELISR.  In order to minimise  loss of productivity during the  implementation  phase  it  is  recommended  the  new  herbarium  management  system  be implemented prior to initiating any further large‐scale data capture projects, such as databasing the foreign collection or undertaking a specimen imaging program. On a broader scale, it is a good time to  upgrade  our  database  system  as  TDWG  standards  are  now  at  a mature  stage  and  are  being adopted globally. 

 

  1 

Page 26: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc  12/6/2011 

Background 

The MELISR  database  supports  the  primary  business  of  PS&B. MELISR  has  been  running  on  the Texpress DBMS since its inception in the early 1990s. The Texpress DBMS (produced by KE software) is  simple  and  efficient  and  at  the  time was  adopted  by most Australian  herbaria. However, with increasing requirements for data sharing between herbaria and the emergence of global biodiversity data standards, the limitations of the Texpress DBMS are becoming apparent. 

The MELISR database design was largely borrowed from other database systems in existence (namely CANB  and NSW)  leading  to  some  parts  being  less  than  optimal  for MEL’s  requirements.   Despite these shortcomings, MELISR has not had any major development work done for the last 8+ years in order to minimise disruption to the Australia’s Virtual Herbarium (AVH) databasing project. There are many outstanding issues with the database that will need a considerable development investment to resolve. It is therefore prudent to consider whether this effort would not be better directed toward upgrading MELISR to another database platform. 

The MELISR database is a specimen database only; its simplistic structure cannot incorporate all the specimen‐related  or  name‐related  information  and  curatorial  tools  used  in  the  herbarium.  In addition  to MELISR,  the  herbarium maintains  the  Loans  and  Exchange  database  (an MS  Access database), the Census database (MS Access), the VicList database (MS Access), a table of scheduled taxa (Texpress and MySQL) and an Authors of taxonomic names database (MS Access).   The  limited querying capability of Texpress, and the  inability of Texpress to directly  interface with the software required  to present MEL’s  specimen data over  the  internet, have made  it necessary  to establish a duplicate  of MELISR  using  the  open  source MySQL  database.  This  is  an  inefficient  solution  that requires extra room on the server, resources to keep the two databases in synch and duplication of effort when implementing changes to MELISR.  

An  important  application  of MELISR  data  is  the  production  of  distribution maps.  The  inability  of Texpress to interoperate with GIS software makes mapping MELISR data very laborious. 

The  label printing program within Texpress  is primitive, and cannot be  tailored  to meet our needs without engaging programming assistance from KE Software. As there is often a conflict between the requirement  to  capture  data  and  the  requirement  to  print  data,  the  inability  to  customise  label printing in MELISR results in certain data (e.g. quarantine messages or acknowledgements of funding support)  needing  to  be  edited  in  to  or  out  of  records  either  before  or  after  printing.  A  more sophisticated, and easily‐customisable, printing system would allow the requirements for curatorial and specimen‐related label data to be managed more consistently and more efficiently. 

The reporting program in Texpress also leaves much to be desired. New reports within MELISR need to be  individually programmed, and several curatorial reporting requirements are handled by other programs  (such as  the MySQL  copy of MELISR,  the Census database and  the  Loans and Exchange database)  due  to  the  deficiencies  of  the  Texpress  system.  Loan  information  is  only  recorded  in MELISR  records  for  the  duration  of  the  loan, making  it  impossible  to  track  the  loan  history  of  a specimen. The absence of loan history data for individual specimens means that the MEL collections cannot be accurately audited  to meet  reporting  requirements  (such as  those of  the bi‐annual AVH Board report).  

Our past experience with the Texpress system has demonstrated that  it  lacks robustness, making  it susceptible  to data  loss.  In March 2006, a  low‐level hardware error  resulted  in  the  loss of  several records. The cause of the problem was difficult to pinpoint and remedy, and resulted in a three week disruption  to  data  entry  and  retrieval  at  a  time  when  ten  staff  were  employed  specifically  to undertake data entry. As well as resulting in significant loss of productivity, this problem highlighted the  inadequacy  of  the  Texpress  backup  system;  not  all  the  lost  records  could  be  recovered  and restored.  

  2 

Page 27: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc  12/6/2011 

In addition to the technical deficiencies of the system, the current data structure is too simplistic to allow  for  accurate  capture  of  taxonomic  information,  determination  history  and  geographical information, which  reduces  the utility of our specimen database and compromises  the  integrity of our data. Texpress imposes constraints on the way that data can be structured, with all information for  each  object  (or  specimen)  recorded  in  the  one  database  record.  This  creates  single  records containing  large numbers of  fields, with  the same  information entered numerous  times across  the different records, resulting in inefficient data entry. As well as improving the way that specimen data is  recorded,  the  database  needs  to  be  able  to  track  data  validation  efforts  and  allow  for  better documentation of changes to records. Appendix A  lists some of the  improvements that need to be made  to  existing  fields  in  MELISR,  as  well  as  suggested  new  fields  that  would  improve  the searchability  and  standards‐compliance  of  the  database.  Many  of  these  suggestions  have  been requested from staff and external clients.  The key areas requiring improvement are expanded upon below. 

One critical area  that  requires  improvement  is  the way  that  taxonomic  information  is entered and stored  in MELISR.   The  flat data  structure of  the Texpress  system means  that  taxonomic names – which  should  ideally  stand  alone –  cannot be  separated  from  the  specimen‐related data  (such  as determination  annotations  and  hybrid  information)  associated with  individual  records.  Individual components of names are  currently entered by  the database operator  from  look‐up  tables. While this partially restricts the content of the taxonomic name fields, the tables are easily edited by any user with data entry privileges, so there is much scope for errors in data entry and inconsistencies in the application of names. The  inadequacy of MELISR  in dealing with uncertain determinations and cultivar, hybrid and informal names, combined with the inability to adequately restrict the content of the  taxonomy  fields,  reduces  the  quality  and  reliability  of  the  name  data  associated  with  our database records. 

The current approach to recording taxonomic names makes it necessary to maintain a separate list of names  that  reflects  the  content  of  MEL’s  collection  (the  Census  database).  This  represents  a considerable duplication of effort, which could be eliminated by using a comprehensive herbarium management  system based on an authoritative  list of names,  rather  than a  collection of  separate databases each with their own name lists. A major benefit of having MELISR linked to such lists is that it would allow searching by synonyms, which  is a powerful tool both for specimen curation and for data interrogation. 

As well as improving the way that current names are recorded in MELISR, it would be valuable from both a  taxonomic and curatorial point of view  to  record  the determination history of a  specimen. Currently,  there  is  no  facility  in MELISR  for  recording  original  determinations  and  subsequent  re‐determinations. Although additional fields could be added to the new system, they would suffer the same  shortcomings  as  the  existing  taxonomic name  fields,  thus determination  histories would be better handled by a relational database with an underlying taxonomy table. 

Another major weakness of  the  current MELISR database  is  that  specimen  records  from  the  core collection (the specimen component of the State Botanical Collection) cannot be distinguished from records  from non‐core collections such as  the Victorian Reference Set,  the Horticultural Reference Set and the Victorian Conservation Seedbank. It is important that these collections are kept separate on  the  database  so  that  data‐retrieval  for  loan  enquiries,  electronic  data  requests  and  specimen retrieval reflects the location and accessibility of the specimens. Database records from the Victorian Conservation Seedbank may not be associated with a vouchered specimen, which undermines  the integrity of MELISR as a collections database based on verifiable specimens. While it makes sense to use  the  same database  to  record data  for  these distinct  collections,  the  structure of  the Texpress system is too simple to reflect the different purpose, location and accessibility issues associated with these collections. 

  3 

Page 28: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc  12/6/2011 

The  shortcomings  with  the  current  version  of  MELISR  outlined  above  could  be  overcome  by employing  a  comprehensive,  relational  herbarium  management  system  that  encompasses  a specimen database,  loans and exchange database, scheduled taxa database and herbarium census. The  taxonomic  tables  that would  be  the  basis  of  the  herbarium management  system  could  also encompass the VicList database, thus reducing duplication of effort in taxonomic name management at the RBG and streamlining curatorial practices associated with keeping VicList data up to date. 

While  many  of  the  desired  improvements  to  MELISR  require  a  more  sophisticated  database management  system,  some  improvements  to  the  handling  of  primary  collecting  data  could  be implemented  in the existing Texpress system. However, given the considerable shortcomings of the existing system and the effort required to replicate any changes  in the duplicate version of MELISR, this  development  effort  would  be more  judiciously  applied  to  incorporating MELISR  into  a  new herbarium management system, rather than adding further workarounds to the existing database. 

Options 

1. Keep existing Texpress system The Texpress application currently used for MELISR is no longer widely used and is likely to become obsolete. The only technical support available is from KE software and this is unlikely to be available in the future. 

The  standard  data  exchange  protocols  used  to  link  in  to  global  biodiversity  initiatives  (such  as BioCASE and TAPIR) cannot interface directly with Texpress. Keeping the existing Texpress system for our Collections database puts the RBG at risk of not being able to participate in emerging biodiversity information initiatives such as GBIF, ALA and EOL and poses a risk to the organisation’s reputation as a quality data custodian. 

The  loss of corporate knowledge associated with using a near‐obsolete system  is a great risk to the organisation. The administration and maintenance of the Texpress system requires a large amount of specialised  skills  and  experience  that  cannot  be  easily  replaced.  The  archaic  nature  of  Texpress means that skills developed in any other modern database system will not be transferable. 

There may also be financial ramifications for the RBG if we persist with Texpress, given the potential for licensing and support costs to increase due to the small number of users. Texpress development costs are also very high, as has been experienced in the past, and these costs are likely to rise in the future. The RBG currently pays $6300 per annum  for 25 user  licenses. Although  this cost could be reduced to $4800 for 15  licenses, the cost of purchasing new  licenses when needed  is much higher than the amount saved this way. Any redevelopment in Texpress will require purchasing the TexAPI package at a cost of $18,000 plus $2315 per annum for licenses. 

2. Change to a custom­built collections management system One  option  is  to  develop  a  custom  collections management  system.  The  greatest  benefit  of  this option  is  that  it would  be  specifically  tailored  to  our  needs.  Along with  this  benefit  come  risks associated with using a system that is not widely used, and thus doesn’t have a global community of users and developers. The development of such a system would be expensive in terms of staff time, and would require that staff time is taken away from other duties. 

At the implementation stage, there is the risk of exceeding the anticipated development cost and the cost of data migration  from Texpress  to  the new  system. Also,  the  temporary  loss of productivity during  the  changeover  period  would  be  greatest  with  this  option.  All  future  support  and development would most likely have to come from within the organisation. The RBG do not have the expertise to develop a custom‐built front end in‐house. 

  4 

Page 29: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc  12/6/2011 

3. Change to existing collections management system The  third  option  is  to  transfer  MELISR  to  an  existing  collections  management  system.  This  is potentially  an  expensive  option,  but  the  cost  of  different  collections management  systems  varies enormously. The major benefit of this option is that it will allow us to select a proven system that has a global community of users and developers. It will also be less expensive in terms of staff time than developing a custom‐built system. 

Risks associated with  this option  include high development/customisation, migration, and  training costs.  These  costs  can  be minimised  by  choosing  a  system  that most  closely meets  our  needs, choosing a database management system we are already familiar with, and choosing a system with a user‐friendly interface. There is also the risk of data loss during migration, which may be reduced by keeping good back‐ups and by  rigorous  testing of  the data model of  the new application. We also need to minimise the risk that the new software will not be supported  in the future and make sure the database can be modified and extended to meet future business needs of the RBG. 

3.1. KE EMu KE EMu  is a Windows‐based collections management application, designed  for all kinds of musea, that uses Texpress as the back‐end database. While this is a powerful system and represents a logical upgrade path,  it has  serious  limitations.  It  is  very  expensive  and has  a history of  implementation problems with regard to herbaria and botanic gardens  (e.g. NSW and BM).  It  is also closed source, meaning that any customisations will have to be performed by KE Software and will be costly.  

Table 1. Features of KE EMu 

Operating system   

Front‐end  MS Windows 

Back‐end  Linux 

Open source  KE EMu is completely closed source; any customisation  will have to be performed by KE Software. 

Back‐end database  KE EMu uses KE Texpress as its back‐end database. This is the same database MELISR currently uses, but with some extensions. 

User‐friendly interface  Yes 

Customisability  KE EMu is fully customisable, but this customisability comes at a cost as it has to be carried out by KE Software. As EMu is designed for use by all kinds of musea customisation needed will be more than in the other applications considered. 

Scalability  EMu scales well. 

Extensibility  EMu is fairly self‐contained; extension is possible, but will have to be custom‐designed by KE. 

Interoperability  Native interoperability is poor 

Support  Support is probably good, but expensive. One would expect some support will come with the licenses. However, a large part of the problems that other herbaria have had with EMu is likely to have been caused by communication problems between herbarium people and application developers. 

Startup costs  $130,000 (25 licenses plus initial customisation) 

  5 

Page 30: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc  12/6/2011 

Ongoing costs  $120,000 pa (25 licenses) 

url  http://www.kesoftware.com/content/view/512/356/lang,en/ 

 

3.2. BRAHMS BRAHMS  is  a  specialised  herbarium  collections  management  system  developed  by  the  Oxford University  Herbaria  and  used  by  many  herbaria  all  over  the  world,  for  instance  the  National Herbarium of The Netherlands  (L, WAG). BRAHMS was  found  to be  rich  in  features, but  lacking  in usability,  robustness and extensibility. BRAHMS has  just undergone a major  redevelopment, but  is still  built  around  the  same  DBMS, Microsoft  Visual  FoxPro, which  is  not  an  enterprise  database. BRAHMS will only run on Microsoft Windows platforms.  

Table 2. Features of BRAHMS 

Operating system   

Front‐end  MS Windows 

Back‐end  MS Windows 

Open source  BRAHMS is closed source 

Back‐end database  BRAHMS uses Microsoft Visual Foxpro. FoxPro is a legacy DBMS no longer actively developed by Microsoft and will not be supported after 2014. 

User‐friendly interface  No 

Customisability  BRAHMS has limited customisability. However, the application is specifically designed for herbaria, so only little customisation will be necessary to accommodate the MELISR data model. 

Scalability  BRAHMS scales poorly, mostly due to its rather weak back‐end database. 

Extensibility  BRAHMS is fairly self‐contained. However, it is very feature‐rich and is specifically designed for herbarium management, so the data model should be sufficient to accommodate all necessary fields at least for the near future. 

Interoperability  BRAHMS has built‐in operability with some other applications, such as ArcView and DIVA‐GIS. The file format in which it saves its data (.DBF) can be read by some Windows applications. An extension for online publishing is available. The National Herbarium of the Netherlands is a member of EDIT and BioCASE and delivers data to GBIF, so dynamic data delivery through a BioCASE provider must be possible. 

Support  Support for BRAHMS is provided by the BRAHMS Project at the Oxford University Herbaria. As herbarium taxonomists are involved in the project, there should be no communication problems.  A support contract costs $US600 pa. 

Startup costs  – 

Ongoing costs  $US600 pa for support (optional) 

url  http://dps.plants.ox.ac.uk/bol/home/default.aspx 

  6 

Page 31: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc  12/6/2011 

 

3.3. Specify Specify  (University of Kansas)  is designed  for herbaria and  zoological musea and was  found  to be usable and robust. The Specify software is currently being completely rewritten and will be released as open source. The new version, Specify6,  is built  in Java using Hibernate to abstract the database layer and will therefore run on many database management systems, including MySQL, PostgreSQL, and Microsoft SQL Server, which all already run on RBG servers (MySQL is used for the web interface of  all  botanical  databases  and  for  the  AVH  interface,  as  well  as  advanced  querying,  of MELISR; MySource  Matrix,  the  Content  Management  System  for  the  RBG  website  uses  PostgreSQL; Hummingbird,  the  records  keeping  software  uses MS  SQL  Server).  Specify6  will  be  released  27 February 2009.  Specify has  a world‐wide user  community  and  is  currently used by 112  institutes, including  34  herbaria.  Development  of  Specify  has  been  supported  by  the  US  National  Science Foundation  for  the  last  twenty  years.  Judging  from  the  proceedings  of  the  2008  TDWG  annual conference, Specify is very much at the forefront of collections management systems. 

Specify has a strongly structured data model  that  is DarwinCore compliant  (GBIF uses DarwinCore) and therefore most  likely also ABCD compliant. Specify6 contains 138 tables and 1658 fields. While some fields in MELISR that are specific to MEL will not be already in the Specify data model, there are several blank fields which can be used for these. Given that the database layer is abstracted from the front end and the database management systems that can serve as back end are very powerful, we expect excellent scalability and extensibility, as well as  interoperability with other applications (e.g. GIS, electronic flora, image storage). 

Specify optionally comes with a fully customisable (using CSS only) web  interface. The Specify front end,  which  includes  all  the  forms  and  reports  (including  labels),  is  fully  customisable,  without requiring  programming.  If  in  future we want  to make  changes  to  the data model,  the  associated changes  in  the  front  end would  require  programming  in  Java.  However,  given  the  growing  user community we expect that changes in the data model necessitated by outside factors would be taken care of in new minor versions of Specify. 

  7 

Page 32: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc  12/6/2011 

Table 3. Features of Specify 

Operating system   

Front‐end  MS Windows, Mac OS 10 or Linux 

Back‐end  MS Windows, Mac OS 10 or Linux 

Open source  Specify is open source. Source code for everything except the label printing application can be obtained from the developers. Specify6 is entirely written in Java, which means that in future we will actually be able to make changes to the source code if necessary. 

Back‐end database  Because the front‐end user interface is separate from the back‐end database, Specify6 can use most of the major database management systems, such as MySQL, PostgreSQL, MS SQL Server or Oracle. Specify6 by default uses MySQL. 

User‐friendly interface  Yes 

Customisability  Specify 6’s Graphic User Interface is entirely customisable, with the possibility to choose fields, change the format or type of fields and even change field names (similar to forms in MS Access). 

Scalability  Because of the very powerful back‐end database systems Specify6 will scale very well. 

Extensibility  Specify has good extensibility. While extension of the data model at the back end is easy and only requires knowledge of SQL, the associated changes in the front end require more knowledge of Java than is currently available at the RBG. However, the Specify data model is very rich, with many customisable fields, so should easily be able to accommodate all MELISR fields at least in the near future. 

Interoperability  Specify comes with a web‐interface (which we may not use) and a DiGIR provider (which we definitely will not use). MySQL interoperates very well with PHP for dynamic web applications and, through the MySQL ODBC, with MS Access and ArcGIS. 

Support  Specify is free and offers free support to registered users. While priority support is given to US institutes, Specify is happy to provide support to non‐US institutes as resources allow. Part of the support is migration of data into Specify. With Specify5.2 there was a waiting time of 2–3 months between registration and migration. We expect waiting times to be longer once Specify6 is released, as existing Specify users will need to have their data migrated as well. 

Startup costs  – 

Ongoing costs  – 

url  http://www.specifysoftware.org/Specify; http://specify6.specifysoftware.org/ (temporary Specify6 website) 

 

  8 

Page 33: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc  12/6/2011 

Recommendation Given  the  Texpress  system  is  nearing  obsolescence  and  changing  to  a  custom‐built  system  is  an inefficient  and expensive option, we  recommend  changing  to  an existing  collections management system.  Based  on  our  comparison  of  existing  collections  management  systems  we  recommend upgrading  to  Specify. We  finally  recommend  a  new  collections  database management  system  be implemented prior to initiating any further large‐scale data capture projects, such as databasing the foreign collection or imaging of herbarium specimens. 

Of the three collections management systems considered KE EMu was discarded as an option early, because of  its high  implementation, customisation and  licensing cost and because of  the problems other herbaria have experienced with  it. BRAHMS was found to be feature‐rich and, because  it was designed  especially  for herbaria, BRAHMS’ data model  is  currently probably  the most  compatible with  the  structure  of  MELISR.  However,  we  are  concerned  about  the  back‐end  database management system BRAHMS uses, and that because of  its  limited scalability and extensibility, we will not be able to adapt BRAHMS to meet the RBG’s future needs. 

Specify has the ability to employ a very powerful database management system and can therefore make use of the DBMS’ back‐up and security facilities. It has a highly structured data model that can include most MELISR  fields as  is, and all  fields after  some modification. Specify comes with a very user‐friendly,  fully  customisable  front end and with extensions  such as a web  interface and DiGIR provider.  The  worldwide  diverse  user  and  development  community  guarantees  that  Specify  will adapt to future needs better than the other systems. Also from an infrastructure perspective Specify fits best, as all required  infrastructure  is already  in place at  the RBG  (nevertheless a more detailed analysis of infrastructure requirements will be part of the project planning). 

We would like to emphasise that while upgrading to a new collection management system is urgent in order to safeguard the quality and integrity of our collections data and the RBG’s reputation as a quality  data  custodian,  the  process  is  not  going  to  be  painless.  The  implementation  of  a  new collection  management  system  will  require  a  large  time  commitment,  especially  from  the Programmer,  Information  Services  and  Collections  Information Officer,  and will  affect  all MELISR users.  In  order  to  ensure  data  integrity  it will  not  be  possible  to  run  the  old  and  new  systems concurrently  and  therefore  MELISR  will  not  be  accessible  for  a  period  of  time  during  the implementation  phase. Also  the  loans  and  exchange  administration  system will  not  be  accessible during this period as it will be included in the new collections management system. 

The temporary loss of productivity during the implementation period will be more than made up for by improved efficiency and increased productivity once the new herbarium management system has been  implemented. On  a  broader  scale,  this  is  a  good  time  to  upgrade  our  collections  database system, as  international data standards have come of age and only minor changes are expected  in the near future. 

Appendix B describes an implementation roadmap that aims to ensure the implementation period is as short as possible and to minimise loss of productivity during this period. 

 

  9 

Page 34: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc  12/6/2011 

 

Table  5.  Summary  of  features  of  the  different  herbarium management  systems  considered.  The current KE Texpress collections management system is included for comparison. 

  KE Texpress  KE EMu  BRAHMS  Specify 

Operating system         

Front‐end  Linux shell emulator 

Windows  Windows  Windows, Linux or Mac OS X 

Back‐end  Linux  Linux  Windows  Windows, Linux or Unix 

Open source  No  No  No  Yes 

Back‐end database  Texpress  Texpress  Microsoft Visual Foxpro 

MySQL,  PostgreSQL, MS SQL Server, Oracle or any other server‐side DBMS 

User‐friendly interface  No  Yes  No  Yes 

Customisability  Poor  Good, but very expensive 

Good  Good 

Scalability  Poor  Good  Poor  Excellent 

Extensibility  Poor  Good, but very expensive 

Poor  Good 

Interoperability  Poor  Poor, but with ample inbuilt functionality 

Good  Good 

Support  Very limited1  Good  Good  Good 

Startup costs  N/A + $18,000 (TexAPI)

$130,000  – 

 

– 

Ongoing costs  $6,300 pa + $2,315 (TexAPI) 

$102,000 pa  $US600 pa2   

1 Support  for Texpress  is very  limited as Texpress as a stand‐alone application  is being phased out and replaced by EMu. 

2 $US600 pa is for support, there are no licensing costs. 

 

  10 

Page 35: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc  12/6/2011 

Appendix A  — Summary of suggested improvements to MELISR  

Field/s  Suggested improvements 

Taxonomy fields  Link specimen names to an authoritative table of taxonomic names. Information related to individual specimens (determination annotations, qualifiers and hybrid information) to be stored with individual records, rather than with names.  

Improve handling of higher level taxonomies, particularly for fungi and algae 

 

Determinations  Include original determination and subsequent determination history.   

Collector/Add. coll.  Create separate fields for recording verbatim label data and standardised data, which would link to an authoritative list of collectors. 

 

Collecting date  Add a memo date field to record non‐standard collecting dates, e.g. 13‐17 June; late March; Spring; Christmas etc. 

 

Additional collector  Add new fields to record collecting numbers of additional collectors.   

Geocode  Allow for recording of geocode as originally provided (DMS or decimal). 

Add new fields for recording AMG references, and enable autoconversion of AMG to geocode. 

Add a new field for recording error measure when provided by collector. 

Improve handling of geocode source data.  

Cultivated data  Improve handling of locality data for cultivated records. Currently, provenance data is entered in the Notes field, and cultivating locality details are entered in the locality fields (minus geocode).  Need to record both cultivated and provenance locality data in a way that allows them to be queried and mapped (or excluded from queries or mapping) on request. 

Add new fields to cater for Plant Occurence and Status Scheme values.  

Unit relationship field  Add new fields to record the range of relationships between herbarium sheets. Currently, the only way of recording a relationship between one or more herbarium sheets is to multisheet them. Need to convert the multisheet field to a unit relationship field that allows for other types of relationships to be recorded (e.g. cultivated seedling and wild‐collected parent plant; uncertain links between foreign specimens).  

 

Duplicates/Specimen Received from/Original herbarium (for images) 

Apply a restricted vocabulary to these fields to prevent the inclusion of non‐standard entries. 

 

Protologue  Separate the publication title and the page and date citation into two distinct fields so that publication title can be linked to an authoritative list of names (e.g. BPH/TL‐2) to avoid incorrect and inconsistent entries. 

 

Precision  Check that our precision code values are sensible and add a built‐in guide to help data entry personnel to use them correctly. 

 

Depth  Stop decimal places from being automatically appended to values in this 

  11 

Page 36: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc  12/6/2011 

field as it gives an inaccurate impression of the precision of the measurement. 

 

Notes  Divide this field into two (or more) categories to allow the collector’s notes to be recorded separately from annotations on the specimen and curatorial or explanatory notes added by the database operator.  

 

Managed habitats  Improve plant occurrence fields to better document specimens collected from managed habitats, e.g. those that have been self‐established in a Botanic gardens context, and should not be treated either as wild‐collected or as cultivated. 

 

Validation level  Add new fields to represent the level of validation of a database record (includes identification, geocode, distribution, predictive distribution).  

 

Images  Differentiate between images of the sheet and images of plant in its habitat etc. Also need to be able to record when we have produced a digital image of a specimen to send to another institution. 

Add a field to record file paths or URLs for digital images.  

Vic. Ref. Set  Improve flagging of Vic. Ref. Set specimens. Vic. Ref. Set specimens are currently listed as duplicates, despite the Vic. Ref. Set not being an official, accessible collection. It would be better to flag these records in a different way than duplicate specimens are flagged.  

 

Type status determination 

Move type status determination data. This information is currently stored in Extra Info., but would be better stored as part of the determination history, with type status of the determination recorded. 

 

Verbatim label field   Add a new field to record verbatim label data for foreign‐language labels. 

Allow for unicode characters to be captured so that foreign‐language data can be recorded more accurately (not possible in Texpress). 

 

Original language field  Add a new field to record the original language that a label is written in (if non‐English). This will allow ease of searching in the event that we want to query for language (e.g. for batch translation of labels). 

 

Global gazetteer  Link to a global gazetteer for ease of geocoding foreign collections. e.g. GEONet Names Server files 

 

Library catalogue no.  Add fields to enter call numbers for additional information stored in the library, e.g. letters, photos, colour transparencies etc.  

 

Quarantine notes  Add a new field to enter quarantine notes that are not printed on any labels or exported, and are only used by curation staff.  Currently, these messages must be deleted prior to labels being printed, then re‐entered into the record. 

 

Destructive sampling  Add a new field to record when material has been removed for destructive sampling. 

 

Ethnobotanical information 

Add a new field to flag the presence of ethnobotanical data associated with a record. 

 

Indigenous name  Add a new field to record indigenous plant names when provided by the collector. 

  12 

Page 37: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc  12/6/2011 

 

Appendix B — Implementation roadmap  

The major stages required to implement this project are outlined below: 

Project planning 

Registration with Specify; 

Development of a detailed implementation plan, by September 2009. 

 

Comprehensive needs analysis 

Consultation with MELISR users  regarding  the current MELISR database structure  to determine what fields need changing, and what new elements are required; 

Mapping of fields in MELISR against the HISPID5 (ABCD) standard to ensure compliance; 

Preparation of draft MELISR data entry manual. 

 

Data preparation 

Mapping of MELISR fields against Specify data model; 

Performance of major quality assurance work on non‐compliant  fields  in MELISR  to make data migration to Specify as smooth as possible. 

 

Implementation and testing 

Installation of Specify on MEL server and work stations; 

Data migration; 

Comprehensive testing of the new system and revision of the MELISR data entry manual. 

 

Training 

Training  of  MELISR  users  in  the  use  of  the  new  system.  This  will  need  to  be  undertaken incrementally, starting with those staff whose work is most reliant on the database. 

 

Configuration of provider software 

Configuration of TAPIR and BioCASE providers for data delivery to AVH, the ALA and the GBIF. 

  13 

Page 38: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc  12/6/2011 

  14 

Appendix C — Glossary  

ABCD      Access  to Biological Collections Data  –  a  comprehensive  TDWG  standard  for access to and exchange of primary biodiversity data 

ALA      Atlas of Living Australia – a project funded under the Australian Government’s National  Collaborative  Research  Infrastructure  Strategy  (NCRIS)  to  develop  a biodiversity data management system for Australia’s biological knowledge  

ArcGIS      a  group  of  geographic  information  system  (GIS)  software  product  lines produced by ESRI 

AVH  Australia’s Virtual Herbarium – an online botanical  information  resource  that provides access to data associated with scientific plant specimens in Australia’s major herbaria 

BioCASE  Biological Collections Access Service  for Europe – a  transnational network of European biological data providers 

BioCASE provider  A  data  exchange  protocol  developed  by  BioCASE.  The  BioCASE  provider abstracts data from a database and turns it into standard format, such as ABCD or DarwinCore. MEL and AD (Adelaide) use the BioCASE provider to deliver data to AVH. Other similar protocols are TAPIR and DiGIR. 

BM  Acronym of The Natural History Museum, London (British Museum) 

BRAHMS  Botanical Research And Herbarium Management System 

CSS  Cascading Style Sheets – a stylesheet language used to apply formatting to web pages 

customisable  able to be customised to our particular needs, without the need for extensive programming 

DarwinCore   a  standard  designed  to  facilitate  the  exchange  of  information  about  the geographic occurrence of species and the existence of specimens in collections 

DBMS  database management system – examples of database management systems are: MS Access, MySQL, MS SQL Server 

DiGIR  Distributed  Generic  Information  Retrieval  –  a  client/server  protocol  for retrieving information from distributed resources 

EMu  Electronic  Museum  –  collections  management  software  developed  by  KE Software 

EoL  Encylopedia  of  Life  –  a  project  to  create  an  online  reference  source  and database for the 1.8 million named and known species on earth 

extensible  able to be extended and adapted: an extensible database allows for expansion of the data structure, i.e. additional tables and fields 

future proofing      the selection of physical media and data formats that best ensure the continued accessibility of data into the future. This process involves anticipating future developments and ensuring that only well‐documented formats, standards and specifications are used to store and describe data. 

GBIF  Global  Biodiversity  Information  Facility  –  an  international  organisation  that focuses on making scientific data on biodiversity available via the Internet using web  services.  The  data  are  provided  by many  institutions  from  around  the world;  GBIF's  information  architecture  makes  these  data  accessible  and 

Page 39: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc  12/6/2011 

  15 

searchable  through a single portal. Data available  through  the GBIF portal are primarily distribution data on plants, animals, fungi, and microbes for the world, and scientific names data. 

GIS  geographic  information  system  –  captures,  stores,  analyses,  manages,  and presents data that refers to or is linked to location. In a more generic sense, GIS applications  are  tools  that  allow  users  to  create  interactive  queries  (user created searches), analyse spatial information, edit data and maps, and present the results of all these operations. 

HISPID  Herbarium Information Standards and Protocols for the Interchange of Data – a  standard  format  for  the  interchange  of  electronic  herbarium  specimen information, initially developed by the Australian herbaria and later adopted as a TDWG standard. The current version – HISPID5 – is ABCD compliant. 

Java  an object‐oriented programming language 

MEL   acronym of The National Herbarium of Victoria 

MELISR  MEL  Information  System  Register  –  the  National  Herbarium  of  Victoria’s specimen database 

Microsoft SQL Server  a  relational database management  system, used as  the back‐end database of Specify5 

MySource Matrix  an open source content management system (CMS) written in PHP, used for the new RBG website  

MySQL  a powerful, open source, relational database management system 

NSW  The National Herbarium of New South Wales 

PostgreSQL  on object‐relational database management system 

PS&B  Plant Sciences and Biodiversity Division, Royal Botanic Gardens (Melbourne) 

RBG  Royal Botanic Gardens  

roadmap  a plan for change execution 

robust  able to withstand pressures or changes in procedure or circumstance 

scalable  able  to  handle  growth  without  having  to  replace  the  existing  platform  or architecture 

Specify  research  software  application,  database  and network  interface for  biological collections information 

TAPIR  TDWG  Access  Protocol  for  Information  Retrieval  –  a  computer  protocol designed  for  the  discovery,  search  and  retrieval  of  distributed  data  over  the internet 

TDWG  Taxonomic  Database  Working  Group  (also  referred  to  as  Biodiversity Information Standards) 

Texpress  an  object‐oriented  multi‐user  database  management  system  developed  by KE Software 

usability  the efficiency with which a user can perform tasks in a given application 

VicList  Census of the Vascular Plants of Victoria, an up‐to‐date list of the species and infraspecific taxa of vascular plants occurring in Victoria 

 

Page 40: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

Appendix 2. Royal Botanic Gardens MELISR migration, project implementation plan

Page 41: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Royal Botanic Gardens Melbourne

MELISR migration

Project implementation plan Prepared by Alison Vaughan and Niels Klazenga, 4 December 2009

Page 42: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into
Page 43: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Contents 1. Project definition ............................................................................................................................ 1

2. Objectives...................................................................................................................................... 1

3. Scope ............................................................................................................................................ 1

4. Deliverables................................................................................................................................... 2

5. Stakeholders.................................................................................................................................. 2

6. Roles and responsibilities .............................................................................................................. 3

7. Timeframes.................................................................................................................................... 3

8. Resources ..................................................................................................................................... 3

9. Implementation plan ...................................................................................................................... 3

10. Risk management.......................................................................................................................... 9

11. Stakeholder management strategy .............................................................................................. 11

Appendix A. Work breakdown schedule............................................................................................... 13

Appendix B. Project schedule .............................................................................................................. 15

Appendix C. Test cases ....................................................................................................................... 17

i

Page 44: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

ii

Page 45: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

1. Project definition

This implementation plan describes the migration of MELISR into a new collections management system.

MELISR is the collections database of the National Herbarium of Victoria (MEL), and is a crucial component of the collections management and research activities of the Royal Botanic Gardens Melbourne (RBG). MELISR is currently implemented on the KE Texpress database management system (DBMS). There are a number of outstanding issues with the Texpress system, including the unreliability of the back-up system, issues with data capture and data integrity, and inability to interface with other key business systems. Furthermore, Texpress is no longer being developed or adequately supported. The MELISR Business Case recommended that Specify be adopted as the new collections management system for MEL.

2. Objectives

The objectives of the MELISR migration project are:

to improve storage and retrieval of specimen information

to improve robustness and extensibility of the collections database

to reduce duplication of effort arising from maintaining multiple databases relating to the same information

to streamline delivery of specimen data to Australia’s Virtual Herbarium (AVH)

to allow MELISR to interface effectively with other taxonomic and nomenclatural databases.

3. Scope A key benefit of shifting to Specify is its greater extensibility. Specify can be more easily customised than Texpress, which means that we can add new fields to MELISR to allow for more accurate capture of specimen data. However, while new fields will be added, populating these fields retrospectively will require a greater investment of time and effort than is within the scope of this project. The only new fields that will be populated are those where the data can be easily extracted from the merging or separation of existing fields, or from associated databases. The tasks that are in and out of scope in this project are outlined below.

In scope:

migrating MELISR data from Texpress to Specify

cleaning MELISR data to improve data quality and aid migration

storing taxonomic names, personal names and locality information in separate tables and fields

improving handling of higher level taxonomy

adding determination history fields

improving recording of collectors (verbatim label data and interpreted label data)

adding fields to allow for more accurate recording of spatial data

improving recording of plant occurrence status

allowing for subsets of the collection to be appropriately flagged

1

Page 46: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

applying restricted vocabularies where feasible

improving recording of geocode precision

ensuring fields comply with HISPID standards

adding verification level fields

improving the capacity to record data about images associated with specimens

linking MELISR to global gazetteers

adding ethnobotanical information fields

adding a field to record the indigenous name of a plant

integrating herbarium census and loans and exchange databases with MELISR collections database.

Out of scope:

parsing ‘Notes’ data into ‘Collectors notes’ and ‘Other notes’ fields

parsing habitat, substrate etc. fields

populating the ‘Language’ field

populating all instances of ‘Verbatim collector’s name’

populating determination history fields

populating plant occurrence status for all records

populating the ‘Ethnobotanical information’ field

populating the ‘Indigenous name’ field

customisation of the Specify database model and interface other than options already available in the application.

The Atlas of Living Australia (ALA) is looking into Specify to replace BioNet, which is currently used as the collection management system by many entomological collections. If the ALA decides to support Specify, changes to the Specify data model that may make Specify more suitable to deal with botanical data may get in scope.

4. Deliverables The deliverables include:

a fully functioning herbarium management system

comprehensive user manual

staff training.

5. Stakeholders

The MELISR database supports the primary business of the Plant Sciences and Biodiversity Division (PS&B). The implementation of a new collection management system will require a large time commitment, especially from the Programmer, Information Technology and Collections Information Officer, and will affect all MELISR users.

Currently, five Collections Branch staff undertake data entry, data cleaning and other curation tasks in MELISR on a daily basis. A further two Collections staff use MELISR at least twice a week, and one Collections volunteer uses MELISR on a weekly basis. Six Plant Sciences staff regularly perform queries and data entry in MELISR, and several more have read-only accounts

2

Page 47: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

for data querying. Outside PS&B, the online interface of MELISR is used by staff in both RBG Melbourne and RBG Cranbourne.

In order to ensure data integrity, it will not be possible to run the old and new systems concurrently. Consequently, MELISR will be unavailable for use during part of the implementation phase. The loans and exchange administration system will likewise be unavailable during this period.

Although the migration of MELISR to Specify will cause disruption to staff in the short term, this temporary loss of productivity will be offset by the increased efficiency and reliability of the new system. The project implementation plan has been carefully structured to minimise disruption to database users.

6. Roles and responsibilities Project managers

Niels Klazenga (Programmer Information Technology, Biodiversity Information Officer)

Alison Vaughan (Collections Information Officer)

Responsible for project planning and quality assurance.

Project team Niels Klazenga (Programmer)

Alison Vaughan (Collections Information Officer)

Ed Jarrett (IT Project Officer)

Responsible for project implementation, testing and training.

Reference group David Cantrill (Chief Botanist and Director, PS&B)

Sabine Glissman-Gough (Manager, IS)

Pina Milne (Manager, Collections)

Catherine Gallagher (Co-ordinator, Curation)

7. Timeframes

This project is due for completion by 30 June 2011. It is critical that any disruption to access to MELISR does not coincide with preparations for the International Botanical Congress (IBC), which will be held in Melbourne in July 2011. If unforeseen delays in the implementation schedule occur, parts of the implementation plan may need to be held over until after the IBC.

8. Resources The migration of MELISR from Texpress to Specify will require a large time commitment from the Programmer, Information Technology and the Collections Information Officer. All infrastructure required to run Specify is already in place at the RBG. The Specify software is free of charge, thus the MELISR migration project will be budget neutral.

9. Implementation plan

The phases of the implementation plan are outlined below. This information is also represented as a work breakdown schedule (WBS, Appendix A) that details the timeframes required for each phase and outlines which tasks in the plan are dependent upon the completion of other tasks. A

3

Page 48: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Gantt chart (Appendix B) provides an overview of the project schedule by month. For an up-to-date project schedule, see S:\PS&B\MELISR\MELISR redevelopment\MELISR project schedule.xls.

1 Installation and configuration

1.1 Install and configure Specify on RBG network (Niels Klazenga)

Specify will be installed and configured on the RBG network.

2 Testing and customisation

2.1 Testing

Specify will be thoroughly tested prior to data migration. Testing the functioning and capabilities of the new system at this stage will help inform the user needs analysis, customisation requirements and user acceptance testing.

2.1.1 Create test data set of 2000 records (Alison Vaughan)

A set of 2000 records will be extracted from MELISR to be used for the testing and customisation of Specify. All fields in the Texpress implementation of MELISR will be represented in the test data set to ensure that all data storage requirements are accounted for when mapping on the new data model. Two thousand records is the maximum that can be loaded into Specify using the Workbench.

2.1.2 Clean test data set (Niels Klazenga & Alison Vaughan)

The taxon name, collector, additional collectors, determiner, confirmer, country and state fields in the test data set will be cleaned and normalised to allow this data to be correctly migrated into Specify.

2.1.3 Map test data on Specify data model (Niels Klazenga & Alison Vaughan)

The MELISR fields will be mapped on the Specify data model, using the test data set.

2.1.4 Upload test data set into Specify (Niels Klazenga & Alison Vaughan)

The test data set will be uploaded in Specify, using the Specify Workbench and the data mapping resulting from 2.1.3.

2.1.5 Test (Niels Klazenga & Alison Vaughan)

Specify will be tested using the test cases listed in Appendix C. Additional cases will be added during the test phase and user acceptance testing (2.5).

2.1.6 Refine mapping (Niels Klazenga & Alison Vaughan)

The mapping provided in 2.1.3 will be evaluated using the test data set, and refined as necessary.

2.2 User needs analysis

The migration of MELISR from Texpress to Specify provides an opportunity to improve the way that specimen data is recorded in MELISR, and to add additional functionality. A comprehensive user needs analysis will be undertaken to ensure that the stakeholder needs are met wherever possible.

2.2.1 Consult MELISR users regarding new and altered fields (Alison Vaughan & PS&B staff)

The project managers will consult MELISR users in PS&B to determine which fields need changing, how to best implement proposed new fields, and what restrictions should be placed on the use of existing and additional fields.

2.2.2 Determine loans and exchange database requirements (Alison Vaughan, Niels Klazenga, Pina Milne & Catherine Gallagher)

The project managers will consult the Manager, Collections and the Co-ordinator, Curation to determine the database requirements of the loans and exchange program, and to ascertain whether all loans and exchange requirements can be met by Specify.

4

Page 49: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

2.3 Customisation

As Specify was developed primarily for use by entomological collections, some customisation will be required to optimise it for use with herbarium collections. For instance, it is likely that Specify will need to be customised to cater for those elements of HISPID that are not represented in ABCD. Specify may also need to be customised to deal with curatorial fields in MELISR specific to MEL.

2.3.1 Customise forms (Niels Klazenga & Alison Vaughan)

The data entry forms in Specify will be customised to meet in-house databasing requirements and to ensure databasing can be undertaken efficiently and intuitively.

2.3.2 Customise labels (Niels Klazenga & Alison Vaughan)

The specimen labels in Specify will be customised to be as consistent with Texpress-generated labels as possible. Annotation labels will also be programmed.

2.3.3 Customise reports (Niels Klazenga & Alison Vaughan)

The standard quality control, auditing and collections management reports used in Texpress will be replicated in Specify or MySQL, and existing Specify reports will be customised to reflect the organisation and composition of MEL’s collection.

2.4 Peripherals

Specify will be configured for use with the RBG’s barcode scanners and printers.

2.4.1 Test barcode readers and resolve any problems (Niels Klazenga, Alison Vaughan & Ed Jarrett)

MEL accession numbers consists of three parts, which are stored as three fields in Texpress, but will be stored in one field in Specify. Because MEL’s barcode readers only read numerical data, there may be issues with the use of barcode readers with Specify that need to be resolved.

2.4.2 Configure print settings and test printing (Niels Klazenga & Ed Jarrett)

As Specify runs on the RBG workstation, configuring the printers will likely be straightforward.

2.5 User acceptance testing (UAT)

User acceptance testing (UAT) will be conducted prior to the migration of data to ensure that Specify meets user needs.

2.5.1 Develop UAT plan (Niels Klazenga & Alison Vaughan)

A UAT plan will be developed to ensure that user acceptance testing is comprehensive and that testing outcomes are properly documented.

2.5.2 UAT (Niels Klazenga, Alison Vaughan & PSB staff)

UAT will be carried out by a range of stakeholders from PS&B, including Collections staff who do the bulk of the data entry into MELISR.

2.5.3 Refine according to outcome of UAT (Niels Klazenga & Alison Vaughan)

If necessary, the customisation of Specify will be refined in response to the outcome of the UAT.

3 Data migration

3.1 Preparation

Because the Specify data structure is very different to the current flat structure of the MELISR database, we need to prepare the MELISR data for migration to Specify. This involves parsing some existing fields into new fields and data cleaning of selected fields. This phase of the project is expected to take the most time. In order to minimise disruption to database users, a snapshot of MELISR will be taken at the start, so staff can continue to use MELISR while the data cleaning is progressing.

5

Page 50: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

3.1.1 Map new and existing fields on Specify data model (Niels Klazenga & Alison Vaughan)

The test dataset will be mapped on the Specify data model using the Workbench.

3.1.2 Data cleaning (Niels Klazenga & Alison Vaughan)

Data cleaning will be restricted to what is necessary to make the collections database standards-compliant and will focus on three areas: collector information, taxon names and geographical information. Parsing habitat and notes fields is out of the scope of the project.

3.2 Migration

Migration will be trialled on a copy of the MELISR database (see above). All SQL statements will be saved, so that the actual migration can take place as quickly as possible.

3.2.1 Create MySQL table with cleaned MELISR dataset (Niels Klazenga)

A single table with all cleaned MELISR data will be created. This table will be similar to the current MySQL duplicate of MELISR, but with the field structure resulting from steps 2.1.3 and 2.1.6.

3.2.2 Trace data mapping (Niels Klazenga)

The Specify data model is highly normalised and consists of a large number of different tables. The mapping of each MELISR field needs to be traced through the data model in order to put the right foreign keys in the right tables, so that all tables will be linked correctly.

3.2.3 Write and run SQL INSERT statements (Niels Klazenga)

SQL commands will be composed and executed.

3.2.4 Migration of MELISR data to Specify (Niels Klazenga)

Final data migration will take place after the necessary SQL has been written and trialled on a copy of MELISR. As records entered or changed in the Texpress implementation will not be migrated, MELISR will be unavailable during this phase. The herbarium census will likewise be unavailable.

4 Project finalisation

4.1 Installation

During testing and migration the Specify client program will be installed only on the Programmer’s and Collection Information Officer’s workstations. After successful migration Specify will be installed on all MELISR users’ workstations. A backup program will be installed on a different server.

4.1.1 Installation of Specify on workstations (Alison Vaughan, Ed Jarrett, Upul Molligoda & Niels Klazenga)

Specify will be installed on the workstations of all RBG staff and volunteers who require access to the collections database.

4.1.2 Install Specify data backup program on server (Niels Klazenga & Ed Jarrett)

A backup program that will make daily backups of the MELISR database will be installed on a different server, in order to ensure security of the database.

4.2 Training

Training staff in the use of Specify is a key aspect of the project; it is imperative that staff who use MELISR are well-supported during the transition from Texpress to Specify. Training will be provided in stages, starting with those staff whose work is most reliant on the database.

4.2.1 Revise MELISR data entry manual (Alison Vaughan)

The MELISR manual will be refined throughout the customisation and testing phases to ensure that all aspects of database use are covered. Data entry procedures will be

6

Page 51: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

updated to reflect changes to the data structure and the addition of controlled vocabularies.

4.2.2 Train Collections staff (Alison Vaughan)

Staff in the Collections Branch are most reliant upon MELISR, and thus will be the first users to be trained in the use of Specify.

4.2.3 Train Plant Sciences staff and Collections volunteers (Alison Vaughan)

Of next priority in the training schedule are Plant Sciences staff and Collections volunteers who use MELISR for data entry, followed by those Plant Sciences staff who use MELISR for data querying only.

4.2.4 Train other MELISR users throughout the organisation (Alison Vaughan)

MELISR users from outside PS&B will be trained last.

4.3 Configure provider software

Currently MEL only delivers data to AVH, using the BioCASE provider software. This provider needs to be modified.

4.3.1 Configure TAPIR and BioCASE providers (Niels Klazenga)

The BioCASE provider will need to be reconfigured. As the underlying data model has changed, this involves a new table structure for the BioCASE table.

4.3.2 Organise update of MEL data in AVH (Niels Klazenga)

Because the new collections database has a different field structure and the data will be cleaned, all records will effectively be updated. An update of the MEL records already in the AVH cache through other means than the BioCASE provider needs to be arranged.

The relationship between the different activities in the project implementation plan is shown as a Project Evaluation and Review Technique (PERT) diagram (Fig. 1).

7

Page 52: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

8

2.1.1 Create test data set of 2000 records and all fields in MELISR

2.1.3 Map MELISR fields on SPECIFY data model

2.1.4 Upload test data set into Specify using workbench

2.3.2 Customise labels

2.3.1 Customise forms

2.3 Customisation

2.4.1 Test barcode readers and resolve any problems

2.4.2 Configure print settings and test printers

2.4 Peripherals

1.1 Install and configure Specify on RBG network

2.1 Testing

2.1.2 Clean test data set

2.1.5 Test

2.2.2 Determine loans and exchange database requirements

2.2.1 Consult MELISR users regarding new and altered fields

2.2 User needs analysis

2.5.2 UAT

2.5.1 Develop UAT plan

2.5 User acceptance testing

2.5.3 Refine according to outcome of UAT

1 Installation and configuration

2 Testing and customisation

3.2.2 Trace data mapping on Specify data model

3.2.3 Write and run SQL INSERT statements

3.2.1 Create MySQL table with cleaned MELISR data

3.2 Migration

3.2.4 Migration of MELISR data to Specify

3.1.2 Data cleaning

3.1.1 Map new and existing fields on Specify data model

3.1 Preparation

2.3.3 Customise reports

3 Data migration

4

2.1.6 Refine mapping

Figure 1. PERT network depicting the sequence of activities

2.1.1 Create test data set of 2000 records and all fields in MELISR

2.1.3 Map MELISR fields on SPECIFY data model

2.1.4 Upload test data set into Specify using Workbench

2.3.2 Customise labels

2.3.1 Customise forms

2.3 Customisation

2.4.1 Test barcode readers and resolve any problems

2.4.2 Configure print settings and test printers

2.4 Peripherals

1.1 Install and configure Specify on RBG network

2.1 Testing

2.1.2 Clean test data set

2.1.5 Test

2.2.2 Determine loans and exchange database requirements

2.2.1 Consult MELISR users regarding new and altered fields

2.2 User needs analysis

2.5.2 UAT

2.5.1 Develop UAT plan

2.5 User acceptance testing

2.5.3 Refine according to outcome of UAT

1 Installation and configuration

2 Testing and customisation

3.2.2 Trace data mapping on Specify data model

3.2.3 Write and run SQL INSERT statements

3.2.1 Create MySQL table with cleaned MELISR data

3.2 Migration

3.2.4 Migration of MELISR data to Specify

3.1.2 Data cleaning

3.1.1 Map new and existing fields on Specify data model

3.1 Preparation

2.3.3 Customise reports

2.1.6 Refine mapping

3 Data migration

4

Page 53: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

4.2.2 Train Collections staff

4.2.3 Train Plant Sciences staff and Collections volunteers

4.2.1 Revise MELISR data entry manual

4.2 Training

4.2.4 Train other MELISR users

4.2.2 Train Collections staff

4.2.3 Train Plant Sciences staff and Collections volunteers

4.2.1 Revise MELISR data entry manual

4.2 Training

4.2.4 Train other MELISR users

4.1.2 Install Specify data backup program on server

4.1.1 Installation of Specify on users’ workstations

4.1 Installation

4.1.2 Install Specify data backup program on server

4.1.1 Installation of Specify on users’ workstations

4.1 Installation

4 Project finalisation

4.1.2 Install Specify data backup program on server

4.1 Installation

4.3.2 Organise update of MEL data in AVH

4.3.1 Configure TAPIR and BioCASE providers

4.3 Configure provider software

4.3.2 Organise update of MEL data in AVH

4.3.1 Configure TAPIR and BioCASE providers

4.3 Configure provider software

1.11.1

1.11.1

22

2.5.32.5.3

2.5.32.5.3

2.5.32.5.3

3.23.2

3.23.2

3

Figure 1 (ctd).

10. Risk management The main risks associated with migrating MELISR to Specify are:

Specify not meeting the business needs of the herbarium

loss of data

delay in implementation

changes to scope

conflicting operational priorities

variable stakeholder expectations

lack of stakeholder participation.

The risks have been minimised by selecting a new system that closely meets our needs, using a DBMS that we are already familiar with, choosing a system with a user-friendly interface, and carrying out comprehensive project planning.

system that closely meets our needs, using a DBMS that we are already familiar with, choosing a system with a user-friendly interface, and carrying out comprehensive project planning.

9

Page 54: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Table 1. Risk management strategy

Risk Impact Probability Consequence Mitigation

Specify not meeting the business needs of the herbarium

The options for improving MELISR would need to be reassessed, which will lead to a delay in implementing MELISR on a new, robust system

Unlikely Moderate Undertake rigorous testing prior to mass data cleaning and migration to ensure Specify meets our needs before progressing with the most resource intensive phase of the project

Loss of data Reduction of the quality of the MEL collection data

Unlikely Major Reduce the risk of data loss during migration by keeping good back-ups and by rigorous testing of the Specify data model

Delay in implementation

Phases of the project might need to be held over until after the IBC; a delay in implementation would increase the risk of conflicting operational priorities

Likely Moderate There are four months between the scheduled completion of data migration and the IBC, which will allow for some slippage

Changes to scope

Implementation will be delayed

Likely Moderate Carefully plan and define scope at the start to minimise scope creep

Conflicting operational priorities

The project team would be unable to meet project deadlines, thereby delaying subsequent phases of the project

Likely Moderate The prioritisation of the MELISR migration implementation project at organisational level should minimise this risk

Variable stakeholder expectations

The new implementation might not meet the expectations of all stakeholders

Likely Moderate Consult stakeholders at various stages of the project to minimise the risk of stakeholder dissatisfaction

Lack of stakeholder participation

Missed opportunity for Specify to best meet the business needs the National Herbarium of Victoria

Unlikely Minor Encourage stakeholders to contribute to the user needs analysis and user acceptance testing

10

Page 55: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

11. Stakeholder management strategy The success of the MELISR migration project is dependent upon the involvement and support of the project’s stakeholders. It is imperative that stakeholders’ interests in the project are identified and that they have the opportunity to contribute to the decision making processes relating to those interests. The stakeholder management strategy (Table 2) details the interests of, and input required from, different stakeholder groups. By maintaining good communication with stakeholders throughout the project, we can minimise resistance to change and promote greater investment in, and acceptance of, the project’s outcomes. Table 2. Stakeholder management strategy

Stakeholder Interest in project Input required Potential barriers

Engagement strategy

Curation Officers, Curation Co-ordinator

As the primary users of the database, the curation staff have a high level of interest in any changes to work processes resulting from the migration of MELISR to Specify.

input on changes to database fields and data entry requirements

input on changes to the loans and exchange database

feedback on data entry manual and database usability

resistance to change, and reluctance to learn new databasing procedures, especially if these are more complex than current practices

consult curation staff on proposed changes and invite feedback throughout the implementation process

provide clear justification for any changes made

provide clear instructions and training in the use of the new system

Botanists with data entry privileges, databasing volunteer

Many of the botanists at MEL database their own collections and, as such, have an interest in changes to databasing procedures.

input on changes to database fields and data entry requirements

feedback on data entry manual and database usability

reluctance to learn new databasing procedures, especially if these are more complex than current practices

consult staff on proposed changes and invite feedback throughout the implementation process

provide clear justification for any changes made

provide clear instructions and training in the use of the new system

Other MELISR users

Several RBG staff have read-only MELISR accounts, and will be affected by changes to querying the database.

input on possible improvements to query functionality

reluctance to learn new system

provide clear instructions and training in the use of the new system

11

Page 56: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

12

Stakeholder Interest in project Input required Potential barriers

Engagement strategy

MELISR web query users

It is foreseeable that the migration of MELISR from Texpress to Specify will necessitate changes to the MELISR web query. Any such changes will need to be communicated to RBG staff who use the MELISR web query.

provide clear and timely information on any changes

provide training if required

IS IS will no longer have to maintain a separate database management system to accommodate MELISR.

provision of programming and technical support

conflicting operational priorities

Programmer to liaise with IS

provide regular project updates

HISCOM Many HISCOM members will be interested in the migration procedure because other herbaria might consider implementing Specify themselves.

input on standards compliance

provide report at end of migration process

invite members to view new system

Page 57: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Appendix A. Work breakdown schedule

Task Resource Dependent on

Start date Finish date

1 Installation and configuration

1.1 Install and configure Specify on RBG network NK - 1/07/2009 1/07/2009

2 Testing and customisation

2.1 Testing 15/10/2009 26/02/2010

2.1.1 Create test data set of 2000 records AV 1.1 15/10/2009 15/10/2009 2.1.2 Clean test data set NK & AV 2.1.1 12/11/2009 3/12/2009 2.1.3 Map test data on Specify data model NK & AV 2.1.2 4/01/2010 22/01/2010 2.1.4 Upload test data set into Specify NK & AV 2.1.3 27/01/2010 29/01/2010 2.1.5 Test NK & AV 2.1.4 1/02/2010 12/02/2010 2.1.6 Refine mapping NK & AV 2.1.5 15/02/2010 26/02/2010

2.2 User needs analysis 1/03/2010 28/05/2010

2.2.1 Consult MELISR users regarding new and altered fields AV & PSB - 1/03/2010 28/05/2010 2.2.2 Determine loans and exchange database requirements AV, NK, CG

& PM - 1/03/2010 5/03/2010

2.3 Customisation 1/03/2010 2/07/2010

2.3.1 Customise forms NK & AV 2.1 12/04/2010 14/05/2010 2.3.2 Customise labels NK & AV 2.1 12/04/2010 14/05/2010 2.3.3 Customise reports NK & AV 2.1 12/04/2010 14/05/2010

2.4 Peripherals 1/03/2010 21/5/2010

2.4.1 Test barcode readers and resolve any problems NK, AV & EJ 2.1 1/03/2010 14/05/2010 2.4.2 Configure print settings and test printing NK & EJ 2.1, 2.3.2 17/05/2010 21/05/2010

2.5 User acceptance testing (UAT) 8/03/2010 2/07/2010

2.5.1 Develop UAT plan NK & AV 2.2 8/03/2010 26/03/2010 2.5.2 UAT NK, AV &

PSB 2.1–2.4, 2.5.1 17/05/2010 18/06/2010

2.5.3 Refine according to outcome of UAT NK & AV 2.5.2 21/06/2010 2/07/2010

3 Data migration

3.1 Preparation 5/07/2010 3/09/2010

3.1.1 Map new and existing fields on Specify data model NK & AV 2.1.3, 2.1.5 5/07/2010 3/09/2010 3.1.2 Data cleaning NK & AV - 5/07/2010 3/09/2010

3.2 Migration 6/09/2010 28/01/2011

3.2.1 Create MySQL table with cleaned MELISR data NK 3.1 6/09/2010 10/09/2010 3.2.2 Trace data mapping NK 3.1 6/09/2010 24/09/2010 3.2.3 Write and run SQL INSERT statements NK 3.2.2 27/09/2010 31/12/2010 3.2.4 Migration of MELISR data to Specify NK 3.2.3 20/12/2010 28/01/2011

13

Page 58: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

14

Appendix A (ctd).

Task Resource Dependent on

Start date Finish date

4 Project finalisation

4.1 Installation 5/07/2010 18/03/2011

4.1.1 Installation of Specify on workstations NK, AV & EJ 1.1 7/02/2011 18/02/2011 4.1.2 Install Specify data backup program on server NK & EJ 1.1 7/02/2011 25/02/2011

4.2 Training 7/02/2011 18/03/2011

4.2.1 Revise MELISR data entry manual AV 2 5/07/2010 4/02/2011 4.2.2 Train Collections staff AV 2.5.3, 4.1 7/02/2011 4/03/2011 4.2.3 Train Plant Sciences staff and Collections volunteers AV 2.5.3, 4.1 14/02/2011 11/03/2011 4.2.4 Train other MELISR users throughout the organisation AV 2.5.3, 4.1 21/02/2011 18/03/2011

4.3 Configure provider software 28/02/2011 29/04/2011

4.3.1 Configure TAPIR and BioCASE providers NK 3.2, 4.1 28/02/2011 11/03/2011 4.3.2 Organise update of MEL data in AVH NK 3.2 14/03/2011 29/04/2011

Page 59: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Appendix B. Project schedule

PBS TASKS

1 Installation and configuration

1.1 Install and configure Specify on RBG network

2 Testing and customisation

2.1 Testing

2.1.1 Create test data set of 2000 records

2.1.2 Clean test data set

2.1.3 Map test data on Specify data model

2.1.4 Upload test data set into Specify

2.1.5 Evaluate and refine mapping

2.2 User needs analysis

2.2.1 Consult MELISR users regarding new and altered fields

2.2.2 Determine loans and exchange database requirements

2.3 Customisation

2.3.1 Customise forms

2.3.2 Customise labels

2.3.3 Customise reports

2.4 Peripherals

2.4.1 Test barcode readers and resolve any problems

2.4.2 Configure print settings and test printing

2.5 User acceptance testing (UAT)

2.5.1 Develop UAT plan2.5.2 UAT

2.5.3 Refine according to outcome of UAT

3 Data migration

3.1 Preparation

3.1.1 Map new and existing fields on Specify data model

3.1.2 Data cleaning

3.2 Migration

3.2.1 Create MySQL table with cleaned MELISR dataset

3.2.2 Trace data mapping

3.2.3 Write and run SQL INSERT statements

3.2.4 Migration of MELISR data to Specify

4 Project finalisation

4.1 Installation

4.1.1 Installation of Specify on workstations

4.1.2 Install Specify data backup program on server

4.2 Training

4.2.1 Revise MELISR data entry manual

4.2.2 Train Collections staff

4.2.3 Train Plant Sciences staff and Collections volunteers

4.2.4 Train other MELISR users throughout the organisation

4.3 Configure provider software

4.3.1 Configure TAPIR and BioCASE providers

4.3.2 Organise update of MEL data in AVH

Mar Apr2009 2010 2011

Nov Dec Jan FebJul Aug Sep OctMar Apr May JunNov Dec Jan FebJul Aug Sep Oct

15

Page 60: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into
Page 61: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

17

Appendix C. Test cases taxon name tables can deal with botanical names

infraspecific rank

hybrid names and formulae

cultivated plant names

database must be able to store determination history

data entry forms are navigable by keyboard as well as mouse

data model must be able to accommodate all required fields

verbatim collector

verbatim foreign language label information

data structure must be able to store unit relationships

Page 62: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

Appendix 3. Summary of Missing and Needed Specify Features

Page 63: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

2

This appendix is a copy of http://bit.ly/otjNme (minus blog comments). Some items, namely Desktop Application, Annotation Data and Annotation Service have been added or updated.

Oracle Support Oracle and other corporate database environments are in regular use Australia-wide. The Information and Communications Technology (ICT) policy in some institutions forbids the use of open source database platforms for corporate data or there is pre-existing data in a platform such as Oracle and it makes sense to continue to use it, rather than adopt a new platform solely for Specify. It is thus critical to a number of institutions that a collections management tool support a corporate database environment.

Specify Software Project (Rod Spears):

“Porting to additional DBMS systems was originally part of our Specify 6 plan. In our most recent renewal we were cut down from 3 developers to 1 and are [fortunate] that Tim's funding [was] available from other sources. We no longer have the resources to do any DBMS porting ourselves. If an organization like yourself would like to do the port themselves we would be more than willing to support that effort.”

External Taxonomies Specify will import a taxonomy of names, but thereafter it can not be updated from that external source. It is critical that Specify be capable of synchronising its taxonomy tree with an external source so that the tool can be implemented alongside mature taxonomy applications already in place in Australian institutions.

Specify Software Project (Rod Spears):

“This has been in the plan from the beginning. We wanted to solve the more generic problem of providing external resources for:

“Taxonomic names

Agent names

Geography

Stratigraphy

“This is [currently] not on our development plan [due] to lack of resources.”

Page 64: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

3

RBG Melbourne (Niels Klazenga):

“There are different forms of synchronisation between taxonomies. I would be happy with a form of synchronisation between the Specify taxon tree and, say, the National Species List where when a taxon name is added to the taxon tree, it can be checked against the NSL and the authorship and protologue can be imported from NSL. I have the feeling Specify can already do that. A form of synchronisation where the taxon tree in Specify is automatically updated when a change is made in an external database I will be happy to implement for our in-house databases (Ausmoss, Interactive Catalogue of Australian Fungi, Census of Vascular Plants of Victoria), but not for anything else.

“This form of synchronisation may be useful for institutes that have collections from a limited geographical area and a small taxonomic spectrum, but not for something like the National Herbarium of Victoria that has collections from all over the world from five different Kingdoms. We would either have to synchronise with many different external nomenclators, many of which do not even exist, or something like the Catalogue of Life (and who would want that?). Our Specify taxon tree also contains only names that are in our collections; if we would include all names in all taxonomic groups of which we hold collections and for all geographic regions, our taxon table would be bigger than our entire collections database is now. Also, there are many “names” on specimens in every institute that you will not find in any nomenclator.

“Synchronisation is pretty easily achieved through the back end and does not really require anything extra from Specify. It is the external nomenclator that needs to make its data available through a web service or something.

“As for agents, the desirability of having a collectors (or people of interest to Australian botany) database that is shared between herbaria is something that has come up in discussions on several of the CHAH-ALA collaboration themes, so we can make a strong case to CHAH that the Australian herbaria want to have something like that. Once there it will be very easy to synchronise with Specify or any other database that has a table with agents.

Page 65: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

4

“Geography. This doesn’t change so fast, so I do not see a need for synchronisation. There is another problem with the geography tree in Specify. The geography tree in Specify comes with continents, ISO countries, states and counties. This ISO system does not make much sense in a natural history collections database. We had initially thrown out this geography tree and replaced it with TDWG-WGS2, but then we found out that the way GEOLocate is implemented in Specify it requires ISO countries to be in a single column in the geography tree. All ISO countries are in WGS2, but they are not all of the same rank. The WGS continents are used for storage at MEL, so now we have a geography tree which is somewhere in between ISO and WGS2. And it is not pretty. It would be much better if we could have a translation table between the geography tree and the GEOLocate plugin that translates any entry in the geography tree to an ISO country.”

Batch Editing (According to http://specifysoftware.org/content/specify-63-features-and-enhancements, batch editing of any number of records to a new determination is now possible in version 6.3.)

It is currently not possible to edit the same field in the database for more than one record at a time. This is a significant time and occupational health and safety feature that must be implemented. Competing tools handle this automatically, e.g. Texpress.

The National Herbarium of Victoria work around this by performing batch edits directly on the underlying MySQL database. They view this limitation as a feature, as it better protects the database. A potential side-effect of a well-normalised database structure is a reduced need for batch edits. A single Agent’s details can be changed in one place and the details are reused throughout the application. This is different from some collections databases.

Issue raised by: WA Herbarium.

Specify Software Project (Rod Spears):

“Our approach in Specify 6 will be different than in Specify 5. Specify 5's batch edit was essentially a search and replace for nearly every field in every table. As you can [imagine], this approach is very powerful, but at the same time, very dangerous for the novice user. Additionally, the average user was not very disciplined about making back ups.

“Specify 6's approach will involve the user 'exporting' the data to the WorkBench where it can be edited, or even [exported] once again to Excel for editing. Then the data is re-uploaded via the WorkBench [where] it can be validated. We will also be providing additional tools for specific clean up use cases, for example, duplicate Agent clean up.”

Batch editing is slated for release in March 2011. It includes a “batch re-identify” feature.

Page 66: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

5

RBG Melbourne (Niels Klazenga):

“I like Rod’s approach. I would be very unhappy with Specify 5’s batch edit, which is basically Texpress’s batch edit or with a situation where every user with editing privileges on a table would also be able to batch edit. Uploading workbench datasets is a different privilege from editing a table, so this would work very well for us.

“At the moment more useful than batch editing would be the shortcuts we could have in Texpress, i.e. short combinations of characters that would be immediately translated into much longer text strings, similar to the autocorrect in MS Office.

Access to Support A number of people have already mentioned that email to support is not answered in a timely fashion. It does not inspire confidence that Australian institutions will have ready access to experts. There is certainly a need for a few Australian points of contact for expert Specify consulting.

Issue raised by: Peter Doherty (ALA); Piers Higgs (Gaia Resources).

RBG Melbourne (Niels Klazenga):

“I have erased us from the line above, as our issue was caused by their emails back to us bouncing.

“Specify is used by over 200 institutes world wide and Specify Software has only one helpdesk person. Moreover, their NSF funding only allows them to support US institutions.

“We need to create a Specify community in Australia where larger institutes with more resources and more experience support the smaller ones. We are not quite experts at Specify, but Alison and I are both happy to advise and discuss. We will also make our mapping, forms, reports and data entry manual available. We will not be able to make forms for everybody or migrate everybody’s data.

“People like Piers and Peter should contact Andy directly.”

Record Display Limits The maximum number of records displayed in the result of a search is set to 5,000 records. This is apparently a configurable limit, but it is inappropriate for very large datasets due to the potential for some records to never fall within the first 5,000 and thus inaccessible without knowledge of the content of those other records so that a special query can be devised to return them.

Issue raised by: Australian National Wildlife Collection (ANWC); Royal Botanic Gardens (RBG), Melbourne; WA Herbarium.

Specify Software Project (Rod Spears):

“[T]his was part of the original plan for Specify 6. Specify 5 did not have a limit. Specify 6 at the moment loads all the results into memory, thus the limit.

Page 67: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

6

“Our plan is to do paging, meaning a set number of records on each page and then move page by page through the results. Or to have a moving 'window' though the results where only the 'window' plus or minus rows on each side [will] reside in memory. My guess is that we will probably use the paging approach.”

“We did not have time to fully complete this before shipping the first version of Specify6. (The selection of 5000 was completely arbitrary).”

Record Import Limits The maximum number of records imported at a time is limited to 2,000. This is an inappropriate limit for large collections seeking to migrate to Specify. It also indicates that the underlying software is perhaps not well implemented, as memory management is well handled by competing collections management tools.

Issue raised by: WA Herbarium.

Specify Software Project (Rod Spears):

“There is a preference that can be set to increase the limit beyond 2000. We have uploaded as many 10,000 rows at a time. The uploader was originally developed using Hibernate instead of straight SQL. The end result was portable DBMS code, but Hibernate is very memory intensive. So the limit is really the number of rows multiplied by the number columns, then factor in roughly how much data is in the rows/columns and finally how much memory is on the machine running Specify. We punted and arbitrarily choose 2000 rows.

“Also, as a point of history, the WorkBench was designed to [enable] collectors to easily upload field notebook information. The interesting thing is that it has been primarily used for migrating data into Specify.”

RBG Melbourne (Niels Klazenga):

“I suggest larger institutes like the WA Herbarium do not use the workbench for migrating data into Specify. Way too scary: you do not know what is happening. Also, not all fields can be imported through the workbench, so you’ll have to use SQL at some point anyway.”

Desktop Application Aside from the basic EZDB version 1 , Specify has a clear client-server design. In this design, the desktop application resides on a user’s PC (the “client”) and the database resides in a MySQL instance on a separate machine (the “server”). This is beneficial for multiple users as it avoids the problem of multiple copies of a database in multiple places, and makes it easier to backup the data outside staff business hours.

1 http://bit.ly/mOAVX9: “Specify EZDB eliminates the need to install and administer a MySQL server, making it well-suited for small collections and single-user databases.” Despite this, we

wouldn’t recommend using it in this manner for any institutional collection.

Page 68: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

7

One problem with this approach is that the desktop application and the MySQL database need to be upgraded at the same time, as some versions of Specify make changes in the desktop application that are incompatible with an older database structure. A way to correct this is to use desktop virtualisation (e.g. Citrix) and run the client software in a virtual desktop for each staff member able to access the database. This virtual desktop can then be locked for update at the same time as the database, ICT support staff can install/upgrade the client from their location. This presumes that the staff member is prevented from updating the client software themselves. Alternatively, a web client can be used, if one exists, where the only desktop requirement is a web browser.

RBG Melbourne (Niels Klazenga):

“This should be remedied when the Specify web client comes out later this year. From what I can see from the sneak preview this web client will have all the features of the desktop client.”

Query Across Collections For institutions with multiple collections, it is critical that all collections be included in Specify searches.

Issue raised by: ANWC, RBG Melbourne.

RBG Melbourne (Niels Klazenga):

“We want this too! So far, the absence of this capability has prevented us to have more than one collection. I think this is somewhere on the road map.”

“OR” Queries Specify cannot do “OR” queries via its interface. This is a way to limit the search results to those of interest.

Issue raised by: ANWC, RBG Melbourne.

RBG Melbourne (Niels Klazenga):

“We want this too. Looking at how the query tool in Specify is set up, I have the feeling it may be quite hard to achieve.”

Seedbank Support Specify needs to support the data collected by seedbanks, as these institutions are often work closely with specimen collections and link their data to specimens. Some seedbanks have poorly performing databases with no link to the biodiversity network.

Issue raised by: Threatened Flora Seed Centre (WA).

RBG Melbourne (Niels Klazenga):

“It would be good to have a seedbank module in Specify that can deal with germination trials and all the other stuff seedbanks do. Some of it may already be possible. Have a look at the treatment events and conservator description and comments tables.”

Page 69: Atlas of Living Australia Collections Project Report · 2011-12-06 · Atlas of Living Australia Collections Project Report 3 funding ceases. This way forward had to also take into

Atlas of Living Australia Collections Project Report

8

Annotation Data There is apparently no support for attaching annotations to a specimen in Specify. In this sense, annotations are free-text notes associated with day-to-day curation of a collection, not a service associated with this data (see Annotation Service). An annotation can commonly be associated with a Collection Object, but also with other parts of the database, such as an Agent or a Taxon.

Annotation Service Once a collection is available to the public, it is common to receive email from collectors and others noting data errors. Within an institution, these notes are often appended to the database as an annotation (see Annotation Data). One source of this in future is ALA itself, as it currently makes an annotation feature available to site users.

A powerful addition to Specify would be a plug-in that was capable of receiving and processing annotations and presenting these to the database custodian within Specify. It would close the loop on corrections made from the cloud and enhance the efficiency of data cleaning and annotating tasks.

Rod Spears has noted that “Specify can easily interact with a set of Web Services to 'pull' annotations. We are currently working with the 'Filtered Push' grant in the States which will communicate annotations. Our Specify 7 grant proposal had such a feature, but we were not funded in such a way to implement the new features in the grant.”


Recommended