Atlas of Living Australia Collections Project Report
Atlas of Living Australia Collections Project Report
Implementing Specify 6 in Australia
December 2011
Ben Richardson Western Australian Herbarium, Science Division, Department of Environment and Conservation
Piers Higgs Gaia Resources
Atlas of Living Australia Collections Project Report
2
Executive Summary In mid–late 2010 a number of institutions were separately considering Specify Software’s collections management tool Specify as a migration path for their current Collection Management System (CMS).
The Atlas of Living Australia (ALA) commissioned Ben Richardson and Piers Higgs to undertake some work in this area. An initial meeting to determine the parameters of the project decided that it would “determine the resources available and those required on a per-institution basis and return a set of recommendations to ALA on how it can best help.” This report represents the completion of this task.
Upon beginning work it quickly became obvious that—given the number, size and complexity of collection databases, the lack of Australian knowledge of Specify, and the tight time frame associated with ALA projects—a project to migrate even a medium-sized collection would find it difficult to meet a mid-2012 deadline.
Instead, this project evolved into an evaluation of Specify that included:
Gauging the interest in conducting training workshops;
Conducting a follow-up survey of interested institutions;
Collation of requirements from the community;
Discussions with Specify about implementing those requirements; and
Writing this report.
Project Outcomes The outcomes from this project were:
1. Raised Awareness: Institutions are more aware of the capability of Specify as a result of this project. We will continue to raise awareness about Specify through the blog developed during this project, located at http://alacollections.wordpress.com/.
2. Determined the Demand: We have determined that 13 institutions, with over 17 million specimens between them, are interested in Specify across Australia. Several institutions performed their own evaluation and removed themselves from the listing.
3. Evaluated Suitability: From our discussions with interested parties, and our own evaluation, Specify is well suited to many of the uses envisaged by contributors to this project, although we did identify some barriers to entry. Specify seems to be particularly well suited to relatively small collections with no existing Collections Management System, but not well suited to seed-banking and cultural collections.
4. Determined the Barriers to Entry: We identified several barriers to entry, such as the missing and needed features in Specify, and the issues with the potential collaboration pathways through Specify Software and local developers.
5. Determined Collaboration Pathways: We now understand the terms by which Specify Software would engage or work with contract developers. This is critical for ongoing support, and to resolve the barriers to entry for the institutions involved.
A Way Forward for the Atlas of Living Australia A key outcome of this project was to develop a way forward for the ALA for the support of the Specify platform within Australia both up to and after June 2012, when the current core ALA
Atlas of Living Australia Collections Project Report
3
funding ceases. This way forward had to also take into account the current life of the ALA, and the full commitment of the budget for the existing project.
Given the remaining funding for this project, we identify two possible ways for ALA to support the uptake of Specify through to June 2012:
1. Continued support for the blog (http://alacollections.wordpress.com/); and
2. Potentially a small trial of targeted support.
We also identify three main ways for this project to continue, should the ALA receive additional funding and be recommissioned at the end of the current funding cycle. These are:
1. Development of missing features;
2. Targeted support to interested parties; and
3. Other potential sources of support.
Atlas of Living Australia Collections Project Report
4
Table of Contents Atlas of Living Australia Collections Project Report .......................................................................... 1
Executive Summary ...................................................................................................................... 2
Project Outcomes ...................................................................................................................... 2
A Way Forward for the Atlas of Living Australia ........................................................................ 2
Table of Contents .......................................................................................................................... 4
Raising Awareness........................................................................................................................ 5
What is Specify? ........................................................................................................................ 5
National Herbarium of Victoria................................................................................................... 6
Determining the Demand .............................................................................................................. 8
Gauging Interest in Specify........................................................................................................ 8
Interest from the Collections ...................................................................................................... 9
Other Interest........................................................................................................................... 12
Training Workshops................................................................................................................. 12
Evaluating Suitability ................................................................................................................... 14
Training Materials .................................................................................................................... 14
Barriers to Entry .......................................................................................................................... 15
Summary of Missing and Needed Features ............................................................................ 15
Collaboration Pathways............................................................................................................... 17
Specify Software...................................................................................................................... 17
Other Australian Developers.................................................................................................... 17
A Way Forward for the Atlas of Living Australia .......................................................................... 19
Continued Support for the Blog ............................................................................................... 19
Targeted Support Trial............................................................................................................. 19
Development of Missing Features ........................................................................................... 20
Targeted ALA Support ............................................................................................................. 21
Other Potential Methods of Support ........................................................................................ 22
Appendices ..................................................................................................................................... 23
Atlas of Living Australia Collections Project Report
5
Raising Awareness An important part of this project was raising awareness across the community that there are additional tools that can be used for collection management.
Natural history institutions around the country already use a variety of tools to manage their specimen collections including: CSIRO’s BioLink, University of Connecticut’s Biota, Knowledge Engineering’s (KE) EMu, Specify Software’s Specify, and Vernon Systems’ Vernon CMS. Rather than use an off-the-shelf product such as these, a number of institutions have developed their collection management capability directly within a database platform, where the database structure is completely bespoke. Database platforms such as KE Texpress, MySQL, Oracle, Microsoft SQL Server, Microsoft Access and FileMaker Pro are common in this scenario. Also, some collections are managed in this way because they were built prior to the existence of CMS applications.
The wide variety of tools in use, even within some institutions leads us to suggest that a migration of many smaller collections into one tool would offer benefits to the institution in terms of standardisation, reduced support overhead, better cross-collection data sharing and Internet readiness.
ALA has provided support to three collections management systems:
1. BioLink has been re-engineered and its code that modernised, the source code provided under an open source license, and a number of outstanding issues fixed. This provides a simple upgrade path for those already using BioLink.
2. BioloMICS was selected as the preferred CMS for micro-organism collections. ALA is supporting the rollout of BioloMICS in 12 institutions.
3. Specify. ALA introduced Australian collections to Specify and initiated this project to determine how ALA could best help.
What is Specify? Specify is a GNU GPL-licensed1 (i.e. open source) CMS. As a result, it can be used by anyone without charge. Although no direct fee is charged to use Specify, there are substantial costs associated with installation and configuration of the software, staff training, data entry and maintenance, and maintenance of the infrastructure needed to run it.
By comparison, a database management system (DBMS) is a storage technology with one or more client programs and application programming interfaces (API) that provide connectivity to it. They contain no inherent understanding of the data they store. Designing a CMS in this scenario requires that an institution begin building the data structure, then the customised client forms and reports to enable staff to manage their collection. Examples of DBMS’s used to manage collection data include: KE Texpress, Microsoft Access, MySQL, Oracle and SQL Server.
By comparison, a CMS such as Specify has more features than a DBMS. A CMS provides features specific to its area of specialisation, including:
The data structure into which a collection is stored;
Import from and export to known biodiversity informatics standards;
Data-aware reporting;
Validation required to properly manage collection data.
1 The GNU Project General Public License, http://www.gnu.org/copyleft/gpl.html.
Atlas of Living Australia Collections Project Report
6
Specify uses the MySQL DBMS as its data store in a large number of related tables2. Collection management systems in use in Australia include: BioLink, Biota, EMu, Specify and Vernon CMS.
The open source license also allows other developers to directly change the way their copy of Specify works, either to contribute fixes for program errors (i.e. “bugs”) or to implement features not yet available. Contributing this source code back to the managers of a project—in the hope that it becomes part of the official source code—is something that commonly occurs in the development of open source software. This would be beneficial to the Specify community.
There are, however, a number of complications associated with this collaboration pathway that we address in the sections “Evaluating Suitability” and “Barriers to Entry”.
National Herbarium of Victoria In 2010, the National Herbarium of Victoria at the Royal Botanic Gardens in Melbourne began the process of migrating its collection data to Specify from Knowledge Engineering’s Texpress. The Herbarium’s collection comprises 1.2 million plant, algae and fungi specimens from around the world, of which approximately 820,0003 are databased. The database is now maintained by two database administrators, 11 users with direct access and 14 others with query-only privileges. The migration was completed in the first quarter of 2011.
In preparing for the migration the National Herbarium of Victoria developed a business case as well as a migration plan. We were given permission to include these documents in the report, and they are provided as Appendix 1 and 2. The entire project was timed to complete prior to the International Botanical Congress in Melbourne, 23–30 July 2011. Documentation was developed by the team and made available to the public via their implementation blog at http://bit.ly/oKNt34 and also at http://bit.ly/qV3I3F.
Of the migration, the implementation team note that:
They each (Alison Vaughan and Niels Klazenga) spent 12 months at approximately 0.5 FTE completing the work;
Some tables in the Specify database schema are not accessible yet from the GUI, which precludes some staff from making use of them;
Some links between database tables and the GUI are not completed;
Some work-flow requirements may still require either direct MySQL access, or code to be developed outside Specify.
Alison and Niels have made themselves, their mapping, customised forms and other technical details available to others, particularly university collections. This was a major project to migrate from an existing system to Specify, which included a range of development, data cleaning and other tasks. In comparison, Gaia Resources4 have completed a much smaller project—a direct migration without cleaning—which took a much smaller amount of time, but is wrapped in similar cleaning processes that complicate even the smallest collection migration project. This work exemplifies the kind of effort needed to catalyse a self-sufficient Specify community in Australia.
2 See Specify Software’s online database schema at http://bit.ly/nLmxxq. 3A. Vaughan, pers. comm., 20 September, 2011. 4 Note also that one of the authors of this report, Piers Higgs, is the Director of Gaia Resources, and a Research Associate of the Western Australian Museum.
Atlas of Living Australia Collections Project Report
7
Atlas of Living Australia Collections Project Report
8
Determining the Demand
Gauging Interest in Specify This project began with an unfortunate implicit assumption that Specify was suitable for Australian collection institutions. It transpired that a number of institutions had performed a basic analysis of the available CMS and DBMS options and Specify had compared well. Unfortunately, the most comprehensive analysis, done by the Canadian Heritage Information Network (CHIN; see http://bit.ly/oOMPzT), did not include Specify.
Prior to the inception of this project, the Atlas—through John Tann—sponsored 40 people from 30 Australian biological collection institutions to attend workshops around Australia in 2010. Specify Software’s Andy Bentley was sponsored to present Specify at the workshops. Of the 30 institutions represented, 20 responded to a subsequent follow-up survey indicating their interest in Specify. Twelve institutions were wholly agreeable to implementing Specify and five more had caveats on their interest. A significant number of institutions did not respond to the survey, or indicated they were not interested in Specify.
Figure 1. Response to the mid-2010 post-workshop survey of attendees’ interest in Specify.
The reasons that workshop attendees gave for a favourable analysis of Specify included:
the source code is GNU GPL-licensed (i.e. open source) and is thus directly accessible to staff or contractors hired to fix bugs;
the lack of infrastructure funding available to some institutions preclude the purchase of more expensive options;
access to technical support isn’t deemed to be important;
the present system (ignoring its upgrade options) doesn’t handle modern database structures or character sets well;
the present system isn’t able to communicate with biodiversity web services tools.
Atlas of Living Australia Collections Project Report
9
The reasons that workshop attendees gave for an unfavourable analysis of Specify included:
it isn’t able to interact with one or more important pre-existing applications;
it isn’t able to store data on objects from a seed bank or cultural collection;
it isn’t able to connect to a pre-existing database platform;
support for the software isn’t very accessible to Australian institutions.
Interest from the Collections This section lists the collections that indicated an interest in Specify, the collection contact(s), the management tool being used and the status of the collection. As outlined in Table 1, below, these collections (plus the existing installation at the National Herbarium of Victoria) comprise over 17 million specimens, of which 2.75 million are currently shared through ALA.
Table 1. Collections with an interest in Specify, and some associated statistics.
Collection Estimated Collection Size
Collection Held Digitally
Records In ALA
Australian National Insect Collection 12,000,000 500,000 133,052
Australian National Wildlife Collection 200,000 119,723 115,073
Australian National Fish Collection 148,000 Unknown 29,970
Western Australian Department of Agriculture and Food 400,000 142,089 0
Western Australian Museum 1,386,600 Unknown 265,175
Curtin University Entomology 11,216 11,216 0
Western Australian Herbarium 729,500 729,500 961,668
Department of Primary Industries, Parks, Water & Environment, Tasmania
150,000 80,000 0
La Trobe University Herbarium, Melbourne 25,000 Unknown 0
University of Sydney Herbarium 71,503 Unknown 0
University of Melbourne Herbarium 100,000 9,000 0
Brisbane Botanic Gardens Unknown Unknown 0
Department of Environment and Natural Resources, South Australia
946,000 600,000 688,876
National Herbarium of Victoria 1,250,212 803,000 560,707
Totals 17,418,031 2,994,528 2,754,521
(Information from the ALA Collectory, as well as direct from collection contacts.)
Atlas of Living Australia Collections Project Report
10
Australian National Insect Collection (ANIC) CSIRO Ecosystem Sciences, Canberra Beth Mantle
The ANIC collection is managed in BioLink. The decision on whether to migrate to Specify is in the hands of CSIRO Information Management & Technology.
Australian National Wildlife Collection (ANWC) CSIRO Ecosystem Sciences, Canberra Margaret Cawsey
The ANWC collection is managed in Microsoft SQL Server, using Access as a client. The decision on whether to migrate to Specify is in the hands of CSIRO Information Management & Technology.
Australian National Fish Collection (ANFC) CSIRO Marine Sciences, Hobart Alastair Graham
The ANFC collection is managed in Texpress. ANFC has no plan to migrate to Specify because they can’t see a benefit that would outweigh the various issues that exist and resourcing required to migrate.
Entomology, Department of Agriculture and Food, Perth Rob Emery
The Entomology collection is managed in Microsoft Access. DAFWA has put the idea of migrating to Specify on hold because it lacks the resources to do the work in house. DAFWA Entomology doesn’t have a full-time database operator, and no development work has taken place on the database in about a decade.
Various Collections, Western Australian Museum, Perth Morgan Strong, Piers Higgs
The Museum manages a number of collections primarily in Microsoft Access, but also FileMaker Pro and Vernon. Some collections are not databased. WAM has commenced migration of several small collections into Specify. WAM has also begun evaluating Specify for use with cultural collections, by letting a tender which was won by Gaia Resources.
Barrow Island Project, Resource Management, Curtin University, Perth Jonathan Majer
The collection is managed in Biota. Curtin University has no plan to migrate to Specify unless they can get direct help to carry out the migration, as the resource previously available for this task has already been expended adopting Biota.
Western Australian Herbarium, Department of Environment and Conservation, Perth Ben Richardson
The collection is managed in KE Texpress. Some Specify evaluation meetings have taken place, but any plan to migrate is on hold until several crucial features are added to Specify, including Oracle support, external taxonomies and batch editing.
Atlas of Living Australia Collections Project Report
11
Entomology, Biosecurity and Plant Health Branch, Department of Primary Industries, Parks, Water & Environment, Launceston Guy Westmore
The collection is managed in BioLink. DPIPWE Entomology is still considering Specify, but resourcing to perform the actual migration is needed: “Our commitment to move to Specify was dependent on ALA’s offer of technical/financial support to help us.”
Herbarium, Department of Botany, La Trobe University, VIC Alison Kellow
The collection was until recently not databased. The herbarium has installed Specify and is now seeking funding for a technician to enter data. It is using the customised forms and taxonomy tree developed by Royal Botanic Gardens, Victoria.
John Ray Herbarium, School of Biological Sciences, The University of Sydney, NSW Murray Henwood
The collection is currently not databased. The herbarium is hoping to have a server-based instance of Specify running in 2011. Like La Trobe University, it is planning to use the taxonomy tree developed by Royal Botanic Gardens, Victoria.
Herbarium, School of Botany, University of Melbourne, VIC Gillian Brown
The collection of around 100,000 records is currently managed in a FileMaker Pro database designed by third year IT students. A project to migrate to Specify is on hold while other tasks are completed. The herbarium is receiving help from Niels Klazenga and Alison Vaughan at the National Herbarium of Victoria.
Seed Bank, Brisbane Botanic Gardens, QLD Philip Cameron
The collection is managed in Microsoft Access 97. The seed bank is waiting to see what the Australian Seed Bank Partnership project decides before making any decision.
Atlas of Living Australia Collections Project Report
12
State Herbarium of South Australia, Department of Environment and Natural Resources, Adelaide Stuart Pillman
The collection is managed in Texpress with links to Oracle. Oracle support is an almost guaranteed requirement, due to the number of pre-existing databases it already contains. May consider migrating anyway if the benefits outweigh the drawbacks.
Other Interest A number of other collections were either initially interested in Specify but chose not to take it any further, or indicated a general interest in the project with no current involvement. These included:
Australian National Herbarium (ANH)
The collection is currently managed in Oracle. ANH rejected Specify as an option because it felt Specify did not match the work-flow used in the Herbarium.
Forestry Tasmania
Interested in Specify 6 to manage their insect collection.
Western Australian Threatened Flora Seed Centre (WATFSC)
WATFSC was interested in Specify for seed-banking, but unfortunately there is insufficient ability for Specify to act as a seed-banking module for this to be taken up without major development, which is not feasible with their budget.
Training Workshops In mid-February, Piers and John Tann met with several Specify Software employees (Andy Bentley, Rod Spears, and Jim Beach). As a result of this meeting, we resolved to invite interested institutions to attend 3 training workshops. The workshops were targeted at 3 groups of people based on their experience, technical knowledge, and the kind of input they could make to the growth of a Specify community in Australia. The workshops proposed were:
Train the Expert, for those invitees who were/had:
Part of an Australian institution considering Specify;
Ability to answer a couple of queries about Specify from other institutions within a year of attending the course.
Train the Trainers, for those invitees who were/had:
Part of an Australian institution considering Specify;
The ability to run an in-house or other training course within a year of attending the course.
Train the Developer, for those invitees who were/had:
Part of an Australian institution considering Specify;
The ability to develop software in Java;
Agreed to submit a change to the Specify code-base within a year of attending the course.
There were nine respondents to this invitation, four from CSIRO, two from Gaia Resources, and one from each of AQIS, the University of Melbourne Herbarium and the University of Sydney Herbarium. All nine wanted to attend the Train the Experts course, six were interested in the Train
Atlas of Living Australia Collections Project Report
13
the Trainer course, and one in the Train the Developer course. Given this low level of interest, the decision was made to not proceed with this training in conjunction with the ALA.
We briefly considered developing Specify training materials as part of this project, but given that Gaia Resources was already liaising with Specify Software to obtain their training material and to work up additional materials, we decided not to pursue this avenue. This was part of the ongoing projects that Gaia Resources is undertaking for the Western Australian Museum.
Atlas of Living Australia Collections Project Report
14
Evaluating Suitability Given the limited time frame for this study, a “light” evaluation of how Specify could work for Australian institutions was conducted—a pilot or prototype software installation was not possible. This evaluation consisted of further investigation of the requirements of several institutions, including the WA Herbarium, WA Museum and the WA Threatened Flora Seed Centre. This was also combined with the results of our interviews with interested parties, some of whom also provided comment in the blog.
The pertinent findings from this review were:
1. A number of functions or features are missing from Specify that formed quite significant “barriers to entry” for some institutions, as are outlined in the next section,
2. Specify isn’t a capable seed bank collection management system, but Specify Software have indicated an interest in changing this5, and
3. Specify isn’t a capable cultural collection management system, but the Western Australian Museum have indicated an interest in changing this.
Also, a project seeking to use Specify must take note of a number of assumptions made by the software. Some examples include:
Specify expects numeric identifiers to be zero-filled integers. As this will be a consequence of migrating to Specify, it is crucial that institutions are aware that changing a database’s numeric identifiers often has a deleterious impact on connectivity with related systems.
Agents, Identifications and other parts of a specimen record are separate entities in the data model. Older systems may not have separated these, thus separating them becomes a necessary part of the migration and may significantly increase the workload associated with migrating to Specify.
The data model does not yet cater to the needs of seed banks or cultural collections.
Training Materials Specify Software does not have comprehensive, up-to-date training materials for other parties to provide training. Specify Software has several generic presentations, and some additional user help materials (such as on-line videos). Gaia Resources is now drafting its own training materials and will contribute these to the community.
5 “I would be interested in looking at use cases and requirements.” — Rod Spears (Specify Software), http://bit.ly/pgU968.
Atlas of Living Australia Collections Project Report
15
Barriers to Entry Aside from the already-identified threats to suitability, we found a range of barriers that complicate a successful migration to Specify. Those we discovered in the course of this project are detailed in the following sections.
Summary of Missing and Needed Features A number of the surveyed institutions indicated that Specify was missing one or more features they considered necessary for them to migrate their collection(s). It was also clear that some institutional contacts had done only a preliminary investigation of the suitability of Specify to their needs.
One of the aims we set ourselves for this project was to set out potential methods of support for the decision-making process within institutions. As part of this, we generated an article summarising the features missing and needed in Specify and published it in the project’s blog at http://bit.ly/otjNme. A link to the article (titled “Summary of Issues in Specify 6”) was then emailed to each of the contacts we’d developed previously. The intention of this article was to inform institutions of potential pitfalls should some begin a migration without a full analysis of Specify. Subsequently, Rod Spears (the lead developer of Specify) and Niels Klazenga (National Herbarium of Victoria) added some important comments to the article. Rod Spears’ replies made it clear that they had requested funding to resolve a number of these issues but had not received enough funding to do so. Specify Software’s prioritisation process has so far caused the features we identified to be put aside so that those with a higher priority (perhaps those more relevant to US institutions) could be implemented.
As a result of the information we received from institutions, we were able to compile a list of the features that were missing in Specify but that were considered a priority by at least one Australian institution.
To prioritise the features that were missing in Specify, we asked institutions to complete a questionnaire by email. Answers to the following question enabled us to rank the features using a first-past-the-post voting method:
“In relation to your collections database(s), which 3 software issues (of those available at http://bit.ly/otjNme, or others) do you consider the highest priority to be fixed?”
A chart summarising the 11 responses is presented in Figure 2.
Atlas of Living Australia Collections Project Report
16
Figure 2. Priority Specify features as determined by Australian institutions.
Atlas of Living Australia Collections Project Report
17
Collaboration Pathways There are two main collaboration pathways—through Specify Software, or through collaboration with other Australian developers.
Specify Software The Specify Software project is funded by the Biology stream of the US National Science Foundation (NSF). This arrangement imposes some constraints on what a grant recipient may do with the money, including:
the NSF Biology stream does not encourage Specify Software to directly add or develop non-biological features6;
Specify Software receives no explicit support from NSF to support non-US collections7. With 300 US and around 75 international institutions, the time it takes an Australian institution to receive support is likely to be reduced from that it may already receive from an Australian company;
features that could be implemented are being put on hold because the level of funding being attracted to the project is insufficient.
Generally, while Specify is supportive of international collaboration, there are problems with this pathway for collaboration.
In an email communication with Piers Higgs, Jim Beach of Specify Software noted that Gaia Resources could not sub-contract Specify to undertake work due to complications with their core US NSF funding. While Specify remain in favour of international collaboration, their participation in this would jeopardise their core funding. This directly affects the previously mentioned Western Australian Museum project to add support for cultural collections to Specify.
When asked for costings for the development of additional features, Specify Software noted in our communication with them that:
co-development, with programmers sitting in offices in Australia, is something they are interested in pursuing, as they already do this with various other projects;
every so often, 1–2 weeks of face-to-face time is useful for managing the project;
short-term programming contracts, i.e. less than 1 year, will be more difficult for them to manage.
Given the remaining time-frame and budget of the ALA, it was not possible to continue any further collaboration down this pathway.
Other Australian Developers Gaia Resources was the only group that was interested in the “Train the Developer” courses that were offered earlier in this project. This may be interpreted as a general lack of willingness to
6 “..support biological collections only with [..] funding [from NSF]”—Jim Beach (Specify Software), pers. comm., 17 August, 2011. 7 “We also receive no explicit support from NSF for supporting non-U.S. collections...” Jim Beach (Specify Software), pers. comm., 17 August, 2011.
Atlas of Living Australia Collections Project Report
18
undertake development in the Australian community, although it should be noted this call was only put out to those who attended the workshops.
Gaia Resources were the successful tendering organisation with the Western Australian Museum to extend Specify for cultural collections, and as part of this project will be developing significant understanding and knowledge of Specify, and would be prepared to also work with other institutions and other developers to build a knowledge base in Australia.
However, developing additional Specify expertise and providing additional services to the community is not something that Gaia Resources would do without a budget and resources to undertake this work.
Atlas of Living Australia Collections Project Report
19
A Way Forward for the Atlas of Living Australia Given the remaining funding for this project, we have identified two possible ways forward for this project in the current ALA funding cycle, namely:
1. Continued support for the blog, and
2. Potentially a small trial of targeted support.
We have also identified three main ways for this project to continue, should the ALA receive additional funding and be recommissioned at the end of the current funding cycle. These are:
1. Development of missing features,
2. Targeted support to interested parties, and
3. Other potential sources of support.
Details of these are provided in the sections below.
Continued Support for the Blog Given the funding available to this project, one of the steps that is available is to provide continued support for the blog (http://alacollections.wordpress.com/), where the authors of this report can write and solicit articles from the community. Other groups (notably CSIRO’s IM&T group, and Gaia Resources) will continue to move ahead with new implementations of Specify, and the experiences of these groups could provide additional material for the blog, and continue to engage those interested in Specify in Australia. This would be a minor activity associated with the authors’ ongoing work in the Australian collections community.
Targeted Support Trial Should the remaining funding be adequate in this project, then it may be possible to undertake a small targeted support trial for an institution. This would involve finding a small institution that is willing to move to Specify, and to undertake an installation of Specify for them. A proposed project plan for this could include:
Ensuring that the institution is aware of the limitations of Specify, and the bigger picture for the Specify community,
Installing Specify in the organisation,
Undertaking a review of the existing data in preparation for a move to Specify, and working with the organisation to ensure that the data is brought across to Specify,
Importing the existing data to Specify,
Ensuring that the staff at the organisation have been trained in the use of Specify, and have available support for the rest of the ALA project (or as agreed).
As a rough estimate, Gaia Resources indicated that to move a small, denormalised database for the Western Australian Museum from a Microsoft Access database through to Specify took approximately two weeks of 1 FTE. This did not include data cleaning, training (including the need to develop training materials) or other processes, such as those undertaken by the National Herbarium of Victoria, which would add significant time to the project. A rough timeline of three months from start to finish is envisaged for the eventual completion of this project.
A good candidate for this trial would be a small collection based around an Access database that is not already funded separately, and has indicated an interest in moving to Specify.
Atlas of Living Australia Collections Project Report
20
Development of Missing Features As detailed in section “Barriers to Entry” Australian institutions would like a number of features added to Specify before they consider it to be ready for their collections.
Once it was clear that Specify was of interest to Australian institutions, we sought to cost some of the most popular items listed in our “Summary of Missing and Needed Specify Features” in Appendix 3. The items and an approximate cost in development time are provided below.
Oracle Support
Resources: 6 months for 1 programmer, 3 months for 2. Employing further programmers would probably not result in a quicker outcome.
Batch Editing
Resources: none, Specify Software are working on this now, and it should be delivered by early 2012.
External Taxonomies
Resources: 6 months full-time for an experienced developer; completed in a manner suitable for use by other name services, such as ALA’s National Species Lists (NSL) project.
It was agreed that this feature request would be limited to the automatic synchronisation of the taxon tree in Specify with that of an external application. The determinations assigned to individual specimens would not be changed automatically, as changing the determination of one or more specimens is commonly a human decision, or is an event that forms part of a specimen management work-flow.
An external taxonomy source would need to be implemented as a web service, in the absence of a push messaging architecture, and Specify would interact with this source using a polling methodology. It would check for changes and then provide the user with some interactive controls for accepting or rejecting the changes as entries into their taxon tree.
Record Import Limits
Resources: 3–4 months full-time for an experienced developer.
“OR” Queries
Resources: Possibly 2 months.
It was not possible to properly define the work to be done for this item, given the lack of understanding of Specify in Australia. It is possible in Specify to perform a logical “OR” query within fields now using the “IN” operator in the Query tool. It is likely that institutions were interested in querying between two or more fields. Specify Software feels they might be able to make this work with the “ANY” operator, but a clearer understanding of the feature request is needed from our end before Specify Software can firm up the quote here.
As a means of summarising this, a table outlining these development tasks is included below, with an approximate cost for a developer included to provide a financial estimate as well as a time estimate. The included hourly rate of $100 is an attempted balance between senior and junior rates, and is chosen as an approximation of the actual cost; this is not a quoted figure from Specify or any other institution.
Atlas of Living Australia Collections Project Report
21
Table 2. Estimated cost of development tasks.
Task Estimated Timeframe (1 FTE)
Estimated cost ($100/hr)
Oracle Support 6 months $96,000
Batch Editing Already under development
$0
External Taxonomies 6 months $96,000
Record Import Limits 4 months $64,000
“OR” Queries 2 months $32,000
TOTALS 18 months $288,000
Targeted ALA Support From the outset, it was the opinion of many of the interested parties that ALA should directly support the implementation of Specify in various institutions. However, it was clear that the funding being made available was not going to make this possible. Also, the size and nature of the suggested collection databases, whether the data in the collection was supported by Specify, the time-line for ALA project itself, and the availability of developers meant that in many cases direct migration of institutional collection databases would be difficult or impossible. As a direct example, seed bank and cultural collections are not yet supported by Specify, so should not be considered for this funding.
To make it possible to support the community with a limited budget, criteria could be used to rank institutions by their need for ALA support. We do not have the funding to undertake any actions resulting from such a prioritisation, so we recommend that this prioritisation is done as a first step should this be taken forward into a new funded project.
Firstly, it is useful to separate the collections into several broad groups:
Group 1: Medium-large, well-maintained and supported collections. Funding is commonly found within the institution to maintain the content and the structure of the database. More than 5 staff edit the database daily, and its content is available via the web.
Group 2: Small- or medium-sized collections with little or no technical support. An off-the-shelf package such as Microsoft Access or FileMaker Pro is used rather than an enterprise-grade DBMS. The database structure or controlled vocabularies may not have been upgraded in several years. The database administrator, curator and/or technical officer are in some cases the one person.
These groups will have different needs for ALA support:
Group 1: An interest in plug-in development to enhance the collection’s interaction directly with the community, such as Annotations Support.
Group 2: An interest in having someone migrate the entire collection into Specify and be involved in training workshops to help current staff make the most from the new environment.
Criteria that would usefully separate institutions include:
size of collection;
size of databased portion of collection;
years since the database structure and/or controlled vocabulary was last changed;
number of staff directly involved in the maintenance of the collection data;
Atlas of Living Australia Collections Project Report
22
whether an upgrade to the database software in use is available but has not yet been applied;
whether the collection is a seed bank or is in some part cultural, as Specify is not capable of managing these collections.
The aim of any criteria would be to separate the well-managed collection databases from those in desperate need of support. It may be the case, however, that the more poorly-managed databases need not only to be upgraded, but to become a part of the wider institution’s ICT policy.
The required funding for this approach would be dependent upon the individual institutions. It would likely also be able to be determined from the “Targeted Support Trial” project outlined above.
In regards to actual implementations—if the ALA was to receive additional funding, and ALA partners deemed additional support for Specify (and/or other CMS’s) was a priority—the ALA could provide additional support for overall implementation. In general, drawing upon the experience of the National Herbarium of Victoria, any CMS project would include:
Business case establishment;
Project Plan – including time and resources for:
Requirements gathering;
Gap analysis – between requirements and CMS capability;
Data Analysis – including data migration plan;
Migration plan – including training, actual switchover planning;
Maintenance plan;
Initial Pilot;
Conduct final implementation and switchover;
Project review.
Other Potential Methods of Support
Data Schema and Migration Assistance
A new funded project, could assist with the migration process without performing the migration directly. This support would only be beneficial to those institutions with an in-house technical resource capable of performing the migration. The project staff might explain how to best map a database structure to that of Specify, for example. We made a start in the blog on this, providing an initial posting on mapping columns, see http://bit.ly/oqosac. Similar work with other standards are being undertaken by other staff within the ALA on tasks such as seed-banking and mobilising data from collection institutions, and this may be a logical extension.
Strategic Roadmap for Australian Research Infrastructure
Senator the Hon Kim Carr, Minister for Innovation, Industry, Science and Research recently announced the release of the 2011 Strategic Roadmap for Australian Research Infrastructure. This roadmap notes that “a Digitisation Infrastructure capability will be implemented by assembling state-of-the-art digitisation technology and expertise to provide high-throughput digitisation services to the Australian research community to achieve priority research outcomes”. This may encourage or provide means to access future funding for digitisation efforts, but is not a source of funding itself. The Roadmap can be found at http://www.innovation.gov.au/Science/ResearchInfrastructure/Pages/default.aspx.
Atlas of Living Australia Collections Project Report
23
Appendices
Atlas of Living Australia Collections Project Report
Appendix 1. Business Case developed by the National Herbarium of Victoria
C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc 12/6/2011
Futureproofing MELISR Toward a robust and standardscompliant herbarium management system
for the National Herbarium of Victoria
Executive summary MELISR is the collections database of the National Herbarium of Victoria. MELISR has not undergone any major development in the last 8+ years, in order not to interrupt data entry for Australia’s Virtual Herbarium (AVH). Consequently, there are now many outstanding issues that need a major redevelopment effort to resolve. These include unreliability of the back‐up system, issues with data capture and data integrity, and issues with data delivery to AVH.
In order for MELISR to optimally fulfill the business needs of the Plant Sciences and Biodiversity Division (PS&B), MELISR also will need to incorporate the loans and exchange database and the MEL herbarium census, as well as efficiently interface with nomenclatural or taxonomic databases such as VicList, APNI and the Australian Plant Census (APC), and be able to communicate with mapping software such as ArcGIS. As not all these objectives can be met under the current database management system, KE Texpress, other database options have been investigated.
It is concluded that the Royal Botanic Gardens (RBG) does not currently have the expertise to develop its own herbarium management system. In any case, this option would take the greatest development effort and lead to the largest loss of productivity during development, while there are pre‐built collections management systems available, some of them at little or no cost.
KE EMu is the successor of Texpress and still uses Texpress as its back‐end database, and therefore would represent a logical upgrade path. However, it is very expensive and has a history of implementation problems at other herbaria. As EMu is designed for all kinds of museum collections, customisation cost also would be maximal.
BRAHMS is especially developed for herbarium management, is free, and would probably require the least customisation to accommodate the MELISR data structure. However, BRAHMS uses a soon to be phased out database management system, and it is unclear how BRAHMS will develop after that. Because of the rather weak back‐end database there are issues with robustness, scalability and extensibility.
Specify is developed for natural history collections, including herbaria and natural history musea. Specify has a highly structured, standards‐compliant data model and was found to be usable and robust. It is also the only system that is completely open source and hence can be adapted to our future needs. Implementation of Specify requires only minimal changes to existing Information Services (IS) infrastructure. It is recommended that the RBG adopts Specify as its new herbarium management system.
Converting to a new collections management system will involve significant commitment of time and resources to retrain existing staff who use MELISR. In order to minimise loss of productivity during the implementation phase it is recommended the new herbarium management system be implemented prior to initiating any further large‐scale data capture projects, such as databasing the foreign collection or undertaking a specimen imaging program. On a broader scale, it is a good time to upgrade our database system as TDWG standards are now at a mature stage and are being adopted globally.
1
C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc 12/6/2011
Background
The MELISR database supports the primary business of PS&B. MELISR has been running on the Texpress DBMS since its inception in the early 1990s. The Texpress DBMS (produced by KE software) is simple and efficient and at the time was adopted by most Australian herbaria. However, with increasing requirements for data sharing between herbaria and the emergence of global biodiversity data standards, the limitations of the Texpress DBMS are becoming apparent.
The MELISR database design was largely borrowed from other database systems in existence (namely CANB and NSW) leading to some parts being less than optimal for MEL’s requirements. Despite these shortcomings, MELISR has not had any major development work done for the last 8+ years in order to minimise disruption to the Australia’s Virtual Herbarium (AVH) databasing project. There are many outstanding issues with the database that will need a considerable development investment to resolve. It is therefore prudent to consider whether this effort would not be better directed toward upgrading MELISR to another database platform.
The MELISR database is a specimen database only; its simplistic structure cannot incorporate all the specimen‐related or name‐related information and curatorial tools used in the herbarium. In addition to MELISR, the herbarium maintains the Loans and Exchange database (an MS Access database), the Census database (MS Access), the VicList database (MS Access), a table of scheduled taxa (Texpress and MySQL) and an Authors of taxonomic names database (MS Access). The limited querying capability of Texpress, and the inability of Texpress to directly interface with the software required to present MEL’s specimen data over the internet, have made it necessary to establish a duplicate of MELISR using the open source MySQL database. This is an inefficient solution that requires extra room on the server, resources to keep the two databases in synch and duplication of effort when implementing changes to MELISR.
An important application of MELISR data is the production of distribution maps. The inability of Texpress to interoperate with GIS software makes mapping MELISR data very laborious.
The label printing program within Texpress is primitive, and cannot be tailored to meet our needs without engaging programming assistance from KE Software. As there is often a conflict between the requirement to capture data and the requirement to print data, the inability to customise label printing in MELISR results in certain data (e.g. quarantine messages or acknowledgements of funding support) needing to be edited in to or out of records either before or after printing. A more sophisticated, and easily‐customisable, printing system would allow the requirements for curatorial and specimen‐related label data to be managed more consistently and more efficiently.
The reporting program in Texpress also leaves much to be desired. New reports within MELISR need to be individually programmed, and several curatorial reporting requirements are handled by other programs (such as the MySQL copy of MELISR, the Census database and the Loans and Exchange database) due to the deficiencies of the Texpress system. Loan information is only recorded in MELISR records for the duration of the loan, making it impossible to track the loan history of a specimen. The absence of loan history data for individual specimens means that the MEL collections cannot be accurately audited to meet reporting requirements (such as those of the bi‐annual AVH Board report).
Our past experience with the Texpress system has demonstrated that it lacks robustness, making it susceptible to data loss. In March 2006, a low‐level hardware error resulted in the loss of several records. The cause of the problem was difficult to pinpoint and remedy, and resulted in a three week disruption to data entry and retrieval at a time when ten staff were employed specifically to undertake data entry. As well as resulting in significant loss of productivity, this problem highlighted the inadequacy of the Texpress backup system; not all the lost records could be recovered and restored.
2
C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc 12/6/2011
In addition to the technical deficiencies of the system, the current data structure is too simplistic to allow for accurate capture of taxonomic information, determination history and geographical information, which reduces the utility of our specimen database and compromises the integrity of our data. Texpress imposes constraints on the way that data can be structured, with all information for each object (or specimen) recorded in the one database record. This creates single records containing large numbers of fields, with the same information entered numerous times across the different records, resulting in inefficient data entry. As well as improving the way that specimen data is recorded, the database needs to be able to track data validation efforts and allow for better documentation of changes to records. Appendix A lists some of the improvements that need to be made to existing fields in MELISR, as well as suggested new fields that would improve the searchability and standards‐compliance of the database. Many of these suggestions have been requested from staff and external clients. The key areas requiring improvement are expanded upon below.
One critical area that requires improvement is the way that taxonomic information is entered and stored in MELISR. The flat data structure of the Texpress system means that taxonomic names – which should ideally stand alone – cannot be separated from the specimen‐related data (such as determination annotations and hybrid information) associated with individual records. Individual components of names are currently entered by the database operator from look‐up tables. While this partially restricts the content of the taxonomic name fields, the tables are easily edited by any user with data entry privileges, so there is much scope for errors in data entry and inconsistencies in the application of names. The inadequacy of MELISR in dealing with uncertain determinations and cultivar, hybrid and informal names, combined with the inability to adequately restrict the content of the taxonomy fields, reduces the quality and reliability of the name data associated with our database records.
The current approach to recording taxonomic names makes it necessary to maintain a separate list of names that reflects the content of MEL’s collection (the Census database). This represents a considerable duplication of effort, which could be eliminated by using a comprehensive herbarium management system based on an authoritative list of names, rather than a collection of separate databases each with their own name lists. A major benefit of having MELISR linked to such lists is that it would allow searching by synonyms, which is a powerful tool both for specimen curation and for data interrogation.
As well as improving the way that current names are recorded in MELISR, it would be valuable from both a taxonomic and curatorial point of view to record the determination history of a specimen. Currently, there is no facility in MELISR for recording original determinations and subsequent re‐determinations. Although additional fields could be added to the new system, they would suffer the same shortcomings as the existing taxonomic name fields, thus determination histories would be better handled by a relational database with an underlying taxonomy table.
Another major weakness of the current MELISR database is that specimen records from the core collection (the specimen component of the State Botanical Collection) cannot be distinguished from records from non‐core collections such as the Victorian Reference Set, the Horticultural Reference Set and the Victorian Conservation Seedbank. It is important that these collections are kept separate on the database so that data‐retrieval for loan enquiries, electronic data requests and specimen retrieval reflects the location and accessibility of the specimens. Database records from the Victorian Conservation Seedbank may not be associated with a vouchered specimen, which undermines the integrity of MELISR as a collections database based on verifiable specimens. While it makes sense to use the same database to record data for these distinct collections, the structure of the Texpress system is too simple to reflect the different purpose, location and accessibility issues associated with these collections.
3
C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc 12/6/2011
The shortcomings with the current version of MELISR outlined above could be overcome by employing a comprehensive, relational herbarium management system that encompasses a specimen database, loans and exchange database, scheduled taxa database and herbarium census. The taxonomic tables that would be the basis of the herbarium management system could also encompass the VicList database, thus reducing duplication of effort in taxonomic name management at the RBG and streamlining curatorial practices associated with keeping VicList data up to date.
While many of the desired improvements to MELISR require a more sophisticated database management system, some improvements to the handling of primary collecting data could be implemented in the existing Texpress system. However, given the considerable shortcomings of the existing system and the effort required to replicate any changes in the duplicate version of MELISR, this development effort would be more judiciously applied to incorporating MELISR into a new herbarium management system, rather than adding further workarounds to the existing database.
Options
1. Keep existing Texpress system The Texpress application currently used for MELISR is no longer widely used and is likely to become obsolete. The only technical support available is from KE software and this is unlikely to be available in the future.
The standard data exchange protocols used to link in to global biodiversity initiatives (such as BioCASE and TAPIR) cannot interface directly with Texpress. Keeping the existing Texpress system for our Collections database puts the RBG at risk of not being able to participate in emerging biodiversity information initiatives such as GBIF, ALA and EOL and poses a risk to the organisation’s reputation as a quality data custodian.
The loss of corporate knowledge associated with using a near‐obsolete system is a great risk to the organisation. The administration and maintenance of the Texpress system requires a large amount of specialised skills and experience that cannot be easily replaced. The archaic nature of Texpress means that skills developed in any other modern database system will not be transferable.
There may also be financial ramifications for the RBG if we persist with Texpress, given the potential for licensing and support costs to increase due to the small number of users. Texpress development costs are also very high, as has been experienced in the past, and these costs are likely to rise in the future. The RBG currently pays $6300 per annum for 25 user licenses. Although this cost could be reduced to $4800 for 15 licenses, the cost of purchasing new licenses when needed is much higher than the amount saved this way. Any redevelopment in Texpress will require purchasing the TexAPI package at a cost of $18,000 plus $2315 per annum for licenses.
2. Change to a custombuilt collections management system One option is to develop a custom collections management system. The greatest benefit of this option is that it would be specifically tailored to our needs. Along with this benefit come risks associated with using a system that is not widely used, and thus doesn’t have a global community of users and developers. The development of such a system would be expensive in terms of staff time, and would require that staff time is taken away from other duties.
At the implementation stage, there is the risk of exceeding the anticipated development cost and the cost of data migration from Texpress to the new system. Also, the temporary loss of productivity during the changeover period would be greatest with this option. All future support and development would most likely have to come from within the organisation. The RBG do not have the expertise to develop a custom‐built front end in‐house.
4
C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc 12/6/2011
3. Change to existing collections management system The third option is to transfer MELISR to an existing collections management system. This is potentially an expensive option, but the cost of different collections management systems varies enormously. The major benefit of this option is that it will allow us to select a proven system that has a global community of users and developers. It will also be less expensive in terms of staff time than developing a custom‐built system.
Risks associated with this option include high development/customisation, migration, and training costs. These costs can be minimised by choosing a system that most closely meets our needs, choosing a database management system we are already familiar with, and choosing a system with a user‐friendly interface. There is also the risk of data loss during migration, which may be reduced by keeping good back‐ups and by rigorous testing of the data model of the new application. We also need to minimise the risk that the new software will not be supported in the future and make sure the database can be modified and extended to meet future business needs of the RBG.
3.1. KE EMu KE EMu is a Windows‐based collections management application, designed for all kinds of musea, that uses Texpress as the back‐end database. While this is a powerful system and represents a logical upgrade path, it has serious limitations. It is very expensive and has a history of implementation problems with regard to herbaria and botanic gardens (e.g. NSW and BM). It is also closed source, meaning that any customisations will have to be performed by KE Software and will be costly.
Table 1. Features of KE EMu
Operating system
Front‐end MS Windows
Back‐end Linux
Open source KE EMu is completely closed source; any customisation will have to be performed by KE Software.
Back‐end database KE EMu uses KE Texpress as its back‐end database. This is the same database MELISR currently uses, but with some extensions.
User‐friendly interface Yes
Customisability KE EMu is fully customisable, but this customisability comes at a cost as it has to be carried out by KE Software. As EMu is designed for use by all kinds of musea customisation needed will be more than in the other applications considered.
Scalability EMu scales well.
Extensibility EMu is fairly self‐contained; extension is possible, but will have to be custom‐designed by KE.
Interoperability Native interoperability is poor
Support Support is probably good, but expensive. One would expect some support will come with the licenses. However, a large part of the problems that other herbaria have had with EMu is likely to have been caused by communication problems between herbarium people and application developers.
Startup costs $130,000 (25 licenses plus initial customisation)
5
C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc 12/6/2011
Ongoing costs $120,000 pa (25 licenses)
url http://www.kesoftware.com/content/view/512/356/lang,en/
3.2. BRAHMS BRAHMS is a specialised herbarium collections management system developed by the Oxford University Herbaria and used by many herbaria all over the world, for instance the National Herbarium of The Netherlands (L, WAG). BRAHMS was found to be rich in features, but lacking in usability, robustness and extensibility. BRAHMS has just undergone a major redevelopment, but is still built around the same DBMS, Microsoft Visual FoxPro, which is not an enterprise database. BRAHMS will only run on Microsoft Windows platforms.
Table 2. Features of BRAHMS
Operating system
Front‐end MS Windows
Back‐end MS Windows
Open source BRAHMS is closed source
Back‐end database BRAHMS uses Microsoft Visual Foxpro. FoxPro is a legacy DBMS no longer actively developed by Microsoft and will not be supported after 2014.
User‐friendly interface No
Customisability BRAHMS has limited customisability. However, the application is specifically designed for herbaria, so only little customisation will be necessary to accommodate the MELISR data model.
Scalability BRAHMS scales poorly, mostly due to its rather weak back‐end database.
Extensibility BRAHMS is fairly self‐contained. However, it is very feature‐rich and is specifically designed for herbarium management, so the data model should be sufficient to accommodate all necessary fields at least for the near future.
Interoperability BRAHMS has built‐in operability with some other applications, such as ArcView and DIVA‐GIS. The file format in which it saves its data (.DBF) can be read by some Windows applications. An extension for online publishing is available. The National Herbarium of the Netherlands is a member of EDIT and BioCASE and delivers data to GBIF, so dynamic data delivery through a BioCASE provider must be possible.
Support Support for BRAHMS is provided by the BRAHMS Project at the Oxford University Herbaria. As herbarium taxonomists are involved in the project, there should be no communication problems. A support contract costs $US600 pa.
Startup costs –
Ongoing costs $US600 pa for support (optional)
url http://dps.plants.ox.ac.uk/bol/home/default.aspx
6
C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc 12/6/2011
3.3. Specify Specify (University of Kansas) is designed for herbaria and zoological musea and was found to be usable and robust. The Specify software is currently being completely rewritten and will be released as open source. The new version, Specify6, is built in Java using Hibernate to abstract the database layer and will therefore run on many database management systems, including MySQL, PostgreSQL, and Microsoft SQL Server, which all already run on RBG servers (MySQL is used for the web interface of all botanical databases and for the AVH interface, as well as advanced querying, of MELISR; MySource Matrix, the Content Management System for the RBG website uses PostgreSQL; Hummingbird, the records keeping software uses MS SQL Server). Specify6 will be released 27 February 2009. Specify has a world‐wide user community and is currently used by 112 institutes, including 34 herbaria. Development of Specify has been supported by the US National Science Foundation for the last twenty years. Judging from the proceedings of the 2008 TDWG annual conference, Specify is very much at the forefront of collections management systems.
Specify has a strongly structured data model that is DarwinCore compliant (GBIF uses DarwinCore) and therefore most likely also ABCD compliant. Specify6 contains 138 tables and 1658 fields. While some fields in MELISR that are specific to MEL will not be already in the Specify data model, there are several blank fields which can be used for these. Given that the database layer is abstracted from the front end and the database management systems that can serve as back end are very powerful, we expect excellent scalability and extensibility, as well as interoperability with other applications (e.g. GIS, electronic flora, image storage).
Specify optionally comes with a fully customisable (using CSS only) web interface. The Specify front end, which includes all the forms and reports (including labels), is fully customisable, without requiring programming. If in future we want to make changes to the data model, the associated changes in the front end would require programming in Java. However, given the growing user community we expect that changes in the data model necessitated by outside factors would be taken care of in new minor versions of Specify.
7
C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc 12/6/2011
Table 3. Features of Specify
Operating system
Front‐end MS Windows, Mac OS 10 or Linux
Back‐end MS Windows, Mac OS 10 or Linux
Open source Specify is open source. Source code for everything except the label printing application can be obtained from the developers. Specify6 is entirely written in Java, which means that in future we will actually be able to make changes to the source code if necessary.
Back‐end database Because the front‐end user interface is separate from the back‐end database, Specify6 can use most of the major database management systems, such as MySQL, PostgreSQL, MS SQL Server or Oracle. Specify6 by default uses MySQL.
User‐friendly interface Yes
Customisability Specify 6’s Graphic User Interface is entirely customisable, with the possibility to choose fields, change the format or type of fields and even change field names (similar to forms in MS Access).
Scalability Because of the very powerful back‐end database systems Specify6 will scale very well.
Extensibility Specify has good extensibility. While extension of the data model at the back end is easy and only requires knowledge of SQL, the associated changes in the front end require more knowledge of Java than is currently available at the RBG. However, the Specify data model is very rich, with many customisable fields, so should easily be able to accommodate all MELISR fields at least in the near future.
Interoperability Specify comes with a web‐interface (which we may not use) and a DiGIR provider (which we definitely will not use). MySQL interoperates very well with PHP for dynamic web applications and, through the MySQL ODBC, with MS Access and ArcGIS.
Support Specify is free and offers free support to registered users. While priority support is given to US institutes, Specify is happy to provide support to non‐US institutes as resources allow. Part of the support is migration of data into Specify. With Specify5.2 there was a waiting time of 2–3 months between registration and migration. We expect waiting times to be longer once Specify6 is released, as existing Specify users will need to have their data migrated as well.
Startup costs –
Ongoing costs –
url http://www.specifysoftware.org/Specify; http://specify6.specifysoftware.org/ (temporary Specify6 website)
8
C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc 12/6/2011
Recommendation Given the Texpress system is nearing obsolescence and changing to a custom‐built system is an inefficient and expensive option, we recommend changing to an existing collections management system. Based on our comparison of existing collections management systems we recommend upgrading to Specify. We finally recommend a new collections database management system be implemented prior to initiating any further large‐scale data capture projects, such as databasing the foreign collection or imaging of herbarium specimens.
Of the three collections management systems considered KE EMu was discarded as an option early, because of its high implementation, customisation and licensing cost and because of the problems other herbaria have experienced with it. BRAHMS was found to be feature‐rich and, because it was designed especially for herbaria, BRAHMS’ data model is currently probably the most compatible with the structure of MELISR. However, we are concerned about the back‐end database management system BRAHMS uses, and that because of its limited scalability and extensibility, we will not be able to adapt BRAHMS to meet the RBG’s future needs.
Specify has the ability to employ a very powerful database management system and can therefore make use of the DBMS’ back‐up and security facilities. It has a highly structured data model that can include most MELISR fields as is, and all fields after some modification. Specify comes with a very user‐friendly, fully customisable front end and with extensions such as a web interface and DiGIR provider. The worldwide diverse user and development community guarantees that Specify will adapt to future needs better than the other systems. Also from an infrastructure perspective Specify fits best, as all required infrastructure is already in place at the RBG (nevertheless a more detailed analysis of infrastructure requirements will be part of the project planning).
We would like to emphasise that while upgrading to a new collection management system is urgent in order to safeguard the quality and integrity of our collections data and the RBG’s reputation as a quality data custodian, the process is not going to be painless. The implementation of a new collection management system will require a large time commitment, especially from the Programmer, Information Services and Collections Information Officer, and will affect all MELISR users. In order to ensure data integrity it will not be possible to run the old and new systems concurrently and therefore MELISR will not be accessible for a period of time during the implementation phase. Also the loans and exchange administration system will not be accessible during this period as it will be included in the new collections management system.
The temporary loss of productivity during the implementation period will be more than made up for by improved efficiency and increased productivity once the new herbarium management system has been implemented. On a broader scale, this is a good time to upgrade our collections database system, as international data standards have come of age and only minor changes are expected in the near future.
Appendix B describes an implementation roadmap that aims to ensure the implementation period is as short as possible and to minimise loss of productivity during this period.
9
C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc 12/6/2011
Table 5. Summary of features of the different herbarium management systems considered. The current KE Texpress collections management system is included for comparison.
KE Texpress KE EMu BRAHMS Specify
Operating system
Front‐end Linux shell emulator
Windows Windows Windows, Linux or Mac OS X
Back‐end Linux Linux Windows Windows, Linux or Unix
Open source No No No Yes
Back‐end database Texpress Texpress Microsoft Visual Foxpro
MySQL, PostgreSQL, MS SQL Server, Oracle or any other server‐side DBMS
User‐friendly interface No Yes No Yes
Customisability Poor Good, but very expensive
Good Good
Scalability Poor Good Poor Excellent
Extensibility Poor Good, but very expensive
Poor Good
Interoperability Poor Poor, but with ample inbuilt functionality
Good Good
Support Very limited1 Good Good Good
Startup costs N/A + $18,000 (TexAPI)
$130,000 –
–
Ongoing costs $6,300 pa + $2,315 (TexAPI)
$102,000 pa $US600 pa2
1 Support for Texpress is very limited as Texpress as a stand‐alone application is being phased out and replaced by EMu.
2 $US600 pa is for support, there are no licensing costs.
10
C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc 12/6/2011
Appendix A — Summary of suggested improvements to MELISR
Field/s Suggested improvements
Taxonomy fields Link specimen names to an authoritative table of taxonomic names. Information related to individual specimens (determination annotations, qualifiers and hybrid information) to be stored with individual records, rather than with names.
Improve handling of higher level taxonomies, particularly for fungi and algae
Determinations Include original determination and subsequent determination history.
Collector/Add. coll. Create separate fields for recording verbatim label data and standardised data, which would link to an authoritative list of collectors.
Collecting date Add a memo date field to record non‐standard collecting dates, e.g. 13‐17 June; late March; Spring; Christmas etc.
Additional collector Add new fields to record collecting numbers of additional collectors.
Geocode Allow for recording of geocode as originally provided (DMS or decimal).
Add new fields for recording AMG references, and enable autoconversion of AMG to geocode.
Add a new field for recording error measure when provided by collector.
Improve handling of geocode source data.
Cultivated data Improve handling of locality data for cultivated records. Currently, provenance data is entered in the Notes field, and cultivating locality details are entered in the locality fields (minus geocode). Need to record both cultivated and provenance locality data in a way that allows them to be queried and mapped (or excluded from queries or mapping) on request.
Add new fields to cater for Plant Occurence and Status Scheme values.
Unit relationship field Add new fields to record the range of relationships between herbarium sheets. Currently, the only way of recording a relationship between one or more herbarium sheets is to multisheet them. Need to convert the multisheet field to a unit relationship field that allows for other types of relationships to be recorded (e.g. cultivated seedling and wild‐collected parent plant; uncertain links between foreign specimens).
Duplicates/Specimen Received from/Original herbarium (for images)
Apply a restricted vocabulary to these fields to prevent the inclusion of non‐standard entries.
Protologue Separate the publication title and the page and date citation into two distinct fields so that publication title can be linked to an authoritative list of names (e.g. BPH/TL‐2) to avoid incorrect and inconsistent entries.
Precision Check that our precision code values are sensible and add a built‐in guide to help data entry personnel to use them correctly.
Depth Stop decimal places from being automatically appended to values in this
11
C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc 12/6/2011
field as it gives an inaccurate impression of the precision of the measurement.
Notes Divide this field into two (or more) categories to allow the collector’s notes to be recorded separately from annotations on the specimen and curatorial or explanatory notes added by the database operator.
Managed habitats Improve plant occurrence fields to better document specimens collected from managed habitats, e.g. those that have been self‐established in a Botanic gardens context, and should not be treated either as wild‐collected or as cultivated.
Validation level Add new fields to represent the level of validation of a database record (includes identification, geocode, distribution, predictive distribution).
Images Differentiate between images of the sheet and images of plant in its habitat etc. Also need to be able to record when we have produced a digital image of a specimen to send to another institution.
Add a field to record file paths or URLs for digital images.
Vic. Ref. Set Improve flagging of Vic. Ref. Set specimens. Vic. Ref. Set specimens are currently listed as duplicates, despite the Vic. Ref. Set not being an official, accessible collection. It would be better to flag these records in a different way than duplicate specimens are flagged.
Type status determination
Move type status determination data. This information is currently stored in Extra Info., but would be better stored as part of the determination history, with type status of the determination recorded.
Verbatim label field Add a new field to record verbatim label data for foreign‐language labels.
Allow for unicode characters to be captured so that foreign‐language data can be recorded more accurately (not possible in Texpress).
Original language field Add a new field to record the original language that a label is written in (if non‐English). This will allow ease of searching in the event that we want to query for language (e.g. for batch translation of labels).
Global gazetteer Link to a global gazetteer for ease of geocoding foreign collections. e.g. GEONet Names Server files
Library catalogue no. Add fields to enter call numbers for additional information stored in the library, e.g. letters, photos, colour transparencies etc.
Quarantine notes Add a new field to enter quarantine notes that are not printed on any labels or exported, and are only used by curation staff. Currently, these messages must be deleted prior to labels being printed, then re‐entered into the record.
Destructive sampling Add a new field to record when material has been removed for destructive sampling.
Ethnobotanical information
Add a new field to flag the presence of ethnobotanical data associated with a record.
Indigenous name Add a new field to record indigenous plant names when provided by the collector.
12
C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc 12/6/2011
Appendix B — Implementation roadmap
The major stages required to implement this project are outlined below:
Project planning
Registration with Specify;
Development of a detailed implementation plan, by September 2009.
Comprehensive needs analysis
Consultation with MELISR users regarding the current MELISR database structure to determine what fields need changing, and what new elements are required;
Mapping of fields in MELISR against the HISPID5 (ABCD) standard to ensure compliance;
Preparation of draft MELISR data entry manual.
Data preparation
Mapping of MELISR fields against Specify data model;
Performance of major quality assurance work on non‐compliant fields in MELISR to make data migration to Specify as smooth as possible.
Implementation and testing
Installation of Specify on MEL server and work stations;
Data migration;
Comprehensive testing of the new system and revision of the MELISR data entry manual.
Training
Training of MELISR users in the use of the new system. This will need to be undertaken incrementally, starting with those staff whose work is most reliant on the database.
Configuration of provider software
Configuration of TAPIR and BioCASE providers for data delivery to AVH, the ALA and the GBIF.
13
C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc 12/6/2011
14
Appendix C — Glossary
ABCD Access to Biological Collections Data – a comprehensive TDWG standard for access to and exchange of primary biodiversity data
ALA Atlas of Living Australia – a project funded under the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS) to develop a biodiversity data management system for Australia’s biological knowledge
ArcGIS a group of geographic information system (GIS) software product lines produced by ESRI
AVH Australia’s Virtual Herbarium – an online botanical information resource that provides access to data associated with scientific plant specimens in Australia’s major herbaria
BioCASE Biological Collections Access Service for Europe – a transnational network of European biological data providers
BioCASE provider A data exchange protocol developed by BioCASE. The BioCASE provider abstracts data from a database and turns it into standard format, such as ABCD or DarwinCore. MEL and AD (Adelaide) use the BioCASE provider to deliver data to AVH. Other similar protocols are TAPIR and DiGIR.
BM Acronym of The Natural History Museum, London (British Museum)
BRAHMS Botanical Research And Herbarium Management System
CSS Cascading Style Sheets – a stylesheet language used to apply formatting to web pages
customisable able to be customised to our particular needs, without the need for extensive programming
DarwinCore a standard designed to facilitate the exchange of information about the geographic occurrence of species and the existence of specimens in collections
DBMS database management system – examples of database management systems are: MS Access, MySQL, MS SQL Server
DiGIR Distributed Generic Information Retrieval – a client/server protocol for retrieving information from distributed resources
EMu Electronic Museum – collections management software developed by KE Software
EoL Encylopedia of Life – a project to create an online reference source and database for the 1.8 million named and known species on earth
extensible able to be extended and adapted: an extensible database allows for expansion of the data structure, i.e. additional tables and fields
future proofing the selection of physical media and data formats that best ensure the continued accessibility of data into the future. This process involves anticipating future developments and ensuring that only well‐documented formats, standards and specifications are used to store and describe data.
GBIF Global Biodiversity Information Facility – an international organisation that focuses on making scientific data on biodiversity available via the Internet using web services. The data are provided by many institutions from around the world; GBIF's information architecture makes these data accessible and
C:\Documents and Settings\benr\My Documents\ALA Specify\MEL\BusinessCase.doc 12/6/2011
15
searchable through a single portal. Data available through the GBIF portal are primarily distribution data on plants, animals, fungi, and microbes for the world, and scientific names data.
GIS geographic information system – captures, stores, analyses, manages, and presents data that refers to or is linked to location. In a more generic sense, GIS applications are tools that allow users to create interactive queries (user created searches), analyse spatial information, edit data and maps, and present the results of all these operations.
HISPID Herbarium Information Standards and Protocols for the Interchange of Data – a standard format for the interchange of electronic herbarium specimen information, initially developed by the Australian herbaria and later adopted as a TDWG standard. The current version – HISPID5 – is ABCD compliant.
Java an object‐oriented programming language
MEL acronym of The National Herbarium of Victoria
MELISR MEL Information System Register – the National Herbarium of Victoria’s specimen database
Microsoft SQL Server a relational database management system, used as the back‐end database of Specify5
MySource Matrix an open source content management system (CMS) written in PHP, used for the new RBG website
MySQL a powerful, open source, relational database management system
NSW The National Herbarium of New South Wales
PostgreSQL on object‐relational database management system
PS&B Plant Sciences and Biodiversity Division, Royal Botanic Gardens (Melbourne)
RBG Royal Botanic Gardens
roadmap a plan for change execution
robust able to withstand pressures or changes in procedure or circumstance
scalable able to handle growth without having to replace the existing platform or architecture
Specify research software application, database and network interface for biological collections information
TAPIR TDWG Access Protocol for Information Retrieval – a computer protocol designed for the discovery, search and retrieval of distributed data over the internet
TDWG Taxonomic Database Working Group (also referred to as Biodiversity Information Standards)
Texpress an object‐oriented multi‐user database management system developed by KE Software
usability the efficiency with which a user can perform tasks in a given application
VicList Census of the Vascular Plants of Victoria, an up‐to‐date list of the species and infraspecific taxa of vascular plants occurring in Victoria
Atlas of Living Australia Collections Project Report
Appendix 2. Royal Botanic Gardens MELISR migration, project implementation plan
Royal Botanic Gardens Melbourne
MELISR migration
Project implementation plan Prepared by Alison Vaughan and Niels Klazenga, 4 December 2009
Contents 1. Project definition ............................................................................................................................ 1
2. Objectives...................................................................................................................................... 1
3. Scope ............................................................................................................................................ 1
4. Deliverables................................................................................................................................... 2
5. Stakeholders.................................................................................................................................. 2
6. Roles and responsibilities .............................................................................................................. 3
7. Timeframes.................................................................................................................................... 3
8. Resources ..................................................................................................................................... 3
9. Implementation plan ...................................................................................................................... 3
10. Risk management.......................................................................................................................... 9
11. Stakeholder management strategy .............................................................................................. 11
Appendix A. Work breakdown schedule............................................................................................... 13
Appendix B. Project schedule .............................................................................................................. 15
Appendix C. Test cases ....................................................................................................................... 17
i
ii
1. Project definition
This implementation plan describes the migration of MELISR into a new collections management system.
MELISR is the collections database of the National Herbarium of Victoria (MEL), and is a crucial component of the collections management and research activities of the Royal Botanic Gardens Melbourne (RBG). MELISR is currently implemented on the KE Texpress database management system (DBMS). There are a number of outstanding issues with the Texpress system, including the unreliability of the back-up system, issues with data capture and data integrity, and inability to interface with other key business systems. Furthermore, Texpress is no longer being developed or adequately supported. The MELISR Business Case recommended that Specify be adopted as the new collections management system for MEL.
2. Objectives
The objectives of the MELISR migration project are:
to improve storage and retrieval of specimen information
to improve robustness and extensibility of the collections database
to reduce duplication of effort arising from maintaining multiple databases relating to the same information
to streamline delivery of specimen data to Australia’s Virtual Herbarium (AVH)
to allow MELISR to interface effectively with other taxonomic and nomenclatural databases.
3. Scope A key benefit of shifting to Specify is its greater extensibility. Specify can be more easily customised than Texpress, which means that we can add new fields to MELISR to allow for more accurate capture of specimen data. However, while new fields will be added, populating these fields retrospectively will require a greater investment of time and effort than is within the scope of this project. The only new fields that will be populated are those where the data can be easily extracted from the merging or separation of existing fields, or from associated databases. The tasks that are in and out of scope in this project are outlined below.
In scope:
migrating MELISR data from Texpress to Specify
cleaning MELISR data to improve data quality and aid migration
storing taxonomic names, personal names and locality information in separate tables and fields
improving handling of higher level taxonomy
adding determination history fields
improving recording of collectors (verbatim label data and interpreted label data)
adding fields to allow for more accurate recording of spatial data
improving recording of plant occurrence status
allowing for subsets of the collection to be appropriately flagged
1
applying restricted vocabularies where feasible
improving recording of geocode precision
ensuring fields comply with HISPID standards
adding verification level fields
improving the capacity to record data about images associated with specimens
linking MELISR to global gazetteers
adding ethnobotanical information fields
adding a field to record the indigenous name of a plant
integrating herbarium census and loans and exchange databases with MELISR collections database.
Out of scope:
parsing ‘Notes’ data into ‘Collectors notes’ and ‘Other notes’ fields
parsing habitat, substrate etc. fields
populating the ‘Language’ field
populating all instances of ‘Verbatim collector’s name’
populating determination history fields
populating plant occurrence status for all records
populating the ‘Ethnobotanical information’ field
populating the ‘Indigenous name’ field
customisation of the Specify database model and interface other than options already available in the application.
The Atlas of Living Australia (ALA) is looking into Specify to replace BioNet, which is currently used as the collection management system by many entomological collections. If the ALA decides to support Specify, changes to the Specify data model that may make Specify more suitable to deal with botanical data may get in scope.
4. Deliverables The deliverables include:
a fully functioning herbarium management system
comprehensive user manual
staff training.
5. Stakeholders
The MELISR database supports the primary business of the Plant Sciences and Biodiversity Division (PS&B). The implementation of a new collection management system will require a large time commitment, especially from the Programmer, Information Technology and Collections Information Officer, and will affect all MELISR users.
Currently, five Collections Branch staff undertake data entry, data cleaning and other curation tasks in MELISR on a daily basis. A further two Collections staff use MELISR at least twice a week, and one Collections volunteer uses MELISR on a weekly basis. Six Plant Sciences staff regularly perform queries and data entry in MELISR, and several more have read-only accounts
2
for data querying. Outside PS&B, the online interface of MELISR is used by staff in both RBG Melbourne and RBG Cranbourne.
In order to ensure data integrity, it will not be possible to run the old and new systems concurrently. Consequently, MELISR will be unavailable for use during part of the implementation phase. The loans and exchange administration system will likewise be unavailable during this period.
Although the migration of MELISR to Specify will cause disruption to staff in the short term, this temporary loss of productivity will be offset by the increased efficiency and reliability of the new system. The project implementation plan has been carefully structured to minimise disruption to database users.
6. Roles and responsibilities Project managers
Niels Klazenga (Programmer Information Technology, Biodiversity Information Officer)
Alison Vaughan (Collections Information Officer)
Responsible for project planning and quality assurance.
Project team Niels Klazenga (Programmer)
Alison Vaughan (Collections Information Officer)
Ed Jarrett (IT Project Officer)
Responsible for project implementation, testing and training.
Reference group David Cantrill (Chief Botanist and Director, PS&B)
Sabine Glissman-Gough (Manager, IS)
Pina Milne (Manager, Collections)
Catherine Gallagher (Co-ordinator, Curation)
7. Timeframes
This project is due for completion by 30 June 2011. It is critical that any disruption to access to MELISR does not coincide with preparations for the International Botanical Congress (IBC), which will be held in Melbourne in July 2011. If unforeseen delays in the implementation schedule occur, parts of the implementation plan may need to be held over until after the IBC.
8. Resources The migration of MELISR from Texpress to Specify will require a large time commitment from the Programmer, Information Technology and the Collections Information Officer. All infrastructure required to run Specify is already in place at the RBG. The Specify software is free of charge, thus the MELISR migration project will be budget neutral.
9. Implementation plan
The phases of the implementation plan are outlined below. This information is also represented as a work breakdown schedule (WBS, Appendix A) that details the timeframes required for each phase and outlines which tasks in the plan are dependent upon the completion of other tasks. A
3
Gantt chart (Appendix B) provides an overview of the project schedule by month. For an up-to-date project schedule, see S:\PS&B\MELISR\MELISR redevelopment\MELISR project schedule.xls.
1 Installation and configuration
1.1 Install and configure Specify on RBG network (Niels Klazenga)
Specify will be installed and configured on the RBG network.
2 Testing and customisation
2.1 Testing
Specify will be thoroughly tested prior to data migration. Testing the functioning and capabilities of the new system at this stage will help inform the user needs analysis, customisation requirements and user acceptance testing.
2.1.1 Create test data set of 2000 records (Alison Vaughan)
A set of 2000 records will be extracted from MELISR to be used for the testing and customisation of Specify. All fields in the Texpress implementation of MELISR will be represented in the test data set to ensure that all data storage requirements are accounted for when mapping on the new data model. Two thousand records is the maximum that can be loaded into Specify using the Workbench.
2.1.2 Clean test data set (Niels Klazenga & Alison Vaughan)
The taxon name, collector, additional collectors, determiner, confirmer, country and state fields in the test data set will be cleaned and normalised to allow this data to be correctly migrated into Specify.
2.1.3 Map test data on Specify data model (Niels Klazenga & Alison Vaughan)
The MELISR fields will be mapped on the Specify data model, using the test data set.
2.1.4 Upload test data set into Specify (Niels Klazenga & Alison Vaughan)
The test data set will be uploaded in Specify, using the Specify Workbench and the data mapping resulting from 2.1.3.
2.1.5 Test (Niels Klazenga & Alison Vaughan)
Specify will be tested using the test cases listed in Appendix C. Additional cases will be added during the test phase and user acceptance testing (2.5).
2.1.6 Refine mapping (Niels Klazenga & Alison Vaughan)
The mapping provided in 2.1.3 will be evaluated using the test data set, and refined as necessary.
2.2 User needs analysis
The migration of MELISR from Texpress to Specify provides an opportunity to improve the way that specimen data is recorded in MELISR, and to add additional functionality. A comprehensive user needs analysis will be undertaken to ensure that the stakeholder needs are met wherever possible.
2.2.1 Consult MELISR users regarding new and altered fields (Alison Vaughan & PS&B staff)
The project managers will consult MELISR users in PS&B to determine which fields need changing, how to best implement proposed new fields, and what restrictions should be placed on the use of existing and additional fields.
2.2.2 Determine loans and exchange database requirements (Alison Vaughan, Niels Klazenga, Pina Milne & Catherine Gallagher)
The project managers will consult the Manager, Collections and the Co-ordinator, Curation to determine the database requirements of the loans and exchange program, and to ascertain whether all loans and exchange requirements can be met by Specify.
4
2.3 Customisation
As Specify was developed primarily for use by entomological collections, some customisation will be required to optimise it for use with herbarium collections. For instance, it is likely that Specify will need to be customised to cater for those elements of HISPID that are not represented in ABCD. Specify may also need to be customised to deal with curatorial fields in MELISR specific to MEL.
2.3.1 Customise forms (Niels Klazenga & Alison Vaughan)
The data entry forms in Specify will be customised to meet in-house databasing requirements and to ensure databasing can be undertaken efficiently and intuitively.
2.3.2 Customise labels (Niels Klazenga & Alison Vaughan)
The specimen labels in Specify will be customised to be as consistent with Texpress-generated labels as possible. Annotation labels will also be programmed.
2.3.3 Customise reports (Niels Klazenga & Alison Vaughan)
The standard quality control, auditing and collections management reports used in Texpress will be replicated in Specify or MySQL, and existing Specify reports will be customised to reflect the organisation and composition of MEL’s collection.
2.4 Peripherals
Specify will be configured for use with the RBG’s barcode scanners and printers.
2.4.1 Test barcode readers and resolve any problems (Niels Klazenga, Alison Vaughan & Ed Jarrett)
MEL accession numbers consists of three parts, which are stored as three fields in Texpress, but will be stored in one field in Specify. Because MEL’s barcode readers only read numerical data, there may be issues with the use of barcode readers with Specify that need to be resolved.
2.4.2 Configure print settings and test printing (Niels Klazenga & Ed Jarrett)
As Specify runs on the RBG workstation, configuring the printers will likely be straightforward.
2.5 User acceptance testing (UAT)
User acceptance testing (UAT) will be conducted prior to the migration of data to ensure that Specify meets user needs.
2.5.1 Develop UAT plan (Niels Klazenga & Alison Vaughan)
A UAT plan will be developed to ensure that user acceptance testing is comprehensive and that testing outcomes are properly documented.
2.5.2 UAT (Niels Klazenga, Alison Vaughan & PSB staff)
UAT will be carried out by a range of stakeholders from PS&B, including Collections staff who do the bulk of the data entry into MELISR.
2.5.3 Refine according to outcome of UAT (Niels Klazenga & Alison Vaughan)
If necessary, the customisation of Specify will be refined in response to the outcome of the UAT.
3 Data migration
3.1 Preparation
Because the Specify data structure is very different to the current flat structure of the MELISR database, we need to prepare the MELISR data for migration to Specify. This involves parsing some existing fields into new fields and data cleaning of selected fields. This phase of the project is expected to take the most time. In order to minimise disruption to database users, a snapshot of MELISR will be taken at the start, so staff can continue to use MELISR while the data cleaning is progressing.
5
3.1.1 Map new and existing fields on Specify data model (Niels Klazenga & Alison Vaughan)
The test dataset will be mapped on the Specify data model using the Workbench.
3.1.2 Data cleaning (Niels Klazenga & Alison Vaughan)
Data cleaning will be restricted to what is necessary to make the collections database standards-compliant and will focus on three areas: collector information, taxon names and geographical information. Parsing habitat and notes fields is out of the scope of the project.
3.2 Migration
Migration will be trialled on a copy of the MELISR database (see above). All SQL statements will be saved, so that the actual migration can take place as quickly as possible.
3.2.1 Create MySQL table with cleaned MELISR dataset (Niels Klazenga)
A single table with all cleaned MELISR data will be created. This table will be similar to the current MySQL duplicate of MELISR, but with the field structure resulting from steps 2.1.3 and 2.1.6.
3.2.2 Trace data mapping (Niels Klazenga)
The Specify data model is highly normalised and consists of a large number of different tables. The mapping of each MELISR field needs to be traced through the data model in order to put the right foreign keys in the right tables, so that all tables will be linked correctly.
3.2.3 Write and run SQL INSERT statements (Niels Klazenga)
SQL commands will be composed and executed.
3.2.4 Migration of MELISR data to Specify (Niels Klazenga)
Final data migration will take place after the necessary SQL has been written and trialled on a copy of MELISR. As records entered or changed in the Texpress implementation will not be migrated, MELISR will be unavailable during this phase. The herbarium census will likewise be unavailable.
4 Project finalisation
4.1 Installation
During testing and migration the Specify client program will be installed only on the Programmer’s and Collection Information Officer’s workstations. After successful migration Specify will be installed on all MELISR users’ workstations. A backup program will be installed on a different server.
4.1.1 Installation of Specify on workstations (Alison Vaughan, Ed Jarrett, Upul Molligoda & Niels Klazenga)
Specify will be installed on the workstations of all RBG staff and volunteers who require access to the collections database.
4.1.2 Install Specify data backup program on server (Niels Klazenga & Ed Jarrett)
A backup program that will make daily backups of the MELISR database will be installed on a different server, in order to ensure security of the database.
4.2 Training
Training staff in the use of Specify is a key aspect of the project; it is imperative that staff who use MELISR are well-supported during the transition from Texpress to Specify. Training will be provided in stages, starting with those staff whose work is most reliant on the database.
4.2.1 Revise MELISR data entry manual (Alison Vaughan)
The MELISR manual will be refined throughout the customisation and testing phases to ensure that all aspects of database use are covered. Data entry procedures will be
6
updated to reflect changes to the data structure and the addition of controlled vocabularies.
4.2.2 Train Collections staff (Alison Vaughan)
Staff in the Collections Branch are most reliant upon MELISR, and thus will be the first users to be trained in the use of Specify.
4.2.3 Train Plant Sciences staff and Collections volunteers (Alison Vaughan)
Of next priority in the training schedule are Plant Sciences staff and Collections volunteers who use MELISR for data entry, followed by those Plant Sciences staff who use MELISR for data querying only.
4.2.4 Train other MELISR users throughout the organisation (Alison Vaughan)
MELISR users from outside PS&B will be trained last.
4.3 Configure provider software
Currently MEL only delivers data to AVH, using the BioCASE provider software. This provider needs to be modified.
4.3.1 Configure TAPIR and BioCASE providers (Niels Klazenga)
The BioCASE provider will need to be reconfigured. As the underlying data model has changed, this involves a new table structure for the BioCASE table.
4.3.2 Organise update of MEL data in AVH (Niels Klazenga)
Because the new collections database has a different field structure and the data will be cleaned, all records will effectively be updated. An update of the MEL records already in the AVH cache through other means than the BioCASE provider needs to be arranged.
The relationship between the different activities in the project implementation plan is shown as a Project Evaluation and Review Technique (PERT) diagram (Fig. 1).
7
8
2.1.1 Create test data set of 2000 records and all fields in MELISR
2.1.3 Map MELISR fields on SPECIFY data model
2.1.4 Upload test data set into Specify using workbench
2.3.2 Customise labels
2.3.1 Customise forms
2.3 Customisation
2.4.1 Test barcode readers and resolve any problems
2.4.2 Configure print settings and test printers
2.4 Peripherals
1.1 Install and configure Specify on RBG network
2.1 Testing
2.1.2 Clean test data set
2.1.5 Test
2.2.2 Determine loans and exchange database requirements
2.2.1 Consult MELISR users regarding new and altered fields
2.2 User needs analysis
2.5.2 UAT
2.5.1 Develop UAT plan
2.5 User acceptance testing
2.5.3 Refine according to outcome of UAT
1 Installation and configuration
2 Testing and customisation
3.2.2 Trace data mapping on Specify data model
3.2.3 Write and run SQL INSERT statements
3.2.1 Create MySQL table with cleaned MELISR data
3.2 Migration
3.2.4 Migration of MELISR data to Specify
3.1.2 Data cleaning
3.1.1 Map new and existing fields on Specify data model
3.1 Preparation
2.3.3 Customise reports
3 Data migration
4
2.1.6 Refine mapping
Figure 1. PERT network depicting the sequence of activities
2.1.1 Create test data set of 2000 records and all fields in MELISR
2.1.3 Map MELISR fields on SPECIFY data model
2.1.4 Upload test data set into Specify using Workbench
2.3.2 Customise labels
2.3.1 Customise forms
2.3 Customisation
2.4.1 Test barcode readers and resolve any problems
2.4.2 Configure print settings and test printers
2.4 Peripherals
1.1 Install and configure Specify on RBG network
2.1 Testing
2.1.2 Clean test data set
2.1.5 Test
2.2.2 Determine loans and exchange database requirements
2.2.1 Consult MELISR users regarding new and altered fields
2.2 User needs analysis
2.5.2 UAT
2.5.1 Develop UAT plan
2.5 User acceptance testing
2.5.3 Refine according to outcome of UAT
1 Installation and configuration
2 Testing and customisation
3.2.2 Trace data mapping on Specify data model
3.2.3 Write and run SQL INSERT statements
3.2.1 Create MySQL table with cleaned MELISR data
3.2 Migration
3.2.4 Migration of MELISR data to Specify
3.1.2 Data cleaning
3.1.1 Map new and existing fields on Specify data model
3.1 Preparation
2.3.3 Customise reports
2.1.6 Refine mapping
3 Data migration
4
4.2.2 Train Collections staff
4.2.3 Train Plant Sciences staff and Collections volunteers
4.2.1 Revise MELISR data entry manual
4.2 Training
4.2.4 Train other MELISR users
4.2.2 Train Collections staff
4.2.3 Train Plant Sciences staff and Collections volunteers
4.2.1 Revise MELISR data entry manual
4.2 Training
4.2.4 Train other MELISR users
4.1.2 Install Specify data backup program on server
4.1.1 Installation of Specify on users’ workstations
4.1 Installation
4.1.2 Install Specify data backup program on server
4.1.1 Installation of Specify on users’ workstations
4.1 Installation
4 Project finalisation
4.1.2 Install Specify data backup program on server
4.1 Installation
4.3.2 Organise update of MEL data in AVH
4.3.1 Configure TAPIR and BioCASE providers
4.3 Configure provider software
4.3.2 Organise update of MEL data in AVH
4.3.1 Configure TAPIR and BioCASE providers
4.3 Configure provider software
1.11.1
1.11.1
22
2.5.32.5.3
2.5.32.5.3
2.5.32.5.3
3.23.2
3.23.2
3
Figure 1 (ctd).
10. Risk management The main risks associated with migrating MELISR to Specify are:
Specify not meeting the business needs of the herbarium
loss of data
delay in implementation
changes to scope
conflicting operational priorities
variable stakeholder expectations
lack of stakeholder participation.
The risks have been minimised by selecting a new system that closely meets our needs, using a DBMS that we are already familiar with, choosing a system with a user-friendly interface, and carrying out comprehensive project planning.
system that closely meets our needs, using a DBMS that we are already familiar with, choosing a system with a user-friendly interface, and carrying out comprehensive project planning.
9
Table 1. Risk management strategy
Risk Impact Probability Consequence Mitigation
Specify not meeting the business needs of the herbarium
The options for improving MELISR would need to be reassessed, which will lead to a delay in implementing MELISR on a new, robust system
Unlikely Moderate Undertake rigorous testing prior to mass data cleaning and migration to ensure Specify meets our needs before progressing with the most resource intensive phase of the project
Loss of data Reduction of the quality of the MEL collection data
Unlikely Major Reduce the risk of data loss during migration by keeping good back-ups and by rigorous testing of the Specify data model
Delay in implementation
Phases of the project might need to be held over until after the IBC; a delay in implementation would increase the risk of conflicting operational priorities
Likely Moderate There are four months between the scheduled completion of data migration and the IBC, which will allow for some slippage
Changes to scope
Implementation will be delayed
Likely Moderate Carefully plan and define scope at the start to minimise scope creep
Conflicting operational priorities
The project team would be unable to meet project deadlines, thereby delaying subsequent phases of the project
Likely Moderate The prioritisation of the MELISR migration implementation project at organisational level should minimise this risk
Variable stakeholder expectations
The new implementation might not meet the expectations of all stakeholders
Likely Moderate Consult stakeholders at various stages of the project to minimise the risk of stakeholder dissatisfaction
Lack of stakeholder participation
Missed opportunity for Specify to best meet the business needs the National Herbarium of Victoria
Unlikely Minor Encourage stakeholders to contribute to the user needs analysis and user acceptance testing
10
11. Stakeholder management strategy The success of the MELISR migration project is dependent upon the involvement and support of the project’s stakeholders. It is imperative that stakeholders’ interests in the project are identified and that they have the opportunity to contribute to the decision making processes relating to those interests. The stakeholder management strategy (Table 2) details the interests of, and input required from, different stakeholder groups. By maintaining good communication with stakeholders throughout the project, we can minimise resistance to change and promote greater investment in, and acceptance of, the project’s outcomes. Table 2. Stakeholder management strategy
Stakeholder Interest in project Input required Potential barriers
Engagement strategy
Curation Officers, Curation Co-ordinator
As the primary users of the database, the curation staff have a high level of interest in any changes to work processes resulting from the migration of MELISR to Specify.
input on changes to database fields and data entry requirements
input on changes to the loans and exchange database
feedback on data entry manual and database usability
resistance to change, and reluctance to learn new databasing procedures, especially if these are more complex than current practices
consult curation staff on proposed changes and invite feedback throughout the implementation process
provide clear justification for any changes made
provide clear instructions and training in the use of the new system
Botanists with data entry privileges, databasing volunteer
Many of the botanists at MEL database their own collections and, as such, have an interest in changes to databasing procedures.
input on changes to database fields and data entry requirements
feedback on data entry manual and database usability
reluctance to learn new databasing procedures, especially if these are more complex than current practices
consult staff on proposed changes and invite feedback throughout the implementation process
provide clear justification for any changes made
provide clear instructions and training in the use of the new system
Other MELISR users
Several RBG staff have read-only MELISR accounts, and will be affected by changes to querying the database.
input on possible improvements to query functionality
reluctance to learn new system
provide clear instructions and training in the use of the new system
11
12
Stakeholder Interest in project Input required Potential barriers
Engagement strategy
MELISR web query users
It is foreseeable that the migration of MELISR from Texpress to Specify will necessitate changes to the MELISR web query. Any such changes will need to be communicated to RBG staff who use the MELISR web query.
provide clear and timely information on any changes
provide training if required
IS IS will no longer have to maintain a separate database management system to accommodate MELISR.
provision of programming and technical support
conflicting operational priorities
Programmer to liaise with IS
provide regular project updates
HISCOM Many HISCOM members will be interested in the migration procedure because other herbaria might consider implementing Specify themselves.
input on standards compliance
provide report at end of migration process
invite members to view new system
Appendix A. Work breakdown schedule
Task Resource Dependent on
Start date Finish date
1 Installation and configuration
1.1 Install and configure Specify on RBG network NK - 1/07/2009 1/07/2009
2 Testing and customisation
2.1 Testing 15/10/2009 26/02/2010
2.1.1 Create test data set of 2000 records AV 1.1 15/10/2009 15/10/2009 2.1.2 Clean test data set NK & AV 2.1.1 12/11/2009 3/12/2009 2.1.3 Map test data on Specify data model NK & AV 2.1.2 4/01/2010 22/01/2010 2.1.4 Upload test data set into Specify NK & AV 2.1.3 27/01/2010 29/01/2010 2.1.5 Test NK & AV 2.1.4 1/02/2010 12/02/2010 2.1.6 Refine mapping NK & AV 2.1.5 15/02/2010 26/02/2010
2.2 User needs analysis 1/03/2010 28/05/2010
2.2.1 Consult MELISR users regarding new and altered fields AV & PSB - 1/03/2010 28/05/2010 2.2.2 Determine loans and exchange database requirements AV, NK, CG
& PM - 1/03/2010 5/03/2010
2.3 Customisation 1/03/2010 2/07/2010
2.3.1 Customise forms NK & AV 2.1 12/04/2010 14/05/2010 2.3.2 Customise labels NK & AV 2.1 12/04/2010 14/05/2010 2.3.3 Customise reports NK & AV 2.1 12/04/2010 14/05/2010
2.4 Peripherals 1/03/2010 21/5/2010
2.4.1 Test barcode readers and resolve any problems NK, AV & EJ 2.1 1/03/2010 14/05/2010 2.4.2 Configure print settings and test printing NK & EJ 2.1, 2.3.2 17/05/2010 21/05/2010
2.5 User acceptance testing (UAT) 8/03/2010 2/07/2010
2.5.1 Develop UAT plan NK & AV 2.2 8/03/2010 26/03/2010 2.5.2 UAT NK, AV &
PSB 2.1–2.4, 2.5.1 17/05/2010 18/06/2010
2.5.3 Refine according to outcome of UAT NK & AV 2.5.2 21/06/2010 2/07/2010
3 Data migration
3.1 Preparation 5/07/2010 3/09/2010
3.1.1 Map new and existing fields on Specify data model NK & AV 2.1.3, 2.1.5 5/07/2010 3/09/2010 3.1.2 Data cleaning NK & AV - 5/07/2010 3/09/2010
3.2 Migration 6/09/2010 28/01/2011
3.2.1 Create MySQL table with cleaned MELISR data NK 3.1 6/09/2010 10/09/2010 3.2.2 Trace data mapping NK 3.1 6/09/2010 24/09/2010 3.2.3 Write and run SQL INSERT statements NK 3.2.2 27/09/2010 31/12/2010 3.2.4 Migration of MELISR data to Specify NK 3.2.3 20/12/2010 28/01/2011
13
14
Appendix A (ctd).
Task Resource Dependent on
Start date Finish date
4 Project finalisation
4.1 Installation 5/07/2010 18/03/2011
4.1.1 Installation of Specify on workstations NK, AV & EJ 1.1 7/02/2011 18/02/2011 4.1.2 Install Specify data backup program on server NK & EJ 1.1 7/02/2011 25/02/2011
4.2 Training 7/02/2011 18/03/2011
4.2.1 Revise MELISR data entry manual AV 2 5/07/2010 4/02/2011 4.2.2 Train Collections staff AV 2.5.3, 4.1 7/02/2011 4/03/2011 4.2.3 Train Plant Sciences staff and Collections volunteers AV 2.5.3, 4.1 14/02/2011 11/03/2011 4.2.4 Train other MELISR users throughout the organisation AV 2.5.3, 4.1 21/02/2011 18/03/2011
4.3 Configure provider software 28/02/2011 29/04/2011
4.3.1 Configure TAPIR and BioCASE providers NK 3.2, 4.1 28/02/2011 11/03/2011 4.3.2 Organise update of MEL data in AVH NK 3.2 14/03/2011 29/04/2011
Appendix B. Project schedule
PBS TASKS
1 Installation and configuration
1.1 Install and configure Specify on RBG network
2 Testing and customisation
2.1 Testing
2.1.1 Create test data set of 2000 records
2.1.2 Clean test data set
2.1.3 Map test data on Specify data model
2.1.4 Upload test data set into Specify
2.1.5 Evaluate and refine mapping
2.2 User needs analysis
2.2.1 Consult MELISR users regarding new and altered fields
2.2.2 Determine loans and exchange database requirements
2.3 Customisation
2.3.1 Customise forms
2.3.2 Customise labels
2.3.3 Customise reports
2.4 Peripherals
2.4.1 Test barcode readers and resolve any problems
2.4.2 Configure print settings and test printing
2.5 User acceptance testing (UAT)
2.5.1 Develop UAT plan2.5.2 UAT
2.5.3 Refine according to outcome of UAT
3 Data migration
3.1 Preparation
3.1.1 Map new and existing fields on Specify data model
3.1.2 Data cleaning
3.2 Migration
3.2.1 Create MySQL table with cleaned MELISR dataset
3.2.2 Trace data mapping
3.2.3 Write and run SQL INSERT statements
3.2.4 Migration of MELISR data to Specify
4 Project finalisation
4.1 Installation
4.1.1 Installation of Specify on workstations
4.1.2 Install Specify data backup program on server
4.2 Training
4.2.1 Revise MELISR data entry manual
4.2.2 Train Collections staff
4.2.3 Train Plant Sciences staff and Collections volunteers
4.2.4 Train other MELISR users throughout the organisation
4.3 Configure provider software
4.3.1 Configure TAPIR and BioCASE providers
4.3.2 Organise update of MEL data in AVH
Mar Apr2009 2010 2011
Nov Dec Jan FebJul Aug Sep OctMar Apr May JunNov Dec Jan FebJul Aug Sep Oct
15
17
Appendix C. Test cases taxon name tables can deal with botanical names
infraspecific rank
hybrid names and formulae
cultivated plant names
database must be able to store determination history
data entry forms are navigable by keyboard as well as mouse
data model must be able to accommodate all required fields
verbatim collector
verbatim foreign language label information
data structure must be able to store unit relationships
Atlas of Living Australia Collections Project Report
Appendix 3. Summary of Missing and Needed Specify Features
Atlas of Living Australia Collections Project Report
2
This appendix is a copy of http://bit.ly/otjNme (minus blog comments). Some items, namely Desktop Application, Annotation Data and Annotation Service have been added or updated.
Oracle Support Oracle and other corporate database environments are in regular use Australia-wide. The Information and Communications Technology (ICT) policy in some institutions forbids the use of open source database platforms for corporate data or there is pre-existing data in a platform such as Oracle and it makes sense to continue to use it, rather than adopt a new platform solely for Specify. It is thus critical to a number of institutions that a collections management tool support a corporate database environment.
Specify Software Project (Rod Spears):
“Porting to additional DBMS systems was originally part of our Specify 6 plan. In our most recent renewal we were cut down from 3 developers to 1 and are [fortunate] that Tim's funding [was] available from other sources. We no longer have the resources to do any DBMS porting ourselves. If an organization like yourself would like to do the port themselves we would be more than willing to support that effort.”
External Taxonomies Specify will import a taxonomy of names, but thereafter it can not be updated from that external source. It is critical that Specify be capable of synchronising its taxonomy tree with an external source so that the tool can be implemented alongside mature taxonomy applications already in place in Australian institutions.
Specify Software Project (Rod Spears):
“This has been in the plan from the beginning. We wanted to solve the more generic problem of providing external resources for:
“Taxonomic names
Agent names
Geography
Stratigraphy
“This is [currently] not on our development plan [due] to lack of resources.”
Atlas of Living Australia Collections Project Report
3
RBG Melbourne (Niels Klazenga):
“There are different forms of synchronisation between taxonomies. I would be happy with a form of synchronisation between the Specify taxon tree and, say, the National Species List where when a taxon name is added to the taxon tree, it can be checked against the NSL and the authorship and protologue can be imported from NSL. I have the feeling Specify can already do that. A form of synchronisation where the taxon tree in Specify is automatically updated when a change is made in an external database I will be happy to implement for our in-house databases (Ausmoss, Interactive Catalogue of Australian Fungi, Census of Vascular Plants of Victoria), but not for anything else.
“This form of synchronisation may be useful for institutes that have collections from a limited geographical area and a small taxonomic spectrum, but not for something like the National Herbarium of Victoria that has collections from all over the world from five different Kingdoms. We would either have to synchronise with many different external nomenclators, many of which do not even exist, or something like the Catalogue of Life (and who would want that?). Our Specify taxon tree also contains only names that are in our collections; if we would include all names in all taxonomic groups of which we hold collections and for all geographic regions, our taxon table would be bigger than our entire collections database is now. Also, there are many “names” on specimens in every institute that you will not find in any nomenclator.
“Synchronisation is pretty easily achieved through the back end and does not really require anything extra from Specify. It is the external nomenclator that needs to make its data available through a web service or something.
“As for agents, the desirability of having a collectors (or people of interest to Australian botany) database that is shared between herbaria is something that has come up in discussions on several of the CHAH-ALA collaboration themes, so we can make a strong case to CHAH that the Australian herbaria want to have something like that. Once there it will be very easy to synchronise with Specify or any other database that has a table with agents.
Atlas of Living Australia Collections Project Report
4
“Geography. This doesn’t change so fast, so I do not see a need for synchronisation. There is another problem with the geography tree in Specify. The geography tree in Specify comes with continents, ISO countries, states and counties. This ISO system does not make much sense in a natural history collections database. We had initially thrown out this geography tree and replaced it with TDWG-WGS2, but then we found out that the way GEOLocate is implemented in Specify it requires ISO countries to be in a single column in the geography tree. All ISO countries are in WGS2, but they are not all of the same rank. The WGS continents are used for storage at MEL, so now we have a geography tree which is somewhere in between ISO and WGS2. And it is not pretty. It would be much better if we could have a translation table between the geography tree and the GEOLocate plugin that translates any entry in the geography tree to an ISO country.”
Batch Editing (According to http://specifysoftware.org/content/specify-63-features-and-enhancements, batch editing of any number of records to a new determination is now possible in version 6.3.)
It is currently not possible to edit the same field in the database for more than one record at a time. This is a significant time and occupational health and safety feature that must be implemented. Competing tools handle this automatically, e.g. Texpress.
The National Herbarium of Victoria work around this by performing batch edits directly on the underlying MySQL database. They view this limitation as a feature, as it better protects the database. A potential side-effect of a well-normalised database structure is a reduced need for batch edits. A single Agent’s details can be changed in one place and the details are reused throughout the application. This is different from some collections databases.
Issue raised by: WA Herbarium.
Specify Software Project (Rod Spears):
“Our approach in Specify 6 will be different than in Specify 5. Specify 5's batch edit was essentially a search and replace for nearly every field in every table. As you can [imagine], this approach is very powerful, but at the same time, very dangerous for the novice user. Additionally, the average user was not very disciplined about making back ups.
“Specify 6's approach will involve the user 'exporting' the data to the WorkBench where it can be edited, or even [exported] once again to Excel for editing. Then the data is re-uploaded via the WorkBench [where] it can be validated. We will also be providing additional tools for specific clean up use cases, for example, duplicate Agent clean up.”
Batch editing is slated for release in March 2011. It includes a “batch re-identify” feature.
Atlas of Living Australia Collections Project Report
5
RBG Melbourne (Niels Klazenga):
“I like Rod’s approach. I would be very unhappy with Specify 5’s batch edit, which is basically Texpress’s batch edit or with a situation where every user with editing privileges on a table would also be able to batch edit. Uploading workbench datasets is a different privilege from editing a table, so this would work very well for us.
“At the moment more useful than batch editing would be the shortcuts we could have in Texpress, i.e. short combinations of characters that would be immediately translated into much longer text strings, similar to the autocorrect in MS Office.
Access to Support A number of people have already mentioned that email to support is not answered in a timely fashion. It does not inspire confidence that Australian institutions will have ready access to experts. There is certainly a need for a few Australian points of contact for expert Specify consulting.
Issue raised by: Peter Doherty (ALA); Piers Higgs (Gaia Resources).
RBG Melbourne (Niels Klazenga):
“I have erased us from the line above, as our issue was caused by their emails back to us bouncing.
“Specify is used by over 200 institutes world wide and Specify Software has only one helpdesk person. Moreover, their NSF funding only allows them to support US institutions.
“We need to create a Specify community in Australia where larger institutes with more resources and more experience support the smaller ones. We are not quite experts at Specify, but Alison and I are both happy to advise and discuss. We will also make our mapping, forms, reports and data entry manual available. We will not be able to make forms for everybody or migrate everybody’s data.
“People like Piers and Peter should contact Andy directly.”
Record Display Limits The maximum number of records displayed in the result of a search is set to 5,000 records. This is apparently a configurable limit, but it is inappropriate for very large datasets due to the potential for some records to never fall within the first 5,000 and thus inaccessible without knowledge of the content of those other records so that a special query can be devised to return them.
Issue raised by: Australian National Wildlife Collection (ANWC); Royal Botanic Gardens (RBG), Melbourne; WA Herbarium.
Specify Software Project (Rod Spears):
“[T]his was part of the original plan for Specify 6. Specify 5 did not have a limit. Specify 6 at the moment loads all the results into memory, thus the limit.
Atlas of Living Australia Collections Project Report
6
“Our plan is to do paging, meaning a set number of records on each page and then move page by page through the results. Or to have a moving 'window' though the results where only the 'window' plus or minus rows on each side [will] reside in memory. My guess is that we will probably use the paging approach.”
“We did not have time to fully complete this before shipping the first version of Specify6. (The selection of 5000 was completely arbitrary).”
Record Import Limits The maximum number of records imported at a time is limited to 2,000. This is an inappropriate limit for large collections seeking to migrate to Specify. It also indicates that the underlying software is perhaps not well implemented, as memory management is well handled by competing collections management tools.
Issue raised by: WA Herbarium.
Specify Software Project (Rod Spears):
“There is a preference that can be set to increase the limit beyond 2000. We have uploaded as many 10,000 rows at a time. The uploader was originally developed using Hibernate instead of straight SQL. The end result was portable DBMS code, but Hibernate is very memory intensive. So the limit is really the number of rows multiplied by the number columns, then factor in roughly how much data is in the rows/columns and finally how much memory is on the machine running Specify. We punted and arbitrarily choose 2000 rows.
“Also, as a point of history, the WorkBench was designed to [enable] collectors to easily upload field notebook information. The interesting thing is that it has been primarily used for migrating data into Specify.”
RBG Melbourne (Niels Klazenga):
“I suggest larger institutes like the WA Herbarium do not use the workbench for migrating data into Specify. Way too scary: you do not know what is happening. Also, not all fields can be imported through the workbench, so you’ll have to use SQL at some point anyway.”
Desktop Application Aside from the basic EZDB version 1 , Specify has a clear client-server design. In this design, the desktop application resides on a user’s PC (the “client”) and the database resides in a MySQL instance on a separate machine (the “server”). This is beneficial for multiple users as it avoids the problem of multiple copies of a database in multiple places, and makes it easier to backup the data outside staff business hours.
1 http://bit.ly/mOAVX9: “Specify EZDB eliminates the need to install and administer a MySQL server, making it well-suited for small collections and single-user databases.” Despite this, we
wouldn’t recommend using it in this manner for any institutional collection.
Atlas of Living Australia Collections Project Report
7
One problem with this approach is that the desktop application and the MySQL database need to be upgraded at the same time, as some versions of Specify make changes in the desktop application that are incompatible with an older database structure. A way to correct this is to use desktop virtualisation (e.g. Citrix) and run the client software in a virtual desktop for each staff member able to access the database. This virtual desktop can then be locked for update at the same time as the database, ICT support staff can install/upgrade the client from their location. This presumes that the staff member is prevented from updating the client software themselves. Alternatively, a web client can be used, if one exists, where the only desktop requirement is a web browser.
RBG Melbourne (Niels Klazenga):
“This should be remedied when the Specify web client comes out later this year. From what I can see from the sneak preview this web client will have all the features of the desktop client.”
Query Across Collections For institutions with multiple collections, it is critical that all collections be included in Specify searches.
Issue raised by: ANWC, RBG Melbourne.
RBG Melbourne (Niels Klazenga):
“We want this too! So far, the absence of this capability has prevented us to have more than one collection. I think this is somewhere on the road map.”
“OR” Queries Specify cannot do “OR” queries via its interface. This is a way to limit the search results to those of interest.
Issue raised by: ANWC, RBG Melbourne.
RBG Melbourne (Niels Klazenga):
“We want this too. Looking at how the query tool in Specify is set up, I have the feeling it may be quite hard to achieve.”
Seedbank Support Specify needs to support the data collected by seedbanks, as these institutions are often work closely with specimen collections and link their data to specimens. Some seedbanks have poorly performing databases with no link to the biodiversity network.
Issue raised by: Threatened Flora Seed Centre (WA).
RBG Melbourne (Niels Klazenga):
“It would be good to have a seedbank module in Specify that can deal with germination trials and all the other stuff seedbanks do. Some of it may already be possible. Have a look at the treatment events and conservator description and comments tables.”
Atlas of Living Australia Collections Project Report
8
Annotation Data There is apparently no support for attaching annotations to a specimen in Specify. In this sense, annotations are free-text notes associated with day-to-day curation of a collection, not a service associated with this data (see Annotation Service). An annotation can commonly be associated with a Collection Object, but also with other parts of the database, such as an Agent or a Taxon.
Annotation Service Once a collection is available to the public, it is common to receive email from collectors and others noting data errors. Within an institution, these notes are often appended to the database as an annotation (see Annotation Data). One source of this in future is ALA itself, as it currently makes an annotation feature available to site users.
A powerful addition to Specify would be a plug-in that was capable of receiving and processing annotations and presenting these to the database custodian within Specify. It would close the loop on corrections made from the cloud and enhance the efficiency of data cleaning and annotating tasks.
Rod Spears has noted that “Specify can easily interact with a set of Web Services to 'pull' annotations. We are currently working with the 'Filtered Push' grant in the States which will communicate annotations. Our Specify 7 grant proposal had such a feature, but we were not funded in such a way to implement the new features in the grant.”