+ All Categories
Home > Documents > Technical Annexe for Project 3: - Defra, UK - Science...

Technical Annexe for Project 3: - Defra, UK - Science...

Date post: 11-Mar-2018
Category:
Upload: vanngoc
View: 214 times
Download: 0 times
Share this document with a friend
88
Technical Annexe for Project 3: Developing online verification of biodiversity records Final version 2-4 at 3rd February 2015 Services to develop and pilot on-line validation and verification of biodiversity records (Project 3) A project for Natural England with funding from Defra. Project led by the Biological Records Centre at the Centre for Ecology & Hydrology. Project partners: Botanical Society of the British Isles, Butterfly Conservation, Marine Biological Association, Thames Valley Environmental Records Centre, Worcestershire Biological Records Centre. Report by Martin C. Harvey and David B. Roy, Biological Records Centre page 1 of 88
Transcript
Page 1: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Technical Annexe for Project 3:

Developing online verification of biodiversity records

Final version 2-4 at 3rd February 2015

Services to develop and pilot on-line validation and verification of biodiversity records (Project 3)

A project for Natural England with funding from Defra.Project led by the Biological Records Centre at the Centre for Ecology & Hydrology.

Project partners: Botanical Society of the British Isles, Butterfly Conservation, Marine Biological Association, Thames Valley Environmental Records Centre, Worcestershire Biological Records Centre.

Report by Martin C. Harvey and David B. Roy, Biological Records Centre

page 1 of 58

Page 2: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Summary The objective of this project was to explore the practical issues around providing local and national data

and experts to validate and verify records in a single community warehouse, in order to inform the decisions that need to be made if a transition is to be made from the current dispersed database and pioneering on-line systems, to a simplified dataflow system and a more unified approach to recording, verification and use.

We worked with five main project partners (Botanical Society of the British Isles, Butterfly Conservation, Marine Biological Association, Thames Valley Environmental Records Centre, Worcestershire Biological Records Centre) to explore the issues arising from using an online, central data warehouse (CDW) model for verifying biological records. We also consulted more widely via an online questionnaire with over 100 representatives from local records centres (LRCs), national recording schemes and societies (NSS) and wildlife recording projects.

The work was done in two phases, the first in February–March 2013, the second in April 2013–March 2014. With the help of project partners, we imported new datasets in the CDW, recruited verifiers to make use of the online verification tools, provided an introduction to the project and training in how to use the online tools via webinars and documentation, consulted with partners (phase 1) and both partners and the wider biological recording community (phase 2) via questionnaires and phone interviews, in order to gain the views of a stakeholders across the community.

About 100,000 new records were imported to the CDW specifically for this project, combing with other incoming data from a variety of sources. At February 2014 the CDW held approx. 840,000 species records, of which 17% (over 140,000) had been formally verified.

Results from analysis of the data in the CDW, and from responses to the consultation process, are summarised and presented under eight headings:o 3.1 Analysis of progress with online data verification o 3.2 Analysis of benefits to users o 3.3 Analysis of verification and validation efficiency o 3.4 Analysis of use of NBN verification tools o 3.5 Analysis of communications between verifiers and recorders o 3.6 Analysis of data sharing efficiency o 3.7 Analysis of views on sharing unverified data o 3.8 Analysis of verifiers’ support requirements

Further detail from the consultation, and data for the supporting metrics, is provided in appendix 1 and appendix 2 respectively.

The responses highlighted the huge amount of variety within the schemes, centres and individuals that make up the ‘biological recording community’. Consequently there are large variations in the availability of time, expertise and software resources, and in the degree of priority given to exchanging information with others outside the scheme itself. Moving towards a more centralised way of capturing, verifying and sharing biological records can help to simplify and streamline the process and make more data available to more people more quickly, especially at national level. But such a move also has the potential to disrupt existing systems and relationships that in many areas are working well in the context for which they were developed.

Online systems for biological recording are here to stay, and given trends in the use of online systems and mobile applications in many parts of life it seems very likely that they will have an increasingly important role to play in the gathering of biodiversity data.

Abbreviations used BC: Butterfly Conservation BRC: the Biological Records Centre based at

Wallingford, Oxfordshire BSBI: Botanical Society of Britain and Ireland

CDW: central data warehouse (the concept of a single data store that can receive records from multiple sources and make them available to verifiers)

page 2 of 58

Page 3: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

LRC: Local environmental records centre MBA: Marine Biological Association NBN: National Biodiversity Network NBNG: National Biodiversity Network Gateway NMRS: National Moth Recording Scheme

NSS: National recording schemes and societies TVERC: Thames Valley Environmental Records

Centre WBRC: Worcestershire Biological Records

Centre

1. Introduction and backgroundThe United Kingdom has the most intensively studied biota of any region in the world, as illustrated by more than 90 million records shared via the National Biodiversity Network (NBN) Gateway. Verification of data is undertaken using a variety of approaches, many of which remain undocumented.

There is a healthy and possibly growing interest in wildlife recording in this country, supported by a network of enthusiastic and expert recording schemes and records centres. Recent developments such as an increased interest in citizen science, and public events such as bioblitzes, have helped increase the supply of wildlife records. And although online recording systems currently play a relatively minor role in the collection and verification of data for most species groups, this source is predicted to grow rapidly and has proved to be highly effective (e.g. as demonstrated by BirdTrack and online recording of ladybirds) when actively supported and promoted.

However, these developments have also added to the challenge of ensuring that the data can be verified. The increase in records has not always been matched by an increase in verifiers, so that verifiers may be a recording communities’ scarcest resource. Current approaches seek to use verifiers as carefully and effectively as possible, but there can still be duplication of effort and/or gaps in local coverage, leading to data quality issues or delays in data flowing through schemes and centres and becoming more widely accessible.

Improvements have been made to address the problems of data quality with the development of the NBN Record Cleaner and similar tools, but human interaction is typically required to make final verification decisions, and to maintain communications with the recorders who provide the records. In recent years the recording community have responded to the challenge by developing a range of online recording systems with built-in verification tools to allow experts to verify records remotely. These systems offer considerable advantages and can build strong networks between recorders and verifiers linked to location (in the case of LRCs) or taxonomic interest (in the case of NSS). However, despite these advances, verification procedures exist on different systems and do not offer standardised means of verification or of effectively sharing the decisions that are reached. A verifier involved with both a NSS and an LRC could potentially be asked to look at two systems to verify the same record. Alternatively it could mean a record entered locally in a system without a verifier may remain unverified, and either be passed to NBN unverified or stay ‘trapped’ on the local system.

1.1 A central data warehouse approachAn alternative model would allow national and local experts to have access to the same tools in the same place to verify the same data. Once verified this data could then be more quickly and conveniently shared for use. This process would avoid the inefficiencies of verifiers having to access multiple systems. It also has the potential to reduce the overheads of local data management and speed up data flows.

A variety of effective online systems already exist for gathering biological records. Rather than try to impose a top-down solution on this variety, it may be preferable to look to a model that allowed numerous online ‘front end’ systems to feed into a single verification warehouse. The use of a more centralised approach has a number of potential benefits for the verification process, including:

1. More consistent use of automated checking rules (e.g. NBN Record Cleaner rules) to aid verification2. More efficient use of resources for developing systems, i.e. the community of verifiers all benefit

from enhancement to verification tools

page 3 of 58

Page 4: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

3. The potential for distributing the verification role to a wider set of experts (e.g. covering subsets of taxonomic groups and defined geographic areas)

4. Greater transparency in verification decisions applied to biological records5. More efficient use of experts’ time by providing access to multiple sources of data through a single

verification system6. Ease of communication between verifiers and recorders7. Ease of identification of gaps and bottlenecks in the verification process

However, a move to online systems and a centralised approach to verification also brings challenges, such as:

1. The potential to over-burden verifiers with data of low quality2. Recorders having an unrealistic expectation of receiving a rapid response to their submissions3. Software tools being inefficient or not user-friendly4. Lack of clarity on data ownership and responsibilities

1.2 Project aimsThe stated objective of this project was to explore the practical issues around providing local and national data and experts to validate and verify records in a single community warehouse. The information gathered will inform the decisions that need to be made if a transition is to be made from the current dispersed database and pioneering on-line systems, to a simplified dataflow system and a more unified approach to recording, verification and use.

The specified project aims were to: Test the single community warehouse approach using the existing BRC Indicia warehouse, and the

verification tools as implemented through the iRecord website (http://www.brc.ac.uk/irecord). Demonstrate the best use of existing automated verification rules to reduce manual verification time Demonstrate how expert verifiers can work effectively and efficiently in tandem with automated

verification technology Demonstrate how to build verifier networks between local and national bodies in a single warehouse Determine the advantages to local and national recorders of sharing data for verification and use Provide manual verification models that optimise the use of the skills and capacity of local and national

organisations to verify data Identify common bottlenecks for unverified data and solutions to speed up data flow

page 4 of 58

Page 5: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Demonstrate working local/national verification models and highlight changes required to existing systems to embed the model

A key part of the proposed approach was the intention to ensure better data validation and verification within a single system in order to increase confidence in biodiversity records.

During the course of the project the methods were refined in consultation with a steering group brought together for the project by Natural England, and representing: ALERC (Tom Hunt), JNCC (Deborah Procter, Steve Wilkinson), NBN (Geoff Johnson), and Natural England (Jon Curson, Oliver Grafton, Keith Porter, Ian Taylor).

2. What we didThe approach adopted in this project was twofold: 1. to work with some of the key organisations involved in recording and verifying biological recording data. Our main partners consisted of three national recording schemes (NSS) and two local environmental records centres (LRCs); 2. An online questionnaire was also used to gather responses from a wider range of individuals and organisations involved with verification of biological records.

The formal partners in the project were: Botanical Society of Britain and Ireland, Butterfly Conservation, Marine Biological Association, Thames Valley Environmental Records Centre, Worcestershire Biological Records Centre.

2.1 Defining the taskFor the purpose of this trial the NBN’s definitions (James 2011) of verification and validation have been followed: Data verification: ensuring the accuracy of the identification of the things being recorded. Data validation: carrying out standardised, often automated checks on the “completeness”, accuracy of

transmission and validity of the content of a record.

The aims and guidance specified for this project focused mostly on verification, and this is what most of our consultation focused on. Validation is in many ways a more straightforward issue, and many aspects of validation (such as checking that dates and grid references are correctly formatted) are automated within existing online and offline biological recording systems. However, during the course of the project it became clear that there is a range of opinion as to what constitutes a “verified” record, and also some of the responses received took in elements of both verification and validation.

2.2 Data and methods for phase 1For phase 1 of the project (up to the end of March 2013) the following new datasets were loaded into iRecord, in addition to the data being captured by iRecord and other ongoing projects linked to the central data warehouse: Moths, UK Butterfly Monitoring Scheme – 32,212 Moths, iSpot – 14,511 records Plants, iSpot – 29,514 Marine, iSpot – 2,711 Total – 78,948 records

During phase 1 the three project partners representing NSS recruited people to take part in initial discussions and provide feedback via a detailed questionnaire. Thirteen people, each of whom plays a role in data verification for their respective schemes or organisations, agreed to take part. Each registered on iRecord and was set up as a verifier for the appropriate combination of taxon group and geographical area.

page 5 of 58

Page 6: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Eleven of these people were able to join in one of three webinar sessions that were set up to introduce the project. Each session was aimed at one of the three organisational groups: Butterfly Conservation (county moth recorders); Botanical Society of the British Isles (vice-county recorders); Marine Biological Association (data verifiers). Two participants (one each from BC and BSBI) who were unable to join the webinar sessions were provided with a screencast of the verification tools as implemented in iRecord, and were able to complete the questionnaire via that route. A fourteenth response was provided by one (Thames Valley Environmental Records Centre, TVERC) of the LRCs who also participated in phase 2 of the project.

Responses to the phase 1 questionnaire were summarised and reported to Natural England at the end of phase 1 (Developing online verification of biodiversity records – summary of main points from Phase 1 consultation; BRC report to Natural England, April 2013).

2.3 Data and methods for phase 2During the second phase, April to December 2013, a similar process was undertaken with verifiers recruited through the two LRC partners, and additional datasets were uploaded to iRecord from the two LRCs involved (see section 3.1.1). Telephone interviews were conducted with data managers from the main partners. Participants during phase 2 were:

Person Organisation InvolvementLes Hill BC Data managers telephone interviewBob Ellis BSBI Data managers telephone interviewTom Humphrey BSBI Data managers telephone interviewStuart Roberts BWARS Data managers telephone interviewBecky Seeley MBA Data managers telephone interviewChris Wood MBA Email correspondenceEsther Hughes MBA Data managers telephone interviewJohn Bishop MBA Email correspondenceJon Parr MBA Data managers telephone interviewCamilla Burrow TVERC WebinarGraham Hawker TVERC Webinar - Data managers telephone

interviewSarah Muddell TVERC: Berks and South Bucks bat

GroupEmail correspondence, online questionnaire

Matt Smith TVERC: Berks ARG Webinar - Data managers telephone interview

Mike Turton TVERC: Berks BDS Webinar - telephone interviewGeoff Trevis WBRC Webinar - telephone interviewMick Blythe WBRC WebinarSimon Wood WBRC Webinar - Data managers telephone

interview

In addition, an online questionnaire was set up to collect responses from a wider community of people involved with verifying wildlife records from a range of backgrounds. This questionnaire was promoted via the LRCs, NSS and NBN, as well as via social media such as Twitter, and collected 104 responses (see Appendix 1 for a detailed summary of the responses).

In conjunction with the project steering group a series of metrics was developed to provide baseline data on the current amount and speed of online record verification and dataflow through to NBN. Data was collected by analysis of the records within the CDW used by iRecord, and by analysis of data arriving at the NBN Gateway, in order to produce baseline performance metrics for the CDW system (see Appendix 2 for a detailed summary of the data for these metrics).

3. ResultsThis project produced a large amount of feedback from the project partners and wider consultees. Results are presented here, grouped under subject headings to draw out the main issues. As well as the main project partners listed above, the online questionnaire received responses from 104 people:

page 6 of 58

Page 7: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Category Number of responses

Percentage

LRC (staff or volunteer) 25 24%Local Natural History Society 2 2%NSS – national representative

23 22%

NSS – local representative 51 49%Other 2 2%Recording project 1 1%

Of these, 23% described themselves as data managers, 25% as verifiers, and 52% as both verifiers and data managers. Just under half (47) of the responders said that they currently carried out a verification role via an online system:

Online system Number acting as verifiersiRecord or other Indicia-based system 22iSpot 12BSBI Distribution Database 10Other 5RODIS 5Living Record 4CATE2 (Association of British Fungus Groups) 2Bird Track 1

Points to be born in mind for the online questionnaire in particular, and more generally throughout this project: The people who took part were either selected by the partner organisations, or were self-selected for

the online questionnaire, and are not a statistically representative sample of the wider pool of verifiers and data managers who engage with biological recording on a professional or voluntary basis.

Publicity for this project was largely targeted at NSS and LRCs; independent local recording groups and projects were not targeted, although a few did contribute to the online questionnaire.

Not all participants provided answers to all questions.

3.1 Analysis of progress with online data verification3.1.1 Verification within the existing iRecord/CDW modelThe iRecord website and its associated verification tools first became available for use in summer 2012. At February 2014 around 840,000 records have been added to the underlying CDW; these records have come from people adding data direct to iRecord itself, or to one of the other front-end websites and apps that use the CDW, or have been uploaded to the CDW in bulk from other databases, in order to allow the data to be verified. Records have been contributed by 43 separate recording projects (in addition to direct data entry to iRecord itself; see full list in Appendix 2). Projects contributing at least 10,000 records were:

Source Total no. of species Total no. of occurrencesiRecord 12368 273659UK Butterfly Monitoring Scheme 420 266053Plantlife Wildflower Count 1207 59109National Moth Night 1186 45073NatureSpot 4180 40343Axis2Poll (pollinator monitoring as part of the Welsh Glastir Monitoring and Evaluation Programme)

216 21018

Yorkshire Naturalists Union 3322 16594Mammal Society 69 12793

Since summer 2012 verifiers for a range of recording schemes have been involved with verifying records via the iRecord system. Coverage is by no means complete for all taxon groups and geographical areas, but (by February 2014) 156 “verification roles” had been set up within iRecord (the number of people verifying will be slightly lower than this, as some people have more than one verification role). Of the 156 roles in place:

page 7 of 58

Page 8: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

110 are for a taxon group in a local region (usually a county, often for county recorders who are contributing to NSS or LRC verification)

32 are for a taxon group nationally (including verifiers for participating NSS) 12 are project-specific (e.g. for PlantTracker; NatureSpot, RISC).

Of the records in the CDW at February 2014, 17% (over 140,000) had been formally verified. The proportion of verified records varies widely for different taxon and geographical areas, depending on the availability of verifiers. For two sets of data 80% of the relevant records have been verified already (this is for 12,467 records submitted for the National Mammal Atlas Project, and for 6,446 records via the range of apps developed by the Nature Locator project), while at the other extreme only 1% of 60,816 records for the Wild Flower Counts project had been verified at that time.

For the current trial, additional datasets were uploaded during 2013, from the two LRCs who are partners in the trial, and from The Open University’s iSpot project. Progress with verification at February 2014 was as follows:

Dataset source No. of records in warehouse Verified Queried Rejected PendingTVERC 3304 (mixed taxa) 1714 (52%) 0 (0%) 2 (<1%) 1588 (48%)WBRC 48398 (insects) 15149 (31%) 15 (<1%) 1 (<1%) 33233 (69%)iSpot 46736 (moths, plants, mixed

marine taxa)1927 (4.1%) 81 (0.1%) 66 (0.1%) 44662 (96%)

The two LRC datasets were uploaded in autumn 2013, and the relatively high proportion of records that have been verified in a short time is encouraging. The relatively low proportion of verifications for the iSpot data (which was uploaded in spring 2013) is likely to be the result of there being relatively few county moth and plant recorders actively verifying on iSpot so far (and those that have been involved weren’t specifically asked to complete verification of any datasets for this trial); the fact that the iSpot data may be seen by some as difficult to work with, as the full names of the original recorders are not always available (many iSpot users give online alias names); and the fact that iSpot is intended to support relatively inexperienced recorders in learning how to identify wildlife, and is not a recording scheme as such - the ad hoc data that results may be perceived as of lower value by some recording schemes.

3.1.2 The central data warehouse as a conceptWe asked “Do you agree that in principle it would be beneficial to have as many 'front-end’ wildlife recording websites and apps as possible feeding their data in to a central data warehouse for verification in one place?” An opinion was expressed by 93 responders:

All responders (n=93) LRC responders (n=23)

NSS responders (n=65)

Yes 35% 26% 38%Yes with reservations 25% 22% 26%No 31% 48% 26%Unsure 9% 4% 9%

A majority thought that the central data warehouse model was a good idea in principle: “In principle a central data warehouse is worth striving for, and has the potential to gather a greater

proportion of new records into one place where they can be dealt with more quickly. Ease of contact with original recorder is an important part of the process.”

However, many reservations were expressed. There was less support for this model among the LRCs that responded than among NSS. Among the comments for this question (see Appendix 1) there were some very strong opinions expressed that the CDW model has major shortcomings and would cause more problems than it would solve:

page 8 of 58

Page 9: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

“We do not believe in a central national data warehouse as this undermines all the work we are trying to do locally and our relationships with our local recorders. Also it is not acceptable to us to have our data store on somebody else's server. Tools such as iRecord are undermining local record centres initiatives.”

3.2 Analysis of benefits to users3.2.1 New data made availableAt February 2014 the CDW contained c. 840,000 records that were available to the recording schemes and centres that wished to engage with them. The records represent 135 separate taxon groups (these are “informal taxon groups” within the NHM’s UK Species Inventory – some taxon groups are duplicated under different names within specific surveys within the CDW, so the true total is a little lower). At that date, 17% of the total records had been marked as verified, 0.4% had been rejected and 0.1% queried.

The majority of records in the CDW are ‘new’ records that have been entered via one of the front-end recording forms that feed in to the CDW, and had not previously been contributed to recording schemes.

3.2.2 Case study: iRecord data for the Soldierflies and Allies Recording Scheme

3.2.3 Simplifying recordingPeople recording wildlife may be asked to supply their records to multiple data collators, including an NSS, an LRC and maybe the local site manager or other interested parties as well. In principle one of the major benefits of a CDW model would be to streamline things so that a recorder only has to submit data in one place for it to become available to all who need it, and this was highlighted by some responders as a very positive thing. However, people submit data for a wide variety of reasons, and they may wish to have direct contact with the organisations who are collating or using their records, e.g. to receive feedback, newsletters, and join in social activity. Within a CDW model the need for direct communication routes and clear recognition of contributing NSS and LRCs (and other organisations) needs to be recognised (see also section 3.5).

3.3 Analysis of verification and validation efficiencyLRCs and NSS all have systems in place, and there is a lot of common ground in the types of check that they carry out, especially for record validation. There is probably a good deal of similarity in the type of verification checks carried out also, but this is not well-documented and the actual procedures used (in

page 9 of 58

Soldierflies and allies consist of 154 species in 11 related families of Diptera, formerly known as the “Larger Brachycera” (now sometimes called the Lower Brachycera). The scheme has been running for many years, but has not been promoted actively in the last few years, and the number of incoming records has fallen from a peak of almost 4,000 in 2005. Scheme data was uploaded to the NBN in the 1990s but has not been updated since.

Since summer 2012 the scheme has been making use of iRecord to collect relevant records. iRecord is not yet widely known among the wider community of entomologists, but even so over 2,500 records for the scheme had accumulated by February 2014, covering 86 species. The largest element of these came via a dataset uploaded to iRecord from the Worcestershire Biological Records Centre (as part of the Defra/NE verification trial), but there are also substantial contributions from individual recorders via iRecord itself, from the Yorkshire Naturalists’ Union online recording, and from the Garden Bioblitz project.

Among the data arriving via iRecord there are records for 15 Nationally Scarce and 9 Red Data Book species, and there are 16 new vice-county records and 434 new 10km-square records.

Of the 2,540 records on iRecord, 1,639 (65%) were flagged by the ‘record cleaner’ automated checks. On inspection, only 46 (1.8%) of records were judged to need querying or rejecting. The high proportion of acceptable records ‘failing’ the record cleaner checks results from a mix of the under-recording of this group of flies (hence lots of scope for finding common species in new places) and the high level of sensitivity of the iRecord system to the ID difficulty grades – everything except Grade 1, the easiest to identify, gets flagged as needing checking.

As a cross-check, data from iRecord has been ranked in order of the species most frequently recorded, and compared with the ordering in the main scheme database. There is a lot of similarity between the rankings, suggesting that the contributors to iRecord are not adding any new biases to the range of species being recorded.

Page 10: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

terms of software and workflows) vary widely. In general, the larger schemes (i.e. those covering the ‘popular’ groups where lots of records are contributed, or those that cover a large number of taxa) make more use of automated checks via software, while for smaller schemes with less incoming data there is less to be gained from applying automated checks, and it may be entirely feasible for an expert to check incoming records by eye. And of course the expertise required for verification is much more widely available for ‘popular’ taxa than for the more specialist groups.

Another split is apparent between local schemes and national schemes. Although both emphasise the need for both verification and validation, there is a tendency for NSS to focus more strongly on ensuring that the verification of identifications is carried out (especially for the more specialist groups), while LRCs may focus more strongly on ensuring that records are valid for their purposes and at a more precise spatial scale, e.g. in some cases checking and editing site names and grid references to match local usage and ensure that data is as informative as possible within a local planning context.

3.3.1 Time taken to verify records (from submission to verification) onlineFor data held in the iRecord CDW information is available on the time period between records arriving in the CDW and subsequently getting verified. There is wide variation in how quickly this happens, and of course it critically depends on the recruitment and availability of volunteer verifiers. However, all datasets included a high proportion of records that had been verified within 0-3 days.

Records in iRecord warehouse *

Number verified to date

% verified

Range for verification: days (min–max)

Days taken for verification: mean

Days taken for verification: median

Days taken for verification: mode

496204 111929 23% 0–1346 82 8 0* : this figure is for records entered or imported directly into iRecord, and is thus smaller than the 840,000 total given above for all data in the CDW

We were unable to find data on how long it typically takes for a record to be verified once submitted offline to an LRC or NSS, not least because the answer to this varies enormously between different taxa and different systems.

3.3.2 Keeping track of verification decisionsAn important part of increasing verification efficiency is the ability to store verification decisions once they have been made, so that data users can see the verification status of any data they require, and verifiers are not asked to repeatedly review the same data within different systems. The majority (64%) of consultation responders stated that their database systems did enable them to explicitly store verification decisions for each record, e.g. by flagging the record as verified/unverified/not yet examined. However, among the NSS there is a sizeable minority that is not able to do this (due to software limitations in at least some cases). A few responders stated that although they did not store verification decisions explicitly, it was implicit that all the data in their dataset had been verified, as they would not have been added to their database if unverified. The danger with this approach is that if data is disseminated via the NBN Gateway or shared elsewhere the verification status may not be apparent to other data users.

See also the tracking of unique identifiers for biological records in section 3.6.5.

3.3.3 Sharing verification among multiple verifiersIn principle the online approach could result in multiple verifiers taking an interest in the same set of data, and in some cases the people concerned are likely to be unknown to each other, especially for the more popular groups where verification expertise is relatively widespread. This is happening to some extent on iRecord (e.g. there has been some sharing of verification for some of the Hymenoptera and Diptera records), and is likely to happen more frequently as more people engage with iRecord.

page 10 of 58

Page 11: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Just over half (52%) of all verifiers responding to this project were part of a team, but unsurprisingly this is slightly more prevalent among LRCs (who cover multiple taxon groups) than among NSS. The nature of shared verification is variable; the most common pattern for NSS is for two people to work jointly on a particular scheme, but in some cases the answers referred to dividing up responsibility by taxon groups within the recording scheme or centre as a whole; or by working with taxon experts on particularly difficult taxa; or by consulting others on an ad hoc basis as the need arises.

When asked how their role could be made easier, only two responders requested more people to share the verification role. In an online CDW model it is likely that there would be an increase in role-sharing, particularly where local expertise and national scheme organisers are likely to take an interest in the same records from different perspectives. Such a model is well-established for the larger NSS that cover the more ‘popular’ taxon groups, where it is common for a network of county recorders to divide up verification on a geographical basis while referring especially difficult taxa to one or more national authorities. Marine recording also has well-established procedures for sharing verification, most often dividing data up on a taxonomic basis. For smaller, more specialist schemes sharing of verification is less common, for the obvious reason that taxonomic skills in these groups are less widespread and the amount of data that needs verifying is smaller.

Where two or more verifiers disagree over the species determination for a record we asked how this should be dealt with; most responders agreed that in such a case both verifications should be stored and one shown as the ‘accepted verification’. A range of suggestions for how to reach the ‘accepted verification’ were given, with no agreement on whether national-level verifiers should automatically have precedence over local level verifiers. But there was agreement that communication between verifiers to achieve consensus is the preferred outcome, and if conflicting verification decisions have been recorded then the record should either be treated as unverified, or should be treated as a verified record at a higher agreed taxonomic level (e.g. genus or family): “If two verifiers disagree and neither is willing to withdraw their decision and both have examined all

available evidence, the record should be changed to whatever higher taxonomic level they do both agree on. The audit trail of the two possible determinations should be kept as part of the record as this could help target future recording effort, or in case further evidence becomes available in future. But for mapping and reporting purposes, the record should be at the higher taxonomic level.”

3.3.4 Is verification a bottleneck in the processing of biological records?From the responses received there was little suggestion that verification itself (i.e. a verifier checking a particular record) was a constraint on the speed or efficiency of processing records; constraints that were mentioned included the time it takes to deal with data arriving in multiple varied formats from different sources, and the problem of trying to verify records that were not well-evidenced.

Online systems do have the potential to reduce workload for verifiers, if there is widespread adoption of compatible systems. At the moment iRecord and the other online recording systems are adding to the number and complexity of places where verifiers are being asked to contribute. However, many responders made the point that such complexity has long been a problem, with data coming from many different sources in many different formats. The CDW concept does offer a way forward in simplifying this situation, if it can be made to work for as many data sources as possible.

When asked whether their scheme/centre received records for which they don’t have the resources or expertise to verify fully, 74% of LRC responders said yes, as a result of dealing with multiple taxon groups for which local verifiers might not always be available. In this case there may be a bottleneck effect if records that can’t be verified at local level are not able to be passed on to national level.

A smaller proportion of NSS verifiers said that lack of resources or expertise was a constraint, and those that did often referred to the impossibility of verifying records in the absence of sufficient evidence (photos, specimens) rather than the absence of resources or expertise as such. Verification is not seen as a bottleneck by active NSS, but there are of course taxon groups for which there is no active NSS, and

page 11 of 58

Page 12: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

consequently no-one taking responsibility for verification in these groups. In such cases there is another form of bottleneck where data remains unverified and in many cases un-collated and unavailable; a CDW model could play a key role here in collating data for taxa that have no recording scheme, even if verification may not be immediately possible.

Other constraints to online verification via the CDW that were highlighted by consultees were: Reliable and fast online access is not yet a reality, especially in rural areas Knowledge of the local area is important for verification, and validation of location names is important

in some contexts (especially for LRC use); a tricky area in that location naming conventions may differ between LRCs and NSS, leading to differences in the approach to validating and editing location names and grid references

LRCs may wish to edit data for locational consistency and accuracy, but only taxonomic verifiers are able to do this at the moment – there needs to be a route for validation changes as well.

From phone interview with Graham Hawker (TVERC): The CDW could decrease efficiency because records get stuck if no verifier processes them – a lot of

data can be self-verified by experienced recorders and does not need to wait for further verification within the CDW.

From phone interview with Matt Smith (Berks Reptile and Amphibian Group, and BWARS, via TVERC): More efficiency gains to be made for popular groups where identification expertise is widespread. The bigger potential gain is to have more data arriving in warehouse and thus made available to

recording scheme in one location. Verification is not a bottleneck for more specialised groups where numbers of records are relatively low.

From phone interview with Simon Wood (WBRC): For WBRC, verification is not a major bottleneck at the moment and doesn’t constrain data supply,

although there are concerns that it may be difficult to replace the current generation of verifiers. Lack of active field recorders is more of an issue.

A central data warehouse approach does have the potential to increase efficiency by bring together datasets from new projects into one place, rather than fragmenting them across multiple websites and apps.

3.4 Analysis of use of NBN verification tools3.4.1 The NBN Record Cleaner rulesetsThe NBN’s Record Cleaner utility can be applied to data held in variety of database and spreadsheet formats. It carries out validation checks against a set of built-in rules. This includes spotting bad dates (e.g. 31st February) or mis-formed spatial references (e.g. TL123), and checking the spelling of items like species and vice county names.

In recent years the NBN has worked with a range of NSS to develop rulesets to help verify biological records (http://www.nbn.org.uk/Tools-Resources/Recording-Resources/NBN-Record-Cleaner.aspx). The rulesets are a set of automated checks that highlight records that are unusual in some way, and may need further investigation. These checks can show whether a record of a species is outside its currently known range, or occurring at a time of the year when it is not expected, and also categorise species by how easy or otherwise they are to identify, allowing the harder-to-identify species to be filtered out for more detailed scrutiny.

iRecord’s data entry process disallows incorrect taxon names, bad dates and mis-formed grid references etc. so the Record Cleaner validation checks don’t need to be run as a separate operation. The verification rulesets are built in to the iRecord verification interface (for those taxa that currently have rulsesets), allowing recorders to see if their records flag up any of the rules, and verifiers to filter records by whether or not they have been flagged by the rules.

page 12 of 58

Page 13: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

3.4.2 Use of the rulesets by verifiersAbout 25% of responders to our online consultation did make use of the Record Cleaner software (with the proportion higher for LRCs and lower for NSS). Some responders used other software and procedures for carrying out automated checks, most often for validation of dates and grid references.

We asked “Do online verification tools such as those implemented in iRecord enable you to carry out any checks that are difficult, or not possible, within the systems you use already?” Only 14 people provided responses that addressed this question, of which nine said that the online tools did allow them to carry out checks more easily than their current system and five said they did not.

Some criticisms were made regarding the limitations of using national rulesets, which by their nature cannot take into account local variation in phenology, distribution etc.; and with the fact that species distributions can change rapidly, reflecting either real changes or increased recorder effort, so that the rulesets can go out of date quite rapidly.

Having the verification rules built in to iRecord does make them more accessible to the verifiers using that system. To what it extent it speeds up the verification process is harder to assess, and varies between different taxon groups. For verifiers who have to deal with large amounts of data from the more ‘popular’ groups such as birds and butterflies the verification rules do help to pick out unlikely records from large datasets. For more specialist taxa where fewer records are contributed the rules can still be helpful but are less likely to lead to increased speed in verification, as verifiers would not normally find it hard to pick out unusual records from the smaller datasets that they deal with.

The iRecord implementation of the CDW model also exposes the rulesets to the people doing the recording, and there is potential for gains to be made here. One of the desired improvements that verifiers requested in the online consultation (see section 3.8.1) was the need for inexperienced recorders to be more aware of the evidence that they should ideally provide if records of unusual of difficult species are to be accepted. By flagging up at the point of data entry which species are regarded as ‘difficult’ or unusual, there is the chance for recorders to add supporting evidence for records of those species. iRecord’s implementation of the rulesets allows recorders to know when they have added a record of a species that might need further scrutiny, something that novice recorders are often unaware of, and has the potential to improve the collection of supporting evidence for ‘difficult’ or rare species.

3.4.3 Application of the rulesets within the iRecord verification systemOf 802,459 records in the CDW, to date 625,918 (78%) are from taxon groups for which NBN record cleaner rules are available, indicating that the existing suite of rulesets do cover the bulk of the species that have been recorded via the CDW so far.

Data from iRecord has been verified by the national Soldierflies and Allies Recording Scheme (see section 3.2.2), which found that the record cleaner ruleset for this group flagged up 65% of 2,540 contributed records. On investigation, only 1.8% of the records were judged to need querying or rejecting. This apparent over-sensitivity of the ruleset in flagging up records is judged to be due to a combination of: under-recording of a relatively specialist group of insects, so that many records are flagged as being

outside the previously known range. iRecord’s flagging of all species that are scored between 2 and 5 for identification difficulty, within the

1–5 range that is used, so that the majority of species are flagged as needing checking on the basis of identification difficulty. This can be helpful when dealing with records from novice recorders, but flags up many records from experienced recorders that can be accepted without further query.

Even so, the rulesets are arguably flagging too many records at the moment. Improvements can be made both to the rulesets themselves and to the degree of control that iRecord allows, e.g. so that verifiers can

page 13 of 58

Page 14: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

choose which level of identification difficulty to filter by (at the moment all levels except the easiest are flagged).

Updating the verification rulesets can be time-consuming, especially for species-rich taxa, but automating some updates should be possible in future, e.g. to take into account accumulated verified records to adjust the geographical parameters used to test whether a record is outside the previously known range.

page 14 of 58

Page 15: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

3.5 Analysis of communications between verifiers and recordersThe iRecord verification tools allow verifiers to communicate with recorders via comments added to the records, and also by direct email communication. At February 2014 the following activity had occurred. source countVerifications and comments 297320

Mammal Society 6005

NNSS Alert 855

NNSS Het 573

NNSS BSBI 335

NNSS MBA 258

NNSS Mammal 188

NNSS ARC 177

NNSS Auch 133

NNSS Crayfish 32

Herts new records 10

Herts verified records 9

NNSS Fera 6

NNSS Conch 5

NNSS Fera Zero 3

Verifiers responding to this consultation pointed out that the iRecord system needs to do more to give recognition to the NSS or LRCs that the verifiers represent, in order to both acknowledge the support of those organisations and to make it clear who the verifiers are acting on behalf of, so that recorders contributing records know who it is they are interacting with. This could be achieved with more use of standard ‘signatures’ on verifiers’ emails and other messages, or through assigning ‘badges’ to verifiers, or better use of links to verifiers’ profile pages and/or to the websites of their associated organisations. Related to this, the idea of using standard templates for email contents to help explain to novices why their record is being queried was also suggested.

Such recognition ideally needs to be carried through to the NBN Gateway as well, so that users can see who has verified records on behalf of which organisation.

There is a risk that a move to a more centralised system for collating records could jeopardise the relationships that individual NSS and LRCs have with individual recorders, and also risks diminishing the social aspect of wildlife recording. It should be possible to guard against in a CDW context by the effective use of communication tools (such as those that already exist within iRecord), and by giving contributing NSS and LRCs a higher profile. However, it also needs to be recognised that not all recorders are comfortable with online communications and these are not a panacea for all.

Feedback and communication with recorders is usually regarded as critical to the success of a recording scheme – recorders are much more likely to engage if they can see that their contributions are being recognised. It has been argued that the next generation of wildlife recorders will grow up expecting an instantaneous response from recording schemes in a similar way to that allowed by social media websites. As yet the evidence is unclear on this, and indeed for verification purposes it can be argued that speed is less important than accuracy, with a focus on speed actually being detrimental to accuracy. However, there may be benefits from enabling recorders to receive some degree of automated feedback (“thanks for your record, which is the Xth we have for this species” or similar) if recorders wish.

From phone interview with Graham Hawker (TVERC): Recorders and other data users needs to know both the name of the verifier and the organisation for

whom they are acting as verifier, both to give recognition to them and also to ensure that the recorder and anyone using the data can see who has been involved in the verification process.

From phone interview with MBA data managers: Where records are being delivered direct to the NBN Gateway from the CDW it would be beneficial to

ensure that the dataset on NBN acknowledges the input of MBA-linked verifiers.

page 15 of 58

Page 16: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

3.6 Analysis of data sharing efficiencyThere is a shared aspiration across the biological recording community for high-quality biodiversity data to be available to those who can make use of it, and the will is there to make this happen. But as has been stressed throughout this report, priorities differ across the schemes, centres and individuals.

A particular challenge that has been highlighted by many (but by no means all) participants is that they are unhappy with some aspects of the National Biodiversity Network, and with moves towards making data flow more directly from recorders to the Gateway. Rightly or wrongly, the NBN is sometimes perceived as being remote to, and having different priorities from, particular schemes, centres and individuals.

From the perspective of individual recorders, some of the responses to this questionnaire (as well as anecdotal evidence from a wider range of recorders) indicate that there is frustration with the current situation where recorders may be asked to supply data (sometimes the same data) to different places (and sometimes in different formats) in order for it to be available to those who can make use of it.

Online systems and a central data warehouse approach offer a way forward here, and there are clearly efficiencies that could be gained from having one place where all a recorders’ data can be sent. But in order for this to be an attractive option recorders need to see that the data is reaching the places they wish it to (i.e. the central data warehouse mustn’t be seen as a ‘black hole’ from which no response emerges).

3.6.1 Speed of data flow to the NBN GatewayData from the iRecord CDW currently (February 2014) forms a very small proportion of the total amount of data coming to NBN Gateway from all sources. However, the relatively larger proportion of records in 2012 and especially 2013 does demonstrate the potential for CDW data to arrive more quickly on the Gateway from the CDW than it currently does from other sources, where it can take some time for data to be passed from the original recorder to a local or national scheme or centre, and then collated and passed on to NBN. (Routes for data arriving at NBN can be convoluted, with some LRCs sending data via NSS and vice-versa.)

The two largest datasets that have been passed from iRecord to the Gateway both contain data that is likely to be of significance to data users (including government agencies) accessing the Gateway, and therefore a priority for efficient dissemination: records for the National Mammal Atlas Project (which will include data for a number of section 41

Species of Principal Importance). records from the Plant Tracker project to map non-native invasive plants.

Number of records on the Gateway, and proportion originating from the iRecord CDW, per year since 2005 (based on date of record, not date of data entry):

Year All Gateway records All iRecord records iRecord as percentage of Gateway

2013 109453 3848 3.52%2012 1071887 5617 0.52%2011 2713117 1195 0.04%2010 3558270 472 0.01%2009 5801997 192 0.00%2008 3693290 45 0.00%2007 3308829 33 0.00%2006 3804853 33 0.00%2005 3479128 46 0.00%

Consultees for this project were asked if their contributing recorders liked to see their records appear on the NBN Gateway as quickly as possible. Many responders expressed an opinion but commented that it was just their opinion, while the “don’t know” category got the highest number of responses. Only one scheme, Seasearch, responded with data based on a consultation they had carried out with their recorders (see

page 16 of 58

Page 17: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Lightfoot 2013). This received responses from over 200 contributing recorders, and on the subject of data accessibility and use the responses showed that: 50% of Seasearch recorders feel it would be a waste of time filling in Seasearch forms if the data weren’t

made fully publicly available. 46% use the NBN Gateway to help with their Seasearch recording. 90% feel that all organisations who make decisions about the marine environment should have full and

free access to their data (although when asked if data users should provide funding or other support if they use Seasearch data 61% said yes for private sector users and 40% yes for public sector users).

MBA made the point that if records are to feed through to NBN more quickly, there needs to be a clear process for dealing with any subsequent changes that may be required if further verification checks result in changes to decisions previously made.

3.6.2 Constraints to disseminating data via the NBN GatewayMany recording schemes and centres would legitimately argue that there are no major constraints from their perspective – they are already putting a lot of effort in to making data available to meet their priorities and those of their partners. The perception of bottlenecks may come from data users who have differing priorities.

The obstacles to having data flow more quickly to the NBN Gateway vary. For funded schemes and centres there may be competing priorities from funders who do not require data to be available from the Gateway (e.g. where local authorities fund LRCs to make data available via local GIS); for voluntary schemes time is often a scarce resource and other needs may take precedence (e.g. atlas production, feedback to volunteers, maintaining the scheme database).

The relationship of NSS and LRCs to the NBN is outside the scope of this project, but unease over the flow of data from the CDW to the Gateway is clearly a concern for many responders and is affecting their ability and willingness to share data via that route, as well as raising issues that have not yet been resolved in the CDW model. NBN has always worked to bring together partners from across the biological recording community. Building on and extending this partnership is an ongoing task, and there is more to do to persuade all potential partners of the benefits to biological recording and conservation in sharing data more centrally.

Potential constraintsIn the online questionnaire we suggested five potential constraints on data flowing to the NBN. These received very mixed responses, with little agreement among responders as to which, if any, were the most constraining:1. Lack of field recording taking place2. Lack of time for verification3. Lack of experience / training among field recorders4. Taxonomic uncertainties making verification decisions hard to reach5. Records not yet digitised

A follow-up question on whether verification resources in particular were a constraint to data flowing to the NBN received more disagreement than agreement (see also section 3.3.4).

Views of the communityIn the free-text comments associated with these questions we received a lot of comments relating to a wide range of issues to do with data flows in general and the NBN in particular. It is difficult to generalise from such a range of responses, but examples are given in the bullet points below (and a fuller summary is in Appendix 1). From the responses received, there were numerous comments to the effect that verification is not the main bottleneck for data flowing to NBN. Among our responders the bigger issues raised included:

page 17 of 58

Page 18: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

“The possibility of inappropriate use of NBN downloads and access conditions by consultants and others.”

“Data already displayed on the Gateway is of variable quality, and some verifiers (especially those from some NSS) don’t want to see data which they regard as being of higher quality mixed with data from other sources.”

“Time and effort is needed to pass data to NBN and for some schemes this is not a priority.” “Where LRCs are passing data to the CDW for verification, they would wish to see any subsequent

supply of data to NBN to be in accordance with LRC data access restrictions; this may conflict with the aspiration of the CDW to make data more readily available to NBN, as well as with access restrictions that NSS verifiers may wish to see.”

“Supply of data to NBN is currently a performance measure for LRCs, if data is going direct from CDW to NBN this can no longer be measured.”

Accuracy and accessThe CDW model has the potential to reduce concerns over the accuracy of data on the Gateway, if it is successful in increasing and explicitly recording verification decisions, and if these decisions can be clearly displayed via the Gateway; improved uptake of and responses to the record querying system that is already available for Gateway records will also help. See also the section 3.7 regarding unverified records.

The issues over data access controls and agreements is harder to resolve for the CDW, where there may be differences in approach between LRCs and NSS for the same set of records. Further debate and consensus building will be required to address this.

3.6.3 Data held centrally vs data held locallyOne of the potential long-term benefits for having a single CDW that all NSS and LRCs can use is the reduction in data duplication as a result of all interested parties having access to one copy of the data in one CDW, and this is a strong argument for moving towards a more centralised approach. However, in the shorter term NSS and LRCs have to be able to access all relevant data for their own analysis and reporting purposes, and online systems are not yet at a point where sufficiently flexible querying and reporting can be achieved and linked to locally-held data. There is little incentive for volunteer verifiers to engage with the CDW model if they can’t easily access the resulting records within the context of the other data that they deal with.

Consultees were asked which of four options were preferred for passing data to the NBN Gateway:1. Data you have verified in the CDW becomes immediately available to the NBN2. Verified data is downloaded from the CDW into the scheme/centre database and passes from there to

NBN 3. It would be possible for all data currently in the scheme/centre database to be stored in the CDW 4. It would be possible to link local databases with the CDW so that all data could be queried without

having to move data between databases

The preferred current option for all responders, and for LRCs and NSS separately, was option 2, by a large margin. The least preferred option for all responders, and for LRCs separately, was option 3. For NSS the least preferred option was option 4. However, option 3 was also highlighted as an option that could become the favourable route in future.

Challenges around shared data useThe problem of how to make data easily available for use by both NSS and LRCs is an intractable one, and many issues were raised by responders to this consultation: How to deal with editing and ‘top copy’ issues if data downloaded to more than one place? Some LRCs

indicated that they need to be able to control the top copy of the data to ensure that records fit in with their requirements (e.g. for locally defined site definitions, or to ensure a full audit trail if records form evidence for local planning purposes).

page 18 of 58

Page 19: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

How to encourage explicit tracking of verification decisions and unique IDs so that records can be recognised as being the same even if held in multiple locations.

Ownership of datasets is unclear where data has been supplied by LRCs, or added via front-end systems embedded in external websites. If data supplied via an LRC front-end website is then verified by a NSS verifier, who is responsible for uploading to NBN? Currently that decision rests with the NSS that have engaged with iRecord, but this leaves LRCs being unsure of the route that data from ‘their’ recorders will take to get to NBN and potentially in breach of agreements with local data providers.

NSS and especially LRCs are sometimes judged on how many records they have mobilised – this would be harder to assess if data is held in a shared pool rather than separate datasets.

Where older datasets are uploaded to CDW for verification purposes there may be problems over duplication within multiple datasets (e.g. records from LRCs may already be in the NSS database or vice versa, edits to site names and grid refs etc. may have been made differently in different places).

A single CDW approach may be the ideal for the future, but the reality is that a number of data warehouses exist or are in development – is there an argument for focusing on interoperability among several warehouses rather than aiming for a single system?

3.6.4 Feedback from project partners on options for dataflow

page 19 of 58

From phone interview with Simon Wood (WBRC):Storing a larger proportion of data in a ‘cloud-based’ central data warehouse could become a viable option in future, but for this to work a number of conditions would need to be in place, including: Needs to be easy to make data available at local level both for WBRC work and for local verifiers to use –

reliable and fast access to online data is not yet a reality for all recorders/verifiers, especially in rural areas, and for the foreseeable future there will still be a need to download data for local use.

There needs to be recognition of the role of LRCs in making data available to NBN – this is currently seen as one of the performance measures for LRCs, but if LRCs are to encourage greater use of online systems and a central data warehouse it will not be possible to track the number of records that are “supplied by” the LRC.

Needs to be appropriate control over access to sensitive records, e.g. protected species. There are concerns that if local authorities see a national central data warehouse as the place to obtain data,

they will not support local LRCs, even though local LRCs play a vital role in supporting the network of recorders that are providing the data to the central data warehouse. Will need a strong steer given to local authorities to shift them from measuring the number of records that they ‘own’ locally.

Phone interview with Les Hill (data manager, National Moth Recording Scheme):NMRS has put a lot of time and resources into liaison with the CMR network, and it is essential for the continuing success of NMRS that good relations are maintained with this network. Current dataflow for NMRS is geared to all data going through the relevant CMR before being passed to NMRS for national collation. This model is explicitly built in to the existing data policy of NMRS and has been the subject of extensive negotiation in order to achieve agreement among all CMRs. There are no plans at the moment to change the NMRS data policy.

So for the foreseeable future option 2 is the only one that is practicable for NMRS – data arriving in a central data warehouse needs to be downloadable for CMRs to incorporate into their existing systems, and then supply back to NMRS.

In the long-term it may be that CMRs will wish to consider other options as online systems become more established, but there are a number of factors that need to be born in mind, including the availability of reliable internet access and familiarity with online technology among the network of CMRs, which encompasses a wide variety of skills, experience and enthusiasm for online technologies.

Phone interview with Graham Hawker (TVERC):Download from central data warehouse (and other systems) is needed to make data available for TVERC’s search/analysis/reporting, but datasets can be excluded from TVERC’s subsequent upload to Gateway if that dataset is being uploaded by another route. However, if LRCs are increasingly allowing BRC or NSS to upload data to Gateway direct from central data warehouse, recognition needs to be given that the numbers of records uploaded by the ERC may decrease, even if the records originate from ERC initiatives such as survey projects or digitisation of data from consultants reports.

…/continues

Page 20: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

3.6.5 Tracking individual records from multiple sourcesIf duplication is to be avoided when records are shared, individual records need to be tracked, something that can in principle be done as long as each record has its own unique identifier, and that identifier remains with the record in whichever dataset the record is transferred to.

Although most if not all current biological recording database systems do generate unique identifiers for the records that they contain, these systems vary in their ability to store unique identifiers for records imported from elsewhere. Over half of LRC responders stated that their systems did store external identifiers, but fewer than half the NSS responders were able to do this. Some responses from LRCs that use the Recorder database indicated that they can store external record identifiers, but not easily – Recorder doesn’t provide a data field specifically for this purpose. Development of software tools that facilitate tracking of unique identifiers across different systems would be of great benefit to the CDW model and more widely.

3.7 Analysis of views on sharing unverified dataData may remain unverified in the CDW for a considerable time, for a variety of reasons. Some data users would benefit from being able to see this data quickly, before it is verified; for example, records of invasive species or protected species may allow others to check the record and take appropriate action. Access to unverified records may also allow the possibility for experienced recorders to re-visit the location in question and collect verified data to confirm or refute the original record (although this could be seen as a poor use of expert time if a high proportion of such records turn out to be incorrect).

Although many responders to this project were uncomfortable with the idea of unverified data being widely shared, others could see clear benefits to doing so. All stressed that if this was to happen more widely it would need to be carefully controlled, with consideration given to: Ensuring that the difference between verified and unverified records was very clearly flagged. Developing and promoting the use of an agreed suite of verification terms, to allow accurate

documenting of records and to enable data of known quality to be shared. Using the NBN Record Cleaner rulesets as an initial ‘triage’ level of verification, so that a distinction

could be made between those records that were unverified but didn’t fail any automated checks, and those records that were unverified but did fail one or more automated checks (and also those records for which automated checks are not available).

Restricting access to unverified data so that it was only available to those with a clear need for it. Ensuring that there was an efficient method for updating unverified data if it was subsequently verified,

and removing it if it was subsequently rejected. Educating data users on verification status and how to interpret this.

Among our responders there was, by a small margin, a majority who thought that there were no benefits to sharing unverified data, although among LRC responders there was a clear majority who could see benefits:

All responders LRC responders NSS respondersData sharing beneficial, or has some benefits 49% 70% 45%No benefits, or damaging 51% 30% 55%

page 20 of 58

… continued:

Matt Smith ((Berks Reptile and Amphibian Group, and BWARS, via TVERC)) phone interview:Currently best route is for data to be collated onto recording scheme database and uploaded to NBN from there. In future it could be possible for data from the central data warehouse to go straight to NBN, but in order for that to happen NBN would have to be able to show clearly which data had been verified by the recording scheme. Data would still need to be held locally by the scheme for analysis and reporting purposes, unless the point is reached whereby full access to all relevant data can be easily got from NBN (e.g. to easily query the gateway to report all scheme-approved records within any given polygon).

Page 21: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

The partner organisations with which we carried out more in-depth interviews there were similarly cautious about sharing unverified data at first, but agreed that with reliable safeguards in place there could be benefits to it.

However, the lack of consensus on this issue is highlighted by two responses from LRCs: “We would not be able to function as a records centre without sharing unverified records, and data

availability to planners and other organisations would be severely constrained.” “We would not consider sharing unverified data. As an LRC we have a responsibility to ensure that data

we supply to sources available to the public are at all times of the absolute highest quality.”

And a similar diversity of opinions came from NSS responders: “It speeds the process up considerably and enables the records to be used in an appropriate manner

quickly. This could be especially important for new discoveries.” “At the level of the NBN, until records are verified they should not be presented or mapped online.

Identification errors, particularly those from inexperienced recorders are common and once displayed on "the internet" may be very hard to "remove" as people may continue to refer to them long after they have been verified and discounted. The NBN needs to be seen as an authoritative source of data. To me this means no unverified data is displayed - if there is a recognised recording scheme then all data for the group should pass through them.”

Additional points from the telephone interview with BSBI’s data managers: “Making unverified data available: BSBI aims to supply verified data to NBN, and has concerns over reputational issues if it was seen to be supplying unverified data. Volunteer VCRs might also see this as having a negative impact on their own reputation, and maybe an implication that they have been tardy in carrying out verification, leading to them feeling under pressure to do more than they are able to take on. Any change to this position would need to be thought through and debated with VCRs. BSBI would prefer to focus on speeding up the supply of verified data from its datasets to NBN, while recognising that there may be value in allowing unverified data from other sources to be visible on NBN (with suitable safeguards to ensure that the verification status of the data was clearly flagged).”

3.8 Analysis of verifiers’ support requirementsThere is a lot of variation in the resources, motivations and interests of verifiers in different situations. Most verification is done by volunteers on behalf of recording schemes or records centres. Some are more independent than others; for instance, a network of independent county moth recorders existed before the National Moth Recording Scheme provided an umbrella organisation for this activity. Some verifiers are appointed by national or local societies or committees, others become verifiers by default through starting to do a job that no-one else is doing.

All verifiers have an interest in ensuring that there is a well-maintained and accurate set of data for their particular interest group. Most verifiers would no doubt agree that part of what they are trying to achieve is to make high quality data more widely available, although there is less agreement on how best this can be achieved, with some taking the view that by making data too easily accessible to others it will end up being mis-used or misinterpreted.

Most verifiers would also agree that they are keen to promote more involvement in recording of their group. However, there are limits to this, and verifiers often have to make decisions between how much time they can spend on reaching out to casual recorders who are not yet (and may never be) committed to recording the group, as opposed to time spent working with and supporting recorders who have already started to make a more substantial contribution.

3.8.1 Making verification easierWe asked verifiers to list up to five ways in which their role as a verifier could be made easier. The request made most often, by far, was for contributing recorders to receive more training, not just in species

page 21 of 58

Page 22: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

identification but also in how to record their observations in a clear way with sufficient supporting evidence, providing photos or specimens as appropriate. Next most popular was to have more information available to both recorders and verifiers on which species were hard to identify or were being reported from unusual locations etc. (we have interpreted this as “Better record cleaning rulesets”; not all responders asked for this by name, but the information they were requesting matches well with what the rulesets try to do; local customisation of rulesets was suggested as important by several.)

In the table below the first two columns present the data from the online questionnaire for the top five requests; the third column is our interpretation of the potential that online systems have to provide some of these improvements.

Desired improvement No. of requests

Potential for online systems to help

Training for recorders 36 SomeBetter record cleaning rulesets 17 HighBetter data exchange / consistent data fields / all data in one place 15 HighMore / better ID guides 11 SomeUser-friendly software 9 High

In addition, MBA’s data managers listed as their top priority the reassessment of the set of terms used to flag verification, both to increase the range of options and to ensure that the wording is as positive as possible when providing feedback to recorders.

Additional point from telephone interview with Graham Hawker, TVERC: People working with data from the data warehouse need to have a way of communicating validation

errors back to someone who can pursue them, or else be able to get editing rights to correct them. E.g. incorrect grid refs or site names may be picked up by LRCs more readily than by NSS verifiers, but it is unlikely that anyone within the LRC will have verification rights for the taxa in question, and there isn’t a clear route for reporting such validation concerns.

We also asked “Would you find it helpful to be shown more contextual information about the records and recorders?” Responses came from 90 people, and there was a high degree of support for most of the options suggested.

Suggested option Number agreeingNumber of records recorder has previously submitted for the current taxon group 66Recorder's biography (including details of which identification guides they use, whether they keep voucher specimens etc.)

65

Whether species previously recorded from 10km square (or other unit) 65Number of previous records for species in vice-county (or other unit) 64Recorder's self-assessment of their skill level for the current taxon group 53

3.8.2 Providing recognition for verifiersAt the moment iRecord display the registered user name of the verifiers that are active on the website, but does not give any means of linking verifiers to the NSS or LRCs that they are associated with. Providing such recognition was seen as important by responders, both within iRecord itself and when the data arrives on the NBN Gateway. For more on this see section 3.5.

4. Conclusions

As is apparent in the responses reported here, there is a huge amount of variety within the schemes, centres and individuals that make up the ‘biological recording community’. Local Records Centres are working towards agreed common standards via ALERC, but their individual circumstances and priorities differ depending on local circumstances and partners, and the local recording groups that they work with

page 22 of 58

Page 23: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

(and often support) add another layer to the picture. National Schemes and Societies, and local natural history groups, are much more varied, from large organisations with staff dedicated to data management (who may work with a network of volunteer county recorders or taxonomic specialists), to small recording schemes that are entirely run by one or a few volunteers. And the various taxonomic groups have their own characteristics – a recording scheme for a popular and relatively familiar group of species requires a different approach to a scheme focusing on more specialist groups that the general public would not normally interact with.Consequently there are large variations in the availability of time, expertise and software resources, and in the degree of priority given to exchanging information with others outside the scheme itself.Moving towards a more centralised way of capturing, verifying and sharing biological records can help to simplify and streamline the process and make more data available to more people more quickly, especially at national level. But such a move also has the potential to disrupt existing systems and relationships that in many areas are working well in the context for which they were developed.Online systems for biological recording are here to stay, and given trends in the use of online systems and mobile applications in many parts of life it seems very likely that they will have an increasingly important role to play in the gathering of biodiversity data. The issues arising from this project are grouped into themes below.

4.1 Online data verification

4.1.1: The benefits from having a central data warehouseThe central data warehouse approach is working well for a growing number of records centres, recording schemes and projects, and would benefit from being developed and improved as a way of collating new records from a variety of sources (including multiple front-end websites and apps) and making that data available in one place for verifiers. The tools provided for verification within the CDW will also benefit from further development, and feedback from participants in this project has already been used to implement a range of improvements. Ongoing consultation with verifiers using the system can steer further improvements.

4.1.2: How can the central data warehouse concept be developed? A CDW model has the potential to act as the single warehouse that sits behind many front-end systems, including other online systems that have already been developed (e.g. BSBI Distribution Database, RODIS, Living Record). This may produce cost savings and benefits from shared development. Alternatively it could be decided that it is more effective to establish better links and data sharing between several large data warehouses. Continued debate and consensus-building will be required to make progress.

4.2 Benefits to users

4.2.1: Are we making the most of best practice?The existing CDW is working effectively and good use has been made of it by a growing number of NSS and LRCs. As experience accumulates case studies and examples of good practice will help existing and potential users benefit from this knowledge.

4.3 Verification and validation efficiency

4.3.1: Are verifiers working in a consistent way? As more verifiers engage with online recording it will be helpful to have documented guidance on the issues that are likely to be encountered, such as how to develop shared verification (where appropriate), how to use the online verification tools, which work-flows are most effective for dealing with large amounts of data, and how best to communicate verification queries and decisions to recorders.

page 23 of 58

Page 24: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

4.3.2: Can we create a single version of a record?LRCs and NSS sometimes have requirements to edit site names and grid references as part of their validation procedures, but their reasons and priorities for doing this can differ. Work is needed to develop a consensus approach to this, and to resolve issues around designating a ‘top copy’ of a record (see also issue 4.6.2). One technical solution might be to allow alternative site interpretations to be added to the record, rather than editing the raw data.

4.3.3: Do we need common standards for verification?The software currently in use by NSS and LRCs varies in its ability to store verification decisions and unique record identifiers. In the long-term a CDW approach can reduce the need for data to be transferred between systems, but in the shorter term it is likely that this will be continue, leading to ongoing problems with data duplication and unnecessary re-verification of records unless progress can be made with sharing verification flags and unique identifiers as an integral part of each record.

4.3.4: Are there enough verifiers?NSS and LRCs can help to recruit more verifiers to engage with and help develop online recording where appropriate. For some purposes it can be helpful to share the verification role between several people (e.g. within those species groups where identification skills are widespread, or by division along geographical lines as for the county recorder model); for the more specialised taxonomic groups where taxonomic skills are at a premium shared verification is less likely to be appropriate.The inclusion of those existing verifiers who are able and willing to engage with online systems will be important to ensure that their skills and knowledge are available. Where existing expert verifiers are unable or unwilling to engage with online systems it may be possible for them to work with or mentor ‘assistant verifiers’, who can work with online records while benefiting from the experience of their mentor.The recruitment and development of new verifiers is also desirable to ensure that tasks can be shared where appropriate, that a continuity of verifiers is maintained into the future, and that the verification of biological records continues to be done to a high standard. Identifying gaps in verification (e.g. for taxon groups that do not currently benefit from a recording scheme – see issue 4.7.1) would help target areas for recruitment.

4.4 Use of NBN verification tools

4.4.1: Are the current rule sets fit for purpose?Feedback suggests that there are four areas that would benefit from further development: Development of rulesets for taxa that don’t yet have them. Updating of rulesets (including investigation of how these might be safely updated based on automated

analysis of existing data) for taxa that do have them. Customisation of rulesets for local/regional use (again it may be possible to automate this to some

degree). Promotion of the use of rulesets among verifiers and especially among recorders, so that recorders are

aware of when they need to provide further evidence to ensure their records can be verified.

4.5 Communications between verifiers and recorders

4.5.1: Is enough credit given to the contributors?Online systems need to do more to give recognition to the NSS or LRCs that engage with them, in order to both acknowledge the support of those organisations and to make it clear who the verifiers are acting on behalf of, so that recorders contributing records know who it is they are interacting with. This could be achieved with more use of standard ‘signatures’ on verifiers’ emails and other messages, or through assigning ‘badges’ to verifiers, or better use of links to verifiers’ profile pages and/or to the websites of their associated organisations.

page 24 of 58

Page 25: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

4.6 Data sharing efficiency

4.6.1: Is the value of verified data recognised by users?NBN has always worked to bring together partners from across the biological recording community. Building on and extending this partnership is an ongoing task, and there is more to do to persuade all potential partners of the benefits to biological recording and conservation in sharing data more centrally.There seems to be very little evidence available to judge what importance individual recorders attach to their records being made widely available for conservation, research and other uses; gathering more evidence on this would help support the debate.

4.6.2: What are the blockages to better data flows?The issues over data access controls and agreements are harder to resolve for the CDW, where there may be differences in approach between LRCs and NSS for the same set of records. Further debate and consensus building will be required to address this.

4.6.3: Can users query the data and provide feedback on data accuracy?Much progress is being made in this area already, but take-up of online technologies for data storage and sharing will increase if data suppliers and users are able to analyse data effectively via online systems (both within the CDW and via NBN). The ability to comment on records, and pursue corrections to them if doubts or errors are found, is also important, so that corrections or adjustments to the verification status can be made swiftly.

4.6.4: A risk-based approach to poorly supported dataDespite the very well-developed biological recording community in this country, not all taxa are covered by a specialist recording scheme. This means that for some groups records are accumulating but are not being verified or actively collated. It would be possible to make use of this data in its unverified state to provide some insight into these groups, and this may attract new people to take on the challenge of developing recording schemes for them.

4.7 Sharing unverified data

4.7.1: Reducing risk in data useThere was acceptance among a reasonable proportion of responders that there is an argument for making use of unverified data, at least in some circumstances, but that in order to do so there must be adequate controls and guidance in place. Effort would also be required to ensure that data users understood the implications of accessing unverified data so that they did not use it inappropriately. A possibility worth exploring would be to provide controlled access to specified data within the CDW to those users who have a need to work with unverified data, rather than allowing unverified data to be disseminated more widely via the NBN Gateway where it could more easily be misinterpreted.

4.8 Supporting verifiers

4.8.1: Responding to needOnline technology has the potential to support at least some of the top five improvements listed in section 3.8.1, and priority should be given to this in order to support verifiers who are engaging with the CDW:

Desired improvement No. of requests

Potential for online systems to help

Training for recorders 36 Some

page 25 of 58

Page 26: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Desired improvement No. of requests

Potential for online systems to help

Better record cleaning rulesets 17 HighBetter data exchange / consistent data fields / all data in one place 15 HighMore / better ID guides 11 SomeUser-friendly software 9 High

4.8.2: Getting the language rightAt the moment there are only four categories available within the iRecord implementation of verification tools: Verified, This record is assumed to be correct, Queried, Rejected. An alternative set of terms is in use by MBA: Certain, probable, uncertain, possible, dubious, insufficient information, definitely not. Another set is given in the NBN Standards for integrated online recording and verification (Anon. undated), where the recommended framework for defining the verification status of records is: Correct, Considered correct, Requires confirmation, Considered incorrect, Incorrect, Unchecked. Further consultation and consensus is urgently required in order to facilitate greater consistency in applying and understanding these terms, and effective transfer of verification flags when disseminating data.

5. References and acknowledgementsAnon. Undated. NBN Standards for integrated online recording and verification.

http://www.nbn.org.uk/nbn_wide/media/Documents/Publications/NBN-Standards-for-Online-Recording.pdf

James, T. (compiler) 2011. Improving Wildlife Data Quality - Guidance on data verification, validation and their application in biological recording. National Biodiversity Network.

Lightfoot, P. 2013. Dive into conservation? Integrated marine recording and recorder motivations (presentation at NFBR conference 2013). http://www.nfbr.org.uk/wiki/images/d/dc/Pl-dive.pdf

Thanks to all who provided responses to this project, both from the main project partners and the wider online questionnaire. Thanks to Graham French of the NBN Trust for providing information on data uploads to the NBN Gateway, and to the project steering group convened by Natural England for their valuable advice throughout.

page 26 of 58

Page 27: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Appendix 1: detailed results from Phase 2The results presented here are summarised from the responses to the online questionnaire, with additional text taken from telephone interviews with staff and volunteers from the main project partners. These detailed results were further summarised in “Results from Phase 2” in the main report, above.

The views given are those of the individuals concerned, and do not necessarily represent the policies of the recording schemes or records centres with which they are associated. Comments have been anonymised for presentation in this report.

Section 1: responders’ detailsSection 1 of the online questionnaire asked responders for details of name, email address, the relevant records centre and/or recording schemes for which they carry out verification, what their roles were (verification and/or data management), how many records they deal with each year, and whether they were already had a verification role for an online system.

The online questionnaire collected 104 responses, from the following categories of verifier:

Verifier category Number or responses PercentageLRC (staff or volunteer) 25 24%Local Natural History Society 2 2%NSS – national representative

23 22%

NSS – local representative 51 49%Other 2 2%Recording project 1 1%

Of these, 23% described themselves as data managers, 25% as verifiers, and 52% as both verifiers and data managers. Of the people who carried out verification, seven verified 100 or fewer records per year, and three verified over 100,000 per year, but most were in the 1,000–10,000 range per year. Of those who reported the number of records they dealt with as data managers, there was a wider range from 100s through to 4 million per year, with most in the 1,000–100,000 range. (However, it seems likely that some of the higher figures are for total data holdings rather than the annual total that the question asked for.)

Just under half (47) of the responders said that they currently carried out verification via an online system:

Online system Number acting as verifiersiRecord 16iSpot 12BSBI Distribution Database 10Indicia-based system 6Other 5RODIS 5Living Record 4CATE2 (Association of British Fungus Groups) 2BTO 1

Section 2: current verification proceduresIn answer to the question “Do you verify data before it gets on to your main database, or after, or both?”, most responders answered “both”, with a smaller number “before” and a smaller number again “after”. Although there is much common ground in the type of checks that verifiers see as necessary, the responses show a great deal of variety in the workflows used to achieve this. Verification checks may be carried out before or after importing into a main database, or both before and after. Some of this variation is the result of differences in the capabilities of the software used (e.g. data synced in to a MapMate database can only be viewed after it has been imported into the database). Some systems use flags in their database to

page 27 of 58

Page 28: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

indicate whether a record has been fully verified or not, but again this is often dependent on the software used.

We asked “Do you use the NBN Record Cleaner rules to help with verification?”. Of 101 responders, the majority did not use the NBN Record Cleaner tools on their dataset:

Record cleaner used? No of responses (n=101)No 74Yes 21Used for parts of dataset 3Non-NBN ruleset used 2Trialling Record Cleaner, not yet in use 1

Of those responding for LRCs, 35% made use of the NBN Record Cleaner tools; for NSS, 24% made use of these tools.

We asked “Do you run any other automated verification checks?” Of 93 responders, 52 said they did not carry out any other automated verification checks on their data (i.e. other than the NBN record cleaner) – checks were made by eye or by simple sorting of data in spreadsheets. The other 41 said that they did carry out automated checks, but their answers indicated a very wide range of procedures in place, using a variety of software, and focusing mostly on validation rather than verification checks (the question asked about verification, but the answers clearly took this to include validation as well).

Thirty-seven responders indicated that they did not use the NBN Record Cleaner nor any other automated checks. Of these, 7 were associated with LRCs and 27 were associated with NSS. However, it is not clear whether some of these responders were being more precise in limiting their response to verification checks (as asked in the question) rather than validation checks; also, some responders were part of a team of verifiers and expected automated checks to be made by others in their team.

We asked “Does your current data system allow you to explicitly store any verification decisions that have been reached (e.g. flagging records as verified / rejected / assumed correct / not examined)?” The majority (64%) of responders stated that their database systems did enable them to explicitly store verification decisions for each record, e.g. by flagging the record as verified/unverified/not yet examined. However, among the NSS there is a sizeable minority that is not able to do this (due to software limitations in at least some cases).

Category Verification decision explicitly stored? No. of responses (n=97)LRC No 1 3LRC Yes 22NSS No 1 29NSS Yes 38other No 1 3other Yes 2

1: one LRC, 3 NSS and 1 other project stated that although they did not store verification decisions explicitly, it was implicit that all records were verified as they would not have been added to their database if unverified.

We asked “Does your current system allow you to store the unique serial numbers or codes of records imported from other databases?” The ability to store the unique record identifiers (= unique keys) of records that have been supplied from external databases is more limited. Over half of LRC responders stated that their systems did store external identifiers, but fewer than half the NSS responders were able to do this.

page 28 of 58

Page 29: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Category Ability to store external unique record keys No. of responses (n=93)LRC yes 14LRC no 8LRC uncertain 2NSS yes 24NSS no 35NSS uncertain 5other yes 2other no 3

We asked “On average, roughly how long does it usually take for a record sent to you to be verified?” We were hoping to get an idea of how long it might take between someone submitting a record, and that record becoming incorporated in the scheme/centre database as a verified record, but the question was too imprecise and most chose to interpret it as how long it took them to verify a record once they started looking at it. Many responded that the question was impossible to answer in any case, as it depended on what the record was and could vary from seconds to years.

We asked “Do you think your contributing recorders like to see their records appear on the NBN Gateway as quickly as possible?” The clearest pattern to emerge here is that the answer to this question is not well-documented. Many verifiers expressed an opinion but commented that it was just their opinion, while the “don’t know” category got the highest number of responses.

Category Do your recorders like to see records appearing quickly on the NBN Gateway? Responses (n=101)

LRC yes 4some yes, some no 5no 4don't know 9not engaged with NBN at present 2

NSS yes 20some yes, some no 2no 16don't know 32not engaged with NBN at present 2

other no 5

The comments associated with this question were also mixed, e.g.: Yes. Most do not use the NBN [Gateway itself] but do use maps provided by the NBN to other websites

such as Moths Count and UKMoths Some do; others are not concerned; a few are against this altogether.

Only one response indicated that they had sought quantitative data on this question from their contributing recorders: 50% said they felt it would be a waste of time filling in our project’s recording forms if their data weren't

made fully publicly available, but only 46% said they used the Gateway themselves. Getting records used for conservation is a very strong motivating factor for our volunteers, so in order to make this possible the records need to be available as quickly as possible.

We asked “Are you the sole verifier for your particular recording scheme or project or is verification shared between a group of people?” There is a fairly even split across all responders between those that are the sole verifiers for their scheme and those that are part of a team sharing verification roles. Just over half (52%) of all responders are part of a team, but unsurprisingly this is slightly more prevalent among LRCs than among NSS. The nature of shared verification is variable; the most common pattern is for two people to work jointly on a particular scheme, but in some cases the answers referred to dividing up responsibility by taxon groups within the recording scheme or centre as a whole; or by working with taxon experts on particularly difficult taxa; or by consulting others on an ad hoc basis as the need arises. (A few of the

page 29 of 58

Page 30: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

“verification shared” answers were from people who seemed to be sole verifiers for their county but part of the bigger team of verifiers for the country as a whole.)

Category

Verification sharing Responses (n=91)

LRC sole verifier 5LRC verification shared 11NSS sole verifier 36NSS verification shared 35Other sole verifier 3Other verification shared 1

We asked “List up to five ways in which your role as a verifier could be made easier.” By far the most often-requested was for recorders to receive more training, not just in species identification but in how to record their observations in a clear way with sufficient supporting evidence, providing photos or specimens as appropriate. Next most popular was to have more information available to both recorders and verifiers on which species were hard to identify or were being reported from unusual locations etc. (Not all responders asked for “record cleaning rulesets” by name, but the information they were requesting matches well with what the rulesets try to do; local customisation of rulesets was raised by several.)

In the table below the first two columns present the data from the online questionnaire; the third column is our interpretation of the potential that online systems have to provide some of these improvements.

Desired improvement No. of requests

Potential for online systems to help

Training for recorders 36 SomeBetter record cleaning rulesets 17 HighBetter data exchange / consistent data fields / all data in one place 15 HighMore / better ID guides 11 SomeUser-friendly software 9 HighTraining for verifier 5 SomeNo taxon name changes / better handling of synonyms 4 ModerateMore time for verification 3 ModerateMore consistent provision of photos linked to records 2 HighMore verifiers sharing work 2 HighEase of contacting recorder / recorder names provided with data 2 ModerateChecklist updates 2 ModerateRecorder accreditation 2 SomeBetter taxonomic knowledge for group 1 -More consistent use of site names 1 SomeFewer records on paper 1 HighKeep track of record unique IDs 1 HighCloser links with local experts 1 Some

From phone interview with MBA data managers: Top priority: reassessing the set of terms used to flag verification, both to increase the range of options

and to ensure that the wording is as positive as possible when providing feedback to recorders. [List of terms in use at MBA is “Certain, probable, uncertain, possible, dubious, insufficient information, definitely not”]

We asked “Does your scheme/centre receive records for which you don’t have the resources or expertise to verify fully?” Overall about half of responders said yes. A higher proportion of LRC responders said yes, as a result of dealing with multiple taxon groups for which local verifiers might not always be available. A smaller proportion of NSS verifiers said yes, and those that did often referred to the impossibility of verifying records in the absence of sufficient evidence (photos, specimens) rather than the absence of resources or expertise.

page 30 of 58

Page 31: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Category Do you receive records that you don’t have the resources or expertise to verify fully?

Responses (n=70)

LRC Yes 17LRC No 6NSS Yes 18NSS No 26other Yes 1other No 2

Section 3: providing data to the NBN GatewayWe asked “Once you have verified a set of records, how do they reach the NBN Gateway?” The situation regarding whether and how data gets from verifiers to the NBN Gateway is complex. Direct supply to the Gateway is the most frequent route, but some LRCs supply data via NSS and vice-versa, and other routes are also used, e.g. via BRC. Of the NSS not supplying data to NBN, the most frequently cited reason was a current disagreement over the recently revised data access rules.

Category Gateway route Responders (n=101)LRC direct 11LRC direct (partial data) 3LRC via NSS 7LRC not supplying to NBN 2NSS direct 51NSS direct (partial data) 1NSS via BRC 4NSS via LRC 5NSS don't know 2NSS not supplying to NBN 10Other via LRC 1Other via NSS 2Other via NSS (partial

data)1

Other not supplying to NBN 1

From phone interview with MBA data managers: MBA has a well-established system with DASSH acting as the archive for verified data archive, and

DASSH data being disseminated via the NBN Gateway. The ability to feed verified records from the CDW directly (and immediately) to NBN is seen as a very good thing, enabling recorders to see that their data is being added to the national dataset as quickly as possible. However, where records are feeding through to NBN more quickly, there needs to be a clear process for dealing with any subsequent changes that may be required if further verification checks require it.

Where records are being delivered direct to the NBN Gateway from the CDW it would be beneficial to ensure that the dataset on NBN acknowledges the input of MBA-linked verifiers.

We asked “Once a record or set of records has been verified, roughly how long does it take to appear on the NBN Gateway?” Responses to this were very varied. Most LRC responders said that they are supplying data every six months, following the procedures outlined in their agreements with Natural England. Among the NSS there were a variety of responses, with time taken ranging from 6 to 18 months for many of those that gave a time, but also lots of comments to the effect that the timing was not known (due to the data being passed to NBN via a national base rather than direct from the verifier), and some comments that the verifier was uninterested in whether or not the data appeared on the Gateway.

We asked “What are the greatest constraints on your scheme’s ability to supply recent (21st century) verified data to the NBN?” and asked responders to give a score from 1 (“most constraining”) to 5 “(least constraining) for five potential constraints:6. Lack of field recording taking place7. Lack of time for verification

page 31 of 58

Page 32: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

8. Lack of experience / training among field recorders9. Taxonomic uncertainties making verification decisions hard to reach10.Records not yet digitised

A very mixed picture emerged from this: The constraint ranked highest on average for all responses (and for NSS responses only) was “Lack of

experience / training among field recorders”, but no individual responder ranked this as “most constraining”.

o For LRC responses only, the constraint ranked highest on average was “Lack of time for verification”

The constraint ranked lowest on average for all responses (and for NSS responses only) was “Records not yet digitised”, but 11 responders ranked this as “most constraining”.

o For LRC responses only, the constraint ranked lowest on average was “Taxonomic uncertainties making verification decisions hard to reach”

Only two constraints were ranked as “most constraining” by at least one responder: “Lack of field recording taking place” (22 responders) and “Records not yet digitised” (11 responders).

Only two constraints were ranked as “least constraining” by at least one responder, and these were the same two constraints: “Lack of field recording taking place” (25 responders) and “Records not yet digitised” (37 responders).

All responses (n=88):lack of field recording

lack of time

lack of recorder experience

taxonomic uncertainty

records not digitised

Average of all responses 2.94 3.10 2.91 3.13 3.74Number scored 1 (most constraining)

22 0 0 0 11

Number scored 2 20 13 22 13 6Number scored 3 10 19 28 22 15Number scored 4 11 18 16 20 18Number scored 5 (least constraining)

24 0 0 0 37

LRC responses (n=20):lack of field recording

lack of time

lack of recorder experience

taxonomic uncertainty

records not digitised

Average of all responses 3.11 2.79 2.83 3.33 3.05Number scored 1 (most constraining)

4 0 0 0 5

Number scored 2 4 5 4 2 1Number scored 3 2 7 6 4 6Number scored 4 4 2 2 6 2Number scored 5 (least constraining)

5 0 0 0 5

page 32 of 58

Page 33: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

NSS responses (n=66):lack of field recording

lack of time

lack of recorder experience

taxonomic uncertainty

records not digitised

Average of all responses 2.89 3.22 2.96 3.05 3.94Number scored 1 (most constraining)

17 0 0 0 6

Number scored 2 16 8 16 11 5Number scored 3 8 12 22 17 8Number scored 4 7 16 14 13 15Number scored 5 (least constraining)

18 0 0 0 32

We went on to ask “And more specifically, to what extent are verification resources a constraint on amount of data that your scheme is able to collate and pass on?”, with responders asked to indicate their strength of agreement or disagreement with three statements:1. My scheme / centre could make more data available if verification could be done more quickly2. My scheme / centre could make more data available if more verifiers were available to help3. Verification would happen more quickly and effectively if the role could be shared with other

experienced verifiers

On average there was more disagreement than agreement with all three statements. If LRC responses and NSS responses are compared, NSS responders had a higher level of disagreement with all three statements. For statement 3 there was on average some agreement from LRC responders.

All responses (n=96):Resources - quicker verification

Resources - more verifiers

Resources - shared verification

Average of all responses 3.46 (between “neither agree nor disagree”

and “disagree”)

3.44 (between “neither agree nor disagree”

and “disagree”)

3.16 (between “neither agree nor disagree”

and “disagree”)Number scored as: strongly agree 6 3 4Number scored as: agree 15 20 30Number scored as: neither agree nor disagree

28 24 23

Number scored as: disagree 23 30 25Number scored as: strongly disagree 24 19 14

LRC responses (n=22):Resources - quicker verification

Resources - more verifiers

Resources - shared verification

Average of all responses 3.18 (between “neither agree nor disagree”

and “disagree”)

3.09 (between “neither agree nor disagree”

and “disagree”)

2.68 (between “agree” and “neither agree nor

disagree”)Number scored as: strongly agree 3 2 2Number scored as: agree 3 6 9Number scored as: neither agree nor disagree

8 6 6

Number scored as: disagree 3 4 4Number scored as: strongly disagree 5 4 1

page 33 of 58

Page 34: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

NSS responses (n=70):Resources - quicker verification

Resources - more verifiers

Resources - shared verification

Average of all responses 3.56 (between “neither agree nor disagree”

and “disagree”)

3.56 (between “neither agree nor disagree”

and “disagree”)

3.30 (between “neither agree nor disagree”

and “disagree”)Number scored as: strongly agree 2 1 2Number scored as: agree 12 12 19Number scored as: neither agree nor disagree 19 18 17Number scored as: disagree 19 25 20Number scored as: strongly disagree 18 14 12

We also asked for further comments on the issues raised in the previous two questions, and comments were provided by 13 LRCs and 34 NSS. From the comments received there is strong feedback that verification itself is not necessarily a constraint on data arriving on the NBN Gateway. Lack of experienced recorders, difficulties of collating data in one place, and concerns over the Gateway itself are all seen as bigger issues.

Selected LRC comments (anonymised): None of these (in question 3.3) are constraints for us. They maybe constraints to recording in general but not for

the provision of data to the NBN Gateway. More data could be made available if we had access to local expertise in the relevant species groups. It is

important to foster local expertise and build our capacity. We will further hamper our ability to record and build expertise/understanding if we pile all the verification work onto a few national experts. The speed in which we verify data makes no difference to the quantity of data we make available. Sharing the role will inevitably cause issues through disagreements and managing a relationship like this will be fraught with issues. Splitting species groups down to families and even genus groups would work, but many verifiers would probably only ever see their own data. However this links to a wider issue of expertise across the board; we need more recorders with more expertise in a wider field.

We provide unverified data to the NBN (in clearly marked datasets) and so verification would not increase the amount of data provided. The main issue for us is in getting a large amount of historical data verified in the first place where there is no current expert (either engaged with the LRC or existing in our area).

This is not relevant to LRCs as we don't verify the data ourselves. The greatest restriction is whether the data is already being uploaded via a different route such as National Schemes.

Our main limitation to supplying data getting the agreement of recorder for their records to be openly accessible. Another major issue is that bad records are not corrected quickly on the NBN so recorders insist that everything is checked over and over again so that no bad record will ever get to the NBN. I would prefer rapid database to database exchange so that faults could be corrected quickly.

The main constraints are the volume of data (and a significant amount still arrives in paper form or in unstructured documents) and verification of location names against grid references, which is very important to some of the users of the processed data.

This questionnaire does not address our concerns about the NBN Gateway. Records are being downloaded by consultants for commercial purposes, which means that they are not contacting the LRCs for data searches. This is depriving the LRCs of an important source of income.

Verification generally isn't the problem, it's the collation of records from a variety of sources and formats into standardized spreadsheets/database that takes the time. For only a few taxonomic groups there is no local expertise for verification and the records will not reach the Gateway or NSS.

Verification happening quickly should not be the priority. Good quality, accurate verification by the appropriate taxonomic experts with knowledge of the local area should be the priority. Accurate digitisation and validation of records takes time. Even if records are digitised then it takes time to convert them into the appropriate format for import into the database.

We have what I consider to be good schemes in place already for data collection, collation and validation. Our biggest constraint is (and I am sure always will be) the reluctance of recording groups to have their data made available on the Gateway and open to misuse. This concern has only increased with the new download options and we as a Records Centre have experienced an increase in consultants requesting higher resolution download access to our data which firstly shows that they have gone to the Gateway with the express intention of skirting round the LRC to download the data for free otherwise they wouldn't have been there to know that the data resolution has been reduced in the first place. It also justifies the concern of the Recording groups who asked us to reduce the

page 34 of 58

Page 35: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

resolution of their Gateway data down to 10km resolution when the new download facilities were first unveiled. I personally am very concerned about the number of requests that we and other LRCs have had in the last couple of months from Gateway users who are clearly mis-using the system and in some cases fabricating their reasons for wanting access to the data just to avoid LRC charges.

We take a pragmatic approach so after internal checks data is made available immediately to ecologists and partners with verification being an on-going process from the time the record enters the LRCs system until decades later in some cases where specimens are involved. We do delay *public* viewing of data on the NBNG until after the initial annual verification assessment; however, the data is available to Defra, national schemes etc in the meantime.

Selected NSS comments (anonymised): There are concerns over putting too much emphasis on immediate verification of records, sometimes

verification benefits from greater time for consideration. However, in principle quick verification is a desirable goal, but it needs to be accompanied by an equally fast and effective way of highlighting and correcting any errors that slip through.

One of the greatest constraints on our ability to supply verified data is a lack of information regarding the method used to identify the species.

As a volunteer, passing records to the NBN Gateway is not my decision to make. At present there are relatively few recorders with high id skills who could be verifiers, and they have their hands

full on other time commitments. The way forward is to have county or regional verifiers. Our data are not supplied to the NBN Gateway due to over-riding quality concerns. Our records have been

meticulously checked and cleaned by a competent team. Placing the data on the Gateway would open the way to merging with data that have not been cleaned, where records are effectively consigned to a data 'drop box'. This would negate much of our high level of investment in cleaning and correcting data. Our database is of sufficiently advanced technical quality that any analysis can be performed online by registered users.

Setting up online recording will help us make more data available more quickly (and deliver lots of other benefits to our recorders) but it's not true to say that it will achieve this by doing verification more quickly and involving more verifiers. Verification can already be done instantly, there are plenty of verifiers available to help and the role is shared between them, so Indicia won't change this way of working, but will integrate it into the data management process.

I don't feel that verification is the key factor in dissemination of the data. As long as there is a process by which changes in the verification status are logged, unverified records could be made available. Making them available to the NBN Gateway is then more down to the time and effort in preparing the dataset for Gateway upload, and if this is not suitably resourced there is little incentive to do so.

I have all I need to check verifications. The NBN system is far too difficult. Trying to update the system takes so long I/we only try to do this once a year, or every other year.

I have currently requested that our data are not passed to the NBN as their procedures for holding back on sensitive data recently changed. I do not consider the NBNs new procedures are suitably robust enough to ensure that unscrupulous individuals would be stopped from accessing the full data.

Main constraint at present is encouraging mammal recording amongst volunteer recorders whilst also encouraging consultant ecologists to submit their records to the Mammal Group or LRC to make them available for verification and import. If all consultants and other organisations made their data available, then yes, we'd struggle to verify them all and struggle to get them all digitised!!

The biggest challenge is not verifying records but collating, and where appropriate digitizing records, and entering them into the database; a great deal of time is spent formatting records and importing them into database. Verifying them is far less time consuming!

The verifier does not limit the number of records; only the speed by which they are made available. Verification could be quicker if more resources to some of the verification material was available e.g. genitalia

diagrams for dissections. Whilst most of the species are covered somewhere they are not at all easily obtained and prohibitively expensive should one ever try to obtain everything.

Verification isn't currently a bottleneck - the problem is simple getting records into the database and cleaned up. We are on top of this at the moment and we don't expect a huge increase in records of taxa in our scheme. If we do get more records from novice members of the public then this could be a drain though and for these we need as much information as possible - e.g. photos mandatory for all novice recorders.

You are assuming that the problem is verification. The reality is that we can do a great deal but in the end we have a million and one jobs to do - running training courses, writing new material, engaging with recorders, assembling data, running field meetings .... For myself, my position is that I run a recording scheme that will contribute to the NBN but that the NBN is not the most important part of the equation. Recorders generally see their records on our own website within 3 months but it has been a long while since we updated the NBN dataset.

page 35 of 58

Page 36: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

We asked “Which of these options are preferred for passing data to the NBN Gateway?” This sought opinions on the best way for data held in a central data warehouse (CDW) to be brought into a dataflow and ultimately passed to NBN. Options offered were:5. Data you have verified in the CDW becomes immediately available to the NBN (scored 1 for analysis)6. Verified data is downloaded from the CDW into the scheme/centre database and passes from there to

NBN (scored 2)7. It would be possible for all data currently in the scheme/centre database to be stored in the CDW

(scored 3)8. It would be possible to link local databases with the CDW so that all data could be queried without

having to move data between databases (scored 4)

There were a number of comments from responders who were not clear what the central data warehouse concept was, and how it might apply to them (see below).

The preferred option for all responders, and for LRCs and NSS separately, was option 2, by a large margin. The least preferred option for all responders, and for LRCs separately, was option 3. For NSS the least preferred option was option 4.

All options were chosen as “Not workable” by at least some responders, although option 2 received fewest “Not workable” scores by a large margin.

There were 25 of the 73 responses where the responder did not score any of the options as “Preferred option now”. Of those 25, 13 did score an option as “Could be preferred in future”, while 8 scored all options as not preferred or not workable.

All responses (n=73):Verify in CDW, direct to NBNG

Verify in CDW download to local DB, upload to NBNG

All data stored in CDW

Local DB linked to CDW

Average score 2.60 1.82 2.83 2.77Number choosing “Preferred option now” (score 1)

14 31 8 6

Number choosing “Could be preferred in future” (score 2)

15 13 16 24

Number choosing “Workable but not preferred” (score 3)

23 12 13 4

Number choosing “Not workable” (score 4)

16 4 22 23

LRC responses (n=19):Verify in CDW, direct to NBNG

Verify in CDW download to local DB, upload to NBNG

All data stored in CDW

Local DB linked to CDW

Average score 2.67 2.00 2.89 2.42Number choosing “Preferred option now” (score 1)

4 8 3 3

Number choosing “Could be preferred in future” (score 2)

5 2 4 10

Number choosing “Workable but not preferred” (score 3)

2 4 3 1

Number choosing “Not workable” (score 4)

7 2 8 5

page 36 of 58

Page 37: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

NSS responses (n=51):Verify in CDW, direct to NBNG

Verify in CDW download to local DB, upload to NBNG

All data stored in CDW

Local DB linked to CDW

Average score 2.52 1.79 2.78 2.92Number choosing “Preferred option now” (score 1)

10 21 5 3

Number choosing “Could be preferred in future” (score 2)

10 11 12 14

Number choosing “Workable but not preferred” (score 3)

21 8 10 3

Number choosing “Not workable” (score 4)

7 2 13 17

The comments associated with this question raised a large number of issuers, not all of them directly related to the central data warehouse concept. There is clearly a very wide range of views on the desirability of sharing data via the warehouse, and indeed on sharing data in general.

Selected LRC comments (anonymised): Data you have verified in the CDW becomes immediately available to the NBN - This is preferable on the

assumption that the data is only available in accordance with our specified restrictions. It would be possible for all data currently in the scheme/centre database to be stored in the CDW - There are concerns that there would be a loss of control over the data, especially if the CDW becomes corrupt or inaccessible for example, we need access to our data on a daily basis, if we are unable to access the information this would impact our ability to provide LRC services.

Our LRC does not think that a central data warehouse is a viable option. Directly linking the NBN with local databases is a long way off in the future for a variety of reasons. No automated or online verification system can ever replace the input of an experienced local expert and whilst I

appreciate that not all areas have the same systems. We have worked very hard to build up excellent working relationships and friendships with local experts who appreciate the personal communication that LRC staff have with them. To change the systems we have in place may result in a time saving (possibly a significant time saving in some cases) but it's not all about time and speed of verification. There is a lot to be said for having those strong relationships with our local recording community and we benefit in far more ways than just data verification support. We co-ordinate public benefit events together, have social meetings and gatherings to encourage new comers of all ages to recording and wildlife appreciation, can call on them to help out when we are asked to give talks to local groups and schools etc.

Of over 50 LRC partners, only a couple are concerned about data going on the Gateway and they're funding partners not data contributing partners, which puts us in a difficult position. The local experts are keen that their data are informing local decisions via the LRC and aren't so worried about data going onto the Gateway.

Pooling all our data on the CDW is not sustainable, issues over reliability and restrictive t&c's make uptake unworkable. It would be useful to have some form of automatic sync mechanism, but this is not as yet a priority for us.

The current CDW (i.e. main iRecord warehouse) is a black hole as far as most data for our county is concerned, because we are the local coordinator for so many taxa groups and act as the single verification hub for them on behalf of national schemes. Our most active data flows are (with the usual complications): records go to us (via spreadsheet, email, paper, Rodis online recording, local referee) -> final collation by LRC -> verified by local referee -> NBNG and NSS. The CDW stands outside this dataflow and I cannot work out how to incorporate within our current resources.

The problem is that who owns the records on iRecord. I might verify records within the LRC area but that doesn't mean they belong to us and that we can dictate what happens to them. The problem with all data being stored in the CDW and even on the NBN is obtaining permissions to do this.

We would never surrender control of our data by moving it to the CDW. The top copy of our data must remain at the LRC. Linking databases is a technological challenge that would require all nodes to be maintained by trained personnel, and probably therefore impractical to implement.

Where is the option not workable and not preferred? The tools within Recorder 6 need to be improved to speed up export to the NBN.

page 37 of 58

Page 38: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

From phone interview with WBRC: Storing a larger proportion of data in a ‘cloud-based’ central data warehouse could become a viable

option in future, but for this to work a number of conditions would need to be in place, including:o Needs to be easy to make data available at local level both for WBRC work and for local

verifiers to use – reliable and fast access to online data is not yet a reality for all recorders/verifiers, especially in rural areas, and for the foreseeable future there will still be a need to download data for local use.

o There needs to be recognition of the role of LRCs in making data available to NBN – this is currently seen as one of the performance measures for LRCs, but if LRCs are to encourage greater use of online systems and a central data warehouse it will not be possible to track the number of records that are “supplied by” the LRC.

o Needs to be appropriate control over access to sensitive records, e.g. protected species.o There are concerns that if local authorities see a national central data warehouse as the

place to obtain data, they will not support local LRCs, even though local LRCs play a vital role in supporting the network of recorders that are providing the data to the central data warehouse. Will need a strong steer given to local authorities to shift them from measuring the number of records that they ‘own’ locally.

Selected NSS comments (anonymised): A big problem is the lack of trust in NBN from recorders and verifiers. Its code of conduct is being abused by some

consultants and organisations in order to save money, but because there is no way of policing it they can get away with it. Recorders feel their goodwill and volunteering is being taken advantage of. I sometimes slow down my submission of data so that NBN is never as up-to-date as local sources. I am seriously considering supplying my records with a "not to be passed on to NBN" rider.

A major problem here is that LRC data are (with about 6 exceptions) riddled with errors. This is because they do not have an expert who can check data for this particular taxon group satisfactorily. I have checked and incorporated data from about 10 LRCs but then ran out of steam. I think it is desirable that recorders should be directly challenged about unlikely records and encouraged to send doubtful specimens to experts. Field data from quadrats or from most ecological consultants are mostly not well enough identified to be used for national recording.

Data needs to be radically simplified between the local and national levels. Data that I verify I would want to hold a copy of in my own database. I would want to submit these data to my

central scheme database and for them to submit to NBN (with appropriate controls on spatial detail). Having the CDW data also going to NBN has the advantage of speed onto the NBN possibly, but the duplicate data may be problematic.

I am of the opinion that it is vital that all data from a CDW be sent to the recording scheme for incorporation into a single dataset. This would make for easier management of the verified data and allow tasks such as phenology studies to be carried out more easily with a single dataset.

I don’t like the fact that the NBN gives out full data to people and agree with my national scheme not supplying data to the NBN at present.

I have grave reservations about the free availability of data, because there are too many ecological consultancies who disregard the rules governing the use of NBN data. Such actions endanger the financial viability of record centres, which are already vulnerable to closure and need to be supported by the consultancy industry.

I haven't a clue what a central data warehouse is. The vital thing is all records MUST come through me as part of the flow chart. If they don't, the records are useless, unverified and don't mean a thing.

My overall preferred route is observer > county recorder (i.e. verifier) > wherever. I'm not really sure where the "central data warehouse" fits in at all.

Need a single stop shop for verification and queries that is within the NBN structure and that merely tips its contents once released, directly into the mainstream NBN once or even twice a year.

One of my bugbears is the number of other organisations (e.g. Wildlife Trust) & national recording schemes that I have to send data to separately. I wish I could upload batches of records once to a CDW & then individual national recording schemes could download records that they are interested in.

Our scheme generally does not depend upon a central data warehouse. I extract iSpot data every day and check every record because there is the potential for a lot of duff records. As far as I know we have never taken records from IRECORD. We get our data direct from a network of recorders that we spend a lot of time engaging with - contact with individuals by e-mail, contact via websites and via Facebook. As far as I can see we rarely get data

page 38 of 58

Page 39: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

from any CDW - although we do access data from Records Centres on an ad-hoc basis when we make a call for records.

Some of our recorders and verifiers do not want unverified records to be publicly available via the NBN Gateway or via any other portals driven by web services from the central NBN database - even if the records are flagged as unverified. Some feel this could undermine the scheme’s reputation, and want to withhold unverified records even from statutory agencies, only releasing them when they have been verified. Like many national recording schemes, we cover the UK and Ireland, so a central database needs to store and disseminate our entire dataset.

The flow from national scheme and thence to the NBN is OK as far as I (a local recorder) am concerned, so long as records are made available before the following season. The two issues I have are (a) with receiving data from many different sources in different formats and (b) the local database and national database being separate, meaning that I cannot see what has been logged on the national database for my county from other schemes - means that I cannot verify these records or take them into account when analysing county data.

The issue is ownership. My data are BSBI data not NBN. The main problem in recording of a difficult group is always the presence of invalid data in places that they

shouldn't be. Once a record is confirmed, there is generally no reason for that record not to enter the NBN immediately, in my opinion. The larger problem is confirming the data *already* in the NBN, and making sure that records don't enter the system as 'confirmed' from unreliable routes (many local record centres, for example, do not check groups in which they have no local expertise, and these records should therefore not be considered 'confirmed').

Centralising the data is only one part of the issue - holding good data locally is also very important and the data should flow in both directions.

Phone interview with data manager, National Moth Recording Scheme: NMRS has put a lot of time and resources into liaison with the CMR network, and it is essential for the

continuing success of NMRS that good relations are maintained with this network. Current dataflow for NMRS is geared to all data going through the relevant CMR before being passed to NMRS for national collation. This model is explicitly built in to the existing data policy of NMRS and has been the subject of extensive negotiation in order to achieve agreement among all CMRs. There are no plans at the moment to change the NMRS data policy.

So for the foreseeable future option 2 is the only one that is practicable for NMRS – data arriving in a central data warehouse needs to be downloadable for CMRs to incorporate into their existing systems, and then supply back to NMRS.

In the long-term it may be that CMRs will wish to consider other options as online systems become more established, but there are a number of factors that need to be born in mind, including the availability of reliable internet access and familiarity with online technology among the network of CMRs, which encompasses a wide variety of skills, experience and enthusiasm for online technologies.

TVERC phone interview: Download from central data warehouse (and other systems) is needed to make data available for

TVERC’s search/analysis/reporting, but datasets can be excluded from TVERC’s subsequent upload to Gateway if that dataset is being uploaded by another route. However, if LRCs are increasingly allowing BRC or NSS to upload data to Gateway direct from central data warehouse, recognition needs to be given that the numbers of records uploaded by the ERC may decrease, even if the records originate from ERC initiatives such as survey projects or digitisation of data from consultants reports.

WBRC phone interview: Currently best route is for data to be collated onto recording scheme database and uploaded to NBN

from there. In future it could be possible for data from the central data warehouse to go straight to NBN, but in order for that to happen NBN would have to be able to show clearly which data had been verified by the recording scheme. Data would still need to be held locally by the scheme for analysis and reporting purposes, unless the point is reached whereby full access to all relevant data can be easily got from NBN (e.g. to easily query the gateway to report all scheme-approved records within any given polygon).

page 39 of 58

Page 40: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Section 4 – sharing unverified dataThis three-part question sought responders’ views on the utility of sharing unverified data, and if it was undertaken how best it could be achieved.

We asked “What benefits do you see from sharing unverified data?” This question was intended to pick out the benefits of sharing unverified records, with any concerns being addressed in question 4.2, but strong feelings against the sharing of unverified data were also expressed in the answers.

Degree of benefit from sharing unverified data, as indicated by comments from responders in three categories (All: n=89; LRC: n=20; NSS: n=65)

All responders LRC responders NSS respondersBeneficial 29% 45% 25%Some benefits 20% 25% 20%No benefits 44% 25% 48%Damaging 7% 5% 8%

Same data with responses summarised further, as simply beneficial or not beneficial:All responders LRC responders NSS responders

Beneficial 49% 70% 45%No benefits 51% 30% 55%

Selected (anonymised) comments from people who saw sharing of unverified data as beneficial: We would not be able to function as a records centre without sharing unverified records, and data availability to

planners and other organisations would be severely constrained. [LRC – compare with the first comment, below, of those who do not see sharing of unverified data as useful]

It is not necessary to verify every record for common species and it is a waste of verifies time to ask them too. Provided there is an agreed list of species that require verification there is nothing wrong with sharing unverified data. [LRC]

[Beneficial for] national users (e.g. Natural England) who don't use our normal enquiry service can access the data and we could grant to others if we chose. [LRC]

Benefits: The ability to verify it; The possibility to use my own filters to choose what I use; Early warning systems; Rapid feedback to recorders making faults; I'm sure there are many more. [LRC]

It speeds the process up considerably and enables the records to be used in an appropriate manner quickly. This could be especially important for new discoveries. [NSS]

Much the same benefits that sharing verified data has. We must accept that not all the 'verified' records on NBN are correct. Better knowledge of species distribution, monitoring change etc. can still be provided by using this 'lower grade' data - we just have to take it with a bigger pinch of salt. The critical aspect of using the unverified records is where the data needs to get onto the system quickly - if we can continue to edit it once it has reached the CDW then I don't see it as a problem. [NSS]

Depends if you mean totally unverified, or if records have gone through automated checks. If latter, generally useful for most (if not necessarily) all purposes. If totally unverified, again it entirely depends on the use to which it is proposed the data are to be put. If I want to know where a key estuarine roost of waders are, and a multi-million pound development is riding on it, then I need to be sure about the data. However, if I'm producing a broad pattern of Swallow migration from 100s of thousands of records, then I'm less concerned about the veracity of every single records. Just depends. [NSS]

Selected (anonymised) comments from people who saw sharing of unverified data as not beneficial: We would not consider sharing unverified data. As an LRC we have a responsibility to ensure that data we supply

to sources available to the public are at all times of the absolute highest quality. [LRC – compare with first comment in previous section]

Disastrous. Undermines the credibility of the whole system. [NSS] None. Making potential rubbish available is no way forward. [NSS] None. I am very much against unverified data getting beyond the initial local verification. Our scheme has suffered

horribly from erroneous data being included in the database. Sometimes this relates to old, and now unverifiable, records, but some sources which are supplying records are riddled with errors and the checking and verification of these data wastes much valuable time. [NSS]

Unverified records SHOULD NEVER be released. [LRC]

page 40 of 58

Page 41: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

I am against unverified data being made public on NBN Gateway. However there needs to be a way of passing this data to local schemes without it appearing in the public domain. [NSS]

None. At the level of the NBN, until records are verified they should not be presented or mapped online. Identification errors, particularly those from inexperienced recorders are common and once displayed on "the internet" may be very hard to "remove" as people may continue to refer to them long after they have been verified and discounted. The NBN needs to be seen as an authoritative source of data. To me this means no unverified data is displayed - if there is a recognised recording scheme then all data for the group should pass through them. [NSS]

None. Unverified records are not data, they are rumours! [NSS]

We then asked “What concerns do you have over sharing unverified data?” The results highlighted a similar range of issues to those already reported for question 4.1; a few additional comments are reported here.

Selected comments from the 96 responses to question 4.2: Caveats regarding data quality may not be stated explicitly enough or ignored. [NSS] Depends what is meant by 'sharing' - if they are just visible to be seen but they are clearly marked as unverified

then it could be ok. But if they are mixed with verified data then they will pollute it. [NSS] Erroneous records leading to lack of trust in scheme. [NSS] Errors appearing and being difficult to remove. People believing that anything published is correct and totally

believable. People repeating erroneous data and using it in places / publications where it cannot be corrected or removed later, so the error becomes fact by default. [NSS]

Errors arising in planning decisions. Loss of credibility of recording scheme. [NSS] I am concerned that people don't check dataset name/metadata and so are not aware data has not been verified. I

am also concerned that it adds to the arguments by (some) NSS that LRC data should never be used! I am uncomfortable at sharing data sets that are not robust. In particular, when this is being used by consultants or

by national policy decision makers. I feel a sense of pride that the data I provide through the recording scheme has known quality and I would be

concerned that the message conveyed by sharing unverified data would be damaging. I also wonder whether it would slow down verification and feedback to recorders because there wouldn't be such a clear end goal. [NSS]

It is problematic, professional consultants tend to use the data to a high level and so unverified data can cause undue survey work and concerns over the presence of a possible protected species when not necessary. [LRC]

May cause errors in scientific research. Dilute intergity of other bone fide records where unverified records are incorrect/false. Red herring records may lead to a waste of conservation organisations time.

Records become accepted as OK and become relied upon. Data users include many careless and/or untrained people. [NSS]

That errors will give cause to further LRC-bashing by the national schemes and societies. As very few of the data [managed] belong to the LRC, concerns are to do with reputation. [LRC]

The reliability of the data is a big concern, and could cause problems but any data is better than none. Main concern is that data partners do not heed the information we supply over data quality, and that this could lead to valid data being questioned or discounted by end users. [LRC]

Unverified data is completely useless. The ability of many people to identify unusual species is dreadful. [NSS] We have seen on numerous occasions that consultants are using data from the Gateway for commercial purposes.

Whether they should be or not is irrelevant, it is a very dangerous situation to have unverified data available in a format where it has the potential to be used in a way. To mis-inform people by having an inaccurate record or records on the Gateway has the potential to result in very costly mistakes for the end users. For example, a student may download a dataset as part of a PhD study and form conclusions based purely on unverified data which later transpires to be wrong. If this data is then linked back to the LRC it portrays us in a very bad light as regardless of the rights and wrongs and clauses of data use this is just the sort of situation that could destroy our hard earned reputation for quality and accuracy overnight. [LRC]

Although this question specifically asked about ‘concerns’ a few responders gave reasons for the benefits, e.g.: [No concerns] at all, as long as the verification status is clear. In reality, there are plenty of "verified" records that

are wrong. [LRC] None- provided the records are clearly marked as unverified and clear information is given as to what it means if a

record is 'verified' or 'unverified'. [NSS]

page 41 of 58

Page 42: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

We asked “What mechanisms would need to be in place to ensure that unverified records could be shared safely?” As for the previous questions, a substantial proportion of the 90 responders used this question to repeat that unverified records should not be shared, but among those who suggested mechanisms there was a fair degree of agreement:

Easy method for verifiers to review and make judgments on unverified records Clear flagging of status of unverified records (but many said that this would be too easy to ignore if the records

were shown alongside verified records, even if using a different symbol) Restricting unverified record access to certain users Default position should be that only verified data is visible on the Gateway – unverified data would not be visible

unless specifically asked for Unverified records to be stored in a completely separate dataset Educate data users on verification status and how to use data appropriately Provide clear information to users as to what standards and procedures were used to verify records. Ensure that there was a permanent identifier propagated from the source database that enables the original to be

found and corrected. Gateway users would have to apply for enhanced access asking for access to unverified data. The dataset

administrators could then decide who could see unverified data. The data could then be subject to terms and conditions of its use, including clearly stating in any products or documents that the information is unverified. All other users would only see verified data.

There could be some kind of prompt in place when data was downloaded that stated unverified data has been included.

If you want to put unverified data on the Gateway then you must ensure that it is locked down only to those capable of making an informed conclusion as to the likely accuracy of that data eg County Recorders and appointed recording group members.

Unverified records could be displayed at higher taxon levels, e.g. “unidentifed bat”, rather than at species level.

WBRC phone interview: A record is not a record until verified. Danger that allowing unverified records through immediately

could flood distribution maps, especially for more popular groups. May be a role for allowing unverified data through to particular users who ‘need to know’, e.g. govt agencies and LRCs, but would need good access controls and clear flagging that the records were unverified.

From phone interview with MBA data managers: On the issue of feeding unverified records to NBN there were concerns. There would be a danger that

people would just look at dots on maps and not discriminate between verified and unverified records, even if verification flags were stored in the underlying data (“if it’s shown on a map people believe it”). However, it is recognised by MBA that certain users (e.g. the statutory agencies) would benefit from immediate access to unverified data (especially in the case of invasive species for example), so one option would be to build in to the Gateway access controls the ability to set who could see unverified data, and/or the ability to allow Gateway users to only see unverified data if they specifically request it (in order to highlight to the user the fact that the data is unverified, and reduce the risk of users assuming that if they can see a dot it is a confirmed record).

From phone interview with BSBI data managers: Making unverified data available: BSBI aims to supply verified data to NBN, and has concerns over

reputational issues if it was seen to be supplying unverified data. Volunteer VCRs might also see this as having a negative impact on their own reputation, and maybe an implication that they have been tardy in carrying out verification, leading to them feeling under pressure to do more than they are able to take on. Any change to this position would need to be thought through and debated with VCRs. BSBI would prefer to focus on speeding up the supply of verified data from its datasets to NBN, while recognising that there may be value in allowing unverified data from other sources to be visible on NBN (with suitable safeguards to ensure that the verification status of the data was clearly flagged).

page 42 of 58

Page 43: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Section 5 – online verification toolsWe asked “If you have made use of the NBN’s "rulesets" as part of a verification process (either within the NBN Record Cleaner or via iRecord), have you found that the rules work well, to flag up those records that might need further attention?”

Responses were received from 34 people who had some experience of using the rulesets. In summary, their response to whether the rules worked well was: Yes = 12 (35%) Partially = 13 (38%) No = 9 (26%)

Many of the criticisms were to do with the limitations of using national rulesets, which by their nature cannot take into account local variation in phenology, distribution etc.; and with the fact that species distributions can change rapidly, reflecting either real changes or increased recorder effort, so that the rulesets can go out of date quite rapidly.

Selected comments: Weak rule as far as Odonata are concerned is the one on flight periods. This is too restrictive and doesn’t take

account of the wide variation in flight periods across the country and in different years. Some rules need to change as the data improves e.g. in Scotland emergence dates can be very different from other

parts of the UK, or better recording has shown that some species are much more widespread than originally thought, changes in distribution due to climate change etc. can change local rarity dramatically.

Note that [for our scheme] the rules are currently under local control by county recorders who can change them at any time they see fit. In future, we will be moving to a mixed model of this, combined with some data-driven rule generation.

Ideally they will need regular updating to keep up with changing species distributions. At present if there is no rule for a given species the iRecord verification system implies that the record is fine,

which is not necessarily the case. A green tick should be used when a record has passed the verification rule-set, and another symbol used when there was no rule-set to apply.

I would like to see a finer level of gradation in the difficult to identify aspect I do not have faith in [the rules] getting it right and end up looking through all the data anyway. So to use the

record cleaner is just another job. The species flight time checking may be appropriate for the southern portion of the UK but only adds to the

verification process in the north where we have less generations of some species and differing flight period due to climate differences. I have stopped using it as it takes longer to do that checking records as they come in.

Too many false positives, particularly from common species. If you used a fine scale, interpolated probability map on which to validate each species you would get far fewer

false positives. Problems with species dictionaries (e.g. in MapMate) not matching the names used in the NBN Record Cleaner.

From phone interview with BSBI data managers: The development and updating of rulesets for species-rich groups such as vascular plants is time-

consuming and may require funding to make progress. Local rulesets can be much more precise in terms of checking against known distribution, but alternatively the use of checks against existing verified datasets may enable distributional checking at a variety of scales, automatically updated as new data becomes available. Verifiers still need to be alert to the possibility of locational errors in records though, whether as a result of imprecise use of online maps, or errors in GPS locations for device-driven locations. Locational information such as very precise details of location (‘by the third fence-post from the oak tree’ etc.) and micro-habitat data is hard to automate, and recorders will still benefit from guidance from experienced recorders regardless of the technology used.

We asked “Do online verification tools such as those implemented in iRecord enable you to carry out any checks that are difficult, or not possible, within the systems you use already?” Only 14 people provided responses that addressed this question, of which nine said that the online tools did allow them to carry out checks more easily than their current system and five said they did not.

We asked “Are there verification tools that you currently use or procedures that you carry out that are missing from the tools within iRecord?” Of the 23 people who answered this question, 10 responded that

page 43 of 58

Page 44: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

they could not think of any tools that were currently missing. Suggestions for additions to the online tools were: A grid reference search/finding and input/correction capability so that if a recorder has supplied the wrong grid

reference but the place location is known the verifier can quickly find the correct grid reference and input this. This saves returning the record back to the recorder if it is not necessary.

Supplementary information (enabled in iRecord but rarely used) such as habitat, host plant, ID work used I only have access to records that fall within my geographical area on iRecord, so I have no way of knowing if there

are records that have been given the wrong grid reference etc. [the general issue here being that if someone sends a batch of records to a county verifier and says they are for that county, then the verifier can quickly tell if a grid ref falls outside their county, whereas if the data is entered online a wrong grid ref will simply result in the data being filtered to another county, where it may not be at all apparent that it has ended up in the wrong place.]

I would like to know if a record is a first 10km or county record even if within range - so I can provide feedback to recorders.

Our current spreadsheets reports using our database to say if records are new to the county/tetrad/hectad. To do this on iRecord would require accessing our data which could be done via NBNG web services but would miss out some data we do not make publicly viewable for sensitivity reasons etc. (About half our data is public on NBN and half not).

We also add designations - notable species etc on request but I assume you can do that in iRecord too. The problem with iRecord is not the tools but the information provided for verification and the poor/frustrating

interface/navigation which makes checking records harder than it should be. We currently rely fully on local experts to vet all records, without all of these individuals using iRecord (which is

unlikely) any full switch to a new system may be counter productive for us with this bypassing of the local resource iRecord needs to be able to set up verifiers by ID difficulty of the species, i.e. some people only have access to

verify levels 1-2 (easy) others can do 1-3 (bit harder) and a few can do 1-5 (very difficult). Yes - checking altitudes and vice-counties Yes - we use a standard set of codes to record the method used to identify the species. This is critical information

when assessing the reliability of the species identification. We hope to incorporate these codes into our own iRecord form and would like to encourage recorders to adopt these codes as widely as possible as they have been developed to cover all eventualities and to provide the information needed to assess the reliability of the species identification and should allow verification to be automated to a large extent. If NBN, iRecord or the BRC can help encourage the use of these codes that would be a huge benefit for bat recording.

Yes having written a set of rules was annoyed to find that it was not possible to have separate sets for adults and larvae.

You can't beat knowing the recorders and their abilities. This is something that most LRC managers acquire over time; they regularly make judgements based on their knowledge of individual recorders in a way that cannot easily be automated without publicly scoring people's abilities. This was tried at one time in Recorder but has not been continued, perhaps because it is a sensitive issue. An automated system also has problems adjusting to the way people's abilities change over time (both improving and declining). It should be possible for Recorders to post a brief biography that would help verifiers judge if they could id some taxa.

We asked “Would you find it helpful to be shown more contextual information about the records and recorders?” Responses came from 90 people, and there was a high degree of support for most of the options suggested.

Suggested option Number agreeingNumber of records recorder has previously submitted for the current taxon group 66Recorder's biography (including details of which identification guides they use, whether they keep voucher specimens etc.)

65

Whether species previously recorded from 10km square (or other unit) 65Number of previous records for species in vice-county (or other unit) 64Recorder's self-assessment of their skill level for the current taxon group 53Recorder's attendance at relevant training courses 29

Additional suggestions from those wanting more information on recorders: [Recorders’] experience of that geographic area ID qualifications of recorders Number of records previously rejected for that recorder, for this and other recording schemes Opinions of [recorders from] other "experts"

page 44 of 58

Page 45: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Protected species licence holder Verification of ability by someone such as county recorder or number of previous records already verified

elsewhere Would data protection be an issue if details of individuals are made more public?

Two people raised concerns that knowing too much could actually be unhelpful: I would be wary of providing information which could bias a verifier one way or another. Method of species ID

would be essential, as noted previously. The most important thing is for the system not to validate [automatically correct?] their records for date/range

etc. This is really the only way that I can assess the capability of the recorder (by how many errors they make).

From phone interview with MBA data managers: The major issue discussed here was the need to extend the range of verification flags that can be used,

and to ensure that all verifiers are using them consistently. For example, where a record is accompanied by a photo it is often possible to fully verify it; if no photo is provided, there needs to be an option to mark the record as “no reason to doubt” or similar, in particular for records from people without an established track record in making correct identifications.

From TVERC phone interview: People working with data from the data warehouse need to have a way of communicating validation

errors back to someone who can pursue them, or else be able to get editing rights to correct them. E.g. incorrect grid refs or site names may be picked up by LRCs more readily than by NSS verifiers, but it is unlikely that anyone within the LRC will have verification rights for the taxa in question, and there isn’t a clear route for reporting such validation concerns.

From phone interview with volunteer verifier for WBRC: Making decisions on records from recorders you don’t know is a “percentage game” – how likely does a

record have to be before it can be verified? Would benefit from consensus/guidance among recorders on how to make consistent decisions.

Section 6 – multiple sources and verifiersWe asked “Do you agree that in principle it would be beneficial to have as many 'front-end’ wildlife recording websites and apps as possible feeding their data in to a central data warehouse for verification in one place?” An opinion was expressed by 93 responders:

All responders (n=93) LRC responders (n=23)

NSS responders (n=65)

Yes 35% 26% 38%Yes with reservations 25% 22% 26%No 31% 48% 26%Unsure 9% 4% 9%

A majority thought that the central data warehouse model was a good idea in principal, but the margin is not large, and many reservations were expressed. There was less support for this model among LRCs than among NSS.

Selected comments from responders who supported the principle: In principle - yes. At present I get the impression that "everybody" has an online or smartphone based recording

system available. In some instances I know a certain project or recording effort is generating records for my target groups but getting hold of the data is near enough impossible - the records are made and some eventually appear on the NBN, but they do not get sent by default to the appropriate National or County recording scheme. [NSS]

Reducing the number of places that I need to go for verification would be beneficial. Ability to make contact with the recorders is essential though. [NSS]

Very much so - as a verifier this would enable me to use one place. [NSS]

page 45 of 58

Page 46: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Yes definitely. I currently look after the records database for a local group and this is shared with the local records centre which in turn shares this data with other conservation bodies. Data is also shared with Natural England and the national scheme which in turn potentially goes to the NBN and therefore if all data was simply shared within a central database then the records could be accessed by the appropriate people without the need for bulk files being sent annually and having to filter records which is time-consuming. [NSS]

Yes. Or perhaps, slightly broader - I agree in principle that a verifier should ideally be able to go to a single portal to undertake their verification role. Not entirely necessary that all verifiers go to the same portal. [NSS]

In principle a central data warehouse is worth striving for, and has the potential to gather a greater proportion of new records into one place where they can be dealt with more quickly. Ease of contact with original recorder is an important part of the process. [NSS]

Selected comments from responders who supported the principle but with reservations: I would prefer that as much as possible control of the data remains with the national schemes and is verified in

their systems before being passed to the NBN or equivalent. Web based recording systems were recorders input their records and where they are verified online are the way forward [I take this to mean that the web systems should be specific to each national scheme.] [NSS]

In principle but this should NOT ever replace the local expertise available in LRCs. LRCs are the best and most relevant unit for records collection and collation and verification (usually supported by local experts and volunteers). Most recorders are driven and motivated by recording at a local level; removing this local motivation will drastically reduce the volume of data collected by volunteers. [NSS]

In principle it is a logical system but in reality while there are multiple systems (iRecord and Living Record for instance) and those who prefer not to use internet based resources it's unlikely to happen. [LRC]

In theory holding data in a centralised database would increase efficiency, and it is always better to be able to access a larger amount of data. However, in practice our organisation receives a significant revenue stream from the collection and use of data generated by its national monitoring schemes. Given the value of this data to us, decisions about where the data is stored and who would have access to it are not straightforward. It is not clear how the central data warehouse concept would affect our ability to attract funding, ensure the correct interpretation of our data, and whether either of these factors could lessen our influence as a voice for conservation. However we very much support data sharing and would be open minded to any developments. [NSS]

In theory this would be beneficial, however the following concerns would be need to be addressed: Loss of control of the data once in the CDW for example, whether this data is then made freely available with no restrictions; Other mechanisms for verification may already be in place elsewhere and works efficiently; Not all county recorders are willing/able to acces on-line tools; Funding for maintenance of this CDW is no longer available - what happens to all the data? [ERC]

Perhaps in an ideal world, the next best thing of several hubs, which exchange data frequently is quite feasible. [NSS]

Within reason. The more systems the more possible problems. I think CDW will create many problems in the sense that contact is lost with recorders and one has to take a great deal on face value. It is important to get a feel for the level of caution exercised by recorders (and in many cases that is lacking). [NSS]

Yes, provided that the warehouse was sufficiently well structured and organised as to make it practically manageable. This warehouse should be regarded as a temporary destination, though - confirmed records should be removed and exported to the relevant databases, or it will lead to confusion and the potential for people to start using that warehouse as a database in its own right. [NSS]

From phone interview with BSBI data managers: For data that is already ‘in the system’ and reaching VCRs the online tools probably don’t make much

difference – all VCRs have systems in place for record verification. However, a move to online recording/verification may help bring records to the VCRs that they would not otherwise have had the opportunity to see, and prevent so many records bypassing BSBI verification before arriving on NBN. If online tools can be made to work efficiently this may allow existing verifiers to verify more records more quickly, but this will depend on a number of things, including the quality of the data that arrives via online systems, and the amount of communication and feedback that will be required across a wider range of recorders. Providing feedback to recorders is seen as important by BSI, and helps develop their skills and interests to carry on recording, but is time-consuming and not very rewarding if the recorders are only reporting casual records rather than continuing to develop their skills over time. Another area that can be more time-consuming when dealing online with recorders that are not directly in touch with

page 46 of 58

Page 47: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

the VCR is the checking of location accuracy – even records generated from GPS data (e.g. via mobile phones) can be mapped in the wrong place (e.g. depending on GPS accuracy).

BSBI recognise that although from their point of view it is desirable to have all the data that is ‘owned’ or ‘approved’ by BSBI available in the DDb, this does need to exist in parallel with other systems, e.g. where LRCs also have a need to maintain ‘ownership’ of datasets. In principle a single centralised data warehouse that is shared through a number of ‘conceptual subsets’ could meet the needs of a variety of data users, in practice there are political and practical considerations that make this difficult, e.g. maintaining direct communications with recorders who ‘belong’ to the BSBI recording community, ensuring that data exchange agreements are appropriate for the uses that BSBI recorders expect, and ensuring that data seen as associated with BSBI is of an appropriate data quality and maintains BSBI’s reputation as an authoritative source.

Given these difficulties it is likely that in the short to medium term at least, BSBI will need to maintain its own database, but there is much that can be done to ease the process of sharing data with others, e.g. developing mechanisms for sharing records IDs and verification flags between different systems, agreement over standards such as verification terminology, and using web services to make data available remotely. A single central data warehouse for all data remains an aspiration for the future.

Selected comments from responders who did not support the principle: I am doubtful about the benefits of this model because established recorders already have their own systems in

place and these produce a product which is of satisfactory quality. [NSS] Depends whether you mean nationally or locally. We do not believe in a central national data warehouse as this

undermines all the work we are trying to do locally and our relationships with our local recorders. Also it is not acceptable to us to have our data store on somebody else's server. Tools such as iRecord are undermining local record centres initiatives. Locally I agree with the statement and this is what we are trying to do. [LRC]

Why can't existing recording schemes send records to vc recorders? [NSS] In an ideal world I think a central data warehouse concept is a great idea. However the reality is quite different;

data providers want someone who has ultimate control over the system. The person controlling the system needs to be in a position to remove and replace data, fix and make significant improvements to the system almost constantly. Even at an LRC level with the relatively few recorders, recording groups and initiatives we work alongside we are kept fully busy. If this was expanded to a national level it would result in demand outstripping resource very quickly indeed. So no, I just don't feel this is the right way forward. [LRC]

I think it dilutes data down and leaves it spread across so many different systems that the likelihood of mass duplication of data as people add it to to as many systems as possible simply to avoid the risk of things being missed. [LRC]

This takes the control of the records away from the recorders and the LRCs. How can you be sure that you can always access the records? What happens if the funding runs out for the central data warehouse? Who would be in charge of it? [LRC]

What if funding runs out for CDW? What if CDW is offline/kaput? Lack of control over who gets the data, and takes control away from LRCs who know best. What if recorder wants to remove data from CDW because they're not happy how it's used? What if unreliable data is used locally in planning decisions, etc.? LRCs know their area, their fauna and flora and the local verifiers (who they can and cannot trust) and have built up personal relationships. This is a threat to the concept of LRCs.

No. Individuals should keep their own records and submit them to scheme organisers in a suitable format. I've seen online recording in action (or inaction) and it's just awful. A scheme organiser's knowledge of individual recorders is just as important as their knowledge of species or geographical areas and I don't believe that online recording allows for this. I would not accept records without direct contact with a recorder. [NSS]

No. The objective should instead be to have as many verified records reach the NBN Gateway as possible. (i.e. it should be the 'CDW'). Apps and websites can route the data by whatever means they wish as long as they eventually reach the Gateway as verified data. The CDW is only one option to achieve this. [LRC]

No. As an LRC manager the issue of sustainability is critical. Our funders - local authorities and agencies particularly - do not give us grants. They pay for a service, and their procurement means that if the data is available apparently from a centralised source they will not continue to fund the local records centre. It would be next to impossible to explain the need for an LRC if the records were all going to a centralised system and funding would be near impossible to come by, even if this meant the data available to national schemes and other national users as a result was of a lower quality and there was less of it (in terms of taxonomic and geographic coverage not just number of records). As the local node of the NBN, our LRC currently is the one place where verification happens for many groups. At most it is one of two places (e.g. members’ data goes to the BSBI vice-county recorder

page 47 of 58

Page 48: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

whereas citizen science and professional survey data coming to the LRC and our plant referee). We actively contact national experts when needed, but they don’t need to look at every record that comes in. At the LRC we are happy to use other organisations’ verified data “as is” and it doesn’t matter what system it had been collected/shared by, as long as we are confident that the organisation is managing the data well including verifying it. We cannot have this confidence in a central data warehouse. It needs to be clear on any recording app or website who the recorder is giving custodianship to. This is not clear with a central data warehouse. Tasks belonging to everyone belong to no one, and that is as true for verification as it is for any other aspect of biodiversity data management. Whose warehouse is it?

We asked “How should a central data warehouse deal with the potential for multiple verifications being added to a given record?” and asked responders to rank three options:1. only one verification is kept for each record, national verifier takes precedence over local verifier2. both verifications are recorded and others can decide which to rely on3. both verifications are recorded and one is shown as the ‘accepted verification’

Responses were received from 93 people:One verification is kept for each record, national verifier takes precedence over local verifier

Both verifications recorded, others decide which to rely on

Both verifications recorded, one shown as ‘accepted verification’

Number as "best available solution" (score 1)

14 26 45

Number as "could work but not ideal" (score 2)

27 29 30

Number as "undesirable" (score 3)

47 23 16

Average score 2.36 2.03 1.64

There was little variation between LRC and NSS responses.

We asked “Do you have other suggestions for tracking, and choosing between, multiple verifications?”

A range of suggestions were given, with several emphasising that if conflicting verification decisions have been recorded then the record should either be treated as unverified, or should be treated as a verified record at the lowest agreed taxonomic level (e.g. genus or family). If conflicting identifications are given by verifiers, the record should be regarded as contentious, and not

confirmed. Interaction between the verifiers should be enabled to allow discussion; if agreement is reached, then the identification can be counted as verified. [NSS]

I think any system which allows conflicting verifications to be included would be unhelpful. If there is a conflict of opinions then surely this becomes an 'unverified' record? [NSS]

At the end of the day, one "entity" should be responsible for deciding the validity of a record - ie a National or County level Recording Scheme. How this entity manages that internally need not concern the NBN. The CDW should only be used as a hub to manage the flow of incoming data from multiple sources and to facilitate multiple verifiers working on the data. It should not directly display data on the NBN. [NSS]

Firstly, one verifier needs to be able to withdraw their decision. If I verified a record and then an expert disagreed with me I would withdraw my decision, unless I had some additional evidence to provide. It's not an issue of 'national' vs 'local' it's a question of who's got more experience and expertise. [NSS]

If two verifiers disagree and neither is willing to withdraw their decision and both have examined all available evidence, the record should be changed to whatever higher taxonomic level they do both agree on. The audit trail of the two possible determinations should be kept as part of the record as this could help target future recording effort, or in case further evidence becomes available in future. But for mapping and reporting purposes, the record should be at the higher taxonomic level. [NSS]

For our taxon group and county we are the verifiers and no one else. The accuracy of the data and publication from it is our responsibility. I don’t want some "other" verifier doing it. [NSS]

I do agree with the first statement in that there should be an order of precedence, but equally it is very important that a 'paper trail' exists and previous verifications are not overwritten. [NSS]

I suggest that all verification 'incidents' are valuable and should be retained (much like multiple det labels on a specimen) but that in cases of disagreement, one national verifier should take precedence and determine the name that is immediately viewable by users. [NSS]

page 48 of 58

Page 49: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

It is essential for multiple verifications that both the person's name and the status (scheme in which acts as verifier, sole or joint verifier, date of verification) are specified for each verification. [NSS]

It is important that records which can be assessed at a local level are assessed at that level, with assistance from national experts as requested. [LRC]

It's a nightmare if there is more than one verifier. A record is either correct or incorrect. If the recorder doesn't like the decision, I use our records committee or get help from other experts. [NSS]

It's a tricky question, and one I'm wrestling with a little at present. Two broad versions of the same issue: 1) multiple people have "rights" to verify a record for a given taxon group for a given geographical area at a particular point in time. 2) For the national vs local, it is generally accepted that national has precedence in bird recording - but this tends to be about vagrants, not rare residents. [NSS]

Local verification is preferred as this can usually better take into account local knowledge about the species, site from which species was recorded and the recorders ability. I think a national recorder should have the ability to "over-rule" or qualify a locally verified record (though in most cases I am sure that most county recorders/local verifiers would defer to the judgement of a national verifier). [NSS]

Local verifier probably has more local knowledge than national verifier about a particular site and what species do/could possibly occur there. If national verifier has a query over a record they should contact local verifier in other ways. [NSS]

Metadata (date, reason, verifier) should be maintained for each verification (and subsequent re-verifications). [NSS]

Need to add comments for verification of trickier species. These species should not be verified by ticking a box only. A verification with a clarification comment is likely to take precedence over one without. At present there are too many ticks and not enough comments on iRecord. [LRC]

The final say is by a verifier and there is only one for any individual record. The job would be split into defined domains (eg a taxonomic group, a county or other area of land). Others expressing an opinion are given a different name and their views being called 'opinions' or similar. [NSS]

The local expert should normally be given precedence over the national one as local experts know the local habitats and likelihood of correct ID better than someone based 100's of miles away. Of course there are exceptions to this for example where a specimen needs study under a microscope by someone who is an expert in a certain group/species. This is for me one of the key issues with trying to speed up and "automate" certain processes. It takes away the level of human interaction that is fundamentally required to make the best decisions as to who should or shouldn't be validating and verifying certain records as this is often something that needs input from local people and the Local Records Centre. [LRC]

This needs proper in-depth research by a team of IT specialists and biologists [NSS] Verification standards need to be developed in order for all verifiers to provide a consistent approach across the

board. There is a need for more robust data that stands up to scrutiny and the same will true for the verification of such data and the methods used to reach a decision. If verification follows set methods then there will be little need for more than one level of verification because they will reach the same conclusion. [ERC]

Where a species group has a national recording scheme with approved local representatives, those local representatives should be the primary verifier for that species group in their own area / county. Wherever possible, the local representative for a national scheme should also seek to be the relevant expert for the nearest Local Record Centre as well. [NSS]

From phone interview with verifier via TVERC: Needs to be control over granting verifier status, in conjunction with NSS, to ‘approve’ verifiers; would

be useful to allow admin rights to NSS national verifiers so that they could allocate verification roles to others, e.g. at county level.

From phone interview with Simon Wood (WBRC): Main requirement where disagreements arise is for verifiers to be able to communicate and hopefully

reach consensus. Important to recognise that while national scheme verifiers are taxonomic experts, they are unlikely to have the detailed knowledge of local sites and local species patterns that county-based verifiers can demonstrate.

From phone interview with MBA data managers: Qualified MBA staff verify most records and refer to national experts if required. There are usually no

more than 2-3 specialists for each taxon group, There is no formal process but verifiers are recruited due to known expertise – but if wider range of verifiers are involved may need to assign them to categories

page 49 of 58

Page 50: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

relating to the species ID difficulty score, i.e. some verifiers may only be able to verify up to a certain level. May be useful to see a verifier profile if interacting with verifiers that are not already known to MBA. Some marine ID skills relate to habitat more than taxon group.

Multiple verifications have not been a problem so far for MBA; however, this issue could arise, as could the situation where a particular verifier is making decisions that seem to be wrong or not up to the standard expected. It would be advisable to develop a procedure for reporting and dealing with such conflicts, and perhaps being more explicit about what ‘qualifications’ are needed to be registered as a verifier on iRecord. Might we need an accreditation system for verifiers? Could some sort of identification test or quiz be used to provide metrics on verifier ability? Or could tracking disagreements between verifiers produce such metrics in the longer term?

From phone interview with BSBI data managers: It would be useful to be able to distinguish the verifier type (i.e. VCR/referee/BSBI central) for future

tracking; for verifiers external to BSBI it would be useful to have information on the verifier and which scheme/centre they are associated with.

Our final question was “Options for how you as a verifier could interact with the data arriving in the central data warehouse (CDW): please indicate your preferred option/s”. Many people acting as verifiers for records need to have access to those records in order to analyse them, provide reports (e.g. feedback to recorders, reports for planning applications), and ensure that they can access as complete a dataset as possible for their taxonomic and geographic interests. In the longer-term this could in principle be achieved entirely online, with verifiers being able to draw on the data they need from a single central warehouse, but in the shorter-term verifiers are likely to need to download data into their existing systems for analysis purposes. This question aimed to gather information on the preferred option for handling such a download.

The four options presented were:1. Data arriving via CDW is verified online, records (with verification decisions attached) could

subsequently be downloaded to verifier’s system2. Data arriving via CDW is downloaded to verifier’s own system, verification takes place offline, but

verifier returns to CDW to add ‘verified’, ‘dubious’ or ‘rejected’ flags to CDW records3. Data arriving via CDW is downloaded to verifier’s own system, verification takes place offline, but

system allows verification decisions to be automatically uploaded back to CDW4. Data arriving via CDW is downloaded to verifier’s own system, verification takes place offline,

verification not passed back to CDW

Very few responders saw option 4 as a good solution, preferring to pass verification decisions back to the CDW. There was less agreement about the best way to do this. Overall option 3 was the most popular, with offline verification taking place and then the decisions being uploaded back to the CDW. But LRC responders preferred option 1, verifying online direct into the CDW and then downloading the data with the verification decisions attached (which is closest to the model currently adopted by iRecord). Option 1 was the second choice for NSS responders.

page 50 of 58

Page 51: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

All responses (n=84):Verify online - download

Download - verify offline - add flags online

Download - verify offline - auto-update online

Download - verify - no flags online

Number as "best solution" (score 1)

31% 21% 31% 5%

Number as "workable solution" (score 2)

29% 38% 43% 4%

Number as "not helpful" (score 3)

30% 30% 18% 75%

Average score 1.99 2.09 1.86 2.84

LRC responses (n=20):Verify online - download

Download - verify offline - add flags online

Download - verify offline - auto-update online

Download - verify - no flags online

Number as "best solution" (score 1)

35% 30% 5% 10%

Number as "workable solution" (score 2)

25% 25% 70% 5%

Number as "not helpful" (score 3)

25% 35% 15% 65%

Average score 1.88 2.06 2.11 2.69

NSS responses (n=60):Verify online - download

Download - verify offline - add flags online

Download - verify offline - auto-update online

Download - verify - no flags online

Number as "best solution" (score 1)

30% 18% 37% 3%

Number as "workable solution" (score 2)

30% 43% 37% 3%

Number as "not helpful" (score 3)

32% 28% 18% 78%

Average score 2.02 2.11 1.80 2.88

The questionnaire did not provide a separate comments box for this question, but three responders added comments via the comments box to the previous question.

There is no comments box for question 6.4. What is this 'downloading to my own system' stuff about? I don't want this! This is what we have now. We're trying to move forwards not backwards!!! I like the first option, but without the '...could subsequently be downloaded to verifier's own system' bit at the end. I want to keep the top copy of the data in the central NBN database, but I want a web services driven front end with all the reporting functionality of Marine Recorder that I and other marine specialists can use, while those with a more general interest can access the data via the NBN Gateway or other portals driven by web services, e.g. a Wildlife Trust website, Marine Conservation Society website, Local Records Centre website etc. [NSS]

I haven't a clue what 6.4 is all about. It's far too complicated. Basically records should be sent to me, I verify them, and they are sent onwards rather than back to the "CDW". I'm a bottleneck through which all records MUST pass in order to be of any use. If any sort of decisions are based on unverified records on the gateway then it's a very sad affair. [NSS]

With regard to 6.4, verification would have to be very time efficient. Currently all records are sent to me in the correct spreadsheet format for MapMate import. I add a column to the far left "verified" and quickly copy yes down all records which I verify and leaving a blank for any queries. Unverified records are cut out before import. I cannot imagine verifying on line in such a quick and efficient way. The thought of having to make an entry 36,000 times (1 for each record) a year is unappealing and quite frankly probably impossible for me to undertake. [NSS]

page 51 of 58

Page 52: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Potential for increased efficiencies

From phone interview with TVERC:Central data warehouse could decrease efficiency because records get stuck – a lot of data can be self-verified by experienced recorders.Scope for increased efficiency of data sharing between LRCs and NSS

From phone interview with Berks Reptile and Amphibian Group, and BWARS, via TVERC: More efficiency gains to be made for popular groups where identification expertise is widespread. The bigger potential gain is to have more data arriving in warehouse and thus made available to

recording scheme in one location. Verification is not a bottleneck for more specialised groups where numbers of records are relatively low.

From phone interview with WBRC: For WBRC, verification is not a major bottleneck at the moment and doesn’t constrain data supply,

although there are concerns that it may be difficult to replace the current generation of verifiers. Lack of active field recorders is more of an issue.

A central data warehouse approach does have the potential to increase efficiency by bring together datasets from new projects into one place, rather than fragmenting them across multiple websites and apps.

Data gathered by consultants is one area where efficiency gains could be made in future – currently such data ends up in reports from which it can be difficult to extract the biological records. If consultants were encouraged to deposit data to a central data warehouse this could bring new datasets into play much more quickly.

Giving LRCs and NSS a profile within the CDWTVERC phone interview: Recorders and other data users needs to know both the name of the verifier and the organisation for

whom they are acting as verifier, both to give recognition to them and also to ensure that the recorder and anyone using the data can see who has been involved in the verification process.

WBRC: Badge system would work well, giving link to scheme website.

page 52 of 58

Page 53: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Appendix 2: data supporting metrics for performance of the iRecord data warehouse and verification system

1. LRC data verificationTVERC dataset: 3304 records imported. 1716 (53%): 1714 (52%) verified, 2 (<1%) rejected

taxon_group record_status

count

acarine (Acari) Pending 8amphibian Verified 3bird Pending 52centipede Pending 35

Centipedes & Millipedes Pending 6

crustacean Pending 88

flowering plant Pending 2

fungus Pending 3

harvestman (Opiliones) Pending 10

insect - beetle (Coleoptera)

Pending 700

insect - butterfly Pending 256

insect - dragonfly (Odonata)

Pending 188

insect - dragonfly (Odonata)

Verified 2

insect - earwig (Dermaptera)

Verified 12

insect - hymenopteran Verified 500

insect - lacewing (Neuroptera)

Pending 6

insect - moth Pending 61

insect - orthopteran Verified 146

insect - scorpion fly (Mecoptera)

Pending 1

insect - true bug (Hemiptera)

Pending 34

insect - true bug (Hemiptera)

Verified 68

insect - true fly (Diptera) Pending 68

insect - true fly (Diptera) Verified 757

millipede Pending 41

mollusc Pending 1

mollusc Rejected 2

mollusc Verified 198

reptile Verified 6

spider (Araneae) Pending 28

terrestrial mammal Verified 22

WBRC dataset: 48398 records imported. 15149 (31%) verified. 1 rejected (<1%). 15 dubious (<1%).

taxon_group record_status

count

insect - beetle (Coleoptera)

Pending 4804

insect - cockroach (Dictyoptera)

Verified 8

insect - earwig (Dermaptera)

Verified 479

insect - hymenopteran Queried 3

insect - hymenopteran Verified 11326

taxon_group record_status

count

insect - orthopteran Verified 3334

insect - true fly (Diptera) Pending 28429

insect - true fly (Diptera) Queried 12

insect - true fly (Diptera) Rejected 1

insect - true fly (Diptera) Verified 2

page 53 of 58

Page 54: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

2. iSpot data verification46736 records imported. 2074 (4.4%): 1927 (4.1%) verified, 81 (0.1%) queried, 66 (0.1%) rejected

taxon_group record_status

count

alga Pending 68

alga Queried 6

alga Rejected 2

alga Verified 16

annelid Pending 102

bony fish (Actinopterygii)

Pending 234

cartilaginous fish (Chondrichthyes)

Pending 51

chromist Pending 124

chromist Queried 2

chromist Rejected 3

chromist Verified 45

clubmoss Pending 32

coelenterate (=cnidarian)

Pending 196

coelenterate (=cnidarian)

Queried 1

coelenterate (=cnidarian)

Verified 1

conifer Pending 311

conifer Queried 1

conifer Verified 10

crustacean Pending 379

crustacean Verified 9

echinoderm Pending 105

taxon_group record_status

count

echinoderm Verified 1

fern Pending 444

fern Verified 4

Ferns & Horsetails Pending 57

flowering plant Pending 28565

flowering plant Queried 28

flowering plant Rejected 30

flowering plant Verified 891

ginkgo Pending 7

horsetail Pending 93

insect - moth Pending 13565

insect - moth Queried 43

insect - moth Rejected 29

insect - moth Verified 875

marine mammal Pending 129

marine mammal Rejected 1

marine mammal Verified 64

mollusc Pending 149

mollusc Rejected 1

mollusc Verified 11

sponge (Porifera) Pending 27

tunicate (Urochordata) Pending 24

3. Verification efficiency

website_id Website N N (verified) % verified Mean days Range (min - max)23 iRecord 270823 43523 16% 78 (0 - 1346)

8 NatureSpot 40849 32209 79% 9 (0 - 421)

14 NMAP 12467 9974 80% 96 (0 - 938)

29 CeDaR 9269 5829 63% 45 (0 - 946)

17 Nature Locator 6446 5181 80% 2 (0 - 255)

16 National Moth Night

45291 4781 11% 129 (0 - 625)

34 YNU 24863 2555 10% 35 (1 - 340)

31 ERCCIS 2372 1756 74% 49 (0 - 843)

24 BWARS 2016 1600 79% 5 (1 - 574)

3 NNSS 1770 1218 69% 435 (0 - 151)

41 SEWBReCord 4841 934 19% 11 (0 - 585)

10 ORS 1205 860 71% 16 (0 - 381)

11 BDS 7146 636 9% 255 (1 - 89)

13 Wild Flower Counts 60816 334 1% 114 (0 - 339)

7 NBIS 5428 251 5% 196 (0 - 324)

25 Herts flora 294 230 78% 51 (0 - 186)

page 54 of 58

Page 55: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

website_id Website N N (verified) % verified Mean days Range (min - max)5 Ephemeroptera 132 52 39% 101 (0 - 124)

43 York University 176 6 3% 18 (3 - 56)

Overall the mean number of days between record submission and verification is 82 days, but the median time is 8 days and the mode is 0 days.

4. Benefits to users

Taxon_group Pending Verified Queried Rejected Totalinsect - butterfly 265326 1305 3 5 266639flowering plant 128281 8927 107 625 147638insect - moth 98563 21290 183 125 120161insect - true fly (Diptera) 40364 2170 31 62 42627bird 27352 2505 5 30 29892insect - hymenopteran 7783 18723 250 528 27284insect - beetle (Coleoptera) 17962 7862 194 854 26872terrestrial mammal 1045 15330 45 228 16648Birds 1559 9873 4 40 11476Moths 0 8873 0 42 8915Sciuridia - Squirrels 5997 2397 0 0 8394Dragonflies & Damselflies 5582 647 7 32 6268insect - true bug (Hemiptera) 1929 3780 28 45 5782Wildflowers 0 5159 0 45 5204mollusc 3048 1737 21 25 4831insect - orthopteran 358 4427 0 10 4795insect - dragonfly (Odonata) 2981 529 6 41 3557crustacean 2844 80 1 1 2926fungus 2428 155 2 0 2585lichen 2024 91 0 0 2115Plants 957 977 16 128 2078Beetles 0 2033 0 39 2072spider (Araneae) 2009 45 2 0 2056unassigned 1825 0 0 0 1825fern 1563 62 1 0 1626Ferns & Horsetails 1105 168 0 1 1553amphibian 1436 61 1 2 1500moss 1343 107 0 2 1452Butterflies 2 1267 0 1 1270Bugs 0 1247 0 10 1257Fungi 0 1234 0 19 1253conifer 986 37 3 0 1101Flies - other 0 1057 2 17 1076Dragonflies 0 961 0 3 964Spiders, Harvestmen & Mites 0 878 1 27 906Habitat 902 0 0 0 902Trees, Shrubs & Climbers 19 871 0 8 898annelid 861 7 0 0 868Bees, Wasps, Ants 1 851 1 12 865acarine (Acari) 836 1 0 0 837insect - earwig (Dermaptera) 153 649 0 1 803Hoverflies 2 705 2 8 717Mammals 86 589 0 1 676Insects 595 65 0 6 666Animals 636 0 4 3 643bony fish (Actinopterygii) 556 60 0 11 627reptile 531 37 1 0 569alga 336 221 8 2 567Orthoptera 78 477 0 2 557Cetaceans 555 0 0 0 555Slugs & Snails 2 548 0 3 553chromist 291 240 2 7 540Grasses, Rushes & Sedges 0 512 0 8 520coelenterate (=cnidarian) 415 29 1 1 446

page 55 of 58

Page 56: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Taxon_group Pending Verified Queried Rejected Totalmarine mammal 249 190 4 2 445millipede 427 9 0 0 436centipede 404 4 0 0 408Lichens 0 394 0 0 394Caddisflies 0 384 0 7 391Crustaceans 144 202 2 6 354liverwort 309 19 0 0 328Mosses & Liverworts 55 269 0 0 324insect - caddis fly (Trichoptera) 15 278 0 2 295harvestman (Opiliones) 260 23 0 0 283Sawflies 0 239 0 6 245horsetail 221 8 0 0 229echinoderm 213 10 0 0 223springtail (Collembola) 213 2 1 0 216slime mould 198 2 0 0 200Amphibians 8 183 0 3 194Unknown 11 33 16 116 176insect - mayfly (Ephemeroptera) 22 144 0 6 172insect - booklouse (Psocoptera) 157 0 0 0 157Lacewings & Scorpionflies 0 141 0 4 145Centipedes & Millipedes 27 110 0 6 143Grasshoppers & Crickets 0 138 0 3 141insect - lacewing (Neuroptera) 128 0 0 0 128insect - stonefly (Plecoptera) 18 104 0 0 122Woodlice, Crustaceans 3 115 0 0 118insect - scorpion fly (Mecoptera) 102 0 0 1 103Springtails & Bristletails 0 83 1 0 84tunicate (Urochordata) 59 20 1 1 81sponge (Porifera) 73 1 0 0 74Reptiles 5 62 0 0 67bryozoan 59 1 0 0 60cartilaginous fish (Chondrichthyes) 58 1 0 0 59flatworm (Turbellaria) 42 4 0 0 46Fish 0 43 0 1 44clubmoss 43 0 0 0 43Worms 2 32 0 0 34Earwigs 1 30 0 2 33insect - alderfly (Megaloptera) 31 0 0 0 31insect - cockroach (Dictyoptera) 3 27 0 0 30Barklice & Booklice 0 29 0 0 29insect - bristletail (Archaeognatha) 20 0 0 0 20Mayflies 0 18 0 0 18Dermaptera 0 17 0 0 17insect - silverfish (Thysanura) 17 0 0 0 17Slime Moulds 0 17 0 0 17Algae 0 17 0 0 17comb jelly (Ctenophora) 13 1 0 0 14insect - flea (Siphonaptera) 14 0 0 0 14ginkgo 12 1 0 0 13ribbon worm (Nemertinea) 10 0 0 0 10insect - thrips (Thysanoptera) 10 0 0 0 10Molluscs 2 7 0 0 9cartilagenous fish (Chondrichthyes) 6 3 0 0 9stonewort 4 2 0 1 7Dictyoptera 1 5 0 0 6Stoneflies 0 5 0 0 5Ferns 1 3 0 1 5bacterium 5 0 0 0 5scorpion 5 0 0 0 5false scorpion (Pseudoscorpiones) 5 0 0 0 5pauropod 4 0 0 0 4rotifer 4 0 0 0 4roundworm (Nematoda) 4 0 0 0 4sea spider (Pycnogonida) 3 1 0 0 4Silverfish 0 4 0 0 4insect - stick insect (Phasmida) 0 4 0 0 4

page 56 of 58

Page 57: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Taxon_group Pending Verified Queried Rejected Totalwaterbear (Tardigrada) 3 0 0 0 3jawless fish (Agnatha) 2 1 0 0 3protozoan 1 2 0 0 3foraminiferan 2 0 0 0 2insect - stylops (Strepsiptera) 2 0 0 0 2hornwort 2 0 0 0 2two-tailed bristletail (Diplura) 1 0 0 0 1Flies, Gnats and Midges 0 1 0 0 1diatom 1 0 0 0 1Phasmida 1 0 0 0 1Fleas 0 1 0 0 1insect - louse (Phthiraptera) 1 0 0 0 1symphylan 1 0 0 0 1Thrips 0 1 0 0 1Bees, Wasps. Ants 0 1 0 0 1

5. Sources contributing records to the iRecord CDW

Source Total no. of species Total no. of occurrencesiRecord 12368 273659UKBMS 420 266053Plantlife Wildflower Count 1207 59109Moth Night 1186 45073NatureSpot 4180 40343Axis2Poll 216 21018YNU 3322 16594Mammal Society 69 12793CEDaR Recording 2092 9510SEWBReCord 1724 7352British Dragonfly Society 53 7138Open Farm Sunday 20 7132Nature Locator 15 6422Black Squirrel 3 5989Norfolk Biodiversity Information Service 1628 4292PondNet 342 3439UK Ladybird Survey 37 3198ERCCIS 595 2649Brampton Parish 1040 2533BWARS 10 2098Corfe Mullen Nature Watch 839 2004NNSS 26 1765Heathland Surveillance Network 268 1676Plant Surveillance Scheme 367 1476Lincolnshire Environmental Records Centre 33 1403ORS 43 1160Sussex Moth Group 175 629North East Cetacean Project 7 570BRC outreach surveys 6 518Hertfordshire Flora Group 213 294York University 69 213ecosurveydata.net 121 167Ephemeroptera Recording Scheme 22 131Crayfish of the british isle 8 122

page 57 of 58

Page 58: Technical Annexe for Project 3: - Defra, UK - Science …randd.defra.gov.uk/Document.aspx?Document=12672... · Web viewThe objective of this project was to explore the practical issues

Developing online verification of biodiversity records – final report

Source Total no. of species Total no. of occurrencesTrack a Tree 14 117British Phycological Society 49 89SPLASH 12 30Pollinators and Predators 5 26B.I.S. 15 17ICP Vegetation 1 13GIGL 8 10BBCT 6 10BRERC 3 3Essex Wildlife Trust 3 3

6. Quality of dataOf 802,459 records in the CDW, to date 625,918 (78%) are from taxon groups for which NBN record cleaner rules are available, and are therefore checked against known distribution and categorised by difficulty of identification.

Data from iRecord has been verified by the national Soldierflies and Allies Recording Scheme, which found that the record cleaner ruleset for this group flagged up 65% of 2,540 contributed records. On investigation, only 1.8% of the records were judged to need querying or rejecting. For this recording scheme, among the data arriving via iRecord there were records for 15 Nationally Scarce and 9 Red Data Book species, with 16 new vice-county records and 434 new 10km-square records.

For further discussion see section 3.2.2 and section 3.4.3.

7. Efficiencies for data managersThe project steering group asked for a metric on how much time is currently spent managing the process of passing data to verifiers and integrating the comments back, and to what extent the online approach might reduce this amount of time. It has not been possible to compile data on this due to the wide variety of approaches currently taken to managing records for verification. Some NSS and LRCs only receive data into the scheme from their verifiers, so do not spend any additional time of transferring data. Others do spend time on transferring data, but for the purposes of this trial spent at least as much time on the process of getting volunteer verifiers registered on iRecord and familiarising them with the online system. And once data has been verified via iRecord LRCs and NSS still have to tackle the issues highlighted in this report about how to manage online dataflows alongside their other dataflows.

Efficiencies should be achievable in future, as online systems become more familiar and more widely used, and assuming that an overall move towards online systems is matched by a reduction in or simplification of other data flows.

8. Efficiencies for verifiersThe project steering group asked for a metric on how much time is currently spent by verifiers on verifying data, and on how frequently records are verified (turnover time). Data on the current verification speed within iRecord is available, but we were not able to find comparable data for offline systems. From the responses received it is clear that the great variety of verification procedures in use across the multiple taxon groups result in many different approaches to this. Some verifiers do verify records very quickly as soon as they are received, others find it more effective to do one annual verification check (e.g. for those groups where most fieldwork is done in summer verifiers may prefer to do all verification in one go in winter). See discussion in section 3.3.1 and appendix 1 section 2.

page 58 of 58


Recommended