+ All Categories
Home > Documents > NERC Environmental ‘Omics Strategy (NEOMICS)

NERC Environmental ‘Omics Strategy (NEOMICS)

Date post: 04-Feb-2022
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
56
NERC Environmental ‘Omics Strategy Final Report and Recommendations October 2010 Report prepared by the NEOMICS Team following competitive tender to NERC NEOMICS Team Peter Kille (Lead author) [email protected] Cardiff University Dawn Field [email protected] CEH Wallingford Mark Bailey [email protected] CEH Wallingford Mark Blaxter [email protected] University of Edinburgh Norman Morrison [email protected] University of Manchester Jason Snape [email protected] AstraZeneca Sarah Turner [email protected] CEH Wallingford Mark Viant [email protected] University of Birmingham In consultation with the NEOMICS Expert Working Group and Research Council Observers Expert Working Group Thomas R. Meagher (Chair) [email protected] University of St Andrews Ewan Birney [email protected] EBI Terry Brown [email protected] University of Manchester Roger Butlin [email protected] University of Sheffield Melody Clark [email protected] British Antarctic Survey Guy Cochrane [email protected] EBI Tim Gant [email protected] MRC, University of Leicester Jack Gilbert [email protected] Plymouth Marine Laboratory Simon Hiscock [email protected] University of Bristol, Steven Paterson [email protected] University of Liverpool James Prosser [email protected] University of Aberdeen Jane Thomas-Oates [email protected] University of York Nico van Straalen [email protected] VU University, Amsterdam Charles Tyler [email protected] University of Exeter Research Council Observers Bill Eason [email protected] NERC Sarah Collinge [email protected] NERC Amanda Collis [email protected] BBSRC
Transcript

NERC Environmental ‘Omics Strategy

Final Report and Recommendations

October 2010 Report prepared by the NEOMICS Team following competitive tender

to NERC

NEOMICS Team Peter Kille (Lead author) [email protected] Cardiff University

Dawn Field [email protected] CEH Wallingford

Mark Bailey [email protected] CEH Wallingford Mark Blaxter [email protected] University of Edinburgh

Norman Morrison [email protected] University of Manchester Jason Snape [email protected] AstraZeneca Sarah Turner [email protected] CEH Wallingford Mark Viant [email protected] University of Birmingham

In consultation with the NEOMICS Expert Working Group and Research Council Observers

Expert Working Group Thomas R. Meagher (Chair) [email protected] University of St Andrews Ewan Birney [email protected] EBI

Terry Brown [email protected] University of Manchester

Roger Butlin [email protected] University of Sheffield Melody Clark [email protected] British Antarctic Survey Guy Cochrane [email protected] EBI

Tim Gant [email protected] MRC, University of Leicester Jack Gilbert [email protected] Plymouth Marine Laboratory Simon Hiscock [email protected] University of Bristol,

Steven Paterson [email protected] University of Liverpool

James Prosser [email protected] University of Aberdeen

Jane Thomas-Oates [email protected] University of York

Nico van Straalen [email protected] VU University, Amsterdam

Charles Tyler [email protected] University of Exeter

Research Council Observers Bill Eason [email protected] NERC

Sarah Collinge [email protected] NERC

Amanda Collis [email protected] BBSRC

NEOMICS

Contents P a g e | i

1 EXECUTIVE SUMMARY

It is recognised that there is an overriding need to develop a NERC „omics strategy that is both scalable and responsive to changes in technology and user demands to deliver world class environmental science with impact. The strategy should:

support and develop core facilities, services and competencies in ‘omics to provide a sustainable long-term National Capability (Recommendation 1);

recognise the on-going need to integrate, coordinate and prioritise the complex pattern of investments as well as manage the interface with other funders, in a cost-effective way that maintains the key link between environmental scientists from across the NERC remit and access to technologies (Recommendation 2);

support strategic Research Programme investment in ‘omics to address emerging challenges that deliver against NERC’s Strategy (Recommendation 3).

Priorities for implementation:

secure support for continuing function of NBAF and restore NEBC support to previous levels (Recommendation 1A & B);

establish a virtual Environmental ‘Omics Synthesis Centre (EOS) in order to coordinate, integrate and prioritise future investment (Recommendation 2A-C);

identify priorities for strategic research and for additional investment in core facilities, services and competencies (Recommendation 1C).

The recent development of ‗omics technologies including genomic, transcriptomic, proteomic, and metabolomic approaches, has provided us with the ability to characterise on a molecular level critical details of how organisms and communities respond to and interact with the environment. Such understanding is essential to meet the challenges of environmental change, reducing uncertainty and safeguarding sustainability, roles central to the NERC Strategy ―Next Generation Science for Planet Earth‖1. ‗Omics approaches are therefore a critical set of enabling technologies and skills for the delivery of NERC science.

Previous NERC investments through the Environmental Genomics2 (EG) and Post Genomics and Proteomics3 (PGP) programmes have developed capacity and resulted in an informed internationally competitive environmental ‗omics community. NERC has ensured community access to a growing array of ‗omics technologies by investment in facilities and key skills through the NERC Environmental Bioinformatics Centre4 (NEBC) and NERC Biomolecular Analysis Facility5 (NBAF). These investments should be capitalised on (Recommendation 3A).

Strategic thematic research investment resulted in an initial significant increase in ‗omics based Responsive Mode (RM) funded science. Following the impetus created by this targeted investment, there is some evidence of a recent levelling-off for the uptake on ‗omics approaches in RM. In direct contrast, evidence from the over-subscription of NBAF pilot projects, saturation of established and new NBAF nodes, together with community feedback, has identified both un-tapped application areas and novel opportunities for ‗omics to deliver against NERCs remit. Notably some areas of NERC science which would most benefited from ‗omics approaches include those where UK scientists are recognised international leaders, such as polar and climate science. Future strategic research investment should support ‗omics approaches in these areas (Recommendations 1D, 2A, 3B, 3C & 3E).

In order to maximise the opportunities offered by ‗omics approaches, a coherent strategy is required that is continually responsive to changes in user demand and emergent technologies, exploiting fully the unique strength of NERC, namely the breadth of its science portfolio together with the skill base, quality and commitment of the science community it supports (Recommendation 2). NERC needs to fully integrate ‗omics approaches at both the laboratory and PI levels (Recommendation 1D). The foundation for continued success in exploiting ‗omics is the

1 http://www.nerc.ac.uk/about/strategy/ngscience.asp

2 http://www.nerc.ac.uk/research/programmes/genomics/

3 http://www.nerc.ac.uk/research/programmes/proteomics/

4 http://nebc.nerc.ac.uk/

5 http://www.nerc.ac.uk/research/sites/facilities/details/mgf.asp

NEOMICS

Contents P a g e | ii

sustained development of key National Capability (Recommendation 1) as well as strategic Research Programmes that focus advances onto NERC Strategy (Recommendation 3).

The delivery of National Capability through the distributed structure of NBAF, with infrastructure embedded within domain-specific research groups, has been particularly effective in improving the uptake of ‗omics approaches by NERC scientists, incorporating new technical advances and responding to community demands. Its success is reflected in the steady rise of Responsive Mode projects supported, most notably during the periods of EG/PGP investment, the continued demand for access associated with graduate training and high level of community support received in the Service Review Group assessment (2010). The continued provision of leading edge 'omics facilities to meet NERC‘s specific needs is essential to maintain the effectiveness of UK environmental science (Recommendations 1A & 1C).

The growing and critical role played by informatics in processing, interpretation and integration of ‗omics data was recognised in the RCUK Large Facilities Roadmap which highlighted the need for substantive informatics infrastructure to support environmental ‗omics science, proposing a £30M investment to establish an ―Environmental Omics Bioinformatics Facility‖. Although, the need for this level of investment is required to support the environmental ‗omics community it was concluded that a more cost effective and efficient model of delivery could be achieved through embedding core skills within the community, supported by a number of informatic knowledge hubs. In this context, particular concern was raised as to the current temporary nature of NEBC support, and provision for long term support should be addressed as a matter of priority (Recommendations 1B & 1C).

Cost effective delivery of NERC science may also, in part, be achieved by strategically tensioning its investments with other funders at a national and international level. NERC should, where appropriate, align support for ‗omics to benefit from the significant investments made by others. (Recommendations 2C & 3D). For example, the two NBAF Next Generation Sequencing (NGS) nodes are co-located with the recently established regional MRC NGS nodes, with both organisations benefiting from synergies in tools and expertise. Other significant investments, such as BBSRC‘s Genome Analysis Centre, are focused on a discrete suite of model organisms and, as with the expanding commercial sector, do not provide the diversity of domain-specific expertise required by NERC community, highlighting the need for NERC to maintain and develop capability and capacity to meet its own specific requirements.

Similarly NERC must seek to leverage its investments in ‗omics with international initiatives such as ELIXIR (Recommendation 2C). This EU programme is an emerging bioinformatics infrastructure comprising a trans-European ―hub and node‖ network structure that identifies the environment as one of its core areas. However ELIXIR provides no direct funding to support environment-based informatics. Instead, it relies on the establishment of independently funded national ―nodes‖ that represent specific communities. This highlights the need for NERC to maintain its own ‗omics community, providing a strong, coordinated voice to enable it to play a leading role in such international initiatives.

The complex array of functions required to support ‗omics science requires overarching strategic coordination and integration to ensure effective delivery of NERC science (Recommendation 2). The speed of technical advance and the multifaceted nature of ‗omics has limited the level of detail this review has been able to deliver in some areas. The pace of change means that any ‗omics strategy must constantly evolve, refresh in response to user demands and rapid technological development. To ensure that the UK community can provide leadership, there needs to be a vehicle for proactive horizon scanning exploiting teams with the appropriate interdisciplinary expertise to identify emerging strategic research opportunities, and coordinate the development of novel applications where ‗omics has the potential to contribute but is presently under-utilised. Thus, NERC needs to develop mechanisms to apply our current collective knowledge to embed and coordinate ‗omics strategy within the wider NERC community. This should also provide the necessary leadership and directed scientific input to emerging business sectors, including the emerging industry in environmental technologies and services.

In making its recommendations, this review has recognised that there is an overriding need to develop a strategy that is cost effective, scalable, and responsive to changes in technology and user demands. The recommendations have therefore focussed on the immediate key strategic requirements, setting the overall direction of travel. Where further work is required (e.g. in identifying specific priorities for research) this is indicated.

NEOMICS

Contents P a g e | iii

2 SUMMARY OF CONCLUSIONS

Conclusion 1. ‗Omics is a highly dynamic field with new dimensions being incorporated to reflect: new scientific understanding, increased technical capability and development of novel specialisms. This dynamism must be reflected in the NERC ‗omics strategy to ensure international leadership in environmental science. ................................................................... 3

Conclusion 2. The application of ‗omics technologies offers significant potential for wealth creation within a range of environmental science sectors......................................................... 5

Conclusion 3. Previous NERC Enviornmental Genomics and Post-Genomics and Proteomics thematic programmes established UK capacity and capability in selected areas of ‗omics technologies and during their tenure established NERC international leadership in this area. 15

Conclusion 4. Aspects of ‗omics science have been identified as core components of strategic challenge areas for both the NERC Technologies and Biodiversity Science Themes. ............ 16

Conclusion 5. Analysis of Responsive Mode funding shows an increase in ‗omics awards during the previous NERC EG/PGP thematic programmes. .............................................................. 17

Conclusion 6. The value of research projects employing ‗omics (both funded and applied for) has recently (05-09) leveled off but this does not incorporate the imminent impact of Next Generation Sequencing or projects utilizing new NBAF nodes. A portent to the potential increase that may arise is documented in Section 5.1.4.2 ...................................................... 17

Conclusion 7. There is widespread community recognition of the role played by NEBC in supporting the development of bioinformatics required to underpin NERC ‗omics. ................. 20

Conclusion 8. The NEBC functions of data management, standards or long-term stewardship of collective data outputs are currently support by an interim funding arrangement and in order to sustain these operations long term NEBC support should restore to previous levels. ................................................................................................... 20

Conclusion 9. There is widespread community support for NBAF, both specifically in relation to service delivery and in the provision of essential infrastructure. The success of NBAF can also be significantly attributed to the distributed multi-nodal model and its provision of crucial domain specific expertise embedded within the NERC community. ....................................... 22

Conclusion 10. The small NBAF next generation sequencing pilot project scheme has demonstrated a large untapped potential both of novel technical application as well as NERC science areas not currently exploiting ‗omics approaches. ..................................................... 22

Conclusion 11. A routine mechanism that is transparent to research grant applicants is required to support research lying at the interface between Research Council remits that is founded on the formally established RCUK cross council agreement. ...................................................... 24

Conclusion 12. The BBSRC investment in TGAC, focused around a single centre, contrasts with the distributed NERC NBAF model. TGAC has an inherent focus on a discrete suite of test systems and, as with the expanding commercial sector, does not provide the diversity of domain specific expertise required by NERC community. There may be discrete opportunities to exploit this resource in areas where BBSRC and NERC share joint strategic objectives. ... 25

Conclusion 13. Application of ‗omics in specific sectors of the environmental portfolio provides fertile ground for industrial, legislator and end-user engagement maximising the impact of NERC environmental science. ............................................................................................... 27

Conclusion 14. The EU ELIXIR represents the largest developing global bioinformatics initaitive, comprising a trans-European network based on a node and hub structure. However, it provides no direct funding to support environment-based informatics; therefore, NERC support is required for a UK-based environmental node. ELIXIR will provide generic support for all ‗omics science, coordinating standards, providing a long term data repository and supporting generic tools development. .................................................................................................... 28

NEOMICS

Contents P a g e | iv

Conclusion 15. Commercial and governmental investment follows international academic excellence. Failure to identify and support visionary but ―risky‖ research results in loss of future high profile academics, the resulting internationally leading research and any associated commercial benefit. ................................................................................................................ 29

Conclusion 16. .... Conclusion 16. There is substantive precedent for significant added value being achieved through the establishment of coordinating “synthesis centres”, especially in multi-disciplinary areas that incorporate different experimental approaches. .......................................................................................................................... 30

Conclusion 17. ‗Omics can deliver in a spectrum of specific environmental areas many mapping directly onto NERC priority areas. .......................................................................................... 33

Conclusion 18. Meta- ‗omics, the analysis of communities / ecosystems, is identified as one of the applications with the greatest future potential for significant impact on NERC science. .......... 33

Conclusion 19. ........... Long term technical development for adaptation of the current „omics platforms for “field” deployment with remote telemetry data provision would address the environmental specific need for spatial and temporal monitoring, providing new insights for environmental science. ................................................................................... 33

Conclusion 20. .............. There is a need for an „omics user community that is competent and informed in the use of appropriate informatics and technological approaches, and who are directly associated with delivery of research projects. This should be supported by a pool of centralised tools developers with a remit to evaluate, develop and optimise novel informatics methodologies. These informatic knowledge hubs could be based at NBAF / NEBC or other national or international centres, i.e. ELIXIR or TGAC, in order to create and maintain critical mass. ...................................................................................... 34

Conclusion 21. The unprecedented developments in genomics will continue for the foreseeable future, delivering platforms with increased capacity, novel capabilities and lower cost. A second major aspect is the ―democratization‖ of NGS platforms with ―bench top‖ versions likely to appear in the majority of research departments over the medium term (2-5 year). ............. 37

Conclusion 22. Metabolomic platforms are becoming more sophisticated with increased through-put. Significant work is needed to identify and characterise the full spectrum of unique metabolites found in the natural world. Specialist areas of metabolomics are maturing as discrete research areas; these include lipidomics and glycomics, each with specific relevance to NERC science.................................................................................................................... 38

Conclusion 23. Developments in proteomics have increased its utility within the environment arena; however, funding to allow access to the technology is required, and use of the technology for environmental research is required to drive the refinement of methodologies and approaches for such sample sets, in order to see this approach fulfil its potential. .......... 40

Conclusion 24. The current distributed model of „omics facilities and expertise embedded in different “nodes” or research groups in NBAF and NEBC is seen to be working well by the community (see also Conclusion 7). This provides a potentially more scalable and responsive model for delivery of „omics services than a new centre set out in the LFCF RCUK proposal. ................................................................................................................... 41

Conclusion 25. A suite of functions can be defined that are required to ensure the effective delivery of NERC research through ‗omics approaches. ..................................................................... 43

Conclusion 26. Coordination and integrating of the various activities and functions of a NERC ‗omics strategy is needed to maximise delivery and ensure international leadership. ............. 45

NEOMICS

Contents P a g e | v

3 RECOMMENDATIONS

This report identifies key areas the strategy should support and the priorities for implementation.

3.1.1 Recommendation 1: National Capability: Core facilities, services and competencies

Ongoing provision of core infrastructure is the minimum requirement to ensure „omics approaches are accessible for grass roots science delivery. This requires support and development of core facilities, services and competencies in ‗omics in order to respond to changes in user demand and emergent technologies. In particular, this review recommends the following.

A. The existing functions and services currently provided to a very high standard by the NBAF model should be maintained, with services distributed and embedded within key user communities. This ―distributed model‖ provides a cost-effective mechanism for the delivery of key facilities, services and skills to the NERC community, capable of responding to changes in user demand and technology development. It is further recommended that more detailed evaluation of the need for NBAF to incorporate new ‗omics platforms (including in proteomics and lipidomics) be carried out. This could be part of the role of the proposed Environmental ‗Omics Synthesis centre (Recommendation 3).

B. -The current transitional support for NEBC—provided to maintain function in bioinformatics and data management, archival and standards—should be placed on a longer term, sustainable basis, initially at levels comparable to those provided under the previous EG/PGP programmes. This would enable NEBC‘s development in response to future challenges, including the development of new tools and increased data volumes. As with NBAF, the proposed Environmental ‗Omics Synthesis centre could play a key role in identifying future requirements for bioinformatics and data management.

C. The review of future requirements for core facilities and services should be an on-going process, responsive to changes in demand and technology. It should also consider how best to support such capabilities in the long-term, ensuring a funding model that addresses issues of both sustainability and cost-effectiveness.

D. Investment in training, in both laboratory skills and informatics, should be increased to assist researchers at all levels of career development, including support for PhD access to ‗omics facilities and the delivery of awareness and training courses to promote the understanding and uptake of ‗omics approaches, respectively, across the environmental science community.

The continued support of all NBAF nodes at their current level of funding over the next five years would require £6.5M, this is required to satisfy current research demand and is unanimously supported by the community. Additional resource to support the development of environmental proteomics and lipidomics, over a 5 year period, is estimated at £2.6M. Additional resource for increased bioinformatics and biostatistics support and maintenance and enhancement of other national data management activities at NEBC (to equivalent levels previously supported under EG/PGP) is estimated to cost a total of £5M over the same period. Support is also needed for PhD student access to ‗omics facilities, coupled with directed specialist training courses, estimated at £1.5M over 5 years. Additionally, ~£2.5M made available to support ongoing technical development of the facilities. Total costs over a 5 year period: £18M (£3.6M/yr).

3.1.2 Recommendation 2: Integration and prioritisation: Coordination, delivery and management

Given the rapidly changing and evolving technological background, there is a critical need to ensure that the scientific capabilities are fully realised. In other words, development of scientific priorities needs to keep pace with evolving technologies in terms of novel approaches and new problems that can be addressed. Moreover, there is an on-going need to integrate, coordinate and prioritise the complex pattern of investments in ‗omics as well as to manage the interface with other funders, in a cost-effective way that maintains the key link between environmental scientists from

NEOMICS

Contents P a g e | vi

across the NERC remit. It is proposed that integration activities should provide overall community coordination, prioritisation and integration of ‗omics investments in a way that can respond to changes in user demand and technology development. It is recommended that there should be:

A. a strong coherent voice for environmental ‗omics, promoting uptake of these approaches across the NERC remit;

B. a coordination of existing and future activities, including horizon scanning research and technology drivers, identification of skills and training needs, and prioritisation of future needs, including on-going advice to NERC on investment in research and facilities that is flexible, scalable and responds to changes in user needs and emerging technologies;

C. development of a unified interface and strategy to engage and influence key national and international partners and stakeholders including Research Councils, Charitable Trusts, industry (e.g. Sanger, BGI), EU (e.g. ELIXIR) to help maximise the impact of NERC investment in ‗omics.

It is recommended that these functions are best delivered through the establishment of a virtual Environmental „Omics Synthesis (EOS) centre whose primary role is to coordination and integrate existing investment, provide advice to NERC on how „omics could deliver against strategic science challenges, act as an interface with national and international partners and promote stakeholder engagement. Examples of where EOS may contribute immediately would be to act to evaluate the requirement of new NBAF nodes or to provide a liaison point for dialogue with ELIXIR. A modest investment in an EOS has the potential to leverage substantial funding from NERC Research Programmes. The estimated costs of an EOS would total £2M over five years.

3.1.3 Recommendation 3: Research Programmes and Responsive Mode: Science delivery

There is a continuing need for NERC to support research investments in ‗omics that deliver NERC‘s Strategy (RP) and:

A. build on previous strategic investment to maintain and develop core capability and capacity in ‗omics within the NERC research community (RP);

B. provide targeted support to foster ‗omics development in areas where NERC sciences has demonstrated international leadership and which may yield particular benefit from exploitation of ‗omics approaches (RP);

C. encourage diversification in the application of ‗omics together with support for novel technical innovations (RP/NC);

D. promote, where appropriate, partnerships with other national and international funders in joint programmes (RP);

E. further embed ‗omics across the full NERC remit ensuring future sustainability (RM).

Where appropriate, the proposed virtual Environmental ‗Omics Synthesis (EOS) centre (1.1.2) should provide advice to NERC on priorities for investment. However establishment of EOS should not preclude or delay early investment in key areas identified in this report.

To deliver these recommendations it may be envisaged to invest in, over a 5 year period, three medium sized coordinated strategic Research Programmes at ~£5M each, which should also encourage collaboration with key external partners; an annual pilot project scheme, modelled on the current NBAF NGS small project initiative but applied to all NBAF nodes, at £0.4M pa to encourage novel technical innovation of ‗omics approaches. Total cost over 5 year period: £17M.

3.2 Priorities for implementation:

First, NERC needs to maintain National Capability investment in facilities support to NBAF and restore NEBC to previous levels funded through EG/PGP. Secondly, in order to realise the full potential of integration of ‗omics technologies into environmental science, investment in novel mechanisms for scientific synthesis and integration at a community level should be developed. Finally, NERC should explore opportunities for strategic investment in ‗omics research.

NEOMICS

Contents P a g e | vii

4 CONTENTS

1 EXECUTIVE SUMMARY .......................................................................................................... i

2 SUMMARY OF CONCLUSIONS .............................................................................................. iii

3 RECOMMENDATIONS ............................................................................................................ v

3.1.1 Recommendation 1: National Capability: Core facilities, services and competencies . v

3.1.2 Recommendation 2: Integration and prioritisation: Coordination, delivery and management ............................................................................................................................ v

3.1.3 Recommendation 3: Research Programmes and Responsive Mode: Science delivery vi

3.2 Priorities for implementation: ............................................................................................ vi

4 CONTENTS ............................................................................................................................ vii

5 INTRODUCTION ..................................................................................................................... 1

5.1 What is environmental ‗omics? ......................................................................................... 1

5.2 Socio-economic impact of environmental ‗omics: From cosmology to commerce ............. 3

5.2.1 Mining the undiscovered diversity: Bio-prospecting .................................................... 3

5.2.2 ‗Omics tools in environmental monitoring and chemical risk assessment ................... 4

5.2.3 Managing long term environmental change ............................................................... 4

6 BACKGROUND AND PURPOSE OF THE REVIEW ................................................................ 6

7 METHOD ................................................................................................................................. 8

7.1 Scope and boundary of review .......................................................................................... 8

7.2 Terms of reference for review including the ‗OMICS Expert Working Group ..................... 8

7.3 Expert Working Group (EWG) Membership ...................................................................... 9

7.4 Process and key activities ................................................................................................. 9

7.5 Structure of report findings .............................................................................................. 11

8 A REVIEW OF ‗OMICS: INVESTMENT AND INFRASTRUCTURE ....................................... 12

8.1 NERC research investment ............................................................................................. 12

8.1.1 NERC strategic Research Programmes [RP]: Environmental Genomics (ended 2005) / Post Genomics and Proteomics (ended 2009) (RP) ............................................................. 12

8.1.2 NERC Science Themes (RP) ................................................................................... 15

8.1.3 The growth of ‗omics within NERC Responsive Mode (RM) science delivery ........... 16

8.1.4 NERC National Capability (NC) ............................................................................... 18

8.2 Interface with non-NERC ‗omics activities ....................................................................... 23

8.2.1 Cross Research Council .......................................................................................... 23

8.2.2 Charities .................................................................................................................. 26

8.3 Industry and end-user engagement ................................................................................ 27

8.4 Major International Centres ............................................................................................. 27

8.5 Synthesis Centres: A successful model of intellectual integration and community cooperation. .............................................................................................................................. 29

9 FUTURE REQUIREMENTS OF NERC ‗OMICS .................................................................... 31

9.1 Fertile areas for future ‗omics delivery in NERC science (RP & RM) ............................... 31

9.1.1 Research requirements: potential areas for future ‗omics research (RP and RM) .... 31

NEOMICS

Contents P a g e | viii

9.2 Requirements for Core facilities, services and competencies (NC) ................................. 33

9.2.1 Genomics ................................................................................................................ 35

9.2.2 Metabolomics. ......................................................................................................... 37

9.2.3 Proteomics ............................................................................................................... 38

9.2.4 Data management and bioinformatics: Large Facilities Roadmap Environmental ‗Omics Bioinformatics Facility (NC) ........................................................................................ 40

9.3 Key Functions underlying a future strategy ..................................................................... 41

9.4 Integration, coordination and prioritisation of ‗omics science ........................................... 44

10 ANNEXES ............................................................................................................................. 47

10.1 Minutes of the First Expert Working Group .........................................................................

10.2 On line Survey Results .......................................................................................................

10.3 Town Meeting and Workshop Report ..................................................................................

10.4 Minutes of Second Expert Working Group ..........................................................................

10.5 Minutes of Third Expert Working Group ..............................................................................

10.6 Environmental Genomics Final Report ................................................................................

10.7 ―Extract‖ from large facilities ................................................................................................

NEOMICS

Introduction P a g e | 1

5 INTRODUCTION

The constantly changing interface between the biosphere and the geosphere continues to reorganise the environment in which we live. The anthroposphere (humans and their collective enterprise) acts as a significant third partner in this relationship creating a precarious interdependency that shapes, both globally and locally, the present and future environments of our planet. NERC science plays a critical role in our fundamental understanding of these interrelationships. The recent development of ‗omics technologies including genomic, transcriptomic, proteomic, and metabolomic approaches, has provided us with the ability to characterise on a molecular level critical details of how organisms respond to and interact with the environment. This is allowing researchers to unlock past events and be informed by earth‘s history. ‗Omics is also providing tools for understanding the ecological significance of natural variation, with the potential to reveal the genotypic and phenotypic basis of fitness that determines responses to environmental challenges at the individual, population and community levels, indeed at the level of the entire biosphere. Such understanding is essential to meet the challenges of environmental change, reducing uncertainty and safeguarding sustainability, roles central to the NERC Strategy ―Next Generation Science for Planet Earth‖6.

5.1 What is environmental ‘omics?

Understanding the function and sustainability of complex current and past ecosystems and environments relies on a detailed knowledge of the constituent communities and populations. Their behaviour, in turn, depends on the form and function of the individuals of which they are comprised, characteristics that are driven by different layers of information exchange between organisms and their environment, from information encoded in the genome, macro-molecular process and networks level, through to physical traits manifested at the whole organism level. These components can be viewed as ―biological information objects‖ that exist at different levels of biological organisation (DNA, proteins, metabolites, up to whole organisms). Of particular relevance to NERC, these different layers of biological information objects ultimately mediate the interaction between organisms and the environment.

The term ―environmental ‗omics‖ describes a broad set of disciplines designed to study the structure and function of sets of these ―biological information objects‖ with the explicit aim of illuminating the past and present relationships between organisms, the biome, and their biotic and abiotic environments. The term ―‗omics‖ is often applied to fields of study described using an ―‘omics‖ suffix to indicate the intention to capture a comprehensive suite of data. These fields are often set within a hierarchal framework which attempts to relate macro-molecular ‗omics data to the phenotype of an individual together with the structure and composition of populations and communities. The core disciplines include genomics, epigenomics, transcriptomics, proteomics, metabolomics and phenomics which are all being applied in the context of individuals as well as applied to communities which have lead to the incorporation of a meta prefix as with metagenomics (Fig. 1 below).

6 http://www.nerc.ac.uk/about/strategy/ngscience.asp

NEOMICS

Introduction P a g e | 2

DNA RNA Ribosome Protein >=1

Substrates

Products

Co-factors

Individual Population/Community

Me

Figure 1 | Hierarchy of biological organisation under „omics investigations

More recently these central domains have diversified either to reflect:

a) new scientific understanding,

such as epigenomics which addresses the critical role of inherited phenotypic changes not derived from alterations in the underlying DNA sequence (DNA methylation, miRNAs, histone modification, to name a few);

b) increased technical capability,

such as metagenomics which recognizes our ability to characterize the total genetic material from an environmental sample allowing us to explore organism assemblages, communities and even whole ecosystems; and

c) novel specialism,

such as the emerging sub-disciplines of metabolomics including lipidomics and glycomics which aim to provide a quantitative description of global lipid and carbohydrate compositions of biological systems.

These approaches are applied to a diverse range of environmentally relevant systems representing differing complexity, from targeted laboratory studies to investigation of ‗wild‘ populations, both across time and space. For example laboratory characterization of the interactions of an individual organism with specific biological, chemical or physical challenges can be used to understand and predict the impact of parasitic infection, pollution or global warming/ocean acidification on target ecosystems whilst complementary studies have employed these global techniques to directly evaluate ‗wild‘ populations or communities. These investigations strive to link global biodiversity measures to the sustainability of ecosystem services whilst also using the data to inform conservation policy, challenges that are central to environmental research. The integration of such approaches into environmental science anchored to organismal and ecological frameworks has the potential to significantly enhance predictive tools to better understand organism and ecosystem responses to natural and anthropogenic threats. In general, ‗omics approaches are used to provide;

a) a comprehensive description of the biomolecules or ―information objects‖ within defined organisms ecosystems, or environments, at a particular time or set of time points.

NEOMICS

Introduction P a g e | 3

b) a definition of the interaction between various levels of biological organization, and

c) analysis and description of the control networks to provide the mechanistic understanding necessary to allow the development of tools to predict the outcome of organism-environment interactions relating ‗omics to function.

The critical challenge for ‗omics science is converting raw data to information and ultimately to knowledge. To achieve this bioinformatic and biostatistic approaches, novel or established, need to be harnessed for data processing, integration and reduction, processes that require the development and compliance to agreed data standards. The key to unlocking the knowledge contained within the ‗omics data (processing) require us to derive a mechanistic understanding of how observations made at various levels of biological organisation are related (integration) and in-turn derive the relationship either between an integrated view of the data or individual components (reduction) and the biotic/abiotic factors being investigated.

‗Omics approaches provide a unique set of tools adding to the sophisticated armory of techniques at researchers‘ disposal. ‗Omics has the ability to significantly enhance the detail in which we perceive the biological component of our environment and substantively contribute to predictive understanding of the interactions among the biosphere, the geosphere as well as anthropogenic interactions over time.

Such predictive understanding provides an array of possible practical (and potentially commercially relevant) applications to provide both excellence in fundamental science whilst directly delivering outputs with impact for society (Section 5.2).

Conclusion 1. „Omics is a highly dynamic field with new dimensions being incorporated to

reflect: new scientific understanding, increased technical capability and development of novel

specialisms. This dynamism must be reflected in the NERC „omics strategy to ensure international

leadership in environmental science.

5.2 Socio-economic impact of environmental ‘omics: From cosmology to commerce

It is well established that scientific breakthroughs that deliver profound wider benefits often were not designed with their final application in mind whilst others are the consequence of years of small iterative discoveries, each element contributing to the delivery of a final product or service now seen as essential to society. However, if we consider the potential of ‗omics research to deliver within the short to medium term we find a range of possible applications or benefits that these approaches may supply. Some examples include:

5.2.1 Mining the undiscovered diversity: Bio-prospecting

These applications encompass exploration of life in extreme environments whether in space, solid rock (deep-biosphere) or at the ocean depths where ‗omics approaches provide an unparalleled tool for identifying and unravelling these unique communities of organisms. Cursory exploration of communities surrounding deep-sea thermal vents, without the benefit of ‗omics approaches, led to the discovery of Taq polymerase, the enzyme that is the basis for PCR a technique that has transformed molecular genetics over the last two decades earning ~$2 billion in royalties7 and has had profound impacts on many applications, e.g. environmental diagnostics, forensic science, and medicine. ‗Omics data generated through the Global Ocean Survey8 (GOS) has revealed that seawater has an extremely high level of as yet unexplored bacterial diversity which contains considerable potential commercial relevance. Most of microbial diversity was overlooked in the past because traditional methods of microbial culture only allow study of a very restricted set of organisms that can be readily cultures; the sensitivity of ‗omics technology opens up a scientific window on a whole new world of microbial diversity. These examples illustrate both the extent of the hidden diversity as well as its potential to deliver outputs of economic importance, making

7 Fore et al., (2006) The effects of business practices, licensing, and intellectual property on development and dissemination of the

polymerase chain reaction: case study. J. Biomed. Discov. Collab. 1: 7. 8 http://www.ploscollections.org/article/browseIssue.action?issue=info:doi/10.1371/issue.pcol.v06.i02

NEOMICS

Introduction P a g e | 4

‗omics exploration an essential tool for future bio-prospecting. It should be noted that the bio-prospecting endeavour may be an informatic exercise disconnected to the original exploration both in objective and personnel.

5.2.2 ‘Omics tools in environmental monitoring and chemical risk assessment

‗Omics makes a major and direct contribution to environmental monitoring and chemical risk assessment. The use of a community based ‗omics approaches e.g. metagenomic, metatranscriptoimics or metametabolomics, to diversity profiling of either complex taxa (diatoms, phytoplankton or collembolan) or whole bacterial or mesofaunal communities will provide future objective and automatable tools for assessing ecosystem health (for detailed review see Kille et al., 20109). These analyses would be complemented by identification of characteristic fingerprints of molecules (transcriptomic, proteomic or metabolomic profiles) detected within key sentinel organisms which reveal predictive markers indicative of the presence of pollutants in general or specific classes of toxicants10,11. These markers maybe composed of holistic profiles or more likely selected groups of functionally linked molecules for the early detection of pollutants. Ultimately this could lead to a fundamental shift from chemical-based monitoring to a more cost-effective, information-rich biological effects-based environmental assessment strategy for the UK.

‘Omics tools as applied to microbial systems are important for risk assessment of chemicals prior to environmental release (e.g. EU Registration, Evaluation, Authorisation and Restriction of Chemical substances legislation - REACH) and may possibly aid the reduction, refinement and replacement of current vertebrate based testing regimes. The importance of invertebrates and fish species (and cell lines derived there from) remains a fertile area for investigation and investment through programmes such as the NC3Rs initiative12. The economic importance of this sector is often mistakenly under-estimated as being purely associated with the governmental legislators. The commercial footprint ranges from the multi-national chemical industries through to those enterprises that will perform monitoring for large organisations (e.g. water industry) to the myriad of small companies involved in the development and production of targeted environmental diagnostic assays based on ‗omics research.

‗Omics technology brings with it the potential to evaluate the true natural variation within populations and communities and therefore to more accurately assess the impacts of global or local environmental change. The ability can be used in proactive environmental management to attempt to mitigate future impacts associated with global warming such as flooding and drought.

5.2.3 Managing long term environmental change

‗Omics also plays a vital role in climate change research as well as in improving our understanding of how organisms (at the individual, community, population and ecosystem) levels respond to long term environmental change in general. It provides novel tools to aid us in our attempts to address question such as: What are the limits on the ability of species and ecosystems and species to respond to environmental change? What are the levels of capacity and/or resilience? What are the key biological feedbacks and how are they regulated? What is the potential for Bio-Geoengineering? At what point does environmental change lead to a decline in ecosystem services and other aspects of ecosystem health? We are beginning to answer all of these questions aided by ‗omics approaches. This is particularly the case in the microbial domain in metagenomic analyses of soils and aquatic environments, but significant progress is now being

9 Kille et al., (2010) A review of molecular techniques for ecological monitoring. Environmental Agency Report

10 Ankley and Miracle (2006) Genomics in Regulatory Ecotoxicology. SETAC press.

11 Van Aggelen et al., (2010) Integrating Omic Technologies into Aquatic Ecological Risk Assessment and Environmental Monitoring:

Hurdles, Achievements, and Future Outlook. Environmental Health Perspectives 118; 1-5. 12

The National Centre for the Replacement, Refinement and Reduction of Animals in Research (NC3Rs) - http://www.nc3rs.org.uk/

NEOMICS

Introduction P a g e | 5

made in higher levels of biological organisation13 (e.g. meiofauna, megafauna and the macrobenthos).

Cellular descriptors of "stress" are more accurate (and earlier) indicators of responses to change and can also provide an understanding of the trade-offs that occur to adapt to changed environmental conditions. This clearly has implications in ecosystem change prediction and can therefore input into modelling predictions of the effects of climate change, and also conservation and commercial exploitation of the environment. In recognition of this, NERC already funds a knowledge exchange grant on genetics in conservation, lead by the Royal Botanic Gardens in Edinburgh14. In the Science Theme Report for Biodiversity (NERC strategy 2007-2012)15 identified ‗omics and a central science driver and providing the key link to other themes such as ‗Assessments of pollutant and pathogen exposure risk to humans‘.

Conclusion 2. The application of „omics technologies offers significant potential for wealth

creation within a range of environmental science sectors.

13

Creer et al., (2010) Ultrasequencing of the meiofaunal biosphere: practice, pitfalls and promises. Molecular Ecology 19; 4-20. 14

Hollingsworth, P. (2010) Integrating genetics into conservation (NE/H001913/1). NERC Knowledge Exchange Network. 15

http://www.nerc.ac.uk/about/strategy/documents/theme-report-biodiversity.pdf

NEOMICS

Background and purpose of the review P a g e | 6

6 BACKGROUND AND PURPOSE OF THE REVIEW

As set out in the NERC Invitation to Tender16 for this review the requirement for NERC to develop this strategy was highlighted in the 2008 Biodiversity Theme Action Plan17, one of NERC‘s seven strategic Science Themes18, which recommended that NERC should develop and implement a strategy for ‗omics data management, bioinformatics support and access to whole genome sequencing‘. It is understood that a similar recommendation is (or will be) included in the NERC National Capability Action Plan. In addition, NERC‘s Asset Management Strategy includes a proposal for a major new investment in ‗omics facilities.

The requirement set by NERC is that the strategy must address a range of aspects, including an assessment of:

NERC requirements for ‗omics data management, bioinformatics and infrastructure to deliver the NERC Strategy, following the winding down of support through the Post-Genomic and Proteomics Programme, including whether this can adequately be met by ongoing Responsive Mode funding, NERC Services and Facilities and designated Data Centres.

NERC long-term needs, in response to emerging technologies and changing demand, for research, training and facilities.

How NERC should seek to co-ordinate with the ‗omics activities of other UK Research Councils, charitable trusts and international initiatives such as ELIXIR19.

A key aspect of the activity required by NERC is to involve appropriate consultation with the NERC and wider science and stakeholder communities for inputs and to validate recommendations.

The Invitation to Tender document set out the background leading up to the review. Over the last 10 years NERC has invested £28M in ‗omics research via the Environmental Genomics (EG) and Post Genomics and Proteomics (PGP) Programmes. Following the end of this investment in 2010 it is now appropriate to develop a strategy for taking NERC ‗omics forward in the context of new technologies and rapidly increasing data volumes. There is a strong need to continue to develop high-throughput sequencing technologies to address environmental science questions and this may require further investment in research, training and provision of facilities.

NERC currently supports the NERC Biomolecular Analysis Facility (NBAF)20 and has developed the

NERC Environmental Bioinformatics Centre (NEBC)21 to support activities in the EG and PGP

programmes. An ‗Omics Facility for Environmental Research has been included in the emerging section of the RCUK Large Facilities Roadmap 2010 consultation22.

Any NERC ‗Omics Strategy must be developed in the context of the wider landscape of ‗omics activities, both in the UK and internationally. BBSRC and MRC have invested heavily in this area, recent examples being the BBSRC Genome Analysis Centre (TGAC)23

and the MRC High-Throughput Sequencing Hubs24. The European Bioinformatics Institute (EBI)25

represents a key international facility based in the UK and European-scale coordination of biological information infrastructure is being considered in the ELIXIR ESFRI preparatory phase project13.

16

http://www.nerc.ac.uk/research/themes/biodiversity/events/itt-omics.asp 17

http://www.nerc.ac.uk/research/themes/tap/documents/tap-biodiversity.pdf 18

http://www.nerc.ac.uk/research/themes/ 19

http://www.elixir-europe.org/page.php 20

http://www.nerc.ac.uk/research/sites/facilities/details/mgf.asp 21

http://nebc.nerc.ac.uk/ 22

http://www.rcuk.ac.uk/cmsweb/downloads/rcuk/research/RCUKLargeFacilitiesRoadmap2010.pdf 23

http://www.tgac.bbsrc.ac.uk/ 24

http://www.mrc.ac.uk/Newspublications/News/MRC006188 25

http://www.ebi.ac.uk/

NEOMICS

Background and purpose of the review P a g e | 7

The existing NERC-led review of Taxonomy and Systematics26 remit is significantly more broad ranging than the NERC ‗omics review:

To review the current status of and trends in UK taxonomy and systematics, including the

nature of current funding and the size of the workforce.

To assess the current and anticipated future needs for the outputs of taxonomy and

systematics by the full range of its user communities.

To produce recommendations for a future UK Taxonomy & Systematics Strategy.

At the time of writing this review has yet to formally report. There was some cross-membership between the Expert Working Group‘s for both reviews. This remit of the Taxonomy and Systematics review is distinct from the work presented here which focuses on NERC‘s specific requirements.

26

http://www.nerc.ac.uk/research/programmes/taxonomy/events/review.asp

NEOMICS

Method P a g e | 8

7 METHOD

7.1 Scope and boundary of review

Following an open invitation to tender, the NEOMICS27 team, led by the Centre for Ecology & Hydrology (CEH) and the Cardiff School of Biosciences, were commissioned to develop a NERC ‗omics strategy. The outline approach to the review was finalised through dialogue with NERC representatives, Drs Bill Eason and Sarah Collinge, from the NERC Swindon Office (SO) including agreement on the Terms of Reference (Section 4.2). It was agreed that the review would include; i) a transparent and clearly documented evidence based assessment, ii) community consultation throughout the processes and iii) oversight and advice, through the appointment of an independent chairman and Expert Working Group (EWG) and iv) ongoing engagement with SO.

Professor Thomas R. Meagher (University of St Andrews) was invited to provide independent oversight to the strategy development, recognising his research track record as well as his role in promoting ‗omics through service on the NERC Environmental Genomics Steering Committee. Professor Meagher‘s primary role was to oversee the appointment of and chair the Expert Working Group (EWG – Section 4.3) which represents the principal vehicle for community consultation. A description of the subsequent process undertaken by the NEOMICS team is given in Section 4.4 together with detail of the suite of activities undertaken to inform the strategy development (Section 4.5).

7.2 Terms of reference for review including the ‘OMICS Expert Working Group

As abstracted in the original documentation the terms of reference for the EWG stated:

The purpose of the expert working group (EWG) is to work with the NEOMICS project and NERC and with inputs from the research community, to produce a coherent NERC ‗Omics Strategy for the next 5-10 years.

Specifically the EWG should provide an independent oversight of and advice to the NEOMICS project which has the following objectives:

a) assess the role ‗omics plays in delivery of the wider NERC Strategy Next Generation Science for Planet Earth and the support required to achieve this in the context of existing investments

b) assess the particular NERC needs for infrastructure, bioinformatics and data management

c) identify technological, bioinformatics or data issues unique to environmental ‗omics that may require strategic investment

d) assess the needs for training and skills development to support the use of ‗omics in environmental science

e) assess how NERC can remain flexible and take advantage of emerging technologies and new applications of ‗omics techniques

f) assess how NERC can work most effectively with other UK ‗omics funders (including research councils, charitable trusts, and industry) in supporting research, training and provision of facilities

g) assess how NERC should seek to interact with key international activities such as ELIXIR and the EBI

h) produce a costed roadmap to deliver the short-term and long-term requirements identified via the activities above

27

http://www.nerc.ac.uk/research/themes/biodiversity/events/omics-strategy.asp

NEOMICS

Method P a g e | 9

The EWG will work with the NEOMICS project to explore a range of scenarios or ―strawman‖ models to meet these objectives. The NEOMICS project will be responsible for delivery on the key objectives, in part, informed by the advice from EWG in an iterative process.

The EWG should provide advice on key stages of the NEOMICS project including the final report. Although the report will be prepared by the NEOMICS project it is expected that the EWG would be able to endorse the final report.

Summary EWG advice to the NEOMICS project should be published as an Annex to the final report.

7.3 Expert Working Group (EWG) Membership

The composition of the EWG was structured to contain an independent Chair (Professor Thomas R. Meagher - University of St Andrews) and relevant expert membership from across all sectors of the NERC community, including Research Centres and HEIs. Membership was developed in response to an open nominations process, and augmented with invited members. The EWG has worked closely with the NEOMICS project team members and with representatives of key UK ‗omics funders including Research Councils, charitable trusts and industry. The final composition of the EWG is given below:

Expert Working Group Thomas R. Meagher (Chair) [email protected] University of St Andrews Ewan Birney [email protected] EBI

Terry Brown [email protected] University of Manchester

Roger Butlin [email protected] University of Sheffield Melody Clark [email protected] British Antarctic Survey Guy Cochrane [email protected] EBI

Tim Gant [email protected] MRC, University of Leicester Jack Gilbert [email protected] Plymouth Marine Laboratory Simon Hiscock [email protected] University of Bristol,

Steven Paterson [email protected] University of Liverpool

James Prosser [email protected] University of Aberdeen

Jane Thomas-Oates [email protected] University of York

Nico van Straalen [email protected] VU University, Amsterdam

Charles Tyler [email protected] University of Exeter

Research Council Observers Bill Eason [email protected] NERC

Sarah Collinge [email protected] NERC

Amanda Collis [email protected] BBSRC

7.4 Process and key activities

The development of the strategy has taken ~9 months and incorporated four phases. In the first phase the detailed implementation plan and specific terms of reference (Section 4.2) were refined in consultation with the NERC SO.

The second phase was one of open consultation with the community. The primary vehicle was to convene an Expert Working Group (EWG), chaired by Professor Thomas R. Meagher (University of St Andrews). EWG membership was representative of a wide spectrum of the research community through an open nomination exercise. This included a transparent community-wide online process for nominations to the EWG. Observers from other Research Councils and other major ‗omics funders were invited to attend EWG meetings to ensure cross Council perspective was achieved. The report has benefited from the input of a number of international advisors through direct contributions to the EWG, through providing visionary science based case studies and through providing informed comment on the final report. The initial meeting of the EWG encompassed a wide ranging discussion summarised in the appropriate minutes of the meeting (Annex 2).

During the third phase of the process the collated information was distilled and used to phrase a number of key questions which were released as an online survey aimed at building consensus on community-centric (bottom-up) strategic options (Survey results as summaries in Annex 3). This

NEOMICS

Method P a g e | 10

survey accompanied an open invitation to a Town Meeting which included a facilitated workshop which had attendance from a spectrum of environmental researchers (Figure 1), technologists, stakeholders and the wider NERC community. Most areas of the NERC remit were represented – where this wasn‘t possible every attempt was made to seek additional perspectives through other mechanisms. The meeting was structured to provide a series of talks describing the technological landscape of ‗omics science whilst providing national and international highlights of the application of ‗omics which had yielded true impact with excellence. The workshop was a professionally facilitated interaction brainstorming event delivered by DialogueMatters, an external organization specializing in consensus building events relating to environmental issues. The comprehensive contributions provided by this meeting were transcribed and independently clustered without bias to reveal the community input (the details of the Town Meeting and workshop processes and outcomes are given in Annex 4).

Figure 1 | Community Representation at Town Meeting

Other Areas of expertise

• organic/isotope geochemist

• population genetics

Polar / Antarctic

Archaeology

System Biology

Data Integration,

Data Standards,

Models

Fungi, Protists...

The final phase was to analyse the community feedback in order to inform the development of an ‗omics strategy that would harness ‗omics for the delivery of NERC science now and in the future. A second meeting of the EWG was convened at CEH-Wallingford and benefited greatly from input from Professor Nico van Straalen (Head of Animal Ecology, Institute of Ecological Science, Institute of Ecological Science, Amsterdam) who has been involved in developing the Environmental Genomics strategy in The Netherlands. This meeting was presented with the completed evidence trail generated by community consultation using the information to assess a detailed options appraisal (Minutes from this meeting can be found in Annex 5). A conceptual summary of how to attain greater community integration and more effective use of resources and delivery mechanisms was presented to and debated by the EWG at this meeting. The report was then drafted by the NEOMICS team and through dynamic dialogue with SO and the EWG was expanded to include relevant supporting information and the structure and focus of the recommendation was refined.

The EWG Chair and key members of the NEOMICS team and SO representatives also met with Professor Ally Lewis in August 2010 to discuss the emerging outcomes of the review. This provided additional perspective and fit to the wider NERC Technologies Science Theme. Additionally Professor Lewis provided additional perspective from his background in atmospheric science.

A penultimate draft report was then presented to the EWG and a third EWG meeting convened at the Royal Botanic Gardens, Kew in September 2010. In addition to the available members of the EWG, this meeting was attended by Professor Terry Burke (Sheffield University; Director NBAF) who was asked to present an overview of NBAFs activities and its view of the future ‗omics

NEOMICS

Method P a g e | 11

landscape. This final meeting of the EWG aided the NEOMICS team in subtly refining details of the final recommendations and the form of the final report.

7.5 Structure of report findings

A key part of the review process has been to examine NERC‘s existing investment in ‗omics and the impact that this has had on NERC science, a necessary requirement to inform future requirements and the report recommendations. This review of existing investment includes that made by other organisations both nationally and internationally and is covered in Section 8.

This analysis informs the discussion on future requirements in Section 9. This includes an analysis of the key functions needed within a NERC ‗omics strategy. This is turn informs the key recommendations set out in Section Error! Reference source not found..

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 12

8 A REVIEW OF ‘OMICS: INVESTMENT AND INFRASTRUCTURE

NERC has invested significantly in ‗omics science, beginning eight years ago through the Environmental Genomics thematic programme and more recently through the Post-Genomics and Proteomics programme, investment which has provided significant strategic support to leverage the full potential of these approaches to deliver against the wider NERC Strategy ‗Next Generation Science for Planet Earth‘. The objective of this section of the report is to review these activities in the context of the delivery of NERC science but also to contextualise science delivery against related national and international capability and research activities. Interposed throughput this review are ―Impact Boxes‖ that provide summaries of specific research projects illustrative of science delivery. Furthermore, emerging headline ‗conclusions‘ are also highlighted together with associated evidence.

Primary impetus for ‗omics within NERC science was supplied through targeted strategic research programmes which developed a research community that rapidly matured showing significant impact within the Responsive Mode research grant portfolio. This growing community was complemented and nurtured through the establishment of core capacity and capability which has further stimulated the uptake and exploitation of ‗omics approaches in novel areas of NERC science. Complementary developments within the UK landscape (including Research Councils, Charitable Trusts, and Industry) in supporting research, training and provision of facilities are examined together with the major initiatives within the European and Global context.

Throughout the narrative the complex nature of the relationship between science and rapidly developing technology, both the underlying hardware and related in silico data interpretation, becomes increasingly evident. This being the case we have also considered the vehicles that may underpin more effective integration and exploitation of these resources across the breadth of NERC science.

8.1 NERC research investment

This report will distinguish the various ‗omics investments along the lines of the three major funding mechanisms used by NERC to support research; Research Programmes28 (RP), Responsive Mode (RM) and National Capability (NC). These modes of funding incorporate investments made to support areas of NERC science of strategic importance, support for curiosity driven or blue skies science and the development of facilities and centres that provides the underlying capacity and capability on which science delivery is based.

8.1.1 NERC strategic Research Programmes [RP]: Environmental Genomics (ended 2005) / Post Genomics and Proteomics (ended 2009) (RP)

The strategic importance and potential of ‗omics research was recognized by the UK government at the turn of the millennium leading to ‗omics targeted cross-Research Council investments. Within NERC these top-down investments were channelled through the Environmental Genomics29 (EG - £16M) and Post-Genomics and Proteomics30 (PGP - £12M) directed programmes with the objective of building community, capability and capacity. These programmes have established the UK as one of the world leaders in environmental ‗omics. Research was supported in both programmes with a composition reflective of the diversity of techniques and approaches required by environmental researchers. The EG programme supported a spectrum of projects (total of 29 projects were supported) focused on applying this new discipline across the broad remit of NERC science. In contrast the PGP programme invested 85% of its project funding into 5 large consortium based grants strategically targeted at specific areas perceived to provide high impact whilst the balance of funding was used to support 12 projects the majority of which provided

29

http://www.nerc.ac.uk/research/programmes/genomics/ 30

http://www.nerc.ac.uk/research/programmes/proteomics/

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 13

speculative pump priming to demonstrate the exploitation of the technology in novel areas. The EG Steering Committee immediately identified that there were unique challenges implicit in the type and volume of data generated by the ‗omics science and as such resource needed to be assigned to nurture a facility for managing the data together with providing for the appropriate training of the research community. As a consequence the EG data subcommittee established the NERC Environmental Bioinformatics Centre (NEBC) (for more details see Section 5.1.4.1). The major achievements of the EG and PGP programmes include:

a) The Environmental Genomics (EG) Programme played a very important role in capacity building and provided good training and skills development to the NERC scientific community

The programme provided an important training platform for many postgraduate researchers, postdoctoral researchers and technicians. Several postdoctoral researchers secured permanent academic posts as a result of their participation in the initiative. The training focus across all levels of scientific staff from technicians, students to post-doctoral scientists was a key strength of the programme and has helped provide a platform for the subsequent PGP programme and responsive mode grants. The PGP Programme has also supported a co-funded fellowship with the Environment Agency and a Knowledge Exchanger grants with Industry to foster skills exchange and education.

b) The EG programme developed effective approaches to data management, initiated the development of environmental genomic data standards and established public resources and encouraged data sharing

The programme met its objective of providing accessible genomic data management resources to the wider NERC and scientific research community. It did this through establishing the NERC Environmental Bioinformatics Centre (NERC) to coordinate the EG programme data management and bioinformatics training. EG and NEBC helped to foster a culture of openness and data sharing among its environmental scientists. It developed the MIAME/ ENV standard for environmental genomic data to improve quality assurance and enable data comparisons between research groups. The value and success of NEBC ensured its continued funding through the PGP programme and it is now a core component of NBAF.

c) The EG Programme helped NERC focus the provision and delivery of essential underpinning resources for key areas of research within the NERC remit

The initiative met its objectives of providing increased access to ‗genomics‘ tools and technologies. Scientists from EG worked closely with the NERC Service and Facilities team to provide community-wide access to DNA sequencing and microarraying. Scientists within PGP have helped extend this community based service to include metabolomics. The NERC programmes had a substantial impact for a relatively modest investment and delivered good value for money for NERC. They have underpinned future research across the NERC remit (see data showing an increase in responsive mode awards).

d) An increased level of networking and collaboration was an important outcome of both the NERC Programmes

Grantholders within individual consortia developed very strong collaborative partnership links with one another as a result of both programmes, many of which are still in place. In addition, the programmes fostered networking and collaboration within the wider research community, helping new research communities establish themselves and strengthening existing communities. This was an important outcome of the initiative and it has had a lasting impact. Both EG and PGP funded International Opportunities awards to develop a common approach to fish toxicogenomics and genomic data standards and promote the work of NERC in a global context and foster collaboration with the wider academic and stakeholder communities..

e) The balance and coverage of the Programmes was appropriate

The EG programme had a wide coverage of NERC science to help build capacity across the whole of the NERC remit. PGP focused on more defined themes to build on existing strengths to help keep NERC science at the international forefront.

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 14

f) Steering Committees made a valuable contribution to the initiative

The EG and PGP Steering Committee contributed to the programmes success. The Steering Committees helped to coordinate and prioritise research activities, they promoted interactions between grantholders, and they provided support for smaller, less established teams (e.g. the Wilson Grant funded in EG).

Impact Box 1: Post-genomics of host-parasite evolution and ecology (NE/D000602/1 - Dr S Paterson, University of

Liverpool, School of Biological Sciences)

All animals play host to parasitic infections that range from tiny viruses to large nematode worms. These infections are

particularly important for animals living in the natural environment, because they have to fight off parasites while also

trying to compete for limited supplies of food or compete for a mate. Understanding the ecology and evolution of host-

parasite interactions within the natural environment was the focus of this project. Using genomic methods provided a

unique insight into this problem, has allowed us to discover and investigate host genes involved in resisting parasites and,

conversely, parasite genes involved in infecting hosts. In addition to advancing our understanding of the ecology and

evolution of host-parasite interactions, work that also informs understanding of host-parasite dynamics in systems of

strong economic importance, such as crop systems, animal health, and human health. The wider implications of these

findings have cross Council significance demonstrating a strong interface with remits of both BBSRC and MRC. The

specific findings of this work include:

Macho males and immunity

Red grouse are a charismatic game bird found on UK moorland and an important source of revenue to many rural

communities. An unusual feature of their ecology is a regular cycle between years of either high or low numbers of grouse

on moors. As grouse become more crowded, the rate at which nematode infections transmit between them also increases

and the males also become more aggressive to each other and have higher levels of testosterone. One hypothesis is

that, because testosterone can suppress immunity, male grouse may regularly have to fight off parasites at the same time

as they're fighting each other. Moreover, they have to fight on both fronts at a time when they are least equipped to do

either.

This hypothesis has been tested by catching grouse from moorland and either clearing their parasite with a drug or giving

them a testosterone implant. Transcriptomic studies were then used to discover genes switched on in birds that had high

numbers of parasites and this analysis also revealed that the same genes were suppressed in birds given extra

testosterone. These data imply that the most 'macho' males may suffer more from parasitic infection. This work illustrates

the potential of environmental 'omics to examine the continual process of adaptation within the natural environment,

where dynamic interactions between species are strong and ubiquitous.

Catching the Red Queen

What drives evolution? One enduring theory is that the majority of evolutionary change is driven by an arms race between

pairs of species, such as hosts and parasites, where an adaptation in one species leads to a counter-adaptation in the

other, and so on ad infinitum. This hypothesis is often known as the Red Queen Hypothesis, since, as the Red Queen

explains to Alice in Lewis Carroll‘s Through the Looking Glass, ―here, it takes all the running you can do, to keep in the

same place.‖

By using fast-evolving viruses we were able to observe hundreds of generations of evolution in action. Left to their own

devices, the bacteria evolve to resist the virus, which in turn evolves to infect these resistant bacteria, and so on, exactly

as the Red Queen Hypothesis predicts. But here we were also able to capture the Red Queen by holding the bacteria

constant and providing a fixed target for the viruses.

We showed that the genome sequences of viruses running with the Red Queen evolved twice as quickly as those that

evolved towards a fixed target. Furthermore, while evolution towards a fixed target led to a set of similar looking virus

sequences, Red Queen evolution led to increasingly divergent virus sequences. These findings suggest that interactions

between species drive the majority of evolution and that by causing rapid divergence they could potentially lead to

speciation itself.

The concept of the Red Queen is centred on the fact that environments, both biotic and abiotic, are constantly changing,

and the Red Queen is all about competing rates of environmental change and biological change (evolution) to meet the

challenge of new environmental conditions. Investigations such as this are of particular relevance in the current global

awareness of medium-term climate change and other environmental changes due to human impacts that have been

accelerating.

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 15

g) The EG and PGP programmes have underpinned research from across the NERC remit which is likely to deliver economic and societal impacts in future

The initiative contributed to the substantial momentum in UK ‗genomics‘ research at the time, and it enabled large sections of the NERC research community to incorporate ‗genomics‘ technologies and approaches into their research programmes. To date, few direct economic and societal impacts have arisen from the research conducted within the initiative. This is to be expected for a programme focused on the development of capacity, tools, resources, facilities and datasets for the wider community. The major impacts will be delivered by other research activities which were underpinned by these programmes. It is likely that the investment in the EG and PGP programmes has compressed the amount of time between the introduction of new ‗genomics‘ technologies and their eventual delivery of impacts.

h) The programmes helped to maintain the international competiveness of UK bioscience research

The investment in ‗genomics‘ enabled UK researchers to conduct high-quality research and it raised the profile of UK environment research. It enabled the UK to contribute to a wider set of ‗genomics‘ tools and resources being developed by other countries, such as the USA, Japan, Canada and the European Union, and it positioned UK environmental researchers so that they were able to contribute to and influence the development of other multi-national ‗genomics‘ research programmes.

Conclusion 3. Previous NERC Enviornmental Genomics and Post-Genomics and Proteomics

thematic programmes established UK capacity and capability in selected areas of „omics

technologies and during their tenure established NERC international leadership in this area.

8.1.2 NERC Science Themes (RP)

NERC supports seven strategic Science Themes31 in support of its overall strategy. Established in 2007/8 following the publication of the NERC Strategy, this includes the Biodiversity Theme, which highlighted the need for a review of NERC ‗omics (Section 3). Other Themes include Climate System, Sustainable Use of Natural Resources, Earth System Science, Natural Hazards, Environmental Pollution and Human Health, and Technologies. ‗Omics has a potential role to play in improving understanding across much of these sectors (Section 6).

The Technologies and Biodiversity Themes however has a particular relevance to this review.

The grand challenge of NERC‘s Biodiversity Theme32 is, ―To understand the role of biodiversity in key ecosystem processes‖. The theme primarily concerns understanding the functional role that biodiversity plays in underpinning benefits to people. It is about addressing the question, ―What levels of biodiversity must we have in order to provide the ecosystem services that society needs?” This focus is reflected in the Biodiversity Theme‘s five major challenges:

Improve understanding of biodiversity‟s role in ecosystems: processes, resilience and environmental change.

Develop new tools and techniques to describe biodiversity and its function

Improve approaches for measuring abundance and distribution of biodiversity and its functions.

Enable society to predict and mitigate effects of biodiversity change on processes that sustain life.

Develop integrated tools for assessing the benefits of biodiversity.

The Technologies Theme has four challenge areas, where the evolution of new technologies is considered central to delivering NERC strategy goals. The overarching purpose of the theme is to ensure that the UK research community can work with the best and most innovative tools for environmental research that are possible. There are in addition theme challenges associated with

31

http://www.nerc.ac.uk/research/themes/ 32

http://www.nerc.ac.uk/research/themes/tap/documents/tap-biodiversity-2009.pdf

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 16

ensuring a broad portfolio of blue-sky activities and the retention of an appropriate skills-base for undertaking basic technologies research. The strategic technology challenge areas are:

Remote sensing and earth observation,

Intelligent field sensors and networks of sensors,

Novel laboratory instrumentation,

Informatics, models and data.

The most recent Technologies Theme report (http://www.nerc.ac.uk/research/themes/tap/ documents/tap-technologies-2009.pdf) has recognized the importance of NERC investment in ‗omics:

“High throughput „omics technologies allow the monitoring of functions and processes in the environment, and characterization of biodiversity. High-throughput sequence generation offers enormous potential to determine an ecosystem‟s diversity, and this may be greatly accelerated by adapting new technology for massively parallel genetic sequencing for environmental science. A targeted future action to better exploit such technological developments is highly likely. This will include technological enhancements to current bottle-neck areas such as sample preparation, and to the archival and informatics challenges associated with massive datasets. Any action in this area will be informed by the current work on the development of the NERC „omics strategy.”

There has already been a targeted investment in the Informatics, Models and Data challenge area. A Technology Cluster, ―Informatec‖33 has been established in order to bring together key researchers and other stakeholders and provide a forum for technology networking and knowledge exchange (http://www.nerc.ac.uk/research/programmes/clusters/) but this network only includes a minor dimension relating to bioinformatics analysis of ‗omics data.

Conclusion 4. Aspects of „omics science have been identified as core components of strategic

challenge areas for both the NERC Technologies and Biodiversity Science Themes.

8.1.3 The growth of ‘omics within NERC Responsive Mode (RM) science delivery

Interrogation of the NERC grants database for projects where the applicants have categorised the research explicitly as including environmental genomics approaches (as a representative of all ‗omics research) reveals a rapid rise in the cumulative value of project exploiting EG from 2002-2005 followed by subsequent levelling off in the year-on-year value of applications of ~£30 Million (Fig. 3A). Interestingly, EG awards follow the same pattern displaying a ~20% success rate apart from 2005 where the success rate increased to 27% then falling back to 13 and 11% in the following two years. These observations may be explained by the fact that the first phase of EG thematic projects were finishing in 2005.

Interestingly the proportion of the total award value assigned against genomics remained constant at 31(±1.4)% of all awards. The informative view of the data resolves when considering the average value of each project using genomics (Fig. 2C). The average project cost has increased year-on-year above the rate of inflation which can in part be explained by the multi-disciplinary nature of the projects. An analysis of the number of cross institutional projects does not reveal a convincing trend but intriguingly in 2005, 2008 and 2009 where the quantity of success EG exploiting projects were high all had a high proportion of cross-institutional awards.

The success of large cross-institutional projects from the PGP may be illustrative of the success of strategic investment into specific areas of NERC science. However, the requirement for a large consortium approach (in PGP) may have also restricted the diversity of applications across the full remit of NERC (Fig. 3). This may also explain the levelling off in applications since 2005 even though there are areas of NERC science under represented within the award profile. It may be concluded that a range of disciplines/remit areas are still not benefiting through the Responsive Mode funding mechanism.

33

http://www.bgs.ac.uk/informatec/home.html

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 17

0

10

20

30

40

2002 2003 2004 2005 2006 2007 2008 2009

Valu

e o

f fu

nd

ing

req

uest (£

Mill

ion)

0

2

4

6

8

10

2002 2003 2004 2005 2006 2007 2008 2009

Valu

e o

f fu

nd

ing

req

uest (£

Mill

ion)

0

100

200

300

400

500

2002 2003 2004 2005 2006 2007 2008 2009

Avera

ge v

alu

e o

f E

G a

ward

s (

£ K

)

0

2

4

6

8

2002 2003 2004 2005 2006 2007 2008 2009

Cro

ss In

situtiio

nal A

ward

s (

No

.)

A. B.

C. D.

Conclusion 5. Analysis of Responsive Mode funding shows an increase in „omics awards

during the previous NERC EG/PGP thematic programmes.

Conclusion 6. The value of research projects employing „omics (both funded and applied for)

has recently (05-09) leveled off but this does not incorporate the imminent impact of Next

Generation Sequencing or projects utilizing new NBAF nodes. A portent to the potential increase

that may arise is documented in Section 8.1.4.2

Impact Box 2 | Metagenomics contributing to Earth Observation at the Western Channel Observatory.

(NE/F00138X/1 – PI: Dr ST Ali, Plymouth Marine Laboratory, Plymouth Marine Lab)

The Western English Channel sampling site, L4, represents one of the longest and most complete marine observatories

in the world. It comprises approximately 100 years of contextual environmental monitoring data and now maintains a 22

year program of extensive biological monitoring. Recently we have used this extensive resource as a bedrock against

which to leverage a number of 'omic technologies to investigate the microbial communities at this site. Consequently, L4

now represents one of the most sequenced sampling locations on earth with more than 30 million sequences and nearly

10 billion base pairs of genetic information. This comprises the longest recorded dataset for taxonomic classification of a

microbial community in the world, with 6 years of 16S rDNA bacterial and archaeal amplicons sequences, which

represents the most comprehensive microbial time series on earth. We have used Illumina and 454-pyrosequencing to

generate more than 22 million 16S rDNA sequences from 96 time points over the 6 year period. Using this information

we have determined that bacterial and archaeal diversity patterns show an extraordinarily robust seasonal cycle, which

follows day-length with almost military precision. Strikingly, the duration of this study also enabled us to pick up unique

biological events that only happen once in the 6 years of observation, such as coupled diatom-bacterial blooms of

otherwise extremely rare organisms. Additionally, we have sequenced more than 6 billion base pairs of metagenomic

and meta-transcriptomic information from multiple time points, over diurnal cycles, seasonal cycles and different

locations around L4 and through depth - comprising one of the most important temporal and biogeographic studies

applied to a single location. These data show that despite considerable changes in the taxonomy of the microbial

community over time, the functional potential of the community is quite similar, which still showing seasonal trends.

However, the meta-transcriptomic trends for this data show a different story with temporally and biogeographically

Figure 2 | Analysis of the annual

Responsive Mode Funding to

Genomics based projects. The

cumulative value of awards

involving environmental genomics

both applied (A) and awarded (B) is

given showing the proportion

assigned specifically against EG

activity (solid section of bar). The

average value each award either

applied for (solid line) or awarded

(dashed line) is provided in Panel C.

The number of cross-institutional

awards is provided in Panel D.

Figure 3 | Science Area Analysis

of Responsive Mode Funding to

Genomics based projects. The

proportion allocated to each science

area was assessed for all response

mode projects exploiting

―Environmental Genomics‖ or

―Population genetics and evolution‖

from 2000-2010 inclusive.

Atmospheric

1%Earth

3%

Freshwater

14%

Marine

21%

Terrestrial

61%

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 18

localized events having a more significant impact, such as day/night changes and the importance of light penetration on

the water column. Following on from our functional characterization of the community, we are interested in linking the

functional potential and actuality of the ecosystem to the potential relative flux of the metabolites. To this end we have

started modeling the change in the potential of the community to use or produce metabolites, so called relative metabolic

flux.

Impact Box 3 | Palaeogenomics tracks the evolution of cotton following domestication (NE/F000391/1 PI: Dr

Robin Allaby, University of Warwick, Warwick HRI)

Next generation sequencing technologies are proving ideal for analysis of the small amounts of ‗ancient‘ DNA that are

present in some preserved specimens. Palaeogenomics is a new area of research that exploits these methods to obtain

genomic information from the archaeological and palaeontological records. Palaeogenomics enables us to follow

genome evolution in plants and animals as they adapt to the pressures placed on them by environmental change and, in

the case of domesticated species, human selection. Cotton was independently domesticated in the Old and New Worlds

some 7500 years ago, and in both regions its natural range was expanded by human intervention, forcing the crop to

adapt to new climatic conditions. Comparisons between wild and cultivated cotton species have shown that since

domestication the genome of cultivated cotton has expanded due to proliferation of transposable elements. To study this

expansion in real time, genomic data was obtained from four archaeological cotton specimens from 750 to 3750 years

ago. The results showed that the expansion did not occurred gradually, but instead involved one or more bursts of rapid

reorganization of the genome. In particular, one such rapid phase of evolution appears to have occurred in Old World

cotton during the last 1600 years. The genomic data also revealed details of the adaptive evolution of cotton to the new

environments to which it has been moved by humans. Specimens from Qasr Ibrim in the Nubian desert contained a

mutation in a gene involved in the response to dehydration stress, giving an insight into the adaptation of this water-

demanding crop to the arid conditions in which they were being grown. As well as its relevance to adaptation to climate

change and to science-based archaeology, this work has importance for our basic understanding of genome evolution

and the punctuated equilibrium model of organismal evolution, which cuts across research councils and is particularly

important within the BBSRC remit. The work also has applied relevance with regards to the design of knowledge based

crop breeding programmes.

8.1.4 NERC National Capability (NC)

Providing cutting-edge research facilities and services with the capability and capacity to deliver world-leading environmental research is a key role of the NERC National Capability funding. The activities supported range from environmental survey and monitoring, shared services and facilities, skill and expertise, research infrastructure and training and knowledge exchange. The delivery mechanisms include; Collaborative Centres (e.g. Plymouth Marine Laboratory (PLM)), Research Centres (e.g. Centre for Ecology and Hydrology (CEH), British Antarctic Survey (BAS), British Geological Survey and the National Oceanographic Centre (NOC)), Research Facilities (e.g. NERC Biomolecular Analysis Facility (NBAF)) and Data Centres (e.g. The Environmental Information Data Centre34 which incorporates NEBC). Provision of National Capability is directly relevant to ‗omics science due to the unique demands of the data generated together with the high capital cost of the platforms used. However, the overriding requirement is for the provision of expertise that will deliver domain specific skills in the application of ‗omics science and data interpretation into the various areas of the environmental research. The current investments in these areas are documented in the following section.

8.1.4.1 Strategic Research Programme investment in National Capability: NERC Environmental Bioinformatics Centre (NEBC) (RP funded but providing NC)

The EG Steering Committee recognised the inherently unique challenges posed by ‗omics data and established a data sub-committee to addressed issues surrounding how to support the NERC funded research community in complying with the NERC data policy through world-class data management. The committee top-sliced a significant proportion of the budget (15%) and through a competitive tendering processes established the NERC Environmental Bioinformatics Centre (NEBC) at CEH Oxford directed by Dr. D Field. NEBC was able to work at the cutting-edge of bioinformatics and data management provision to the community because it was built from research groups at CEH (resource delivery indicated by ¶ below), University of Manchester (see #

34

http://www.ceh.ac.uk/sci_programmes/env_info.htm

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 19

below) and University of Edinburgh (see $ below). The PGP programme recognised the success of NEBC against its remit and assigned NEBC a similar proportion of its overall budget to continue NEBC services. NEBC‘s has delivered the following components of its remit through:

i) Compliance with NERC data policy: EnvBASE35 was developed by NEBC to track all ‗omics data holdings and deliverables from EG projects. Additional limited support (2005-2007) was provided from the Mark Thorley (NERC Data Co-ordinator, SO) since EnvBASE was instrumental in delivering NERC data policy in relation to ‗omics. The ability of large multi-faceted ‗omics projects to handle, track and associate complex sets of related samples while sometimes undergoing differentially processing to liberate a wide spectrum of data is central to ‗omics research. To ensure both data holdings and physical resource could be managed NEBC developed HANDLEBAR¶36 a highly configurable bar-code based archival system for managing ‗omics resources within these complex projects.

ii) Establish data standards and nomenclature for EG: NEBC has engaged widely with the international community to comply with, and develop standards and nomenclature to support environmental ‗omics data. Their success is exemplified by the international adoption of MIAME-Env#,¶37. The NEBC team has been instrumental in establishing the international team that is working to deliver an Environmental Ontology (EnvO38) and related Gazetteer (Gaz39) to better facilitate electronic mining of environmental data. NEBC continues to drive activities within international standards forums including the Genomic Standards Consortium where Norman Morrison (CEH) chairs the Biodiversity working group, and has helped to launch the MIBBI consortium40 and the BioSharing Consortium41. Most recently, Dr D. Field in collaboration with Dr Susanna Sansone (Oxford University) have become recipients of a major joint NERC and BBSRC award to further develop an infrastructure to support integrated experiment data capture (See ISA Infrastructure document in Impact Box 4 below).

iii) Community training: Coordination and delivery of a range of training courses in specific areas of bio-informatics, including: Perl, R and Linux.

iv) Provision of informatic support: NEBC established a robust help desk allowing the EG and subsequently PGP scientists to obtain both routine and bespoke bioinformatics support. The centre delivered both cost effective access to commercial software (e.g. Genespring42 – the leading transcriptomic analysis environment) as well as bespoke software for ‗omics data storage, submission to data repositories and facilitating dynamic data interrogation and mining including (MaxD#43, Omixed#44 Trace-2-dbest & Partigene$45). They are also responsible for the now internationally recognised Bio-Linux¶46 platform that allows research to gain access to key informatics tools within a computational environment which is rarely used by environmental scientists but is core to many informatic tools (Linux) delivered within a centrally managed and validated environment. This system is now used by thousands of students, postdocs and researchers around the world and is increasingly being adopted to teach bioinformatics courses at a variety of levels.

The end of funding from EG and PGP poses a challenge to the ongoing stewardship of ‗omics data. These science programmes have succeeded in establishing a community that is actively

35

http://nebc.nerc.ac.uk/nebc/data/envbase 36

http://nebc.nerc.ac.uk/tools/handlebar 37

http://nebc.nerc.ac.uk/data/standards-and-ontologies 38

http://www.environmentontology.org/ 39

http://gensc.org/gc_wiki/index.php/GAZ_Project 40

http://mibbi.org/index.php/Main_Page 41

http://www.biosharing.org/ 42

http://www.chem.agilent.com/en-US/products/software/lifesciencesinformatics/genespringgx/pages/default.aspx 43

http://nebc.nerc.ac.uk/tools/other-tools/maxdinfo 44

http://nebc.nerc.ac.uk/tools/omixed 45

http://nebc.nerc.ac.uk/tools/other-tools/est 46

http://nebc.nerc.ac.uk/tools/bio-linux

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 20

pursuing ‗omics approaches but has no long term plan for the maintenance of National Capability to support the data is being generated. Part of the loss of support for bioinformatics services have been addressed by transforming some NEBC analysis capacity (2 posts) into a node of the NBAF, but this does not cover data management, standards or long-term stewardship of collective data outputs. To overcome this remaining shortfall, until the outcome of the NEOMICS consultation, NERC have provided continued minimum funding for NEBC in the form of an Service Level Agreement for 2 years (in the first instance) covering core projects and 2 posts. NEBC is now in the process of being integrated into the Environmental Information Data Centre47 (EIDC) at CEH which is a positive development but without sustained funding, CEH will not be able to continue to provide NEBC functions to the community. These stop-gap provisions provide only a limited and temporary solution for supporting a critical element of the required ‗omics national capability that underpins the complete spectrum of science delivery in this domain.

Conclusion 7. There is widespread community recognition of the role played by NEBC in

supporting the development of bioinformatics required to underpin NERC „omics.

Conclusion 8. The NEBC functions of data management, standards or long-term stewardship

of collective data outputs are currently support by an interim funding arrangement and in order to

sustain these operations long term NEBC support should restore to previous levels.

Impact Box 4 | Isa Infrastructure (BB/1000917/1 -PIs: Dr Dawn Field (CEH) & Dr Susanna Sansone). The

Investigation/Study/Assay (ISA) infrastructure project received BBSRC and NERC joint funding through the BBSRC

Biological and Bioinformatic Resources Fund (BBR). This effort supports the development of a freely available desktop

software suite designed for both curators and experimentalists that assists in the reporting and local management of

experimental metadata (i.e. sample characteristics, technology and measurement types, sample-to-data relationships).

The system has been designed to be highly scalable supporting project employing one or a combination of technologies.

Embedded in the approach is the concept of user empowerment in relation to uptake community-defined minimum

information checklists and ontologies. Furthermore, the ISA Team is engaging with and providing connectivity to a

growing number of international public repositories who endorsing the tools, these currently include ENA (genomics),

PRIDE (proteomics) and ArrayExpress (transcriptomics) and has launched the BioSharing community

(http://biosharing.org).

8.1.4.2 NERC Bio-molecular Analysis Facility (NBAF) (NC)

In order to build upon the strategic investment provided by EG and PGP and to support the growing community demand ensuring that the rapidly evolving ‗omics technologies contributed to ongoing and future NERC science, further investment was used to expand the established NERC Molecular Genetics Facility originally based at Sheffield. This initial expansion phase supplemented the population genetics and conservation expertise based at Sheffield (now NBAF-S) with transcriptomic expertise at Liverpool (now NBAF-L) and genomics expertise at Edinburgh (now NBAF-E). This multi-nodal facility provides a critical infrastructure across part of the ‗omics technologies to act as a foundation for environmental ‗omics. There has been strong support and demand for this facility from within the research community, and it has gained substantial national and international recognition. This view was further reinforced by the feedback from the community as part of this review.

Soon after the expansion of the NERC Molecular Genetics facility, advances in genomics technologies and expansion of community requirements provided a substantial challenge. These issues were comprehensively addressed by ensuring that the current facilities could access prompt investment to support the ground-breaking technological advances together with pump-priming funding to enable the community early access to develop the applications required to address a range of science questions. The Liverpool and Edinburgh nodes have been exceptionally proactive in adopting new developments in genomics (see 9.2.1) and now have state-of-the-art genomics capability allowing them to deliver a suite of services including de novo sequencing and annotation of small and large animal and plant genomes (See Impact Box 5 and 7). The flexibility afforded by the multi-nodal structure also allowed a recent expansion, in 2009, of the facilities to establish new centres to encompass novel technologies and services demanded by the community. These included establishing a Metabolomics Facility at Birmingham (NBAF-B) and a Bioinformatics Facility at CEH Oxford (NBAF-O, now based at Wallingford). At this time the facilities were re-

47

http://www.ceh.ac.uk/sci_programmes/env_info.html

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 21

named as the NERC Biomolecular Analysis Facility (NBAF) and now form one of NERC‘s core Services and Facilities48.

Impact Box 5 | Interrogating an honest witness: worms eye view of systems toxicology (NER/T/S/2002/00021 &

NE/F001185/1 PI: Dr P Kille, Cardiff University, School of Biosciences)

Clinical diagnostics entails the analysis of deviations from normal in the composition and functions of human cells,

tissues and organs. Increasingly, the discriminatory power of ‗omic techniques is being harnessed to facilitate and

individualize diagnosis. These principles have been applied to the field of environmental diagnostics, where site-specific

damage caused by chemical pollutants is a major concern in terms of ‗fit-for-use‘ decision making, agricultural

sustainability, wealth creation and, ultimately, human health. The concept is appealing but the practise was until recently

constrained by our sparse knowledge of the fundamental biology and, particularly, genetics of non-laboratory animals.

Advances in ‗omics technologies are fast addressing difficulties, with NERC funded scientists pioneering detailed studies

into sentinel species that serve as reporters of pollution intensity, much as canaries indicated the presence of explosive

gas to coal miners.

Recent NERC funding which consolidated a network of scientists from leading academic laboratories and research

institutes working in partnership with both legislators and industry, has illustrated the potential of ‗omics tools for

environmental diagnostics. This project took a true soil sentinel, the earthworm, and compared ‗omic fingerprints

measured at all levels of biological organization, from gene to phenotype, to provide a detailed insight into how

poisonous chemicals interfere with the intricate molecular machinery of living organisms. This work also illuminates how

genes, proteins and metabolites work not as independent entities but as cooperating players in exquisitely complex

functional networks. The project also highlighted the colossal and rapid advances in genomics techniques and the

importance of remaining state-of-the-art: the initial generation of a suite of more than 19,000 genes for the earthworm,

took over three years with significant dedicated man power, yet in contrast a complete draft genome has been completed

recently taking only a fraction of the cost and demanding only the, not insignificant, skill of an single bioinformatician. All

of this information, together with a burgeoning wealth of genetic data on other keystone terrestrial and marine annelid

worms, is deposited in a public-access database called LumbriBASE at www.earthworms.org.

We now have a burgeoning insight to the pathways underlying the earthworm‘s unique physiology together with the

biological macromolecules involved in response to changes in its soil habitat whether they are caused by the underlying

biogeochemistry or anthropogenic imbalances stemming from industrial processes, such as mining, or agrochemical

activities. Together with this transformation in our fundamental knowledge comes the ability to develop novel tools for

assessing current environmental health and improving our predictive power when using this diminutive organism to

evaluate environmental and human health impacts of specific chemical compounds.

An evaluation the breadth of community support afforded by the current NBAF nodes can be derived by considering a number of key high end metrics. Firstly, the NBAF user-base includes the majority of research-intensive HEIs in the UK. In the last 3 years, twenty-eight of the 40 highest ranking (RAE2008) Biological Sciences units have made at least one and often several successful applications to NBAF; this proportion rises to 33/40 if we include projects that begun in the previous 3-year period. The equivalent proportion for the top 20 departments are 15/20 and 17/20 respectively. The minority of departments that are not users include several that have relatively little NERC funding. NBAF also provides substantial support to the NERC Research and Collaborative Centres having provided support for projects involving CEH, PML and BAS for example. The current resourcing profiling for the five nodes is given in Table 1 and represents the direct funding whilst much of the work performed at NBAF-L and NBAF-E exploits the pay-as-you-go (PAYG) services where costs are recovered directly from project based funding awarded to PIs external to the NBAF node (Responsive Mode or other).

Table 1 | Present NBAF resourcing.

Node No. staff (FTE), excluding Directors Annual cost

(2009-10 in £k)

NBAF-B 2.4 134.1

NBAF-E 2.5 347.6

NBAF-L 2.5 363.3

NBAF-O 2.1 136.4

48

http://www.nerc.ac.uk/research/sites/facilities/list.asp

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 22

A.

NBAF-S 3.8* 305.4

TOTAL 13.3 1286.9

*includes 0.8 admin

The NBAF user base showed a significantly upward trend between 2005 and 2007, mostly resulting from increased exploitation of ‗omics approaches with research projects (PhD and research grant alike). However, the profile has been accelerated by an annual scheme to support pilot pump priming projects aimed at; a) extending the exploitation of ‗omics in areas of NERC science which have not exploited these approaches; b) stimulating novel technical developments and c) providing preliminary data to complement ongoing excellent research and lead to future responsive mode grant applications. This scheme provides only for the costs associated with the NBAF node activity (the 2010 round has capped this to a value of £8K) but this scheme has seen an exceptional level of interest and it provides a unique opportunity for research groups to exploit the extensive ‗omics infrastructure represented by NBAF.

Figure 4 | NBAF users and outputs. The number of project applications (shown in blue) together with the users (shown in red) for the

NBAF nodes are given in Panel A. The user number exceeds the application number since the majority of the application have multiple

applicants (CoIs and PIs). Outputs, given as number of ISI recognised citations, are provided for each node together with the total

number of thesis supported as a cumulative value for all nodes. Not the absence of outputs for NBAF-B and NBAF-O (now NBAF-W)

reflects that these nodes have not as yet been active for whole year and therefore have not submitted output returns.

A recent evaluation of the three established NBAF nodes (NBAF-S, –L and –E) by the NERC Services Review Group (SRG) found that management of the facilities both by NERC administrative staff and locally by the PIs had been exemplary and allowed NERC research to continue to compete at the highest international level. The SRG scored the three NBAF nodes being reviewed, Sheffield, Edinburgh and Liverpool, as delivering against the communities need as 5/5, the uniqueness of the services supplied as 4.5/5, the quality of science / quality of service performed as 5/5 and training 5/5. The success of NBAF is further reflected in a strong growth in demand, which is currently stretching its collective resource base. For example, demand for the metabolomics service provided by NBAF-B is nearing full capacity in only the second year of its operation. Moreover, the ongoing rapid rate of technological advance constantly challenges NBAF to be highly reactive and quicker to anticipate the community needs providing not only access to novel technologies but the support and expertise required to make the most effective use within the environmental context. It should be noted that the array of technologies and expertise available through NBAF is restricted with notable exceptions in coverage including approaches such as proteomics and lipidomics (Section 9.2.2).

Conclusion 9. There is widespread community support for NBAF, both specifically in relation

to service delivery and in the provision of essential infrastructure. The success of NBAF can also

be significantly attributed to the distributed multi-nodal model and its provision of crucial domain

specific expertise embedded within the NERC community.

Conclusion 10. The small NBAF next generation sequencing pilot project scheme has

demonstrated a large untapped potential both of novel technical application as well as NERC

science areas not currently exploiting „omics approaches.

B.

Num

ber

Num

ber

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 23

8.2 Interface with non-NERC ‘omics activities

The exploitation of common ‗omics platforms and approaches provide a natural interface with other funding organisations that support work utilising these methodologies. However, these technological overlaps often mask the potential for ‗omics to provide a bridge comprised of the quest for shared knowledge within strategic science areas thus blurring established boundaries. These can be exemplified by research in a number of crucial areas. In ―Environmental and human health‖ ‗omics provides the potential for deriving a mechanistic link between the impact of synthetic substances (e.g. drugs or agrochemicals) on sentinel species and exposed human populations. Equally, the application of ‗omics to characterise natural variation in wild populations that form the basis of our agriculture and its relationship to disease resistance (amongst other characteristics) deliver both under the BBSRC‘s core remit of sustainable agriculture food security whilst also informing on the sustainability of natural environment. These common inferences are not unique examples but together with the common technologies form a wider basis demonstrating that increasing the impact of NERC research may be achieved through improved collaboration with other Councils, industry and the international community. We have attempted to identify areas of common scientific endeavour as well as technology overlap, this cannot be a comprehensive exposition of these interactions or relevant institution but instead attempts to provide relevant exemplars needed to contextualise the forward strategy.

8.2.1 Cross Research Council

NERC has attempted to cultivate productive synergies by building experience and lines of communication to engender cross Council working. Mechanisms are in place aimed at ensuring that Councils work together to ensure there are no gaps in funding49. The 2006 Cross Council Funding Agreement sets out how RM applications that extend beyond the remit of one Council will be assessed. Additionally NERC is increasingly looking to work with other Councils to co-fund calls (e.g. Impact Box 4 | Isa Infrastructure). The strategic Science Themes RP investment also offers significant potential for cross-Council cooperation.

The synergies between Councils are often are supported fortuitously through complementary funding of various aspects of the work (see Impact Box 6). However, the apparatus for cross Council work is not always transparent to the researcher seeking support and in the absence of the appropriate proactive interaction with NERC on the part of the researcher excellent science can end-up being viewed as ―falling between stools‖ and may fail to get the support it requires.

Impact Box 6 | Central role for epigenetics in natural variation in seed vernalisation (NE/C507629/1 – PI: Professor

C. Dean, John Innes Centre, Cell and Develop Biology). Researchers have identified the central role played by an epigenetic factor, a long antisense transcript, in controlling the cold sensing mechanism that leads to acceleration of the flowering transition in plants exposed to prolonged cold, vernalisation. This observation has specific consequences for our understanding of the natural variation of this process observed across geographic climate zones whilst illustrating the importance of the epigenome in influencing natural phenotypic variation through nature. This process is also a critical aspect of crop development, and results from this study could potentially inform crop development to meet novel and changing climatic conditions findings which complement related research supported by BBSRC.

8.2.1.1 Cross council initiatives and areas of potential science interface

Environmental Exposure and Health Initiative (EEHI)50: This initiative was launched in 2009 and offers a model for smart cross-Council working. It not only partners NERC with other Research Councils including the Medical Research Council (MRC) and Economic and Social Research Council (ESRC), but also with the Department of Health (DH) and Department for Environment, Food and Rural Affairs (Defra) with additional support from a myriad of ancillary groups including notably a suite of agencies involved in environmental protection. This array of stakeholders provides a fertile area to bridge the interface between research, policy and practice, and ensure the translation of research results for improved environmental management policies. The scheme falls under the auspices of the ―Living With Environmental Change‖51 (LWEC) initiative contributing to our knowledge of the relationship between environmental quality and human/ecosystem health.

49

http://www.nerc.ac.uk/funding/available/researchgrants/councils.asp 50

http://www.nerc.ac.uk/research/programmes/eehi/ 51

http://www.lwec.org.uk/

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 24

The call specifically identified the role ‗omics technologies will play within delivering the goals of this scheme yielding both biomarkers as well as addressing questions of gene-environmental interaction in the context of the real world multi-sensor environment. An array of projects funded under the first round of this initiative have employed these global approaches to address the potential impact of myriad of potential toxicants, for example see Impact Box 7.

Impact Box 7 | From Airborne Exposures to Biological Effects (FABLE): the impact of nanoparticles on health.

(NE/I008314/1 – PI: Prof. J. Ayres University of Birmingham).

In this project, we will measure ambient concentrations of NPs and constituent metals (Cerium [Ce], Zinc [Zn], and

Vanadium [V]) in a range of urban and rural locations, in indoor settings and through modelled personal exposures. We

will define how CeO2, V and Zn NPs in different physico-chemical forms and at different doses, penetrates into different

human epithelia. We will establish the biological effects of these NPs in a range of biological systems both in vivo (rats)

and in vitro. This will comprise conventional biochemical/histological analysis of abnormalities and cellular toxicity, as

well as unbiased transcriptomic and metabolomic analyses and linkages to biological function via gene ontology and

KEGG pathways. We will also address how NPs with different characteristics enter cells, whether they cross the blood

brain barrier, their subsequent biology and how their disposition relates to toxic responses both locally and for the whole

organism. We will interact with policy makers throughout the work programme to use the acquired data to assess likely

policy needs using a range of approaches.

NC3Rs52: As we attempt to reduce, refine and replace (3Rs) animal testing, one option is to use invertebrate replacements but this relies on establishing a common mechanism and therefore application of ‗omics on environmental sentinels. NC3Rs activities is supported directly by government and supports research programmes often exploiting NERC based research in this very directed application area. For example the use of a genomics based approach to analyse the responses of fish cells or early stage fish embryos to potential environmental contaminants may provide an opportunity to reduce animal testing. However, it may be argued that the data is as important to deliver the NERC obligation in relation to research on environmental pollution and potential to human health. These overlaps provide a core theme to many of the NC3R projects as it supports research ranging from chemical effects on wildlife to the use of invertebrates in screening.

Conclusion 11. A routine mechanism that is transparent to research grant applicants is required

to support research lying at the interface between Research Council remits that is founded on the

formally established RCUK cross council agreement.

8.2.1.2 MRC and BBSRC investments in Genomics

The increased opportunities provided by ‗omics technological advances (see Section 9.2.1 for details) have led to substantial demand for access to these new transforming technologies within all science areas. The early adoption of the technology within the NBAF facilities in 2006 resulted in closely coupling the technological resource with the expertise in applying these new genomics tools to the environmental arena together with providing the service framework and governance required to ensure community wide access and support. BBSRC and MRC also recognised that providing their research communities with access was essential in order to allow all researchers access in a cost-effective manner and therefore ensuring funded research remained internationally competitive.

In order to address this requirement BBSRC, in partnership with local regional partners, established The Genome Analysis Centre53 (TGAC) in 2009 (http://www.tgac.bbsrc.ac.uk) a centralised genome analysis centre representing an initial capital infrastructure investment of £13.5M, the Council also underwriting running costs. The centre is based at the Norwich Research Park proximal to the John Innes Centre a major international centre of excellence in plant science and microbiology sponsored by the BBSRC and involved with a remit to ―addresses problems in agriculture, sustainable energy, food and nutrition, through novel approaches in genomics and specialising in genomics technology, high throughput data analysis, advanced bioinformatics and innovation‖. The ethos of TGAC centre is to closely couple the raw data generation with a production of value added information provided by detailed informatics analysis. Therefore the significant investment in data generation platforms has been matched by investment in the

52

http://www.nc3rs.org.uk/ 53

http://www.tgac.bbsrc.ac.uk/

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 25

hardware, software and expertise (people power) required providing the appropriate level of bioinformatics support. Access to the facilities requires pre-consultation with the centre and an explicit costing associated with the project application. Sequencing capacity can be sourced by commercial organisations but, at present, it is unclear as to how (or if) and under what costing framework non-BBSRC researchers may engage with TGAC. It is of note that prior to establishing TGAC BBSRC has funded various genome level project which exploited a range of public-private partnerships, as was the case with the sequencing of the Chicken transcriptome54 a project where the sequencing was outsourced to Incyte Genomics55. More recent projects have exploited the burgeoning next generation sequencing capacity imbedded within the UK HEIs, which has been sufficient to tackle some of the largest eukaryotic genomes (see Impact Box 8).

Impact Box 8 | Mining the allohexaploid wheat genome for useful sequence polymorphisms (BBG0130041 – PI:

Professor N. Hall, Liverpool University). August 2010 a previously funded BBSRC project (press announcement) completed the sequencing and assembly of the wheat genome (17 Gbp), the largest genome to be sequenced to date, primarily using the capacity and sequencing resources based at Liverpool University. This undertaking took a year to complete and together with detailed annotation and derivation of the genetic variation, also to be done at Liverpool, demonstrates the availability of the infrastructure to

undertake exceptionally challenging and extensive genomics projects. The sequencing and informatics resources used

for this project are available for NERC researcher through the NBAF-L node.

The approach to delivery of core ‗omics services delivered by BBSRC contrasts with the distributed nodal approach developed by NERC with NBAF. MRC however have also adopted a distributed approach, and tackled the issue of community access to high throughput sequencing through a competitive tendering process which resulted in the establishment, in late 2009, of four regional hubs at Oxford, Edinburgh, Liverpool and Cambridge (http://www.mrc.ac.uk/Newspublications/ News/MRC006188). This represents an investment of >£9M. It is of note that the centres at Liverpool and Edinburgh are coincident with the established NBAF centres providing the external recognition of these centres together with adding to the critical mass associated with these facilities.

Conclusion 12. The BBSRC investment in TGAC, focused around a single centre, contrasts with

the distributed NERC NBAF model. TGAC has an inherent focus on a discrete suite of test systems

and, as with the expanding commercial sector, does not provide the diversity of domain specific

expertise required by NERC community. There may be discrete opportunities to exploit this

resource in areas where BBSRC and NERC share joint strategic objectives.

8.2.1.3 Metabolomics / Lipidomics

The first Research Council-funded metabolomics facility in the UK was established in 2005. The National Centre for Plant and Microbial Metabolomics56 based at the Plant Science Department at the BBSRC-sponsored Rothamsted Research57 (RRes) is a leading UK facility for research and service activity in plant and microbial metabolomics. It was originally established as the first phase of a two-phase initiative by the BBSRC, totalling £6.5M, in plant and microbial metabolomics. The primary objectives of this centre of excellence are to provide a metabolomics service for the BBSRC microbial and plant science communities, to establish a training network in metabolomic technologies, to increase research activity in metabolomics in the UK, and to further promote technology development. The facilities available in the Centre consist of a suite of nuclear magnetic resonance (NMR) and mass spectrometry (MS) laboratories, staffed by experienced analysts, with access to extensive resources in analytical chemistry and controlled environment plant growth facilities at RRes. The Centre undertakes research in metabolomics and provides services for the comprehensive chemical and bioinformatic analysis of plant and microbial metabolites.

As discussed above, NERC subsequently funded the NBAF-B metabolomics facility at the University of Birmingham in April 2009 (~£134k pa). The primary objectives of NBAF-B are to facilitate the continued development and application of metabolomics to address scientific

54

Boardman et al., 2002 A comprehensive collection of Chicken cDNAs. Current Biology 12(22); 1965-1969. 55

http://www.incyte.com/ 56

http://www.metabolomics.bbsrc.ac.uk/index.htm 57

http://www.rothamsted.bbsrc.ac.uk/Research/Centres/home.php

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 26

questions in the NERC remit, to provide cost-effective access to state-of-the-art analytical instrumentation including NMR and MS, to promote the highest quality environmental metabolomics studies by providing extensive training and advice to environmental scientists, and to help maintain the UK as the world leader in environmental metabolomics.

Most recently, in July 2009, the MRC funded the Cambridge Lipidomics Biomarker Research Initiative58 (CLBRI) at the MRC Collaborative Centre for Human Nutrition Research, totalling £2M. The CLBRI has been established to conduct lipidomics research (analysis and characterisation of lipids, and their roles within the body) focusing on the integration of biochemical, nutritional and clinical research. The vision was to create an MRC-led, internationally distinctive laboratory that will have the capacity to discover and use biomarker techniques in the study of lipids and their roles in health and disease. As such the primary objectives of this centre are to bring together scientific and clinical expertise, be at the forefront of advances in mass spectrometry, create a bioinformatics facility, to develop a new lipidomics database, and to provide a hub for emerging technologies in the field of biomarker research. The facilities available in the Centre consist of a suite of four state-of-the-art mass spectrometers based on Orbitrap technology: an LTQ Orbitrap Velos biomarker discovery platform, a MALDI LTQ Orbitrap XL instrument, and two Exactive benchtop liquid chromatography-mass spectrometry (LC-MS) systems.

8.2.1.4 Proteomics

At present there is no centrally Research Council supported proteomics facility. However, there are a number of locally funded facilities either supported through regional development agencies, or by the hosting universities themselves. Researchers at host institutions can gain access at fEC costs and dependent on the area of science of the sponsoring academic may receive some generic assistance. There are a handful of research groups with the expertise in applying proteomics to various domains with environmental relevance but groups wishing to engage this expertise need to liaise on a personal level with these academics. Evidence was presented that the result is that researchers with excellent environmental questions with a high reward potential and with a good justification for using a proteomics approach may fail to receive funding due to technical flaws originating from applying proteomics approach to an allelic heterogeneous wild population when it has been optimised for clonal laboratory strains. Access to this type of domain specific knowledge may be needed to stimulate the exploitation of proteomics more widely with the NERC community.

8.2.2 Charities

Prior to the turn of the millennium the genomics capability of the UK was defined by the human genome project, an activity focused at the Sanger Institute59 sequencing centre. The Sanger initially diversified by applying its resources to characterise key model organisms (small genomes and short life cycles) with medical relevance in model systems (e.g. mouse, nematode, frog, fugu and zebrafish) together with deriving the genome sequences for many pathogens with medical relevance. It extended its remit to characterise organisms which lie at key evolution nexus points, such as the sea-squirts which are primitive uro-codata, and also collaborates on a range of genome annotation projects from Alpaca to Platypus through their Ensembl interface (www.ensembl.org). Although it has provided sequencing capacity within a semi-commercial framework, including an EG project focused on sequencing the Emiliania huxyleyi virus (see Impact Box 9) these are viewed as collaborative ventures where there is cross-fertilisation with core human health interest, as in the E. huxyleyi example viral evolution. Currently the demands associated with medically focused projects such as the UK BIOBANK 60and 1000 genomes project 61targeted at linking genetic variation and human health are saturating much of the present capacity. The Sanger centre has benefited significantly from the close physical and intellectual relationship with the European Bioinformatics Institute (EBI) which relocated to Hinxton Hall in 1994 and now shares the Genome Campus with the Sanger. This close working relationship has provided the critical mass that has powered the development of many of tools illustrating the benefits of close coupling of data generation and tool development together with the advantages of

58

http://www.mrc.ac.uk/Newspublications/News/MRC006220 59

http://www.sanger.ac.uk/ 60

http://www.ukbiobank.ac.uk/ 61

http://www.sanger.ac.uk/about/press/2008/080122.html

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 27

critical mass. Furthermore, this dynamic environment yielded key technological developments including that of one of the leading next generation sequencing technologies, Solexa now embedded within the Illumina Genome Analyser platforms (Section 9.2.1).

Impact Box 9 | Genomics reveals link between marine viruses and climate (NER/T/S/2002/00019 - PI: Dr WH

Wilson, Marine Biological Association, The Laboratory). The oceans have long been known to have a dramatic influence on climate. One way they achieve this is by absorbing large quantities of the greenhouse gas, carbon dioxide. This process is mainly conducted by small, floating marine algae – phytoplankton. Among the various factors that influence the abundance of these important ‗carbon sinks‘ is disease, including the effects of marine viruses. NERC-funded scientists studied a giant virus that infects chalk-covered algae (Emiliania huxyleyi) determining complete genome sequence in collaboration with the Sanger centre. These algae form oceanic blooms and soak up billions of tonnes of carbon dioxide. Many of the genes found in this study had never been seen before, and some may slow down the aging process of the infected cell by keeping it healthy for an extended period. Ceramide is produced that can control the timing of cell death and may have applications in anti-cancer therapies and anti-aging creams in the cosmetic industry. By comparing the genes encoded for by the virus with genes of known function from other organisms a better understanding of how the virus controls the growth of these ecologically important and highly abundant marine algae has been determined. This research also underpins practical applications ranging from carbon sequestration to cosmetics production.

8.3 Industry and end-user engagement

There are topics where direct industrial and end-user engagement offers mutual benefit. One of these areas is that of chemical risk assessment where various aspects of research are of potential commercial importance. This includes industrial and end-user engagement (providing modest funding and in-kind support) with ecotoxicogenomics projects (i.e. ecotoxicology projects exploiting genomics approaches), such as the EG funded project Ecoworm (see Impact Box 5). This research provided both specific data in relation to the mode of action of chemicals of a specific class but also it represented development of novel generic approaches that may underpin both screening and future environmental monitoring procedures. This dual interest lead to more direct co-funding in projects such as NERC‘s Knowledge Exchange Connect B62 partnership funding scheme project ―The population and molecular stress responses of an ecotoxicology indicator species‖ (PI: Professor R. Sibly, Reading University supported by Syngenta) which exploited phenotypically anchored transcriptomic studies of the water flea, Daphnia magna for environment chemical assessment.

The maintenance of a continuing dialogue with regulators that may wish to exploit these results generated is essential and this was recognised by the PGP thematic programme which supported Fellowship Positions in key areas to act as the interface with these organisations (e.g. Unravelling the Molecular Mechanisms of Disruption of Sexual Differentiation in Fish Exposed to Oestrogenic and Androgenic Pollutants; NE/C002369/1; Fellow: Dr R. van Aerle, Exeter University).

This area has been fruitfully exploited within an international arena. At the second Expert Working Group meeting Professor Nico van Straalen provided a brief overview of the BE-Basic63 consortium from Holland which represents a public-private initiative with an R&D budget of €120M focused at the development of clean, robust and competitive bio-based chemicals, materials and energy industries, including responsible monitoring and control of healthy soil and water environments, on the basis of advanced genomics technologies and bioprocess engineering. In excess of 20% of this budget will be used to support ‗omics based investigations in support of the chemical industry.

Conclusion 13. Application of „omics in specific sectors of the environmental portfolio provides

fertile ground for industrial, legislator and end-user engagement maximising the impact of NERC

environmental science.

8.4 Major International Centres

A series of major genome centres were established during the push to complete the human genome. Many of these centres have subsequently diversified their focus providing capacity to sequence a wider range of species. Both species selection and funding regimes are considerably varied but basically fall into two discrete models; direct funding for core capacity which is then prioritised to specific projects through a competitive processes and alternatively funding to support

62

http://www.nerc.ac.uk/using/schemes/partnershipgrants.asp 63

http://www.be-basic.org/

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 28

sequencing associated directly to a specific research project. Notable examples of these funding models is the Joint Genome Institute64 (JGI) which is core funded by the US Department of Energy Office of Science and the J. Craig Venter Institute65 is supported through a combination of US National Institutes for Health support for specific programmes, philanthropic donations and commercial project support. Of significance is the close coupling and significant emphasis placed on informatic provision within these organisations. Furthermore and intriguingly both organisations are multi-nodal with geographically separated centres contributing subject specific excellence to the central organisation.

Two major recent international developments, ELIXIR and Beijing Genomics Institute (BGI), are of note due to their European and global impacts. European Life sciences Infrastructure for Biological Information (ELIXIR) represents generically a European Commission programme to support 'Shared platforms for Data Resources in the Life Sciences'. Specifically it will deliver a major upgrade of EBI - facilities, data resources and services and an expansion of the current EBI so that it lies as the hub to a trans-European network of informatics centres delivering the capacity and capability needed to support the increased data demands of life sciences including those challenges provided by next generation sequencing data. Its objective is to provide the informatics support to address the four European Grand Challenges in support of: Healthcare for an ageing population; A sustainable food supply; Competitive pharma and biotech industries and Protection of the environment. The latter challenge directly maps onto NERCs remit and more specifically ELIXIR states that it will help environmental scientists to monitor life in the oceans, understand the effects of climate change on species diversity and develop new methods to tackle pollution and waste. In March 2010 ELIXIR invited expressions of interest from organisations interested in becoming nodes within its network. The position would require national financial support, but provides the conduit to influence the priorities within ELIXIR to deliver environmental relevant resource development. The direct delivery of tools pertinent to the stated grand challenge of environmental protection will require continued engagement and dialogue between ELIXIR and the environmental community across Europe including NERC funded researchers.

Conclusion 14. The EU ELIXIR represents the largest developing global bioinformatics initaitive,

comprising a trans-European network based on a node and hub structure. However, it provides no

direct funding to support environment-based informatics; therefore, NERC support is required for a

UK-based environmental node. ELIXIR will provide generic support for all „omics science,

coordinating standards, providing a long term data repository and supporting generic tools

development.

The Beijing Genomics Institute was established in 1999 and contributed significantly to the human genome project and subsequently to International Human HapMap Project66, carrying out research to combat SARS, being a key player in the Sino-British Chicken Genome Project67, and completely sequencing the rice genome and the silkworm genomes. However, in 2007 BGI went through a major step change in capacity and capability, in accordance with its goals for developing projects and platforms that are on the cutting edge of research and technologies; the organisation‘s headquarters was relocated to Shenzhen, a move that was coincident with a major instrumentation refresh which embraced next generation technology. This signalled an exponential increase in productivity illustrated by the 2008 announcement by BGI-Shenzhen of the Genome Map of Giant Panda rapidly followed by the publication of the First Asian Diploid Genome Project-―The Diploid genome sequence of an Asian Individual‖68. This was followed by the announcement in 2009 of three Extreme-Environment Animal Genomes Project and Ten Thousand Microbial Genomes Project. In 2010 BGI yet again embraced the increased capacity provided by developments in NGS technology and now represents the largest global sequencing centre. BGI provides sequencing capability to support projects funded by the Chinese government but also acts as a significant partnership to research projects from throughout the world, providing sequencing capacity coupled with appropriate informatics support at extraordinarily competitive prices. At present, the projects are costed individually and reflect the degree of collaboration with BGI. On May 17th 2010 officials

64

http://www.jgi.doe.gov/ 65

http://www.jcvi.org/ 66

Li et al., (2009) Building the sequence map of the human pan-genome. Nature Biotechnology. 28, 57-63. 67

Muir et al., (2008) Genome-wide assessment of worldwide chicken SNP genetic diversity indicates significant absence of rare alleles in commercial breeds. PNAS 105(45), 17312-17317. 68

Wang et al., (2008) The diploid genome sequence of an Asian individual. Nature 456, 60-65

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 29

from China and Denmark announced the establishment of BGI-Europe in Copenhagen (Denmark) and will work closely with the Centre for GeoGenetics69, a €7M facility established to exploit ‗omics approaches (genomes, transcriptomes and proteomes) to characterise samples isolated over a large temporal range. This centre currently represents the world‘s leading exponents of the exploitation of genomics to characterise ancient DNA and in collaboration with BGI has already led to the first ancient Human Genome70 (It should be noted that three of the core staff at the Centre for GeoGenetics, including its directory, held positions within UK HEIs but left having failed to secure long-term funding). BGI plans to invest $10M in BGI-Denmark in the first year recruiting 20 to 50 staff to establish a high-throughput sequencing platform within two years and set up a production base in Europe. BGI-Europe will provide all European partners the opportunities in technology development, production design and project development, in fields of scientific research, agriculture and bio-energy, personal healthcare etc. The vision is to make BGI Europe one of the largest centres of sequencing and bioinformatics analysis in Europe. The present capacity of BGI-Shenzhen and the vision for BGI-Europe will have major implications for global research involving sequence generation.

Conclusion 15. Commercial and governmental investment follows international academic

excellence. Failure to identify and support visionary but “risky” research results in loss of future

high profile academics, the resulting internationally leading research and any associated

commercial benefit.

Highlighting the growth of this field, several metabolomics facilities have recently been established across the globe. Examples include the Netherlands Metabolomics Centre (NMC), focused upon creating a world class metabolomics knowledge infrastructure to improve personal health and quality of life. This Centre integrates research institutes and industry in the medical, plant science and microbial fields, with a joint common strategy for future technology development and application. It primarily aims to develop standardised and validated technological tools and instruments that can be applied in new strategies for Dutch industry and academia, and clinical centres. NMC has a budget of €54M from the Netherlands Genomics Initiative for the period 2008-2013. ―Metabolomics Australia‖, a National Collaborative Research Infrastructure Strategy (NCRIS) Centre, is headquartered at the University of Melbourne and supported by $9.5M. The Australian Government has committed $50M through NCRIS for facilities to support the establishment of infrastructure in evolving biomolecular platforms and bioinformatics, e.g. for genomics, proteomics, metabolomics and bioinformatics. Within the US, several metabolomics facilities exist, including units with a specific focus on the environment. These include the US Environmental Protection Agency‘s Metabolomics Facility at Athens, Georgia, and the National Institute of Standards and Technology‘s NMR Environmental Metabolomics Facility at the Hollings Marine Laboratory, Charleston, South Carolina.

8.5 Synthesis Centres: A successful model of intellectual integration and community cooperation.

The concept of synthesis centres has been developed by the US National Science Foundation71 (NSF). The first NSF-sponsored centre, the National Center for Ecological Analysis and Synthesis72 (NCEAS), was established in 1995. NCEAS is based at the University of California, Santa Barbara, and its primary functions include sponsorship of focussed workshops to identify novel and integrative research directions, a focused forum for identifying and promoting real-world impacts of ecological research in application and policy development, science outreach, support for postdoctoral researchers and sabbatical placements, and sponsorship of large-scale research projects, usually funded by sources other than NCEAS itself.

This model has been considered a strong success by the NSF, which has since established several synthesis centres in different research areas following the NCEAS approach. For example,

69

http://snm.ku.dk/english/centres/geogenetics/ 70

Rasmussen at al., (2010) Ancient human genome sequence of an extinct Palaeo-Eskimo. Nature 463, 757-762. 71

http://www.nsf.gov/ 72

http://www.nceas.ucsb.edu/

NEOMICS

A review of ‘omics: investment and infrastructure P a g e | 30

the National Evolutionary Synthesis Center73 (NESCent) is entering its second cycle of funding, and there has been a recent call for proposals for a new Environment Synthesis Center74.

The function of these synthesis centres is not to provide a funding base per se, but rather to enhance the ability of the scientific community to identify and tackle research problems beyond the scope of individual research programmes and that are novel, represent significant scientific advance, and have strong societal relevance.

Conclusion 16. Conclusion 16. There is substantive precedent for significant added value

being achieved through the establishment of coordinating “synthesis centres”, especially in multi-

disciplinary areas that incorporate different experimental approaches.

73

http://www.nescent.org/ 74

http://www.nsf.gov/pubs/2010/nsf10521/nsf10521.htm

NEOMICS

Future requirements of NERC ‘omics P a g e | 31

9 FUTURE REQUIREMENTS OF NERC ‘OMICS

Having established the current ‗omics landscape this section of the report considers how these foundations together with future developments may be mobilized to deliver a NERC ‗omics strategy. In acknowledgment of the various NERC funding streams (RP, RM, and NC), this report uses the results from the various consultation exercises conducted during this review, in particular information from the EWG and the Town Meeting workshop, to evaluate: research requirements (both RP and RM) and the needs for core facilities, services and competencies (NC) to deliver the strategy.

This section also includes a technical evaluation of advances in the major platforms used in ‗omics sciences identifying the critical challenges that need to be addressed to support continued exploitation. This will follow up on issues identified in the previous section to help identify key functions needed to deliver the strategy.

Finally, possible scenarios for delivering increased productivity through increased coordination and integration of activities across all funding streams (as well as NERC interactions with other organisations) are considered, aimed at ensuring ‗omics continues to deliver across the breadth of the NERC remit and contributing to the delivery of the NERC Strategy.

9.1 Fertile areas for future ‘omics delivery in NERC science (RP & RM)

The EG and PGP programmes have developed an active ‗omics community as is evident by the increase in RM applications in from 2002-2005 (see Section 8.1.3). However, these programmes were aimed at building capacity in the community and they did not overtly target the current strategic Science Themes. Therefore, the starting point of this prospective analysis was to identify in which areas of NERC‘s seven Science Themes ‗omics science could potentially play a central role Although development in Responsive Mode (RM) or blue skies research, by its very nature, is difficult to predict we have also attempted to consider those areas of NERC science where there is the greatest potential for an expansion of ‗omics exploitation.

Since ‗omics is inherently an ―enabling technology‖ this analysis has also included an investigation of the particular relevance to the Technologies Theme, ensuring this encompassed physical platforms as well as data orientated issues.

9.1.1 Research requirements: potential areas for future ‘omics research (RP and RM)

To develop an objective evaluation of the strategic science areas where ‗omics may deliver we utilised the outcomes of the facilitated workshop as part of our Town Meeting community consultation process, conducted by external professional organisation DialogueMatters75 who specialise in facilitating stakeholder participation in ecological and environmental discussions. This workshop entitled ―Small Molecule: Big Impact‖ was a multi-faceted comprehensive exploration of the applications, critical contributions and future potential of ‗omics sciences within the environmental arena. The detailed outputs are provided in Annex 10.3 which includes the full Report and Community consultation document. One critical exercise of the workshop was designed to stimulate independent predictions from the participants of the critical areas where ‗omics could deliver in the future. An emergent clustering approach was then applied to the individual contributions to consolidate them into key scientific areas (see Impact Box 10 for detail).

75

http://www.dialoguematters.co.uk/background.htm

NEOMICS

Future requirements of NERC ‘omics P a g e | 32

Interestingly many of these areas map on the NERC Science Theme areas including Environment and Human Health (a, f), Biodiversity (a-c,e) and Climate System (d) reflecting the current and expanding utilisation of ‗omics approaches.

Many of these areas are still in their infancy with the potential of significant imminent expansion. For example, the genomics-based monitoring of ocean microbial biodiversity within the Global Ocean Sampling76 project (GOS) has revealed an extensive hidden biodiversity, and it is thought that this environment contains only a fraction of the potential biodiversity present within terrestrial and sediment systems, which will be addressed through projects such as the Terragenome project77.

Furthermore, recent advances allow community assemblages of eukaryotic micro and mesofauna to be explored providing a similar expansion of our understanding of hidden biodiversity. To harness the data provided by these approaches there needs to be novel ―systems level‖ thinking which will need new algorithms and biostatistical approaches to be developed. The ability to provide genome wide high density SNP maps to profile eukaryotic populations or even re-sequencing genomes across phenotypically diverse populations are presently being adopted as tools to explore fitness and species adaptation. Here again the large volumes of data generated threaten to overwhelm current analytical resources, including the fundamental informatics, the computational infrastructure and most critically the people with the appropriate skills and training to link the data to the science questions being addressed. The major challenge is to link the ‗omics level measurements to community structure and function and in turn to environmental health and sustainability.

The analysis of the workshop outcomes also reveals a number of key exciting areas which have not as yet, in the UK context, exploited ‗omics to its full potential. A key example is ―Understanding Past Ecosystems – for Future Choices‖ where emerging ‗omics tools now allow us to interrogate the impacts of past events, including global climatic change, such as ice ages; local geological incidents which have created or destroyed habitats (volcanic island formation or ingress of deserts); disease episodes; previous extinction episodes; or the consequences of the ingress of alien species (including the destruction of diversity associated with human habitation which incorporates those events leading to the extinction of other members of the Homo genus). The

76

http://www.jcvi.org/cms/research/projects/gos/overview/ 77

http://www.terragenome.org/

Impact Box 10 | Potential future applications of „omics: Scientific areas identified through an emergent

clustering approach from a community workshop (see Annex 8.3 for full details).

a) Understanding fitness and species adaptation:

Understanding individual and community fitness and adaptation at the level of the genome including

determining the ‗omic basis of adaptation and species formation examined in both space and time.

b) Ecosystems – systems thinking – understanding and limits:

Understanding the complex relationship between genes and ecosystems and the implications for

delivery of ecosystem services and potential future uses.

c) Global „hidden‟ biodiversity:

The application of ‗omics to provide a fast high-quality mechanism to determine biodiversity and to

identify species (expanding phylogenetics towards phylogenomics). Exploring the relationship between

organisms and the landscape (phylogeography) and the comprehensive description of communities of

organisms both meso- to macro- fauna (metagenomcis). Using museum and fossil material to explore

the change in biodiversity over time.

d) Climate change effects:

Compare past and present responses of organisms to global change to understand the impact of

species sensitivity, plasticity and adaptation on the outcomes of climate change. Developing new tools

to evaluate the effects of climate change on whole communities and ecosystems. Observing cryptic

turnover in species over time.

e) Evolution in action:

Modelling and mapping allelic variation over time (e.g. origin of lactase persistence in humans).

f) Environment and health:

Exploiting ‗omics technologies to assess causal relationships between the anthroposphere and health.

over time.

NEOMICS

Future requirements of NERC ‘omics P a g e | 33

technology is now capable of providing information enabling either whole ecosystems to be re-constructed from metagenomic analysis of trace DNA, or from proteomic analyses of remains, from stratified sediments or detailed characterisation of the genetic make-up of individual organisms from mammoth to man. One sparsely exploited area is application of these approaches to characterise bio-molecules, both DNA and amino acids, preserved under ice to provide information about paleo-environments78,79. The UK has an immense investment in both Arctic and Antarctic ice cores which could be harnessed to yield a novel insight into past climatic events and their effects on the associated biota although it should be noted that it is only the basal section of the ice that contain sufficient material for detail ecosystem reconstruction.

As yet, the full potential for ‗omics has not been harnessed to inform earth systems models. These approaches should provide additional layers of detail enhancing our understanding of the potential for living systems to contribute to C and N cycles informing modelling programmes such as SOLAS80, whose aim is ―To achieve quantitative understanding of the key biogeochemical-physical interactions and feedbacks between the ocean and atmosphere, and of how this coupled system affects and is affected by climate and environmental change". From revealing the essential contribution provided by microbial activity in the deep biosphere (rocks) to quantifying the potential for earthworms to contribute to global carbon fixation, ‗omics has an important contribution to make to these global processes. There are many further examples some of which were eloquently identified in the Environmental Genomics end of programme report (see Annex 10.6).

There is an on-going need to provide funding to support access to the ‗omics technology platforms supported through pay-as-you go mechanisms through a small pilot project scheme. This mode of support stimulates both novel technical developments as well proof of concept work in new application areas. NBAF has successfully pioneered such a scheme for the last 3 years (see Section 8.1.4.2) with the current round allocating a maximum of £8k of facilities time and receiving approximately 10 times oversubscription of the possible ten projects on offer. Optimally these schemes need to be maintained with the appropriate oversight to ensure they are targeted at supporting novel developed and new application areas.

Conclusion 17. „Omics can deliver in a spectrum of specific environmental areas many mapping

directly onto NERC priority areas.

Conclusion 18. Meta- „omics, the analysis of communities / ecosystems, is identified as one of

the applications with the greatest future potential for significant impact on NERC science.

9.2 Requirements for Core facilities, services and competencies (NC)

Environmental ‗omics has benefited greatly from commercial developments of analytical platforms mainly developed in response to the human health market place and the promise of personalised medicine. As these technologies mature, development of versions that can be exploited in the field context or even integrated into remote sensing platforms capable of automated operation with transfer of data through remote telemetry, provides exciting possibilities for environmental science.

Modifications of the present platforms along these lines may require future NERC investment, although, far from science fiction the logistic specification for this mode of operation is currently underway within the international space programs driven by the need to perform ‗omic analysis off planet. There may be significant requirements related to the laboratory elements of the technology as well as in the customisation of techniques and methodologies to address environmental questions.

Conclusion 19. Long term technical development for adaptation of the current „omics platforms

for “field” deployment with remote telemetry data provision would address the environmental

specific need for spatial and temporal monitoring, providing new insights for environmental

science.

78

Willerslev et al., (1999) Diversity of Holocene life forms in fossil glacier ice. Proc. Natl. Acad. Sci. USA 96, 8017–8021. 79

Willerslev et al., (2007) Ancient Biomolecules from Deep Ice Cores Reveal a Forested Southern Greenland. Science 317, 111-114. 80

http://solas-int.org/

NEOMICS

Future requirements of NERC ‘omics P a g e | 34

There are a number of ways issues can be identified and tackled. For example community based actions such as technical workshops or sandpits can successfully stimulate these advances (see Impact Box 11).

Other issues relate to establishing the robust implementation of these technologies within an environmental sphere ensuring aspects such as inter-laboratory / cross platform variations are assessed so they can be removed as contributing from derived global analyses. It was informative that when the community was asked to list their ‗top‘ papers the most frequently cited papers related to standardisation of metabolomics approaches reflecting the importance the community places on these activities.

Impact Box 11 | RAD Sequencing. (PI: Professor Mark Blaxter NBAF-E, Edinburgh University.

An example of how ‗omics applications can be developed through community engagement can be seen in a recent

advance in whole genome SNP analysis using next generation sequencing which provides a ground-breaking tool for

associating genotype with phenotype and is exceptionally valuable for determining the relationship between genomic

variation and underlying geography – phylogeography. Novel techniques using RAD (Restriction site Associated DNA)

Sequencing81,82

have exceptional promise to deliver these crucial environmental insights. The NERC community has

benefitted from two technical workshop hosted by NBAF-E designed on technology development through community

engagement and building critical mass of applications exploiting this technology. This initiative has benefitted enormously

from the altruistic contributions from the originators of the technology based at University of Oregon.

The EWG also identified the critical requirement for developments in the area ―Informatics, models and data‖ a strategic Technology Theme challenge area. These developments are needed to address significant increases in complexity and volumes of data being generated using the present NBAF ‗omics resources.

There is a need to further develop a critical mass of informatics expertise initially embedded in centres with established informatics / data processing expertise but affiliated with a spectrum of discrete science projects from the wider NERC community to provide grounding and specific application to novel informatics developments. This activity needs to develop novel approaches in informatics, biostatistics and data integration and should include significant community training including; sandpits, workshops and expert in silico laboratory training to ensure the current and future personnel can exploit genomics data fully. Further resourcing should be assigned against primary data generation to ensure specific science based questions can be addressed. These activities should be designed to develop a users community which is competent and informed to use appropriate informatics approaches and who are directly associated with delivery of research projects. There should also be a pool of centralised tools developers with a remit to evaluated, develop and optimise novel informatics methodologies which would then be distributed to the community through appropriate training and centralised resource provision.

It is clear that the rate of technology advance and increase in community demands within the ‗omics sector has been extraordinarily aggressive over the last 5 years. NERC has been exceptionally fortunate that, through the good management of the Service and Facilities team and commitment of key academics in the community NBAF has expanded and engaged with new technology developments (see 8.1.4.2). It is therefore important to consider how technologies needs may change in the future. This technical evaluation of the key platforms (Sections 6.2.1-6.2.3 below) is needed to inform the continuing degree of support required for these core facilities and capabilities in ‗omics technologies.

Moreover, multiple threads of evidence provided directly by members of the EWG together with response from the community consultation identified Data Management and Informatics is core to the delivery of ‗omics for NERC. Intriguingly, this very area had been previously identified in the joint research councils Large Facilities Roadmap (Section 6.2.4) and therefore this is explored fully below and illustrates the minimal requirement for the continued support of the functions currently served by NBAF-O/NEBC in bioinformatics and data management.

Conclusion 20. There is a need for an „omics user community that is competent and informed in

the use of appropriate informatics and technological approaches, and who are directly associated

with delivery of research projects. This should be supported by a pool of centralised tools

81

Baird et al., (2008) Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers. PLoS ONE 3(10): e3376. 82

Emerson et al., (2010) Resolving postglacial phylogeography using high-throughput sequencing. PNAS 107(37): 16196-16200.

NEOMICS

Future requirements of NERC ‘omics P a g e | 35

developers with a remit to evaluate, develop and optimise novel informatics methodologies. These

informatic knowledge hubs could be based at NBAF / NEBC or other national or international

centres, i.e. ELIXIR or TGAC, in order to create and maintain critical mass.

9.2.1 Genomics

The increase in the rate of sequence generation together with the reduction in the cost per base accelerated gradually since the development of Sanger sequencing in the early 1970‘s. The incorporation of fluorescent dye terminators and the utilisation of capillaries allowed for partial automation in the 1990s, and underpinned the sequencing of the human genome, representing a major acceleration in the ability to determine DNA sequences. However, in 2005-7 a number of systems that allowed generating sequence exploiting massively parallel arrays termed ―Next (or second Generation Sequencing technologies (NGS)‖ (see below for a technology brief) provided a step change in sequencing capability (Fig. 5). Further advances in 2009-2010 have heralded the era of single molecule sequencing – so called third generation technologies which have the potential to yet again impact significantly on genomics research. Technological refinements in NGS have continued to increase the rate of sequence generation whilst reducing cost. At the beginning of 2010 two of the leading NGS platform manufacturers announced new models (Hi-seq / Solid 4 HQ) with 6-10 fold increase in capacity and a significant reduction in running costs but requiring a capital outlay of £450-650K. These machines generate 400-600 Gb of sequence data in approximately 1 week. Complementary developments, nearly overlooked due to the extraordinary headline statistics of these high throughput platforms, were associated the release of ―bench top‖ NGS platforms which provide NGS capability for ~£100K and reduced running costs but provided only modest capacity (GS Flx Junior and SOLiD PI system). These developments place NGS technology firmly within the financial ability of most university departments and will drive the democratisation of the technology. It also will place ever increasing demands on the upstream techniques that prepare material to feed the platforms and the informatics process needed to turn the huge data volumes generated into scientific understanding and advance. The tools and correctly qualified people required to deal with this ever increasing mass of data is, at present, not available.

The impact of the capacity of these technologies is two-fold; the ability to describe DNA content (i.e. the sequences of bases together with epigenetic modifications) more rapidly and to a far greater depth than ever before revealing underlying genetic variation and secondly to provide a quantitative measure of the relative abundances of specific DNA sequences. This latter dimension had found only a few applications (e.g. SAGE) prior to the development of NGS technology but these technologies now allow relative abundance through counting to be used in a wide range of applications from transcript quantification to relative organism abundance by bar code

counting. Some of the transformative impacts for environmental sciences are summarised below:

1. Democratisation of genome sequencing at the level of the organism. It is now possible to determine the genome sequence of any organism relatively rapidly and economically. 7 The consequence is that research that was prohibitive due to the requirement for detailed genetic information or where a ―genome‖ model had to be used as a surrogate for a specific environmental target organism, can now be progressed using the optimal target relatively straightforwardly.

Figure 5 | An illustration of the increase in rate and

decrease in costs of sequence generation plotting the

major platforms developments which have driven these

developments.

1

10

100

1000

10000

100000

1000000

10000000

0.0001

0.001

0.01

0.1

1

Base p

air p

er

hour

(bp)

Base p

air p

er £

Ra

dio

nu

cle

otid

es

1980-

1990

1990-

2000

2000-

2006

Flu

ore

sce

nce

Ca

pilla

ry

MP

S

Sin

gle

Mo

lecu

le

1000000000

100000000

(1 GB)

(1 MB)

(1 KB)

2000-

2006

0.00001

0.000001

2006-

2010

NEOMICS

Future requirements of NERC ‘omics P a g e | 36

2. Characterisation of natural variation. The ability to determine the variation in sequence within and between populations relating this to the spectrum of phenotypes observed. Ultimately, this will allow the specific variation or set of variations underlying a specific phenotypic trait to be determined but also allow us to predict the plasticity within population to environmental change.

3. Derive gene-environment interactions. Determine the impact of anthropogenic and natural factors on how the organism exploits its genome.

4. Description and quantification of complex communities. The ability of sequencing technology to describe complex communities either through utilisation of species specific bar codes or through global sequencing of the DNA content of an environment isolate.

5. Analysis of ancient DNA. Second generation sequencing approaches have been successfully harnessed within a range of ancient DNA research. At present single molecule approaches have been found to be less appropriate for degraded (environmental) samples as the sequencing strategies are confounded by blocking lesions, which has proved problematic when this approach has been applied to buried samples (e.g. US Armed Forces application of Helicos technology to MIAs from the Korean War). Recent research with the Helicos platform (Leiden University) on direct sequencing of archaeological remains shows promise, but there is a considerable lack of understanding of the factors that influence the persistence of DNA.

Genomics technology brief: Next generation sequencing technologies are automated, high-throughput, massively

parallel DNA sequencing platforms that can produce thousands or millions of DNA sequences without the need for

individual cloning of the target DNAs. These technologies are intended to lower the cost of DNA sequencing beyond

what is possible with standard ―dye-terminator‖ methods (Sanger sequencing), on which the first generation of automated

sequencing technologies relied. There have been a range of technical approaches, including microelectrophoretic and

sequencing by hybridization, which have, so far, generated limited amounts of hard data83

. However, a number of

platforms exploiting ―cyclic-array sequencing‖ are now considered mature technologies and have become widely

available with their application yielding significant high-profile research. This has resulted in these implementations being

referred to as ―second-generation‖ sequencers84

. Primary data has now been published exploiting methods based on

―real-time observation of single molecules‖. This approach has additional advantages in that it not only removes the need

for target amplification but increases read length and fidelity, and reduces the time needed for sequence generation,

leading to the assignment of ―third generation‖ platforms.

Second generation: Although ―second generation‖ platforms employ cyclic-array sequencing as their central approach,

the specific implementations and associated methods are distinct, resulting in equipment with very different

specifications. Furthermore, each implementation is based on discrete proprietary methodology which has led the

community to refer to platforms by their original developers rather than the present models or manufacturers. As a result

the community now uses company names as accepted synonyms for particular implementations, even though the

platforms have been refined and even licensed to third parties. Furthermore the outcome of refinement in hardware,

software and supporting methodologies (both enzymic and chemistry) is that the specification of each system is rapidly

changing. Defining each platform‘s ―real‖ specification is further complicated since it is challenging to distinguish the

performance provided in company publicity and academic research ―developments‖ which have achieved under routine

delivery. The implementations differ at two crucial steps; i) generation of template, and ii) sequencing chemistry. Three

approaches are employed to generate template; emulsion PCR (―454‖ by 454 Life Science now ―GS Flx‖ by Roche and

―SOLiD‖ by Applied Biosytems), Bridge PCR (―Solexa” by Solexa now “Genome Analyser” (GA2) by Illumina) or single-

molecule requiring no amplification (―Helicos‖ by Helicos). Sequencing derivation is then achieved through

pyrosequencing (―454‖/‖GS Flx‖), reversible dye termination (―Solexa‖/―GA2‖), ligation (―SOLiD‖) and reversible dye

incorporation (―Helicos‖). For comprehensive review of these platforms with detailed methodology see Metzker (2010)85

.

The Helicos ―True Single Molecule Sequencing‖ (tSMS) lies on the boundaries between second and third generation

sequencing as it enables sequencing of single DNA molecules using a direct sequencing-by-synthesis approach but

represents a step-wise incorporation of each base which does not represent a real time observation of the strand

synthesis and provides only small fragments of 25-30bp long. Emerging technology that harnesses the power of

semiconductor technology to directly sense the release H+ ions during the DNA extension processes couples the

generation of sequence data directly to the digital input thereby reduces processing required (no image acquisition) and

83

Shendure et al., (2004) Advanced sequencing technologies: methods and goals. Nat. Rev. Genet. 5, 335–344. 84

Shendure and Ji (2008) Next-generation DNA sequencing. Nat Biotechnol. 26(10), 1135-45. 85

Metzker, M.L. (2010) Sequencing technologies - the next generation.Nat Rev Genet. 11(1), 31-46.

NEOMICS

Future requirements of NERC ‘omics P a g e | 37

linking sequence generation to the chip technologies. These development promise a suite of inexpensive machines with

low consumable cost allow true bench top sequencing projects to be delivered (http://www.iontorrent.com/).

Third generation technologies: By visualising the activity of DNA polymerase in real time, these approaches display

significant advantages over other techniques provided that the ―observation‖ can be performed without disrupting

nucleotide coupling the wild type speed, fidelity and read length of the DNA polymerase enzyme can be harnessed there

is something wrong here. Various approaches have been attempted including; using labeled DNA polymerase; reading

the sequence as a DNA strand transits through nanopores; and, microscopy-based techniques, such as Atomic Force

Microscopy or Electron Microscopy. However, at present the leading approach is the application of nanostructure arrays

creating a nanophotonic visualization chamber with an anchored DNA polymerase unit placed at the bottom of the

chamber allowing the real time visualization of 5‘ labeled dNTPs (Pacific Biosciences 86

).

Comparative evaluation criteria for sequencing platforms should include; number of concurrent bases generated from

each target, number of molecules sequenced in parallel, fidelity, run time per base, cost per base, suitability for

environmental and historical DNA extracts, and suitability for bar-coding/parallel sequencing (to increase throughput and

lower costs).

Conclusion 21. The unprecedented developments in genomics will continue for the foreseeable

future, delivering platforms with increased capacity, novel capabilities and lower cost. A second

major aspect is the “democratization” of NGS platforms with “bench top” versions likely to appear

in the majority of research departments over the medium term (2-5 year).

9.2.2 Metabolomics.

From the inception of metabolomics in ca. 2000 until 2005, the dominant analytical technology in biological, biomedical and environmental studies was proton NMR spectroscopy (largely driven by Professor Jeremy Nicholson, Imperial College London). This approach offers many benefits for metabolite analysis, including high reproducibility and the ability to identify and quantify analytes. Due to relatively low sensitivity, however, it detects only ca. 50-75 metabolites in a biological specimen, accounting for only a few percent of all metabolites in the metabolome. Consequently, the research community has expanded to include higher sensitivity technologies, in particular mass spectrometry (MS), in order to measure a much wider range of important metabolite classes. Whereas this expansion in technology usage began in the mid-2000‘s for non-environmental studies, it has occurred more recently in environmental metabolomics. Mass spectrometry is not (currently) without limitation, for example the difficulty of spectrum wide quantification of metabolites, and therefore NMR remains a valuable tool for particular applications not requiring high sensitivity. It should be noted that further technological developments continue for both approaches, with instrument manufacturers clearly focused on capitalising upon the large market in metabolomics. During the next 5 years, it is likely there will be a considerable and increasing demand for high resolution mass spectrometry in (environmental) metabolomics (including platforms such as the Orbitrap, FT-ICR and QTOF; see below). This statement is evidenced by the new MRC-funded Centre at Cambridge (see Section 8.2.1.3), which is based on four Orbitrap mass spectrometers.

There are multiple variants of ―mass spectrometry‖ used in metabolomics, with a range of different techniques for (1) sample pre-separation; (2) introduction and ionisation of the sample; and (3) the design of the mass spectrometer for measuring the ions. Technology development within mass spectrometry is vibrant, with instrument manufacturers continually announcing major new hardware and associated applications. For example, the first applications of ion chromatography and also of ion mobility spectrometry (IMS) prior to detection of analytes by MS have recently been announced. Such developments impact directly on the best technology options for environmental metabolomics and hence for this field to remain world-leading in the UK it will be essential to maintain state-of-the-art facilities for the NERC community. As described above, specialisms of metabolomics are now emerging as sub-disciplines, including lipidomics and glycomics, which aim to provide quantitative descriptions of global lipid and carbohydrate compositions of biological systems. It is anticipated that quantitative lipidomics will become a standard tool for clinical diagnostics87. While related fields such as environmental lipidomics are also anticipated to grow 86

Eid et al., (2009) Real-Time DNA Sequencing from Single Polymerase Molecules.Science 323(5910), 133 – 138. 87

Shevchenko A. & Simons K. (2010) Lipidomics: coming to grips with lipid diversity Nature Reviews Molecular Cell Biology 11:593-598

NEOMICS

Future requirements of NERC ‘omics P a g e | 38

rapidly within the next few years, such sub-disciplines utilise largely the same technologies as for the analysis of other metabolite classes. This is obviously an advantage in terms of technology provision to the NERC community, although it further highlights the anticipated increased demand for high resolution MS as well as expertise in the handling, analysis and interpretation of the data from these sample classes, some of which are very challenging to work with.

While measurement technologies have advanced considerably in the last few years, the bioinformatics tools and resources for environmental metabolomics have remained poor. In particular, there are no resources available to assist with annotation of the metabolomes of organisms of interest to the NERC community, with the exception of published papers. To be fair to this community, not even the human metabolome has been fully characterised, with estimates of the total number of unique metabolites in the human body amounting to many thousands. That said, well-funded resources such as the Human Metabolome Database (HMDB) have existed (and grown) over the last several years. Establishing databases that document the occurrence of metabolites in a range of sentinel organisms, for example by text-mining the existing literature, would be of immediate value to metabolomics studies by the NERC community. Furthermore, new tools (coupling analytical methods with informatics) are required for the identification of large numbers of metabolites (and lipids), which is a challenge for the whole metabolomics community. The scientific, societal and financial importance of the discovery and identification of metabolites is readily evident from the well-established field of Natural Products Chemistry, which has repeatedly discovered novel bioactive metabolites in our natural environment. Only after metabolite identification will we be able to understand phenomena such as metabolic responses to environmental stress, intra- and inter-species chemical communication, metabolic diversity etc. As such, international efforts to annotate the metabolomes of keystone terrestrial and aquatic organisms, such as Lumbricus spp. and Daphnia spp., should be discussed and implemented.

Ultimately, to increase our understanding of the natural world at the metabolic level, both the identification and absolute quantification of metabolites is essential. Following the ca. 5-year period of ―shotgun‖ or untargeted metabolomics (with particular challenges in quantification), the research community is now realising the next important (and complementary) step to maximise the value and future potential of metabolic measurements. Metabolomics is undeniably an excellent approach for discovering novel biology and therefore will continue to play a central role in the delivery of NERC science. However, for a given phenomenon, once the critical molecular pathways are known (i.e. that are diagnostic of a particular environmental stress), we can progress to analytical methodologies that target just those metabolites in a more specific and quantitative way compared to MS approaches used for the initial discovery phase. This next step in the growth and development of metabolomics has started. For example, Wei et al. (July 2010)88 employed a triple-quadrupole MS in multiple reaction monitoring (MRM) data acquisition mode to measure 205 pre-selected metabolites in plasma in only 10 minutes. New generations of triple quadrupole mass spectrometers now allow absolute quantification of this large number of metabolites in a single analysis. The potential of this targeted LC-MS approach is that it provides a cost-effective tool for assessing the health of our environment with an analytical capacity in excess of a hundred analyses per day.

Conclusion 22. Metabolomic platforms are becoming more sophisticated with increased

through-put. Significant work is needed to identify and characterise the full spectrum of unique

metabolites found in the natural world. Specialist areas of metabolomics are maturing as discrete

research areas; these include lipidomics and glycomics, each with specific relevance to NERC

science.

9.2.3 Proteomics

Proteomics involves the identification of proteins in a system, most commonly by analysis of their proteolytic peptides – the central technology is typically mass spectrometry. Proteins can be separated first, digested, and then the released peptides analysed. Alternatively, in a so-called shotgun proteomic approach, the unfractionated protein mixture is digested and the entire resulting mixture of proteolytic peptides that results can be separated and analysed. Although perhaps

88

Wei, R., Li, G. and Seymour, A.B. (2010) High-Throughput and Multiplexed LC/MS/MRM Method for Targeted Metabolomics. Anal.

Chem. 82 (13): 5527–5533

NEOMICS

Future requirements of NERC ‘omics P a g e | 39

counter-intuitive, because peptide separation and analysis technologies are so well developed, this approach is very powerful and the most commonly adopted technology in current use. Peptide separations using robust, rapid, and reproducible chromatographic approaches are well advanced, and such chromatographic separations interfaced directly with mass spectrometry (LC-MS) is now well established and widely available.

Mass spectrometry is used to mass measure the peptides (MS) and then to fragment them and mass measure the fragments generated (MS/MS), allowing the peptides to be characterised. The mass spectrometric data are often not interpreted de novo, but are instead used to match the experimentally-observed peptides with theoretical peptides generated in silico by prediction of the proteolytic digestion products of all the proteins in protein sequence databases. The subset of theoretical peptides with masses that match those of the experimentally observed peptides is then subjected to theoretical mass spectrometric fragmentation in silico, and software compares the experimental and theoretical spectra to identify the best matches. Peptide matching allows the identification of the proteins from which the peptides come. Modern mass spectrometers are robust, offer high sensitivity, and the newest generation (Orbitrap, Fourier transform-ion cyclotron resonance instruments, and very high resolution TOF instruments such as the maXis) offer low- or sub-ppm mass accuracies, which makes protein identifications much more plentiful and secure.

While LC-MS instruments are reasonably widely available, the use of such technology for proteomics applications is certainly not widespread. This has several causes, including: the expertise needed especially in experimental design and in sample preparation; the high cost of instrumentation, its maintenance and operation; the sample throughput needed for a proteomic experiment is high, which makes the real costs of instrument access for a complete proteomic experiment high; post-translational modification of proteins means that protein sequence predicted from the genome may not allow modified peptides to be matched, demanding de novo sequencing from the MS/MS data, a specialist and laborious process.

While protein identification is now reasonably easy and routine in labs with the appropriate expertise, measurement of changes in protein levels remains a technical challenge, whether using stable isotope labelling approaches such as iTRAQ and SILAC or label-free approaches that are becoming the norm; measurement of absolute protein amounts is rarely attempted, such is the difficulty and cost of the experiment.

The UK punches well above its weight in the international area of proteomics, perhaps because of the very strong history and record in mass spectrometry in this country. In the NERC community however, the challenges that beset proteomics seem to have meant that uptake of proteomic approaches to studying the environment have been limited. This is a particular loss to this community, given that it may well be easier to retrieve protein than DNA or RNA from an environmental sample. Perhaps ironically, the BBSRC and MRC communities have adopted proteomic approaches well ahead of the NERC community, in spite of their samples being more amenable to a broader range of approaches than are ours. NERC research that could benefit enormously from the power of the proteomic approach frequently does not consider making use of this technology, especially since there are no proteomics facilities for the community.

The application of proteomics to environmental and ancient samples is thus still in its infancy although considered by some to be a paradigm shift in the ability to monitor ecosystem processes89 because of the potential to simultaneously recover information about genome type and activity90. Studies have explored species response to pollution (protein expression signatures) and this approach has been expanded to ecosystem level exposure91,92. In older samples (protein

89

Keller and Hettich. (2009) Environmental Proteomics: a paradigm shift in characterizing microbial activities at the molecular level.

Microbiol Mol Biol R 73 (1), 62-70. 90

Schulze. W. (2005) Protein analysis in dissolved organic matter: What proteins from organic debris, soil leachate and surface water

can tell us–a perspective. Biogeosciences 2 (1), 75-86. 91 Lo et al., (2007) Strain-resolved community proteomics reveals recombining genomes of acidophilic bacteria. Nature 446, (7135),

537-541. 92

Wilkins et al., (2009) Proteogenomic Monitoring of Geobacter Physiology during Stimulated Uranium Bioremediation. Appl Environ

Microb 75 (20), 6591-6599.

NEOMICS

Future requirements of NERC ‘omics P a g e | 40

persists for at least 5 Ma, and perhaps much longer93) proteomics can be also be used to explore sequence evolution and both functional and disease-linked post-translational modifications. Today‘s high throughput, high sensitivity instrumentation means that data really must be mined by machine, but the immaturity of automated de novo sequencing means that advances in proteomics are tightly coupled to the availability of genomes/exomes for peptide matching.

Conclusion 23. Developments in proteomics have increased its utility within the environment

arena; however, funding to allow access to the technology is required, and use of the technology

for environmental research is required to drive the refinement of methodologies and approaches

for such sample sets, in order to see this approach fulfil its potential.

9.2.4 Data management and bioinformatics: Large Facilities Roadmap Environmental ‘Omics Bioinformatics Facility (NC)

Independent evidence provided by EWG and community consultation identified the critical-path that needs to be satisfied to support the continued and expanding delivery of ‗omics for NERC is in the area of data management and informatics. Interestingly, this very area that had been previously identified in the joint research councils Large Facilities Roadmap and therefore this is explored below. This illustrates the important requirement for the continued support the functions currently served by NBAF-O/NEBC in bioinformatics and data management.

The joint research council Large Facilities Roadmap94 published in 2010 provides an overview of the research facilities that are under construction of planned by RCUK and of other facilities – in the UK or overseas – that are under current consideration and of high priority to UK research. In this report it identified the emerging need for an ―Environmental Omics Bioinformatics Facility‖ (Page 46 of the ―Road Map‖ is extracted and included as Annex 10.7). This proposal recognized the plethora of applications of ‗omics within environmental sciences and identified the emerging bottleneck associated with the full spectrum of functions associated with data management and informatics. It identifies the need for new capability to support the analysis and interpretation of bio-molecular data as it pertains to the environment and proposes an operational model that coalesces the various functions associated with data (aggregation, interoperation and analysis) around a centralized ―hub‖. The burgeoning capability for ‗omics sciences to describe entire communities for any given environment together with the ability to describe the underlying genetic variation within populations requires novel advances in process-understanding to allow us to exploit the data to inform predictive modeling, either in relation to local sustainability or global level change (e.g. climate or ocean acidification). The pivotal role identified for the proposed RCUK facility would both be as an outward looking international network as well as fostering internal national capability. It is of note that the potential cost of this centre was estimated at £30M with an operational date of 2016. As will become clear, in the balance of this report, there may be more optimal models for delivery of the same functions in the near term; moreover the pace of technological development may accelerate the need for this investment.

It is important to note that as part of the community consultation survey respondents were asked specifically to consider the potential benefits of various delivery options for an ‗omics strategy, including the establishment of a environmental ‗omics ―centre‖. There was a reluctance to endorse a further physical centralization of core facilities and services including in bioinformatics and data management as set out in the RCUK proposal (see Annex 10.2). There are a number of issues underlying this response including access to domain-specific expertise which has been such a critical facet of the NBAF success, and secondly the need for community embedding of these functions and finally the potential relative inflexibility associated with a physical centre to respond to changes in community demand.

Support for informatics, in all its dimensions, is seen as crucial to realizing the potential of ‗omics and essential to underpin any future ‗omics developments. It is clear from the review of the current landscape (Section 5.1.4.1) that NBAF and NEBC funding at present permits only a minimal bioinformatics service following the end of the EG/PGP programmes. The interim arrangements

93

Schweitzer et al., (2009) Biomolecular characterization and protein sequences of the Campanian hadrosaur B. canadensis. Science

324 (5927), 626-31. 94

http://www.rcuk.ac.uk/cmsweb/downloads/rcuk/research/RCUKLargeFacilitiesRoadmap2010.pdf

NEOMICS

Future requirements of NERC ‘omics P a g e | 41

that are in place will not be able to address existing and future levels of demand. Restoring informatics provision beyond the levels operating under EG/PGP has been identified as a key priority by this review.

Conclusion 24. The current distributed model of „omics facilities and expertise embedded in

different “nodes” or research groups in NBAF and NEBC is seen to be working well by the

community (see also Conclusion 7). This provides a potentially more scalable and responsive

model for delivery of „omics services than a new centre set out in the LFCF RCUK proposal.

9.3 Key Functions underlying a future strategy

The evidence clearly indicates the potential for ‗omics to deliver significantly across the breadth of NERC science providing a transformative technology from which excellent science with significant social and economic impact can be delivered. To ensure the potential of ‗omics science is realised within the environmental arena and for NERC to maintain its internationally competitive science portfolio, continued investment in ‗omics is required to deliver the following functions. These functions are then mapped onto the delivery mechanisms currently supported by NERC (Table 2) and an evaluation is provided as to the success of this delivery. Note that this also includes non-NERC providers.

a) Science delivery (Research Programmes; Responsive Mode)

Strategic Research Programme priorities

‗Omics approaches represent a powerful suite of tools that may be harnessed to address the strategic Research Programme priorities inherent the critical issue of the 21st century - the sustainability of life on Earth. A conduit is required so that ‗omics experts can engage with Science Themes Leaders empowered to deliver NERC‘s strategic mission. By engaging with these identified science challenges the contribution of ‗omics can be identified and incorporated to support these long-term activities.

Underpin “blue-skies” Responsive Mode research

Continued support for Responsive Mode research is critical to tap the visionary potential of the UK research community to explore novel areas of scientific application that will contribute to stewardship of our environment. ‗Omics approaches enable a global overview of biological systems and therefore provide the comprehensive information needed to better inform simulations and predictive models.

Developing novel areas

This report has identified a number of NERC science areas with untapped potential for the exploitation of ‗omics science. This of course can be addressed through both RP and RM funding routes. However, additionally, a rapid and responsive route for supporting novel application areas is needed to demonstrate the capability of ‗omics to provide novel insight. This might incorporate pump-priming access to ‗omics infrastructure (as embodied by the current NBAF pilot scheme in support of Next Generation Sequencing), opportunities for cross disciplinary training and workshops where expertise from different specialities can be brought together.

b) Core facilities, services and competencies

Training

Training is required at two levels: that of the established investigator and also of the junior researcher. New researchers together with researchers from complementary disciplines need to access an introduction to ‗omics technology with appropriate examples of their deployment, further applications and limitations. Investigators need to be educated as to the potential of new technologies, frontiers update, to deliver new insights into their specific research area together with best practice for experimental design. Researchers need to receive appropriate training from experts in the field, technical master classes, as to the theoretical background of the techniques, the experimental delivery and the appropriate data interpretation / informatics required to

NEOMICS

Future requirements of NERC ‘omics P a g e | 42

analyse the results. The latter issue, however, is the most pertinent. The rapid advances in sequencing capacity have generated a gap in terms of informatics infrastructure to deal with the massive quantities of data being generated. It is essential that we ―tool-up‖ the community with informatics skills so they have the basic competence in data management and informatics, implicit is the appropriate understanding of the bio-statistical approaches. There is considerable scope for the development of the new specialised area of environmental informatics.

Data generation infrastructure: Facilities, services and technology

It is essential to maintain a system whereby all NERC researchers can gain access to the most up-to-date ‗omics technologies (including genomics, transcriptomics, metabolomics, as well as the development of lipidomics and proteomics) supported by domain specific expertise that is highly versed in both the underlying technology requirements and knowledgeable in exploiting these approaches within an environmental context. This infrastructure must be able to response dynamically to provide the capability and capacity required by the NERC community and to reflect novel technical developments as they emerge. Appropriate investment must be provided to the people supporting these facilities to recognise the contribution they provide to the wider community.

Data management and informatics

The NERC data policy must be delivered and therefore there need to be support for data archival (including interface with appropriate international repositories), management (ensuring data integrity and interoperability) and adherence to / development of standards (recognition and adoption of international standards / development of new standards where required). These activities are essential to underpin data sharing, cross-experimental data mining and future analysis.

A critical mass of expertise is required to underpin the complex process from data processing through experimental interpretation (bioinformatics & biostatistics) to integration and meta-analyses. This will require specialist expertise associated with different data types and a specific understanding of how the ‗omics techniques have been deployed and specific experimental context. A wide spectrum of informatics provision is required ranging from signal (data) processing which may be closely coupled to raw data generation to the integration of ‗omics data into environmental models (such as modelling impacts of climate change) which require specialist knowledge of the application area. Although it is essential to establish a community of ―experts‖ who provide these skills it is also important to have a training and informed biological community who are informed sufficiently to interpreted the significance of the outputs of these analyses (see training above).

c) Integration and prioritisation

Community coordination, delivery and management

There are significant benefits of shared knowledge and community action in relation to technical development / deployment together with the considerable value added insights gained from multifaceted / cross disciplinary approaches focused on specific science questions. The rapid developments in the technology underpinning ‗omics together with the changing demands signals a compelling need for a ongoing horizon scanning and prioritisation exercises where the opportunities posed by new developments can be evaluated and appropriate road-map to exploitation developed. Similarly workshops need to be staged that focus on specific science challenges which may be in areas with no history of ‗omics exploitation. These ―sandpits‖ may well bring together research from diverse disciplines to focus on how ‗omics may be employed to address these questions.

Engagement with other partners

NEOMICS

Future requirements of NERC ‘omics P a g e | 43

An increased in engagement with other funders both nationally through a cross Council interface but also with aligned funders internationally will allow NERC to address the big science questions and reassert its pioneering role in environment ‗omics.

The science needs to include the ability to undertake substantial sequencing projects and to contribute to international multi-genome projects. There is no contribution or influence in these from the UK at present, but we need to have the capacity to get organisms of UK and NERC interest included, and to ensure that UK environmental scientists are properly involved in these programmes

Maximising impact

Many areas of ‗omics application have potential commercial opportunities whether they are direct, as with bio-prospecting, or indirect associated with the leverage of commercial speculation linked to academic excellence/investment (as projected for TGAC and already seen in the case of BGI-Europe and Centre for GeoGenetics in Denmark). Attracting commercial investment can be delivered by establishing Knowledge Exchange networks where groups of researchers involved in a similar applied area of science may provide a unified interface for large industries and a fertile environment from which to support interactions with SMEs.

To deliver research outcomes that are fit for purpose the community needs to actively involved in skakeholder engagement. However, engagement at the level of the individual researcher is often inefficient for both the researcher and end-user, therefore activities that provide integrated forums that bring researcher and stakeholders together will provide a more fruitful conversion of research into impact.

The communication of the outcomes of ‗omics research to the public and the exploitation of aspects with potential industrial or governmental application is essential to ensure the benefits accruing from these techniques is fully realised.

Conclusion 25. A suite of functions can be defined that are required to ensure the effective

delivery of NERC research through „omics approaches.

NEOMICS

Future requirements of NERC ‘omics P a g e | 44

Table 2 | Functions required to foster future „omics delivery mapped onto current NERC delivery mechanisms.

Categories Functions

Re

sear

ch

Pro

gram

me

s

Re

spo

nse

Mo

de

NER

C C

en

tre

s

Serv

ice

an

d

Faci

ltie

s (N

BA

F)

Dat

a C

en

tre

s

(NEB

C)

Stu

de

nts

hip

s &

Fell

ow

ship

s

Kn

ow

ed

ge

Exch

ange

Oth

er

Co

rpo

rate

Fun

ctio

ns

No

n N

ERC

Faci

ltie

s

Strategic Research Programme priorities ***

Underpin “blue-skies” Response Mode

research*** **

Developing novel areas * * ** *

Introductions to 'omics technologies *

Frontiers update *

Technical master classes *

Data management and informatics ** ** *

Genomics *** **

Transcriptomics *** **

Metabolomics *** *

Lipidomics *

Proteomics *

Delivery of domain specific expertise ***

Data Archival / managment / standards **

Data (Signal) processing **

Interpretation (bioinformatics /

biostatistics)*

Data integration *

Data meta-analyses **

Horizon scanning* *

Application of 'omics to novel areas

(Sandpits)*

Cross Research Council interface** *

International engagement * *

Attracting commercial investiment * **

Stakeholder engagement ** * **

Public enagagement * * * *

*** Most 'omics specific requirements currently met

** Not all 'omics specific requriements currently met

* Little or no 'omics specific requirements currently met

(no star) Not relevant for this function or function is recorded in another category (eg for NEBC within CEH)

Scie

nce

De

live

ryC

ore

fac

ilit

ies,

se

rvic

es

and

co

mp

ete

nci

es

Inte

grat

ion

an

d p

rio

riti

sati

on

Delivery Mechanisms

Dat

a m

anag

emen

t an

d

info

rmat

ics

Max

imis

ing

imp

act

Trai

nin

g

Dat

a ge

ner

atio

n in

fras

tru

ctu

re:

Faci

litie

s an

d t

ech

no

logy

Res

earc

h

Inve

stm

ent

Co

mm

un

ity

coo

rdin

atio

n,

del

iver

y an

d

man

agem

ent

Enga

gem

ent

wit

h o

ther

par

tner

s

9.4 Integration, coordination and prioritisation of ‘omics science

To successfully build on previous strategic investment whilst also addressing key emerging research challenges with the result that ‗omic science is further embedded across the full NERC remit requires the integration, coordination and prioritisation of a complex and dynamic suite of activities or key functions within NERC.

It is evident that there are also potentially highly fruitful areas for interaction with other national and international funders, such as joint RC initiatives, where partnerships may provide highly efficient mechanisms for optimal delivery of big science questions. Furthermore, if the social and economic

NEOMICS

Future requirements of NERC ‘omics P a g e | 45

impact of these developments is to be realised increased interaction with industry, stakeholder and the public is critical.

In the absence of integration and coordination this intricate series of activities both within NERC and elsewhere will maintain independent trajectories. However, if they can be effectively managed each function, including science delivery; core facilities, services and competencies; and integration and prioritisation, will enhance the ability of ‗omics sciences to deliver against NERC‘s Strategy.

The delivery model proposed by this review for this integration is the establishment of a cost-effective, virtual, ―Environmental ‗Omics Synthesis‖ (EOS) centre. The concept is loosely based on the successful synthesis centre model developed by the US NSF (see Section 8.5). In order to facilitate the development of environmental ‗omics within the UK community and to capitalise on the scientific lead established by earlier NERC funding in this area, this review proposes the establishment of a virtual EOS centre, but with features that address the unique needs of the UK NERC environmental science community. To be clear the proposal is a for a virtual centre – not the establishment of a new physical centre – rather a scalable, flexible, and cost-effective coordinating body, capable of responding quickly to changes in technology and user demand.

The key functions of NERC EOS should therefore be to help NERC to provide overall community coordination, prioritisation and integration of ‗omics investments including:

provide a strong coherent voice for environmental ‗omics, promoting uptake of these approaches across the NERC remit;

coordination of existing and future activities, including horizon scanning scientific and technology drivers, identification of skills and training needs, and prioritisation of future needs including on-going advice to NERC on investment in research and facilities that is flexible, scalable and responds to changes in user needs and emerging technology;

development of a unified interface and strategy to engage key national and international partners and stakeholders including Research Councils, industry (e.g. Sanger, BGI), EU (e.g. ELIXIR) to help maximise the impact of NERC investment in ‗omics.

EOS delivery mechanisms could include the establishment of a community-led steering group with membership drawn from across the NERC remit and to include key external partners including from sister Research Councils. It is envisaged that activities would also include a series of workshop-based forums for NERC community-level identification of on-going science priorities and the facilities, services and skills needed to support this. EOS would also be expected to:

provide an effective interface with other stakeholders and investors in ‗omics including the Research Councils, government, industry, and international organizations;

advise NERC on training priorities to build a cross-disciplinary skills capacity, where appropriate providing a coordinating role on new initiatives;

provide a unified portal to enhance impact delivery to end users and society;

grow a critical mass of bioinformaticians and biostatisticians integrated, embedded and distributed within the scientific community;

work with current and future funding initiatives to ensure cost effective and coordinated development of the ‗omics research area.

EOS would therefore be expected to have primarily a coordinating role – a major function would be to coordinate efficient links between existing investments and functions that are currently dispersed (e.g. NBAF) but in most cases not to manage these directly. Much of the EOS functions would also need to interface with existing NERC Corporate activities (e.g. Knowledge Exchange; training; developing effective interfaces with external partners) providing advice to NERC and leadership on ‗omics related issues.

Conclusion 26. Coordination and integrating of the various activities and functions of a NERC

„omics strategy is needed to maximise delivery and ensure international leadership.

NEOMICS

Future requirements of NERC ‘omics P a g e | 46

Figure 6 | A schematic model for the coordinating role of the proposed Environmental „Omics

Synthesis centre. This illustration shows the network of ‗omics functions within NERC. It is proposed that

EOS would provide an overarching coordination role but in most cases not directly managing these separate

functions.

Data generation infrastructure: Facilities and technology

Non-UK Centres NBAF

UK RCfacilities

Data management and informatics NBAF NEBC

NBAF Training Studentships / Fellowships

Engagement with other partners

Maximising impact: Linkingwith End user / Industrial

Community coordination, delivery and management

Research Programmes NBAF Knowledge Exchange

Research Programmes NERC corporate

Research Programmes

Response Mode

Knowledge Exchange

Research Programmes

Response Mode

Underpin “blue-skies” Responsive Mode research

Developing novel areas

Strategic Research Programme priorities

Studentships / FellowshipsResponse Mode

NERCCentres NBAF Studentships /

Fellowships

Environmental ‘Omics

Synthesis centre

Co

mm

un

ity

coo

rdin

atio

n, p

rio

riti

sati

on

an

d in

tegr

atio

n

Scie

nce

De

live

ryC

ore

faci

liti

es,

se

rvic

es

and

co

mp

ete

nci

es

Inte

grat

ion

an

d

pri

ori

tisa

tio

n

NEOMICS

List of Annexes P a g e | 47

10 ANNEXES

10.1 Minutes of the First Expert Working Group

10.2 On line Survey Results

10.3 Town Meeting and Workshop Report

10.4 Minutes of Second Expert Working Group

10.5 Minutes of Third Expert Working Group

10.6 Environmental Genomics Final Report

10.7 “Extract” from large facilities


Recommended