+ All Categories
Home > Science > The ProteomeXchange Consoritum: 2017 update

The ProteomeXchange Consoritum: 2017 update

Date post: 21-Jan-2018
Category:
Upload: juan-antonio-vizcaino
View: 137 times
Download: 0 times
Share this document with a friend
22
The ProteomeXchange Consortium: 2017 update Dr. Juan Antonio Vizcaíno (on behalf of all ProteomeXchange partners) EMBL-European Bioinformatics Institute Hinxton, Cambridge, UK
Transcript
Page 1: The ProteomeXchange Consoritum: 2017 update

The ProteomeXchange Consortium: 2017

update

Dr. Juan Antonio Vizcaíno

(on behalf of all ProteomeXchange partners)

EMBL-European Bioinformatics Institute

Hinxton, Cambridge, UK

Page 2: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

Overview

• Introduction

• Some usage statistics

• New prospective member: iProx

• Handling of reprocessed datasets

Page 3: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

ProteomeXchange: A Global, distributed proteomics

database

PASSEL

(SRM data)

PRIDE

(MS/MS data)

MassIVE

(MS/MS data)R

aw

ID/Q

Me

ta

jPOST

(MS/MS data)

Mandatory raw data deposition

since July 2015

• Goal: Development of a framework to allow standard data submission and

dissemination pipelines between the main existing proteomics repositories.

http://www.proteomexchange.org

Vizcaíno et al., Nat Biotechnol, 2014

Deutsch et al., NAR, 2017

Page 4: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

ProteomeCentral: Centralised portal for all PX

datasets

http://proteomecentral.proteomexchange.org/cgi/GetDataset

Page 5: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

Public datasets from different omics: OmicsDI

http://www.omicsdi.org/

• Aims to integrate of ‘omics’ datasets (proteomics,

transcriptomics, metabolomics and genomics at present).

PRIDE

MassIVE

jPOST

PASSEL

GPMDB

ArrayExpress

Expression Atlas

MetaboLights

Metabolomics Workbench

GNPS

EGA

…and others

Perez-Riverol et al., Nat Biotechnol, 2017

Page 6: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

OmicsDI: Portal for omics datasets

Page 7: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

Overview

• Introduction

• Some usage statistics

• New prospective member: iProx

• Handling of reprocessed datasets

Page 8: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

Origin:

1229 USA

902 Germany

618 China

583 United Kingdom

319 France

250 Netherlands

213 Canada

208 Switzerland

200 Australia

179 Spain

172 Austria

168 Denmark

138 Sweden

133 India

115 Japan

115 Belgium

98 Norway

75 Italy

69 Taiwan

57 Brazil

51 Israel

51 Singapore

43 Finland

44 Ireland…

ProteomeXchange: 7,475 datasets up until September 1st 2017

Type:

4805 PRIDE partial

1552 PRIDE complete

649 MassIVE

117 PeptideAtlas/PASSEL

complete

109 jPOST

243 reprocessed datasets

Publicly Accessible:

4051 datasets, 54% of all

89% PRIDE

6% MassIVE

3% PASSEL

2% jPOST

Top Species studied by at least

50 datasets:

2,787 Homo sapiens

958 Mus musculus

236 Saccharomyces cerevisiae

229 Arabidopsis thaliana

190 Rattus norvegicus

157 Escherichia coli

68 Bos taurus

62 Drosophila melanogaster

~ 1,100 species in total

Page 9: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

PRIDE: Submissions and downloads keep increasing

Data download volume for

PRIDE Archive in 2016: 243

TB

0

50

100

150

200

250

300

2013 2014 2015 2016

Downloads in TBs

Top months: 224 and 234 datasets submitted on

July & August, respectively

> 400 TBs of data

Page 10: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

Public proteomics datasets are being increasingly

reused…

Martens & Vizcaíno, Trends Bioch Sci, 2017

Page 11: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

PRIDE has become and ELIXIR core data

resource• ELIXIR coordinates, integrates and sustains bioinformatics

resources across Europe and enables users in academia

and industry to access services that are vital for their

research

• First list of core resources announced on July 2017.

• PRIDE included in the initial list.

https://www.elixir-europe.org/platforms/data/core-data-resources

Page 12: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

On-going

On-going

Page 13: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

Overview

• Introduction

• Some usage statistics

• New prospective member: iProx

• Handling of reprocessed datasets

Page 14: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

VIP

Load balance server 1

nginx keepalived

CentOS

Load balance server 2

nginx keepalived

CentOS

Application server 1

SpringMVC MyBatis

tomcat

java

CentOS

Application server 2

SpringMVC MyBatis

tomcat

java

CentOS

Database server (Master)

CentOS

MySql

Database server (slave)

CentOS

MySql

Data storage server 2

nginx

CentOS

Data storage server 1

nginx keepalived

CentOS

aspera

Data storage server 3

nginx keepalived

CentOS

aspera

Team Leader

Prof. Yunping Zhu

Curator

Chunyuan Yang Xue Wang

PhD, Medical Genetics MSc, Bioinformatics

Bioinformatician

Jie Ma Cheng Chang

PhD, Biochem &

Molecular Biology

PhD, Biochem &

Molecular Biology

Software Development

Tao Chen Mansheng Li

PhD, Computer

Science & Tech

PhD, Bioinformatics

System Admin.

DongshengLi

Bachelor, Computer Tech

iProX- the integrated proteome resources in China

iProX Team

Cloud platform architecture

with High Availability

www.iprox.org

• User-friendly web-based system

• Standardized metadata collection

• Complete and partial data submission

• Different access level for dataset

• Aspara-based data upload/download

• XML file for data sharing

• RESTful Web Service

• Cloud platform architecture and multiple

sites deployment

Page 15: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

Beijing

Hunan

Shanghai

Infrastructure of iProX

• BPRC & NCPSB (Beijing): Main

location of deployment and the

only submission site

• Three Offsite data backups

• CNIC (in deploying, Beijing, north

China)

• SCBIT (Shanghai, east China)

• NSCC (Hunan, south China)

• All four sites will provide

download service at the same

time coordinated by the load

balancer.

• By the end of August 2017, 308

datasets are submitted, with a

total amount of 47.68 TB

Page 16: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

An observer of ProteomeXchange consortium - iProX

• Proteome data sharing platform in China

• Focusing

• Collection and sharing of proteome experiment raw data

• Standardized metadata of proteome experiment

• Visualization of proteome dataset

• Providing

• A User friendly data submission pipeline

• Structured management of datasets

• An effective user authority system

• Standardized metadata collection

• Powerful computing, storage, and network resources to support the pipeline

• Remote data backup and synchronous update

www.iprox.org

Page 17: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

Overview

• Introduction

• Some usage statistics

• New prospective member: iProx

• Handling of reprocessed datasets

Page 18: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

Ongoing work

• Reuse of public proteomics data is increasing.

• We are working at present in guidelines to implement the

handling of reprocessed datasets (they get an RPXD

identifier)

• Initial pilot implementation in MassIVE.

Page 19: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

Datasets evolve with reanalysis

http://massive.ucsd.edu

Page 20: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

Reanalysis identifiers

Online browsing Provenance records

Own

identifiers

Own

metadata

Citable

http://massive.ucsd.edu

Page 21: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

Searching large-scale reanalyses

http://massive.ucsd.edu

Available now

321M PSMs

6.1M peptides

10.7M variants

14k searches

>31TB human data

Page 22: The ProteomeXchange Consoritum: 2017 update

Juan A. Vizcaí[email protected]

HUPO 2017 World ConferenceDublin, 20 September 2017

Aknowledgements: People

Attila Csordas

Tobias Ternent

Gerhard Mayer (de.NBI)

Yasset Perez-Riverol

Manuel Bernal-Llinares

Andrew Jarnuczak

Mathias Walzer

Former team members, especially:

Rui Wang

Florian Reisinger

Jose A. Dianes

Henning Hermjakob

Acknowledgements: All ProteomeXchange partners

All data submitters !!!

Eric Deutsch

Zhi Sun

David Campbell

Nuno Bandeira

Mingxun Wang

Jeremy Carver

Yasushi Ishihama

Shujiro Okuda

Shin Kawano

Follow new datasets @proteomexchange


Recommended