+ All Categories
Home > Documents > Worldwide Protein Data Bank wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering...

Worldwide Protein Data Bank wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering...

Date post: 18-Jan-2016
Category:
Upload: clyde-grant
View: 228 times
Download: 0 times
Share this document with a friend
19
Worldwide Protein Data Bank www.wwpdb.org wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update
Transcript
Page 1: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

www.wwpdb.org

wwPDB Common D&A Project November 24, 2009

November 24, 2009

Steering Committee

Project Update

Page 2: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Update report

D&A Team Charge for end of January 2010: Deliver production functionality that will provide a significant impact on the annotation workflow.

Agenda:

1. Deliverables

2. Accomplishments

3. What’s keeping us/you up at night

4. Timeline overview

Page 3: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Functional Deliverables Implement the chemical and model coordinate

sequences issue resolution and integration using the Master Format.

Provide an annotator graphical interface to resolve discrepancies.

Implement the capability to repeat an incremental process step (GO BACK) under conditions such as – Replacement coordinates packaged in mmCIF or PDB formats– Replacement coordinates with updated sequence– Replacement chemical sequence

Integration of these new functionalities into the existing workflows.

Page 4: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Deliverable Details Finalization of Physical Data Exchange Annotator graphical interface for sequence functionality Master Format. Extended API Tracking DB support Extended Work Flow Engine (WFE) Work Flow Manager (WFM) Work Flow Manager User Interface (WFM UI) Integration of this “module” of new functionalities into the

existing workflows.

Page 5: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Physical Data Exchange

All sites have acquired NetApp hardware for this project The version of NetApp software compatible with all sites

has been determined. A simplified secure protocol for NetApp communication

has been found which avoids the need for extra networking hardware.

When the release candidate for the NetApp operating system is finalized as general release, in December, all sites will be on the same page for data exchange.

Page 6: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Process Overview

With GO BACK functionality

Page 7: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Deliverable: Annotator InterfaceA graphical interface for resolution of structural features Requirements for display and editing by Annotation staff,

including 3D visualization Resource allocation: RCSB Technical design: JavaScript/AJAX + CSS User prototype review Stress tested prototype with very large sequences User testing functional prototype (begins Dec 15) Integration with current systems using Master Format (Jan15) In Use by annotators by Jan 28. Integrate with new system (WFE, WFM, API) March

Page 8: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Design Convergence – Master Format, API, WFM, WFE, UI Distributed development on a complex project is

challenging, but we are managing Reached consensus on critical project technologies –

– Master format & workflow schema– Project identifiers– Python implementation– Division of effort among programming layers – Passing communication and control of between computational

and interactive workflows– Requirements and technology platform for sequence editor +

3D viewer

Page 9: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Deliverable update: MASTER FORMAT

A single data dictionary for the project based on the PDB Exchange Dictionary (PDBx). (John) PDBx extended for Common D&A project (deposition data set

identifier, WF class ID, WF instance ID, Site ID, Version ID)

PDBx (mmCIF syntax) data file format will be used as a working format for PDB annotation. (Zukang) Translation between RCSB and PDBx tested with Maxit Conversion tool for PDB to PDBx completed PDBx mapping CIF to PDB within Maxit – ready for testing

Page 10: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Data and Application API Design

Unified Python language implementation Provides all access to data and applications for the

workflow manager and workflow engine Subcomponents of the API provide access to:

– Data objects and data values – Applications and tools – Tracking and status information– Site level configuration information

Page 11: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Deliverable update: Extended API Site Configuration

API Configuration: Division of processing responsibility between the workflow engine and the API decided.

Workflow Engine/Manager (12/15, Luana) Add sequence data methods (11/25, Vladimir, John)

Solution for identifying and finding things Archival data files Transient files required by workflows for data processing Versioning of data files and key data values within files Progress and tracking workflows

MySQL support of tracking (12/4, Li) Application integration with API and WFE (12/4, Vladimir)

Page 12: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Deliverable update: WFE Final design – core API communication protocols

Internal object representation Final design – XML schema created (description of WF) WFE can process revised WF definitions

Test suites Engine development (12/23, Tom) Integration with API, data model, WFM (12/23, Tom)

Page 13: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Deliverable update: WFM Design Functional Architectural design

Will present progress and tracking information Will start/stop and restart the workflow engine in executing data

processing tasks Will work in a fully distributed web-based mode Will provide a launch point for tasks requiring interactive or

graphical interactions. Two modes defined – • Immediate mode – all processing occurs in a single session

(simple case).• Deferred mode – requests for input are registered with the

workflow manager for later processing by annotator

Page 14: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Deliverable update: WFM UI

WFM – Annotator UI (Luana) Requirements (12/3) annotator team) Design (12/10) Development (1/15)

WFM Development (1/21, Luana) Integration with WFE, API (2/4, Vladimir, Luana) User Testing (2/28, all)

Page 15: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Deliverable: GO BACK FUNCTIONALITY

Master Format Workflow execution environment (WFE, WFM) Session management and tracking infrastructure

Page 16: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Things that have kept us up at night

These are cornerstone deliverables requiring intense study and design consideration – beyond the proof of concept.– Organization of data, communication protocols, etc. – Clear consensus of design features has required an evolution of

understanding – requiring wetting of hands

Ramp up of skill sets: Python, mmCIF (PDBe), EBI External services: web-service set up Site specific integration challenges Resource issues

Page 17: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Good News (from your local PM)

Team is VERY FUNCTIONAL– A lot has been accomplished despite distributed team members

and multi-tasking resources

Consensus on difficult issues – starting at considerable philosophical distances has been achieved!– No bloodshed to date – all limbs in tact

Team is still highly motivated to succeed with this project!

Page 18: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

Timeline Summary

Functional Interface – integrated in existing systems January 15, 2010– In use by annotators by January 28, 2010

Full Integration of WFM UI with WFE, WFM and API February 4, 2010

Testing completed by February 28, 2010

Page 19: Worldwide Protein Data Bank  wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.

Worldwide Protein Data Bank

Common D&A Project January 2010 Deliverables

PDBe integration

There are significant changes to the PDBe annotation– PDBe data model -> D & A data model – import– Load D & A data model with status and domain data– Start web services/connect to web resources

External services at EBI– Run workflows

Implement programs at PDBe– Export data from D & A data model to PDBe data model

– Requires Glen who will be away for December to integrate path


Recommended