+ All Categories
Home > Documents > Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim...

Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim...

Date post: 26-Mar-2015
Category:
Upload: luke-barnett
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
35
www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman
Transcript
Page 1: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

www.pazar.info/

ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR

Jonathan Lim

With Introduction by Wyeth Wasserman

Page 2: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

Welcome

•If you encounter any technical difficulties during the webinar, type a report using the chat option

•Slide presentation ~25 min

•Compile Questions as they are submitted and answer them during the final Q&A/discussion period

•During the discussion session, we’ll allow audience speaking

2

Page 3: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

Topics

• PAZAR Overview

• Data Retrieval Through Web Interface

• Data Files and Formats

• PAZAR Application Programming Interface (API)

• Q&A

3

Page 4: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

Topics will increase in complexity as webinar progresses

Data file formats will be presented in order of complexity, beginning with the simplest

PAZAR API will be the most technical topic presented today and is geared toward those with programming knowledge

4

Page 5: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

www.pazar.info

• Software framework for the construction and maintenance of regulatory sequence data annotation

•Allows multiple boutique databases to function independently within a larger system

• Public repository for regulatory data

• Each group manages its own deposit and distribution of data

• Envisioned as tool for capturing deep experimental annotation

• Species, cell line, treatment

5

Page 6: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

Browsing Data on PAZAR

6Link to project details

Page 7: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

Project Information

Page 8: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

Gene View

8Link to sequence details

Page 9: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

Sequence Information

Page 10: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

TF View

Page 11: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

Data File Formats

11

Page 12: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

Data Files available for Download

12

Page 13: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

TF – Target Gene Format

• Provides listing of TFs and the genes that they putatively regulate

• In some cases, the gene is the most proximal to the TF binding site - especially true for ChIP-Seq regulatory sequences.

• PubMed ID and Analysis method provided as interaction evidence when available

• Files automatically exported for all public projects

• Updated weekly

13

Page 14: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

14

Page 15: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

TF – Target Gene File Example

TF0001078 E2F4_HUMANGS00121862 ENSG00000187634 1 860260 879955Homo sapiens E2F4_Lee 21247883 PROTEIN BINDING ASSAY::CHROMATIN IMMUNOPRECIPITATION (CHIP)

15

PAZAR TF ID

ChromosomePAZAR Gene ID

Ensembl Gene Accession

Gene Start Coordinate

Analysis Method

Gene End Coordinate

Species Project Name Evidence PMID

TF Name

Page 16: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

ChIP-Seq Peak Format

• For users who are only interested in ChIP-Seq peak data

• Provides peak information in a simple delimited format that is easy to work with

• Files will be exported for public projects containing ChIP-Seq data and updated weekly

16

Page 17: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

17

Page 18: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

ChIP-Seq Peak File Example

chr1 915920 916350 916127 195.45 MAXHEIGHT ENSG00000187961 E2F4 Homo sapiens 21258399 Human Lymphoblastoid cells

18

Chromosome

Peak start coordinate

Peak end coordinate

Peak max coordinate

Score Score type TF ID TF Name

Species PMID Cell or Tissue

Page 19: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

PAZAR GFF Format

• GFF format describes genes and other features associated with DNA, RNA and Protein sequences

• The PAZAR GFF format is intended to represent simple annotations

• One annotation record per line, one annotation for one sequence

• Not as comprehensive as XML files; represents a subset of total data, but may be easier for some people to work with

• Projects containing only artificial sequences (eg. jaspar_core) follow slightly different format. Refer to GFF format documentation for details.

• Files automatically exported for all public projects

• Updated weekly

19

Page 20: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

20

Page 21: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

PAZAR GFF File Example

chr12 E2F4_Lee RS0293021 82752225 82752317 . + . sequence="CA…AT";db_seqinfo="ENSEMBL:60_37E";db_geneinfo="ENSEMBL:ENSG00000127720:C12ORF26 ";species="HOMO SAPIENS";

db_tfinfo="EnsEMBL_transcript:ENST00000394351:E2F4_HUMAN";analysis_name="ANALYSIS 1";analysis_comment="0";cell_type="HUMAN LYMPHOBLASTOID (GM06990) CELLS :HOMO SAPIENS";pmid="21258399";method="PROTEIN BINDING ASSAY::CHROMATIN IMMUNOPRECIPITATION (CHIP)";evidence="CURATED"

21

Sequence start coordinate

Project Name

PAZAR Feature ID

Chromosome Sequence end coordinate

Strand

Mandatory Attributes

Optional Attributes

FrameScore

Page 22: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

PAZAR XML Format

• Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents

• PAZAR XML schema defined for capturing data and relationships

• Comprehensive and flexible enough to capture many types of data

• Files automatically exported for all public projects

• Updated weekly

22

Page 23: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

23

Page 24: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

Sample PAZAR XML File

<reg_seq

pazar_id="rs_0022"

quality="TESTED"

sequence="CGGGCTCTCCGACCCACGGGTCACTTTTGACAGCTGGCCTGAGTCCTGCCTGGTGGAAACCCCTCCTGGGAGGCTGGAGCCAGCACCAGGGCCCACGTGTGCTT

CACCTTGAAGCCTGAGGACACAGACTCTCCGGCAATCACATAGCCCATGTTGAGGACGCTGCCTTCAATGGAGCACGTGATCATGGACGCCACGCCAGTGCCCATGAGGGTGA

GGGTGAGCGTGCCTCTCTTGGTGATGATGTCCAG"

tfbs_name="">

….

<peak

maxcoord="1857289"/>

</reg_seq>

<funct_tf

funct_tf_name=“E2F4_HUMAN"

pazar_id="fu_001">

<tf_unit

pazar_id="tu_0001"

tf_id="tf_001"/>

</funct_tf>

<interaction

pazar_id="in_00063"

quantitative="299.77"

scale="MAXHEIGHT"/>

24

Understanding of schema and parsing work required to extract ChIP-Seq data

Page 25: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

PAZAR API Overview

• Application Programming Interface (API) facilitates programmatic retrieval of data contained in 'Published' or 'Open' projects as well as user's own restricted projects.

• Provides a mechanism for automating bulk data retrieval in a customized fashion

• Provides Methods to make it easier to work with data once it has been retrieved

• Object – oriented -> data types within the system can be mirrored as objects in code

• Uses the perl programming language

• Uses SOAP communication protocol for transferring data

25

Page 26: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

SOAP

26

Simple Object Access Protocol (SOAP) is a protocol for exchanging structured information between networked computers. It relies on Extensible Markup Language (XML) for its message format.- Communication done over http, can be used on any network that permits web browsing

• Simple Object Access Protocol (SOAP) is a protocol for exchanging structured information between networked computers. It relies on Extensible Markup Language (XML) for its message format.

• Communication done using http as transport layer, can be used on any network that permits web browsing

• Client computer sends requests to server, which performs functions to retrieve data from database and return it to client

• Code to perform functions resides on the server, so client only needs to send requests in order to receive data

Page 27: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

Benefits of using SOAP

• Users do not have to worry about installing the API code on their computer

• Updates to newer API releases involve minimal effort

• Can be used across firewalls where only web browsing is permitted

• Transparent – users don't have to learn new syntax or change the way they code

• Language independent, but yet to be further developed and tested with programming languages other than perl in mind

• Data privacy can be managed by the PAZAR team since authentication is done on the server side

27

Page 28: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

Data Privacy

• Access to data through API same as through website. PAZAR username and password must be supplied to retrieve data from personal restricted projects.

• Authentication is performed on PAZAR server

28

Request Parameters Access To Restricted Data Access To Public Data

Correct user/passwordand user is a member of specified project

Results from specific restricted project being queried

Results from all public projects

Incorrect username / password combination or invalid project name or user not a member of specified project

Results from all public projects only

Project status is open or published

Results from specific public project being queried and all other public projects (authentication not required)

Page 29: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

PAZAR API Classes

pazar class - handles authentication and contains general methods for retrieving data and creating instance objects of other classes

- a PAZAR object must always be created first. It is supplied to all methods in other classes.

pazar::project - handles project information

pazar::dbsource - handles information source data

pazar::gene - handles gene information

pazar::reg_seq - handles regulatory sequence information

pazar::tf - Transcription Factor meta information and general methods for retrieving TF-related information

pazar::tf::tfcomplex - handles Transcription Factor complex information

pazar::tf::subunit - handles Transcription Factor subunit information

pazar::tf::target - handles Transcription Factor target (regulatory sequence, artificial sequence or binding site matrix) information

pazar::transcript - handles transcript information

pazar::tsr - handles transcription start region information

29

Page 30: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

www.pazar.info/apidocs

PAZAR API Documentation

30

Page 31: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

API Setup

1. Install perl library SOAP::Lite v 0.60a by Paul Kulchenko .

• Later versions of SOAP::Lite maintained by different author and not compatible with the PAZAR API.

• Can be downloaded from link in the PAZAR API user guide at http://www.pazar.info/apidocs/userguide.html

• Also available for download from CPAN at http://search.cpan.org/~byrne/SOAP-Lite-0.60a

• SOAP::Lite installation should follow standard procedures

2. Include the following at the top of your script, before any code that makes use of the API

use SOAP::Lite +autodispatch =>

uri => 'http://www.pazar.info/pazar',

proxy => 'http://www.pazar.info/cgi-bin/API0.01/pazarserv.cgi';

• Any code that follows will automatically make use of PAZAR API modules via SOAP; no additional modules need to be installed on the client side.

• API0.01 may be replaced by a newer release number when available (eg. API0.02), to use the newer API

• Older API releases will continue to be in service after newer releases have been made available

31

Page 32: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

Sample Perl Code Using PAZAR API

#!/usr/bin/perl

use SOAP::Lite +autodispatch =>

uri => 'http://www.pazar.info/pazar',

proxy => 'http://www.pazar.info/cgi-bin/API0.01/pazarserv.cgi';

# change [email protected] and yourpass to values for your own PAZAR account

my $pazar = new pazar(-pazar_user=>'[email protected]', -pazar_pass=>'yourpass');

my $proj = pazar::project::get_by_name(‘Demo',$pazar);

print $proj -> status ."\n";

print $proj -> id ."\n";

print $proj -> project_name ."\n";

print $proj -> description ."\n";

my $project_name = $proj -> project_name;

my $project_num=$proj->id;

my @funct_tfs = $pazar->get_all_complex_ids($project_num);

print "num tf complexes: ".scalar(@funct_tfs)."\n";

32

Setup

Page 33: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

Future PAZAR API Development

• API testing with other programming languages such as Java and Python

• Expansion of variety of classes and methods offered

• Further support for ChIP-Seq data handling

• Update and import of data

33

Page 34: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

Recap

Browsing through current data online

• Web interface

Data Files Available for download

• TF- target Gene list

• ChIP-Seq peak files

• GFF

• XML (all data)

Bulk retrieval of most current data in customized way through programmatic approach

• PAZAR API

34

Page 35: Www.pazar.info/ ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR Jonathan Lim With Introduction by Wyeth Wasserman.

Q&A

Please take a moment to type PAZAR-related questions/comments into the Chat box.

The questions will be answered shortly.

35


Recommended