+ All Categories
Home > Science > Kulakova sbb2014

Kulakova sbb2014

Date post: 16-Jul-2015
Category:
Upload: ekkul
View: 106 times
Download: 0 times
Share this document with a friend
Popular Tags:
13
COMPUTER DATA ANALYSIS OF GENOME SEQUENCING BY TECHNOLOGY ChIP-seq AND Hi-C adviserYuri Orlov, ICG SB RAS authorKulakova Ekaterina, bachelor
Transcript

COMPUTER DATA ANALYSIS

OF GENOME SEQUENCING

BY TECHNOLOGY ChIP-seq

AND Hi-C

adviser–Yuri Orlov, ICG SB RAS

author– Kulakova Ekaterina, bachelor

Topicality

Automated systems allow decoding DNA and genomic sequences up to whole genomes. The

complete sequencing of genomes leads to avalanche growth on the sequence information

(megabytes and gigabytes of data).

Development of methods based on chromatin immunoprecipitation (ChIP-seq, ChIA-PET) gives

a qualitatively new data.

There are new tasks of computer genomics (analysis of spatial, non-linear structures of

chromosomes)

*ChIP- Seq = Chromatin ImmunoPrecipitation sequencing

ChIA-PET = Chromatin Interaction Analysis by Paired-End-Tag sequencing

The aim of this work - the study of chromosomal contacts in the cell nucleus with the help ofcomputer programs statistical data of genes and chromosomal domains, experimental dataanalysis ChIP-seq and Hi-C.

Integration of modern genome-wide ChIP-seq data and Hi-C, which became available only inthe last two or three year

Using the parameter precision location on chromosome with which to analyze the data

Establishing a list of genes located on chromosome boundaries of topological domains.

Aim and Scientific novelty

Methods Hi-C and ChIA-PET*

*ChIP- Seq = Chromatin ImmunoPrecipitation sequencing

ChIA-PET = Chromatin Interaction Analysis by Paired-End-Tag sequencing

Hi-C = Hi (high dimension chromosome) Conformation

Comprehensive Mapping of Long-Range

Interactions Reveals Folding Principles of the

Human Genome. Science, 2009

Arrangement of chromosomes in

the cell nucleus (reconstruction

according to Hi-C)

Scheme of local chromosomal

domains ("tangle" contacts)

Separate loops

«tangle»

(Dixon et al., 2012)

Topological arrangement of the

domains of chromosomes and its

mapping in the genome

Scheme of arrangement of

genes on chromosome

Genomic data: genes, peaks ChIP-seq,

contact areas ChIA-PETgenes

genes

Plot of

chromosomal

contacts ChIA-PET

Chromosomal domain

Peaks of ChIP-seq

profiles

File formats and their presentation

>track name=ER_E2 description=ER_E2 chr1 557112 558114

chr1 559459 560286

chr1 998864 999397

chr1 999399 999604

chr1 1004343 1005146

chr1 1070346 1071080

chr1 1305474 1306502

chr1 1358287 1358744

chr1 1776987 1777750

chr1 1820476 1821168

chr1 1922754 1923628

chr1 2131962 2132747

chr1 2325805 2326447

chr1 2368996 2369977

chr1 3119829 3120541

chr1 3244610 3245121

Bed-file example

The size of one file with the

genomic profile - from 100 MB to

2-3 Gb

RefSeq annotation taken from UCSC Genome

Browser

http://genome.ucsc.edu/cgi-bin/hgTables

Data about domains in mouse cells -

obtained in the laboratory O.L.Serov (ICG

SB RAS) (Fib_domains, Sp_domains).

Calculation of the position of genes and

domain boundaries

А1 – left coordinate of the gene B1 - right coordinate of the gene.

А2 – left coordinate of the domain, В2 – right coordinate of the domain.

Е – accuracy, user-defined.

if (|А1 – А2| <= Е) & (В1 < А2 + (В2 – А2)/2) true, we assume that the gene

lies close to the left boundary of the domain. Similar conditions for the right

border.

Е

А2 В2А1 В1

доменген

Example of location of chromosomal

domains and genes for mouse

chromosome 10 The linear arrangement of genes in the domain

Table location types of genes in chromosomal

domains

Other – other genes

Inside – genes that lie within the domains

onBorder – genes lying on the domain

boundaries.

Analysis of the location set of genes on

the domains in different cell types

User specifies a list of genes. Possible to analyze all the genes in the genome

(20,000 genes)

Types of cells - embryonic stem cells (fibroblasts - Fib) and sperm (Sp)

mouse. Experiment Hi-C, ICG SB RAS

Sp (densely packed

structure)

92,5 % genes within domains

1,4% on border

6,1% other

Fib (Open chromatin)

72,6 % genes within domains

3,2% on border

24% other

Experimental data.

Gene Ontology categories

For analysis were taken genes lying on the

domain boundaries.

The result was sorted by the number of

genes with common biological processes

category

Used online resource

http://david.abcc.ncifcrf.gov/

Analysis of the co-expression of genes, lying on the borders of the spatial domain

For analysis were taken genes located on the domain boundaries.

Used online resource STRING http://string-db.org/

The main result - graphs of gene networks of varying degrees of

connectivity for the two types of cells

Fib

698 – the total number of genes on

the domain boundaries

88 – genes involved in the

connection

160 pairs of connection

12% genes from total

Sp

314 – the total number of

genes on the domain

boundaries

13 – genes involved in the

connection

10 pairs of connection

4% genes from total

Conclusion

Implemented a Java program

Application of the program to the experimental data (ICG SB RAS

and databases on chromosome contacts)

The analysis of the location set of genes in chromosomal domains (control computer simulation)

Next Steps

Define domains including pluripotency genes in the mouse genome (Dixon

et al., 2012).

Make developed project is compatible with other programs designed toICG SB RAS for microarray data developed in languages Java, C / C + +.

Integrate the program with data on gene expression database BioGPS

microchips in human genome.

Thank you for your attention!

Publications(Thesises) Safronova N.S., Kulakova E.V., Orlov Yu.L. (2013) Applications of text complexity measures to

genome sequences analysis. // Proceedings of GIW-2013, National University of Singapore, 16-18 Dec 2013. P.42.

Медведева И.В., Вишневский О.В., Кулакова Е.В., Спицына А.М., Афонников Д.А., КочетовА.В., Орлов Ю.Л. (2014) Геномная организация и контекстные характеристики генов сповышенной экспрессией в клетках мозга // Геномная организация и контекстныехарактеристики генов с повышенной экспрессией в клетках мозга // XVI Всероссийскаянаучно-техническая конференция «Нейроинформатика-2014»: Сборник научных трудов.М.: НИЯУ МИФИ. Ч. 2., С. 32-42.

Kulakova E.V., Bryzgalov L.O., Orlov Y.L., Li G., Ruan Y. Computer analysis of chromosomecontacts revealed by sequencing // Конференция BGRS\SB-2014 (Bioinformatics of GenomeRegulation and Structure\System Biology).

Kulakova E.V., Podkolodnaya O.A.,Serov O.L., Orlov Y.L. Computer data analysis of genomesequencing by technology ChIP-seq and Hi-C.// Конференция BGRS\SB-2014 (Bioinformaticsof Genome Regulation and Structure\System Biology).P – 90.

Кулакова Е.В. Компьютерный анализ данных геномного секвенирования по технологииChIP-seq и Hi-C. // Конференция МНСК-2014 (Международная научная студенческаяконференция). C. 207

Spitsina A., Kulakova E.V., Safronova N., Orlova N.G. Statistical analysisof gene expression data by rank correlation coefficients.// Конференция BGRS\SB-2014(Bioinformatics of Genome Regulation and Structure\System Biology). P-91.


Recommended