DKFZ PhD Retreat Poster 20190703 - EMBL Blogs · 1st d e Celltype Morula Blastocyst ICM Regulatory...

Post on 27-Jul-2020

1 views 0 download

transcript

Single-cell transcriptome and chromatin accessibility dataintegration reveals cell specific signatures

1Division of Neuroblastoma Genomics, German Cancer Research Center, Heidelberg, Germany.2Health Data Science Unit, Medical Faculty Heidelberg and BioQuant.*Correspondence: carl.herrmann@uni-heidelberg.de

Andrés Quintero1,2,°, Anne-Claire Kröger2 and Carl Herrmann2,*

recover cell specific signatures between different species1

2

The ability to integrate multiplelayers of omics data plays anessential role in understanding thecomplex interplay of differentmolecular mechanisms that giverise to cellular diversity.

To address this challenge weimplemented Integrative IterativeNon-negative Matrix Factorization(i2NMF), a computational method todissect genomic signatures frommulti-omics data sets.

i2NMF was implemented as anextension of the R package Brat-wurst available in Github.

We applied i2NMF to :

https://github.com/wurst- theke/bratwurst

identify rare cell populations

For ev

ery initial matrix Xn

Variable number of signaturesfor every data set

Signatures are sharedacross data sets

Stage 1 Stage 2i2NMF workflow:

The shared effect is recovered in the Hsmatrix, and the exposure of the featuresexplaining this effect are contained in theWs matrices (Yang and Michailidis, 2016).

The explained variance of thedecomposed model can beestimated for both stages by:

This is useful to compare theperformance between stages and the

overall decomposition

Common featuresshould be sharedacross columns,e.g. gene orsample IDs.

i2NMF advantages

● The feature exposure matricesWs and Wr are different,

recovering unique signaturesbetween stage 1 and 2

● The number of inferredsignatures in stage 2 can varyacross matrices, allowing a betterresolution of specific effects.

● All solvers were implementedon TensorFlow, allowing

scalability between platforms.

Hrn

Hr1

Hr2

+×≈

= ×

X₁

X₂Xn

Xrn

Xrn

−Xn HsWsn

Hs

Ws1

Ws2

Wsn

1. Starting from two or more non-negative matrices, i2NMF initially

decomposes the shared effect across them,using integrative NMF (iNMF).Solving the following problem:

×Wr1

Wr2

Wrn

Residual input matrixXr is defined by:

and decomposed into:

2. On a second iteration, i2NMFdecomposes the residual effect, which wasnot explained by the shared decomposition

≈ × HrnWrn

Sig. 5

Sig. 4

Sig. 1

Sig. 6

Sig. 3

Sig. 2

Sig. 7

Oligodendrocyte

Neuron

Endothelial

AstrocyteMuralOPCs

Microglia

Ependymal

Polydendrocyte 0 1 2 3 4 5

-log10(p-value)

Gene SetLein astrocytemarkersLein oligodendrocytemarkersLein neuronmarkersGO endotheliumdevelopmentGO regulation ofimmune responseGO regulation ofneuron differentiation

Gene setenrichmentCell type enrichment

Enrichment Z-score

−1000100200

0.000.050.100.150.20

Human Mouse

i2NMFstage2nd1stEx

plained

variance

−10 0 10UMAP1

−10

0

10

−10 0 10UMAP1

UMAP2

Cell type●●●●

AstrocyteEndothelialEpendymalMicroglia

●●●●●

MuralNeuronOligodendrocyteOPCsPolydendrocyte

Species●

HumanMouse

−10

−5

0

5

−5 0 5UMAP1

UMAP2 Mouse

PolydendrocyteSub-types●●●●●

Tnr Bmp4Tnr Cspg5Tnr Cspg5 Dad1TnrOpalinTnr Tmem2

2318272443212541

CSPG5

Mouse residualsignatures

OPALINBMP4TNR

ExposureHrmouse

LowHigh

c. Cell type and gene set enrichment analysisrevealed that each shared signature

corresponds to cell types.

Human & Mousesubstantia nigra (SN)scRNA-seq data

a. The human andmouse SN data setswere integrated overthe set of sharedgenes using i2NMF.

b. The shared Signatures identified in the firstintegrative step, were able to combine humanand mouse cells (left) and resolve groups of the

most relevant cell types in the SN.

d. The second stage of i2NMFrecovered species-specificSignatures, that helped to resolvecellular sub-types (top) and weredefined by marker genes (bottom).

40,453Cells

Welchet al., 2019

5,127 genesin common

51,912Cells

Saunderset al., 2018

Shared Sig. 1

Shared Sig. 2

ATAC-seq Sig. 3

Cell type

Exposure Hshared

LowHigh

Exposure HATACseq

LowHigh

Cell typeMorulaBlastocyst

0.0

0.2

0.4

0.6

0.8

ATAC-seq RNA-seq

i2NMFstage2nd1stEx

plained

variance

Cell typeMorulaBlastocystICM

Regulatoryrelationship

Regulatory

relationships

PresentNot present

Morula

Blastocyst

ICM

●●

●●

●●

0%

50%

100%

Relative Expression

● KLF17 ● NANOG● OCT4

b. The shared H matrix was able to recovertwo cell specific signatures. On the seconditeration for the ATAC-seq data, a definedsignature was decomposed for two cells.

Human embryosMorula and blastocyst

scCAT-seq data

a. The human embryo scCAT-seq data set was integratedover all 72 cells. For the geneexpression data, the majority ofthe explained variance wascaptured in the first stage ofi2NMF, interestingly for thechromatin accessibility thesecond stage also recovered aa considerable fraction of thevariance.

c. The decomposedshared signatureswhere stable across arange of factorizationranks, showing a clearseparation betweenmorula and blastocystcells.

d. The set of chromatin accessible regionsassociated with the ATAC-seq Sign. 3 and itstargets genes showed a specific pattern fortwo blastocyst cells. These also showed higherexpression in marker genes for cells of theinner cell mass (ICM). Thus, allowing theidentification of this rare cell type.

ATAC-seqK2

K3

K4

K5

K6Factorization

rank

Morula

Blastocyst

BlastocystMorula

RNA-seq72 Cells

gene expression andchromatin accessibility

for every cellLiu

et al., 201916,501 expressed genes

(RNA-seq)&

42,713 identified peaks(ATAC-seq)

Celltype