Date post: | 18-Dec-2015 |
Category: |
Documents |
Upload: | vivian-berry |
View: | 219 times |
Download: | 3 times |
Library of Integrated Network-based Cellular Signatures
(LINCS)
September 20, 2013
LINCS concept
• perturbations scalable to genome• high information content read-outs (e.g. gene expression)• inexpensive• mechanism to query database
cell types
phenotypic assa
ysperturbations
Look-up table of cellular activity
perturbations cell types read out
database
GENOME SCALEGENETIC
PHARMACOLOGIC
MODERATECOMPLEXITY
10’S COMPLEX
COMMUNITYQUERIES
PLATFORM-INDEPENDENT
The LINCS Network (NIH)
Data Production/Analysis CentersBroad InstituteHarvard Medical School
Computational andTechnology Development Centers
Arizona StateBroad Institute (Jake Jaffe)ColumbiaU. CincinnatiMiami School of MedicineWake ForestYale
External Collaborations• Snyder Lab, Sanford-Burnham Medical
Research Institute • FDA• GTEx• ENCODE/Epigenomics• Rao Lab, NIH CRM:• Scadden Lab, Massachusetts General
Hospital• McCray Lab, University of Iowa• Loring Lab, Scripps Research Institute• Edenberg Lab, Indiana University • Spria Lab, Boston University• Pandolfi Lab, BIDMC• Chen Lab, NHLBI• Kotton Lab, Boston University
diseases genes drugs
mRNA Expression Database
453 Affymetrix profiles164 drugs
> 16,000 users 916 citations
Lamb et al, Science (2006)
Connectivity Map
CMAP/LINCS is an approach tofunctional annotation
perturbagens cell types
CMap is limited by profiling cost low-cost, high-throughput method would enable…
primary screening librariesdrug-like, non-drug-like, natural products
genomic perturbagensshRNA, ORF, variants (natural + synthetic)
cellular contextstissues, types, culture conditions, genetics
treatment parametersconcentrations, durations, combinations
• re-think: gene content × labeling × detection
samples
gen
es
observation: gene expression is correlated
computational inference model
reduced representation transcriptome
‘landmarks’
genome-wideexpression profile
Reduced Representation of Transcriptome
~ 100,000 profiles
0
20
40
60
80
100
2228
3
1481
2
1000
0
5000
2000
1500
1000 70
0
500
300
100
number of landmarks measured
% c
on
nec
tio
ns
80%
1000
simulation
1000-plex Luminex bead profiling
001
3' TTTT
5' 3'
5'-PO4 |
5'
5'
5' AAAA 3'
RT
ligation
PCR
hybridization
Luminex Beads (500 colors,
2 genes/color)
Reagent cost: $5/sample
content
technology
throughput
unit cost
(reagent)
1 22,000transcripts
inferredmeasured
1,0001 22,000transcripts
GeneChip L1000
microarray
3× 96 / week
$500
Luminex beads
200× 384 / week
$5
“L1000” expression profiling
LINCS Dataset
Current LINCS Dataset
5,178 compounds
• 1,300 off-patent FDA-approved drugs• 700 bioactive tool compounds• 2,000+ screening hits (MLPCN +
others)3,712 genes (shRNA + cDNA)
• targets/pathways of FDA-approved drugs (n=900)
• candidate disease genes (n=600)• community nominations (n=500+)
15 cell types• Banked primary cell
types• Cancer cell lines• Primary hTERT-
immortalized• Patient-derived iPS cells• 5 community nominated
small-moleculesgenomic perturbagens
1,000 landmark genes
21,000 inferred genes
1,209,824 profiles
Coming soon (in beta)
U54 Grant: Progress on Data Access
desc format availability common use cases
level 1
Raw dataPlate folders with
3,812 foldersnew computational
approaches to data pre-processing and normalization
level 2
Normalized dataset
Matrix: GCTX 1.2M+ profiles deriving signatures
other kinds of analysis
level 3
Signatures(differentially expressed
genes)
1. mongo DB2. Matrix: GCTX
383,788 sigatures(beta release)
High-level integration with analytics and websites e.g
Genes that are modulated by TP53
Genes most correlated to the Akt1 pathway
level 4
Queries JSON objects Q1 2014Genes connected to an
external query signature
findings
1) Large-scale gene-expression analysis
2) Analysis of L1000 shRNA signatures
# o
f p
rofi
les
Data quality: correlation between biological replicates
cum
ulat
ive
scor
e
connected
down-regulated
up-regulated
genes (thousands)
cum
ulat
ive
scor
e
not connected
genes (thousands)
matching cell states
1) define a ‘query’
2) assess strength of the query in the profile of all perturbagens in DB
3) rank order perturbagens by connectivity strength
the set of genes up- and down- regulated in a cellular state of interest
rank perturbagen
123.....
997998999
conn score
10.9930.791
.000.
-0.877-0.945
-1
drug Ydrug egene S…gene ndrug Idrug L…drug Ngene Edrug G
positive connectivity
no connectivity
negative connectivity
reversing drug resistance
hypothesis:sirolimus induces glucocorticoid sensitivity
sirolimus
50 ‘sensitive’ and 50 ‘resistant’ markers
signature: glucocorticoid resistant acute lymphoblastic leukemia
(David Twomey and Scott Armstrong)
resistant sensitive resistant sensitive
0.8040.7890.544
35-sirolimus42-sirolimus26-sirolimus
56
27
HL60ssMCF7MCF7
cell scorerank perturbagen
464
1
The 1% challenge:the “tail” of current data is > ENTIRE previous dataset
query: histone deacetylase inhibitors (Glaser et al 2003)
Rank Compound ID Compound Description Connectivity Score1 BRD-K69840642 ISOX 0.995
2 BRD-K52522949 NCH-51 0.994
3 BRD-K12867552 THM-I-94 0.993
4 BRD-K64606589 apicidin 0.992
5 BRD-K56957086 dacinostat 0.99
6 BRD-A19037878 trichostatin-a 0.989
7 BRD-A94377914 merck-ketone 0.987
8 BRD-K17743125 belinostat 0.987
9 BRD-K75081836 BRD-K75081836 0.986
10 BRD-K81418486 vorinostat 0.986
11 BRD-K68202742 trichostatin-a 0.986
12 BRD-K22503835 scriptaid 0.986
13 BRD-K02130563 panobinostat 0.985
14 BRD-A39646320 HC-toxin 0.983
15 BRD-K13810148 givinostat 0.98
16 BRD-K85493820 KM-00927 0.977
17 BRD-K11663430 pyroxamide 0.977
18 BRD-K74761218 WT-171 0.975
19 BRD-K74733595 APHA-compound-8 0.97
20 BRD-A19248578 latrunculin-b 0.965
21 BRD-K49010888 BRD-K49010888 0.962
22 BRD-K53308430 SA-1017940 0.951
23 BRD-K64890080 BI-2536 0.95
24 BRD-K00627859 tubastatin-a 0.947
25 BRD-K31542390 mycophenolic-acid 0.9460.5% Page 1 / 200
Rank Compound ID Compound Description Connectivity Score1 BRD-K78659596 MLN2238 0.9982 BRD-K60230970 MG-132 0.9983 BRD-K88510285 bortezomib 0.9964 BRD-A55484088 BNTX 0.9935 BRD-A18725729 BRD-A18725729 0.9936 BRD-K74402642 NSC-632839 0.9927 BRD-K50234570 EMF-bca1-16 0.9928 BRD-A58924247 BRD-A58924247 0.9929 BRD-A39093044 K784-3187 0.992
10 BRD-A72180425 K784-3188 0.99211 BRD-K50691590 bortezomib 0.99212 BRD-K19499941 BRD-K19499941 0.9913 BRD-K09854848 MD-II-008-P 0.98814 BRD-A76490030 K784-3131 0.98815 BRD-A36275421 MW-RAS12 0.98716 BRD-K28366633 BRD-K28366633 0.98717 BRD-A11007541 BCI-hydrochloride 0.98718 BRD-K37392901 NSC-632839 0.98719 BRD-K66884694 BRD-K66884694 0.98720 BRD-A83124583 EMF-sumo1-39 0.98621 BRD-K10882151 BO2-inhibits-RAD51 0.98622 BRD-K44366801 BRD-K44366801 0.98523 BRD-K61033289 15-delta-prostaglandin-j2 0.98524 BRD-K07303502 arachidonyl-trifluoro-methane 0.98425 BRD-K02822062 CT-200783 0.984
query: compound identified to induce the lysosomal apoptosis pathway (D’Arcy et al Nature Medicine 2012)
0.5% Page 1 / 200
Rank Compound ID Compound Description Connectivity Score1 BRD-A81772229 simvastatin 0.9962 BRD-A70155556 lovastatin 0.9943 BRD-U88459701 atorvastatin 0.9914 BRD-A18763547 BAX-channel-blocker 0.9885 BRD-K22134346 simvastatin 0.9856 BRD-K12994359 valdecoxib 0.9837 BRD-K09416995 lovastatin 0.9818 BRD-K34581968 BMS-536924 0.9799 BRD-K94176593 TWS-119 0.975
10 BRD-K20285085 fostamatinib 0.97311 BRD-K94441233 mevastatin 0.97212 BRD-K95785537 PP-2 0.97113 BRD-K53414658 tivozanib 0.9714 BRD-K83213911 PF-750 0.96815 BRD-K85606544 neratinib 0.96816 BRD-A19248578 latrunculin-b 0.96717 BRD-K68588778 BRD-K68588778 0.96618 BRD-K06750613 GSK-1059615 0.96619 BRD-A11678676 wortmannin 0.96420 BRD-K05653692 DL-PDMP 0.96321 BRD-K72420232 WZ-4002 0.96122 BRD-K19796430 erismodegib 0.96123 BRD-K78513633 lonidamine 0.96124 BRD-K03618428 PP-110 0.96125 BRD-K37940862 BRD-K37940862 0.961
query: HUVEC cells treated with pitavastatin (cell line not in panel)
0.5% Page 1 / 200
Rank Compound ID Compound Description Connectivity Score1 BRD-K12502280 TG-101348 0.9922 BRD-K94176593 TWS-119 0.9873 BRD-K20285085 fostamatinib 0.9754 BRD-K49328571 dasatinib 0.9695 BRD-K12867552 THM-I-94 0.9696 BRD-K85493820 KM-00927 0.9697 BRD-A02180903 betamethasone 0.9698 BRD-K91701654 U-0126 0.9669 BRD-K95785537 PP-2 0.965
10 BRD-K53414658 tivozanib 0.96411 BRD-A50454580 PD-0325901 0.9612 BRD-K73789395 ZM-336372 0.9613 BRD-K17743125 belinostat 0.95214 BRD-K46419649 U0126 0.9515 BRD-K09499853 KU-0060648 0.94916 BRD-K64890080 BI-2536 0.94717 BRD-K70914287 BIBX-1382 0.94718 BRD-K50168500 canertinib 0.94619 BRD-U43867373 WH-4025 0.94620 BRD-U25771771 WZ-4-145 0.94521 BRD-K34581968 BMS-536924 0.94322 BRD-K18787491 U-0126 0.94223 BRD-K56343971 vemurafenib 0.94124 BRD-K01877528 TL-HRAS-61 0.93725 BRD-K66175015 afatinib 0.933
query: imatinib-resistant chronic myeloid leukemia (Frank et al Leukemia 2006)
0.5% Page 1 / 200
findings
1) Large-scale gene-expression analysis
2) Analysis of L1000 shRNA signatures
Current CMap Dataset
1. Connections b/w genes and drugs2. GWAS gene lists to pathways3. Causal mutation to therapeutic leads4. Discovering new cancer pathways5. MoA of novel small-molecules6. Biological novelty biasing
biological goalLINCS as a starting point for
functional follow-up
Core Signature DB
263 Components explain 80% of the variance
Core Gene signatures from KD (n=1387)
2226
8 Fe
atur
es
Signature Diversity
Similarity Metric
Mining the Similarity Matrix• Unsupervised
• Global Patterns• Supervised
• Gene->[Gene,Pathway,Compound]
Genes (n=1387)G
enes (n=1387)
Global Views of Connections
49% of genes have at least 1 connection > 0.4
Connections per gene
PC3 cell line
Most connected genes
• JAK2 knockdown connects to STAT1 signature• FOS knockdown connects to JUN signature• Cell cycle genes connected (CCND1, CDK2, CDK4, CDK6, CCNE1, E2F1)• ER knockdown connected to ER antagonists & inversely connected to ER agonists• JAK2 over-expression signature inversely to JAK2 inhibitor (lestaurtinib)• HDAC knock-downs connected to HDAC inhibitors (vorinostat, others)• NRF2 over-expression signature inversely connected to curcumin• WNT1 gene connections: TCF7L1, GSK3B, CSNK2A2, PRAKACA, SMAD3…
querying LINCS for connections
AKT3, FOXO1, PDPK1, PHLPP1,
PIK3CB
Top 10 small-molecule connections
genes connections
Integrating queries across members of a pathway
AKT1
39 genesassociated with T2D
allele classification• genes implicated by GWAS
– can be many hundreds, most unannotated
• create profiles of ablation (shRNA) in suitable cells by L1000
– universal functional bioassay
• cluster into “complementation groups”– assign genes to groups, groups to pathways,
pathways to disease
S. Jacobs &D. Altshuler
Drug signature in
MCF7
All MCF7 CGS
wtcs score rank
Similar Dissimilar
Query
Molecular target of Drug A
Target ID
An Example where integrating across many shRNAs improves Connections
Each dot is a dose / timepoint of rapamycin
MTOR shRNA 1
MTOR shRNA 2
MTOR shRNA 3
MTOR shRNA 4
MTOR shRNA 5
MTOR shRNA 6
MTOR shRNA 7
MTOR shRNA 8
MTOR shRNA 9
MTOR shRNA 10
MTOR shRNA 11
MTOR shRNA 12
MTOR shRNA 13MTOR Consensus
Gene Signature
Connectivity Rank of Small Molecules
500040003000200010001
Query with Vemerafinib, highlight BRAF shRNAs
Cel
l lin
e
Each dot is an individual shRNA targeting BRAF
Rank of shRNA (%) Negative Correlation
Positive Correlation
MTOR connects to BEZ235
Rank CGS ID Gene Symbol Connectivity Score1 CGS001-2475 MTOR 0.9992 CGS001-4609 MYC 0.993 CGS001-57521 RPTOR 0.9764 CGS001-2623 GATA1 0.9725 CGS001-5245 PHB 0.9696 CGS001-2581 GALC 0.9677 CGS001-9184 BUB3 0.9658 CGS001-360023 ZBTB41 0.9659 CGS001-4860 PNP 0.965
10 CGS001-11164 NUDT5 0.96411 CGS001-89849 ATG16L2 0.96412 CGS001-527 ATP6V0C 0.96413 CGS001-2065 ERBB3 0.96114 CGS001-3845 KRAS 0.95415 CGS001-4486 MST1R 0.95416 CGS001-3479 IGF1 0.95117 CGS001-207 AKT1 0.9518 CGS001-8607 RUVBL1 0.94819 CGS001-54106 TLR9 0.94820 CGS001-5045 FURIN 0.94725 CGS001-9533 POLR1C 0.944
Rank Compound ID Compound Description Connectivity Score1 BRD-K12184916 NVP-BEZ235 12 BRD-K69932463 AZD-8055 13 BRD-K67566344 KU-0063794 14 BRD-K67868012 PI-103 0.9995 BRD-K77008974 WYE-354 0.9986 BRD-K94294671 OSI-027 0.9987 BRD-A45498368 WYE-125132 0.9988 BRD-K13049116 BMS-754807 0.9979 BRD-K87343924 wortmannin 0.996
10 BRD-K67075780 TGX-115 0.996
BEZ235: a dual ATP-competitive PI3K and mTOR inhibitor
Dose dependentconnectivity
PIK3CA connects to BEZ235
Current list of significant drug-CGS connectivities span multiple MoA’s
losartan AGTR1 Merck60 HDAC1 TGX-115 PIK3C2AMK-2206 AKT1 ISOX HDAC6 BEZ235 PIK3CA10-DEBC AKT1 2-bromopyruvate HK1 PIK-90 PIK3CAMK-2206 AKT2 lovastatin acid HMGCR Compound 110 PIK3CAMK-2206 AKT3 linsitinib IGF1R GW-843682X PLK110-DEBC AKT3 selumetinib MAP2K1 LFM-A13 PLK1brefeldin A ARF1 Compound 11e MAPK1 HA-1004 PRKACBgossypol BCL2 sirolimus MTOR KU 0060648 PRKDCYM-155 BIRC5 BEZ235 MTOR AM-580 RARAZM336372 BRAF PIK-90 MTOR gemcitabine RRM1LFM-A13 BTK PP-30 MTOR fatostatin SREBF2N9-isopropylolomoucine CDK1 parthenolide NFKB1 RITA TP53BML-259 CDK2 triptolide NFKB2 nutlin-3 TP53fumonisin B1 CERS4 dexamethasone NR3C1 pifithrin-alpha TP53etomoxir CPT1A olaparib PARP1 SJ-172550 TP53PNU-74654 CTNNB1 olaparib PARP2 gemcitabine TYMScyanoquinoline 11 EGFR veliparib PARP2 MK 1775 WEE1neratinib EGFR GSK-2334470 PDK1 tyrphostin AG-1478 EGFR BX-795 PDK1
AZD-7545 PDK2 tamoxifen ESR1 PF-3845 FAAH
Goal: Given a chemical library:
• identify the bioactive subset of a library• identify unique bioactivity
Gene-expression as a universal measure of bioactivity
If we see no robust gene expression consequence whatsoever across a diverse panel of cell types, then it's likely that the
compound has no bioactivity.
L1000 as a sensor of bioactivity
active analogs(high S-C)
inactive analogs(low S-C)
dose titration
signature robustness across replicates (C)
S-C plot
sign
atur
e st
reng
th (S
)
biological novelty biasing of chemical libraries
reproducibility
sig
nal
str
eng
th
0 1-1
20
6
0
• global bioactivity detection using L1000 profiles– number and magnitude of expression changes, and robustness
• calibrate with 350 known bioactives across 47 cell lines– median sensitivity of individual cell lines is 42% (90% specificity)– rationally-designed panel of 7 cell lines achieves 95% sensitivity
• qualification, de-duplication, and novelty biasing– consolidate and subset libraries based on function
chemical libraryn = 9,875
activen = 487 (5%)
known MoAn = 435 (4.5%)
noveln = 52 (0.5%)
de-duplicatedn = 30 (0.3%)
1. Data Generation: 1.2M+ profiles released to LINCS
2. Data Access: Multiple levels of data matrices, cloud-compute beta released
3. Biologist-friendly web user interfaces
4. Emerging scientific findings1. Causal mutation to therapeutic leads2. GWAS gene lists to pathways3. Discovering new cancer pathways4. Connecting small-molecules to biology5. Biological novelty biasing of chemical
libraries
Broad LINCS U54
CMap Analytical
Rajiv NarayanJoshua GouldCorey FlynnTed NatoliDavid WaddenIan SmithRoger HuLarson HogstromPeyton Greenside
CMap Data Generation
David PeckJohn DavisRoger CornellXiaohua WuXiaodong LuMelanie Donahue
Todd Golub
Broad ScientistsJesse BoehmBang WongFederica PiccioniJohn DoenchDavid RootSuzanne JacobsPaul ClemonsStuart SchreiberAly Shamji
Broad Platforms
RNAi platformChemical BiologyTD/TS