Date post: | 15-Nov-2014 |
Category: |
Technology |
Upload: | herwig-van-marck |
View: | 18,300 times |
Download: | 5 times |
Advanced Use of Properties and Scripts in TIBCO Spotfire
C.R.E.A.Te Community of Research Excellence & Advanced Technology
Janssen Research & Development
Herwig Van Marck
15.02.11
Overview
• Examples of the use of properties
– Trellis Simulation
– Using $map and $csearch
• Examples of the use of scripts
– Using comma separated tag columns
– Expand marking
– Dynamic list box content
2
15.02.11
Trellis Simulation
3
Team, Player Name
At
Ba
ts
Bill
Mue
ller
Ed
ga
r R
ent
er i
a
John
ny D
am
on
Ma
nny
Ra
mir
ez
Ar a
mis
Ra
mir
ez
De
rre
k L
ee
Jerr
y H
air
sto
n
Mic
hae
l Ba
rre
tt
To
dd
Ho
lland
sw
…
A.J
. Pie
r zy n
ski
Ca
r l E
ver e
tt
Joe
Cr e
de
Pa
ul K
one
r ko
Ta
da
hito
Iguc
hi
Bo
sto
n
Chi
Cub
s
Chi
So
x
Bill
Mue
ller
Ed
ga
r R
ent
eri
a
John
ny D
am
on
Ma
nny
Ra
mir
ez
Ara
mis
Ra
mir
ez
De
rre
k L
ee
Jerr
y H
air
sto
n
Mic
hae
l Ba
rre
tt
To
dd
Ho
lla
nds
w…
A.J
. Pie
rzyn
ski
Ca
rl E
vere
tt
Joe
Cre
de
Pa
ul K
one
rko
Tad
ahi
to Ig
uchi
Bo
sto
n
Chi
Cub
s
Chi
So
x
Bill
Mue
ller
Ed
ga
r R
ent
eri
a
John
ny D
am
on
Ma
nny
Ra
mir
ez
Ara
mis
Ra
mir
ez
De
rre
k L
ee
Jerr
y H
air
sto
n
Mic
hae
l Ba
rre
tt
To
dd
Ho
lland
sw
…
A.J
. P
ierz
ynsk
i
Ca
rl E
vere
tt
Joe
Cre
de
Pa
ul K
one
rko
Tad
ahi
to Ig
uchi
Bo
sto
n
Chi
Cub
s
Chi
So
x
600
550
500
450
400
350
300
250
200
Boston Chi Cubs Chi Sox Color byTeam
Boston
Chi CubsChi Sox
15.02.11
Trellis Simulation
4
Bill
Mue
ller
Da
vid
Or t
iz
Ed
ga
r R
en t
er i
a
Jaso
n V
ar i
tek
John
ny D
am
on
Ke
vin
Mill
ar
Ma
nny
Ra
mir
ez
Tro
t N
ixo
n
Ara
mis
Ra
mir
ez
Co
rey
Pa
tte
r so
n
De
rre
k L
ee
Jero
my
Bur
n itz
Jerr
y H
air
sto
n
Jose
Ma
cia
s
Mic
hae
l Ba
rre
tt
Ne
ifi P
ere
z
To
dd
Ho
lland
sw
ort
h
To
dd
Wal
ke
r
A.J
. P
ierz
ynsk
i
Aa
ron
Ro
wa
nd
Ca
rl E
vere
tt
Jerm
aine
Dye
Joe
Cre
de
Jua
n U
rib
e
Pa
ul K
one
rko
Sco
tt P
od
sed
nik
Ta
da
hito
Iguc
hi
Bo
sto
n
Chi
Cub
s
Ch i
So
x
600
550
500
450
400
350
300
250
200
Number of Teams/page 3 Select page 2
Legend
Team Team
Boston
Chi Cubs
Chi Sox
per Player per TeamAt Bats
15.02.11
Trellis Simulation
• Create document properties
– “teamsPrPageProp” (integer)
– “selectedPage” (integer)
– “valueProp” (column name)
• Create ‘Calculated column’ “Page”:
Integer((DenseRank([Team])-1)/${teamsPrPageProp}+1)
• Custom expression on X-axis:
<if([Page]=${selectedPage},[Team],"") as [Team] NEST if([Page]=${selectedPage},[Player Name],"") as [Player Name]>
• Custom expression on Yaxis:if([Page]=${selectedPage},$esc(${valueProp}),null) as $esc(${valueProp})
5
Number of Teams/page 3 Select page 2
per Player per TeamAt Bats
15.02.11
Using $map and $csearch
6
Bar Chart
Team
Run
s, H
ome
Run
s, R
uns
Cre
ated
Ari
zon
a
Atla
nta
Ba
ltim
ore
Bo
sto
n
Ch
i Cu
bs
Ch
i So
x
Cin
cin
na
ti
Cle
vela
nd
Co
lora
do
De
tro
it
Flo
rid
a
Ho
ust
on
Ka
nsa
s C
ity
LA
An
ge
ls
LA
Do
dg
ers
Milw
au
kee
Min
ne
sota
NY
Me
ts
NY
Ya
nke
es
Oa
kla
nd
Ph
ilad
elp
hia
Pitt
sbu
rgh
Sa
n D
ieg
o
Sa
n F
ran
cisc
o
Se
att
le
St.
Lo
uis
Tam
pa
Ba
y
Texa
s
Toro
nto
Wa
shin
gto
n
1600
1400
1200
1000
800
600
400
200
0
Color by
RunsHome RunsRuns Created
Text Area
Y-Axis property search: *run**run*
15.02.11
Using $map and $csearch
• Create ‘Input Field’ in text area
• Create associated document property “SearchProp”
7
Text Area
Y-Axis property search: *run*
15.02.11
Using $map and $csearch
• Use custom expression on Y-axis
$map("sum($esc($csearch([Baseball],"${SearchProp}"))) as $esc($csearch([Baseball],"${SearchProp}"))",", ")
– Search columns:$csearch([Baseball],"${SearchProp}")
– Escape columns:$esc(…)
– Summarization + renaming of the axis label:sum(…) as …
– Handle multiple columns:$map("…",", ")
8
15.02.11
Comma separated tags column
• Data from a literature search (on next generation sequencing)
• Comma separated tags column (“Tags”)
9
Tags
De novo Assembly
Genome Annotation, Derivative technology, Multiple Analysis Steps
Alignment
Alignment
Genome Annotation
QC Analysis
…
15.02.11
Comma separated tags column
10
Table
Authors Title Source TagsBoisvert, S., F. Laviolette, et al. (2010) Ray: Simultaneous Assembly of Reads from a Mix of High-throughput Sequencing Technologies. Journal of Computational Biology 17(11): 1401-1415Alignment
Clement, N. L., Q. Snell, et al. (2010) The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing. Bioinformatics (Oxford, England) 26 (1): 38-45Alignment
Coarfa, C., F. Yu, et al. (2010) Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing. BMC Bioinformatics 11: 572 Alignment
Mane, S. P., T. Modise, et al. (2011) Analysis of High-throughput Sequencing Data. Plant Reverse Genetics: Methods and Protocols.:1-11, 2011. A. Pereira. 999 Riverview Dr, Ste 208, Totowa, Nj 07512-1165 USA, HUMANA PRESS INCAlignment
Li, H. and N. Homer (2010) A survey of sequence alignment algorithms for next-generation sequencing. Briefings in Bioinformatics 11(5, Sp. Iss. SI): 473-483Alignment, Data volumes
Kaminuma, E., J. Mashima, et al. (2010) DDBJ launches a new archive database with analytical tools for next-generation sequence data. Nucleic Acids Research 38 (SUPPL.1) (pp D33-D38)(gkp847)Alignment, De novo Assembly, Genome Annotation
Tag count
C,
G,
R,
G,
S,
(, D
, A,
D,
D, A
, T,
M,
S,
D,
D,
D,
Q,
E
Da
ta v
olu
mes
Alig
nme
nt/A
ssem
bly
Vie
we
rs
Mu
ltipl
e A
naly
sis
Ste
ps
Ge
nom
e A
nno
tatio
n
SN
P/D
IP D
ete
ctio
n
De
riva
tive
tec
hno
logy
De
no
vo A
ssem
bly
Da
ta s
tora
ge
Alig
nme
nt
Targ
ete
d R
ese
qu
enci
ng
Da
ta in
teg
ratio
n
Da
ta r
epre
sen
tatio
n
Err
or
Co
rre
ctio
n
QC
An
aly
sis
Ra
w D
ata
An
aly
sis
Str
uct
ural
Va
rian
ts D
ete
ctio
n
Co
py
Nu
mbe
r V
aria
tion
Ge
not
ype
Ca
llin
g
(Em
pty)
14
12
10
8
6
4
2
0
14
12 1211
9 98 8
76 6
4 43
2 21 1
0
Marking:(None)
All values
Tag processing
Tags Get tag values Mark selected row
(Column Names) TitleTitleAuthorsSource
Pash 3.0: A versatile software package for read mapping and integrative analysis of genomic and epigenomic variation using massively parallel DNA sequencing.Coarfa, C., F. Yu, et al. (2010)BMC Bioinformatics 11: 572
AbstractBACKGROUND: Massively parallel sequencing readouts of epigenomic assays are enabling integrative genome-wide analyses of genomic and epigenomic variation. Pash 3.0 performs sequence comparison and read mapping and can be employed as a module within diverse configurable analysis pipelines, including ChIP-Seq and methylome mapping by whole-genome bisulfite sequencing. RESULTS: Pash 3.0 generally matches the accuracy and speed of niche programs for fast mapping of short reads, and exceeds their performance on longer reads generated by a new generation of massively parallel sequencing technologies. By exploiting longer read lengths, Pash 3.0 maps reads onto the large fraction of genomic DNA that contains repetitive elements and polymorphic sites, including indel polymorphisms. CONCLUSIONS: We demonstrate the versatility of Pash 3.0 by analyzing the interaction between CpG methylation, CpG SNPs, and imprinting based on publicly available whole-genome shotgun bisulfite sequencing data. Pash 3.0 makes use of gapped k-mer alignment, a non-seed based comparison method, which is implemented using multi-positional hash tables. This allows Pash 3.0 to run on diverse hardware platforms, including individual computers with standard RAM capacity, multi-core hardware architectures and large clusters.
Tag count
(Column Names) CountData volumes
Alignment/Assembly Viewers
Multiple Analysis Steps
Genome Annotation
SNP/DIP Detection
Derivative technology
De novo Assembly
Data storage
Alignment
Targeted Resequencing
Data integration
Data representation
Error Correction
QC Analysis
Raw Data Analysis
Structural Variants Detection
Copy Number Variation
Genotype Calling(Empty)
||||||||||||||(14)
||||||||||||(12)
||||||||||||(12)
|||||||||||(11)
|||||||||(9)
|||||||||(9)
||||||||(8)
||||||||(8)
|||||||(7)
||||||(6)
||||||(6)
||||(4)
||||(4)
|||(3)
||(2)
||(2)
|(1)
|(1)
Detail on single paper
19
colu
mns
15.02.11
Comma separated tags column
• Text area for parameters and scripts
– Drop-down list for column selection• Document property “TagColumn”
– Create document property “TagsList”• Via temporary “List box (multiple select)”
• Fill with values with Iron Python script
11
Tag processing
Tags Get tag values Mark selected row
15.02.11
Comma separated tags column
• GetTags script:
12
Tags Get tag values Mark selected row
from System import Arrayfrom Spotfire.Dxp.Data import IndexSetfrom Spotfire.Dxp.Data import DataValueCursor
rowCount = Document.ActiveDataTableReference.RowCountrowsToInclude = IndexSet(rowCount,True)
#Create a cursor to the Column we wish to get the values fromcursor1 = DataValueCursor.CreateFormatted(Document.ActiveDataTableReference.Columns[ColumnName])
keys=dict()
#Loop through all rows, retrieve value for specific column, and add value into arrayfor row in Document.ActiveDataTableReference.GetRows(rowsToInclude,cursor1): value1 = cursor1.CurrentValue for tag in value1.split(', '): keys[tag]=1
strArray = Array.CreateInstance(str,len(keys))
idx=0for key in keys: strArray[idx] = key idx=idx+1
#Set property to array created aboveDocument.Properties.Item[ListProperty]=strArray
Script parameters
Name Type Value
ColumnName String “${TagColumn}”
ListProperty String TagsList
Get unique tag values
Put in “Tagslist” property
15.02.11
Comma separated tags column
• Custom Expression on Y-axis:
$map("Integer(Sum(if(find("", ${TagsList}, "","", ""&[Tags]&"", "")>0,1,null))) as $esc(${TagsList})", ",")
– Search for for tag (caveat: quoting the quotes)if(find("", ${TagsList}, "","", ""&[Tags]&"", "")>0,1,null)
– Summarization + renaming of the axis labelInteger(Sum(…)) as $esc(${TagsList})
– Handle multiple tags$map("…", ",")
13
Da
ta v
ol u
mes
Ali g
nm
en
t/A
ss
em
bl y
Vi e
wer
s
Mu
l tip
l e A
na
l ys
i s S
teps
Ge
no
me
An
no
tatio
n
SN
P /D
IP D
ete
ctio
n
De
r iv
ativ
e t
ec
hn
olo
gy
De
no
vo
As
se
mb
ly
Da
ta s
tor a
ge
Alig
nm
en
t
T ar g
ete
d R
ese
qu
en
cin
g
Da
ta in
tegr
atio
n
Da
ta r
epr
ese
nta
tion
Err o
r C
orr e
ctio
n
QC
An
aly
sis
Ra
w D
ata
An
aly
sis
Str
uc
tur a
l Var
ian
ts D
ete
ctio
n
Co
py
Nu
mb
er V
ar i
atio
n
Ge
no
typ
e C
alli
ng
(Em
pty
)
14
12
10
8
6
4
2
0
14
12 1211
9 98 8
76 6
4 43
2 21 1
0
15.02.11
Comma separated tags column
• Selection Problem
– because all data is derived from all records
• Solution:
– “horizontal bar chart” using cross table• Repeat(""|"",Integer(Sum(…))) …
– Use column sorting as selector and Iron Python script
14
Tag count
Data volumes
Alignment/Assembly Viewers
Multiple Analysis Steps
Genome Annotation
SNP/DIP Detection
Derivative technology
De novo Assembly
Data storage
Alignment
Targeted Resequencing
Data integration
Data representation
Error Correction
QC Analysis
Raw Data Analysis
Structural Variants Detection
Copy Number Variation
Genotype Calling
||||||||||||||(14)
||||||||||||(12)
||||||||||||(12)
|||||||||||(11)
|||||||||(9)
|||||||||(9)
||||||||(8)
||||||||(8)
|||||||(7)
||||||(6)
||||||(6)
||||(4)
||||(4)
|||(3)
||(2)
||(2)
|(1)
|(1)
Tags Get tag values Mark selected row
15.02.11
Comma separated tags column
• MarkSelectedRow script:
15
from Spotfire.Dxp.Application.Visuals import VisualContentfrom Spotfire.Dxp.Data import IndexSetfrom Spotfire.Dxp.Data import RowSelectionfrom Spotfire.Dxp.Data import DataValueCursor
vc = vis.As[VisualContent]()
dataTable=vc.Data.DataTableReferencemarking=vc.Data.MarkingReference
selectRows = IndexSet(vc.Data.DataTableReference.RowCount, False);
if (vc.SortColumnsCategory): selectedTag=vc.SortColumnsCategory.ToString() rowCount = dataTable.RowCount rowsToInclude = IndexSet(rowCount,True)
#Create a cursor to the Column we wish to get the values from cursor1 = DataValueCursor.CreateFormatted(dataTable.Columns[TagsColumn])
#Loop through all rows and check for tag idx=0 for row in dataTable.GetRows(rowsToInclude,cursor1): value1 = cursor1.CurrentValue found=False for tag in value1.split(', '): if (tag==selectedTag): found=True break if found: selectRows[idx]=True idx=idx+1
marking.SetSelection(RowSelection(selectRows), dataTable)
Script parameters
Name Type Value
vis Visualization Tagging>HBar
TagsColumn String “${TagColumn}”
Get “selected” tag
Set marking
Find records with “selected” tag
15.02.11
Expand marking
16
Scatter Plot
Games Played
At
Bat
s
90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160
650
600
550
500
450
400
350
300
250
200
150
650
600
550
500
450
400
350
300
250
200
150
Arizona
SS
Atlanta
Baltimore Boston
Color byPosition
1B2B3BCCFLFRFSS
Table
Data limited by:
Marking
Data table:Position Data Table
Marking:Marking
PositionSS
Select the same Position
Unmark all
15.02.11
Expand marking
17
Scatter Plot
Games Played
At
Bat
s
90 95 100 105 110 115 120 125 130 135 140 145 150 155 160 90 95 100 105 110 115 120 125 130 135 140 145 150 155 160
650
600
550
500
450
400
350
300
250
200
150
650
600
550
500
450
400
350
300
250
200
150
SS
SS
Arizona
SS
SS
Atlanta
SS
SS
Baltimore
SS
Boston
Color byPosition
1B2B3BCCFLFRFSS
Table
Data limited by:
Marking
Data table:Position Data Table
Marking:Marking
PositionSS
Select the same Position
Unmark all
15.02.11
Expand Marking
• Expand marking to all other records with the same value for a chosen column (e.g. Position)
• Solution:
– Add a second data table containing the unique values
– Define relation between the 2 tables on the chosen column
– Marking something in data table 1 changes the marking in data table 2 due to the relation between the data tables
– Re-applying the marking in data table 2 changes the marking in data table 1 (effectively expanding it)
– Iron Python script to re-apply a marking
18
15.02.11
Expand Marking
• Script to re-apply a marking:
19
from Spotfire.Dxp.Application.Visuals import VisualContent
vc = vis.As[VisualContent]()
dataTable=vc.Data.DataTableReferencemarking=vc.Data.MarkingReferencemarking.SetSelection(marking.GetSelection(dataTable),dataTable)
Script parameters
Name Type Value
vis Visualization Page>Table
Get data table and marking from visualization
Re-apply marking
15.02.11
Dynamic list box content
• Multi select list box in a text area
• How do you dynamicaly change the contents?
– Subset of unique values from a column, selected by marking
20
Select symbols for list
Approve…A1BGA1CFA2LD1A2MA2ML1A4GALTA4GNTAAASAACSAADACAADACL2AADACL3AADACL4AADATAAGABAAK1
Load marked symbols
A2LD1A2MA4GALTAAASAADACAADACL3AADACL4AADATAAK1
Cross TableS
e le
cte
d G
ene
Sy
mbo
ls
Selected Gene Symbols ChromosomeAAASAADACL3(Empty)
12q131p36.21
15.02.11
Dynamic list box content
• Define a tag column (e.g. ‘SelectedGenes’)
– Create a tag ‘Selected’
• Create a calculated column (e.g. ‘Selected Gene Symbols’)
if([SelectedGenes]="Selected",[Approved Symbol],null)
• Fill list box with unique values from [Selected Gene Symbols]
• Use Iron Python script to modify tag column
21
15.02.11
Dynamic list box content
• Script to set ‘tagValue’ in ‘tagColumn’ for marked records
22
from Spotfire.Dxp.Data import DataManager,TagsColumn,IndexSet,RowSelection
selection=Application.GetService[DataManager]().Markings[markingName].GetSelection(dataTable)
col=dataTable.Columns[tagColumn].As[TagsColumn]()
# remove tag in tagColumn for all rows in dataTableidx=IndexSet(selection.TotalRowCount,True)col.Tag("",RowSelection(idx))
# tag marked rows in tagColumn with tagValue col.Tag(tagValue,selection)
Script parameters
Name Type Value
dataTable DataTable Data Table
markingName String Select Marking
tagColumn String SelectedGenes
tagValue String Selected
Remove all tags in ‘tagColumn’
Set ‘tagValue’ tag in ‘tagColumn’ for records marked in ‘markingName’
15.02.11 23