Fluidigm Singular Analysis

SINGuLARTM Analysis Toolset

User Guide

PN 100-5066 B1

Copyright 2013 Fluidigm Corporation. All rights reserved.

Limited License for SINGuLAR Analysis Toolset The SINGuLAR Analysis Toolset is a shared-source, proprietary data analysis resource for Fluidigm customers interested in analyzing or developing software for single-cell gene expression data generated on Fluidigm technology . It is comprised of unsupported software development resources, including R-scripts, documentation and reference data. Registered users of the SINGuLAR Analysis Toolset may use the code contained in this file in accordance with the terms set forth in sections 1 through 8 below. You may register to use the toolset at the following address: http://www.fluidigm.com/singular-sc-analysis-toolkit-request.html. Unregistered users or users whose registration has not been confirmed with a receipt at the aforementioned website have no rights or permission to use this code.

1. Use of the code in source and binary forms, with or without modification is permitted solely in accordance with section 3 below. 2. Redistribution of the code in source and binary forms, with or without modification is permitted only to employees and agents of entities

named as registered users of the SINGuLAR Analysis Toolset. Redistribution, whether in source or binary form must include this license statement.

3. Any use must be in conjunction with a Fluidigm product. Any use with a Fluidigm product may also be in conjunction with data from any source, including products from other vendors. In any case, the code may not be used in conjunction with any product similar to the Fluidigm BioMark Real-Time PCR System that is made by another entity.

4. Any redistribution and use shall be in accordance with the laws and export regulations of the United States of America. Under no circumstances shall code be distributed to or used by persons listed on the Denied Persons List maintained by the United States Department of Commerce, or be distributed to or used or executed in a country listed on the Export Control List, List of Extensively Embargoed Countries, or List of Targeted Sanctions Countries and Territories maintained by the United States Department of Commerce; appropriate measures shall be taken to ensure that recipients will also refrain from distribution to such parties.

5. Fluidigm will not provide, and is not responsible for providing any end-user support. 6. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,

INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

7. This license, and all matters relating to use of the SINGuLAR Analysis Toolset shall be governed by and interpreted in accordance with the law of the State of California except for its choice of law rules. For any disputes arising out of this Agreement, the parties consent to the personal and exclusive jurisdiction of, and venue in, the state and federal courts within San Mateo County, California.

8. This license constitutes the entire agreement between you and Fluidigm Corporation. This license may only be amended or supplemented by a writing that refers explicitly to this Agreement and that is signed by duly authorized representatives of both parties.

Information in this manual is subject to change without notice. Fluidigm assumes no responsibility for any errors or omissions. In no event shall Fluidigm be liable for any damages in connection with or arising from the use of this manual. Fluidigm, the Fluidigm logo, BioMark, C1, DELTAgene, Dynamic Array, FC1, and SINGuLAR are trademarks or registered trademarks of Fluidigm Corporation in the U.S. and/or other countries. Contacting Fluidigm By phone: In the United States: 1.866.FLUIDLINE (1.866.358.4354) Outside the United States: +1.650.266.6100 On the Internet: www.fluidigm.com/support; [email protected]

Fluidigm Corporation 7000 Shoreline Court, Suite 100 South San Francisco, CA 94080

SINGuLARTM Analysis Toolset

User Guide

PN: 100-5066 B1

Table of Contents

Section 1: Single-Cell Data Analysis

Purpose of this Document ................................................................................................... 7

The Nature of Single-Cell Transcription ................................................................................... 7

Transcriptional Bursting in Single Cells ................................................................................ 9

Replicates .................................................................................................................... 10

Identification and Use of Limit of Detection (LoD) and Log2Ex ....................................................... 12

Limit of Detection ........................................................................................................... 14

Detection limit of the qPCR reaction .................................................................................. 15

Qualification of Assays Prior to Single-Cell Experiments ............................................................... 16

Elimination of Cells or Genes from Subsequent Analysis ............................................................... 17

Normalization ................................................................................................................ 18

Secondary Analysis .......................................................................................................... 19

Section 2: The SINGuLAR Workflow

Installing R and SINGuLAR .................................................................................................. 22

Installing R ................................................................................................................. 23

Installing SINGuLAR ....................................................................................................... 24

Creating the SINGuLAR Directory for Data Analysis .................................................................... 25

Preparing BioMark System Results ........................................................................................ 26

Estimating the Limit of Detection (LoD) Ct Value ....................................................................... 26

Option 1: Experimental Determination of LoD ....................................................................... 26

Option 2: Iterative Determination of LoD ............................................................................. 26

Removing Failed Data Points and Low Expression Cells ............................................................... 27

Loading and Analyzing Data for Single-Cell Experiment Results with SINGuLAR ................................... 27

Single-Cell Data Analysis Performed Using Fluidigm SINGuLAR ....................................................... 27

Violin Plots ................................................................................................................ 28

Hierarchical Clustering .................................................................................................. 29

Principal Component Analysis (PCA) ................................................................................... 30

Loading and Individually Analyzing Data for Single-Cell Experiments ............................................... 31

To Calculate Log2Ex ...................................................................................................... 31

To Generate a Violin Plot ............................................................................................... 31

To Generate a Hierarchical Cluster Heat map ....................................................................... 31

To Perform PCA ........................................................................................................... 32

Analyzing Multiple Chip Runs with SINGuLAR ........................................................................... 32

Section 3: Appendices

Appendix 1: Protocol for the Qualification of Assays .................................................................. 35

Appendix 2: Removing Data Failed by Fluidigm Real-Time PCR Analysis Software ................................ 41

Appendix 3: Eliminating Low-Expressing Cells from Subsequent Analysis ........................................... 43

Appendix 4: Normalizing Using Median Log2Ex .......................................................................... 44

Appendix 5: A Note on the Optimal Number of Cycles Needed for Preamplification ............................. 45

References ................................................................................................................... 46

Table of Figures

Figure 1: The single-cell workflow ......................................................................................... 7

Figure 2: ActB expression data; Fluidigm study ......................................................................... 8

Figure 3: Data from Fluidigm experiment showing large fold-differences ........................................... 9

Figure 4: Single-cell standard deviations ................................................................................ 10

Figure 5: PCA showing subpopulations ................................................................................... 11

Figure 6: Calculating LoD and Log2Ex .................................................................................... 13

Figure 7: Comparison of IER3 transcripts ................................................................................ 13

Figure 8: Compare Log2Ex levels of 10 genes in 75 single cells ....................................................... 14

Figure 9: Poisson distribution at average of 5 targets/chamber ..................................................... 15

Figure 10: Cutoff Ct three standard deviations below mean .......................................................... 18

Figure 11: Example where normalization does not greatly affect data analysis ................................... 19

Figure 12: The Analysis workflow ......................................................................................... 22

Figure 13: The SINGuLAR workflow ....................................................................................... 23

Figure 14: Spreadsheet exported as .csv file ............................................................................ 28

Figure 15: Violin plots generated in R .................................................................................... 29

Figure 16: Sample heat map ............................................................................................... 29

Figure 17: Scree and scatter plots ........................................................................................ 30

Figure 18: Selecting the Tm range ........................................................................................ 39

6

Section 1

Single-Cell Data Analysis

7

Purpose of this Document Single-cell researchers use the Fluidigm BioMark System to measure gene expression levels for up to

hundreds of genes in hundreds to thousands of samples. This document is a practical guide on minimum steps

in using the BioMark System to obtain single-cell gene expression data. Starting with background material on

the nature of single-cell transcription, it takes the reader through a tutorial of data collection, preparation,

and analysis. Fundamental steps in the single-cell workflow are:

Figure 1: The single-cell workflow

This document takes users through one particular path of the latter half of the single-cell workflow: qPCR

detection, primary data processing, and secondary data analysis. The choices available at each step lie

beyond the scope of this document but will provide topics for subsequent documentation.

The Nature of Single-Cell Transcription Bengtsson et al. (2005) were among the first to use qPCR to quantify transcripts in single cells. They

measured gene expression levels of five genes in individual cells from mouse pancreatic islets and found

that the transcript levels of the different genes were lognormally distributed. Since a lognormal

distribution is characterized by its geometric mean rather than its arithmetic mean, there are profound

implications for the comparison of single-cell data to population data. In a lognormal distribution, the

average expression level (arithmetic mean) observed for a population of cells gets strongly biased by a

few cells with a very high number of transcripts. Therefore, the average expression level does not reflect

the expression level in a typical cell. The paper concluded, Accordingly, it may not be valid to

extrapolate results of gene expression measurements on cell populations to the single-cell level.

8

The lognormal distribution means that data from single eukaryotic cells show cell-to-cell variation in

mRNA amounts that ranges from 10-fold to 1,000-fold depending on the gene and type of cell. In the study

cited, the levels of ActB transcript varied approximately 1,000-fold among the single cells analyzed. A

Fluidigm replication of the study is shown in Figure 2 below.

Figure 2: ActB expression data; Fluidigm study

Fluidigm also ran a single-cell experiment on a 96.96 Dynamic Array integrated fluidic circuit (IFC), but

analyzed a much larger number of genes. Data for 77 genes in 87 single human K562 cells showing large

fold-differences between individual cells are presented in Figure 3 below.

9

Figure 3: Data from Fluidigm experiment showing large fold-differences

The Fluidigm experiment determined the number of genes exhibiting differential expression between

individual cells, depicted here as fold-change (upper X-axis labels) and equivalent Ct values (lower X-axis

labels). These results indicate that 10- to >500-fold variation in transcript levels should be expected when

comparing individual cells.

Transcriptional Bursting in Single Cells

Data such as these, collected by several researchers, have led to the model that eukaryotic transcripts are

produced in short but intense bursts interspersed with intervals of inactivity during which transcript levels

decay. Raj et al. (2006) directly observed intrinsically random bursting of mRNA for two genes in CHO

cells. Chubb et al. (2006) also observed this burst-and-decay behavior for the dscA gene in living

Dictyostelium cells. For this gene, they measured a mean burst duration of 5.2 minutes and a mean interval

of inactivity (presumably mRNA decay) of 5.8 minutes, but there was a great deal of stochastic variation

in each of these averages.

This noise inherent in single-cell gene expression challenges conventional methods for obtaining and

analyzing qPCR data. Factors such as replicates, data display, limits of detection, normalization, and

univariate versus multivariate analysis need to be re-evaluated. Although one may think that this noise

10

precludes the ability to get useful information from single cells, the reality is quite the opposite. By

acknowledging and addressing the intrinsic noise (using appropriate statistical analysis methods), single-

cell gene expression profiling can provide biological insights that are simply not visible when one is

averaging expression levels from hundreds or thousands of cells.

Replicates

Another way to assess the variation observed in single cells is to look at the standard deviation of various

transcript levels in a population of single cells. Figure 4 uses data from the Fluidigm experiment described

earlier (using K562 cells) to depict the standard deviations observed for 77 genes in a population of 87

cells.

Figure 4: Single-cell standard deviations

Only two genes show a standard deviation of less than one cycle between single cells. For

experiments run using bulk RNA on the BioMark System, the standard deviation observed for qPCR

technical replicates is typically 0.16-0.25 cycle or less. Biological noise is thus greater than technical

noise by a large amount. It is therefore better to focus on biological replicates rather than on

technical replicates. Experimental bandwidth is thus better utilized by running more single-cell

samples and by interrogating more genes than by running technical replicates of the single-cell

samples or assays.

11

One way to restate the need for biological replicates is to say that data need to be collected from a

statistically significant number of single cells in order to obtain reliable results. What is a

statistically significant number of single cells? This is difficult to answer in absolute terms. Statistical

significance depends not only on the number of cells, but also on other factors including the degree of

variation within the population analyzed, the number of genes assayed, and the ability of those

assays to differentiate the population variation.

Basic statistics would indicate that for any single gene, a homogenous population can be

characterized on the basis of 30 samples. Thus, if every subpopulation within a sample of single cells

were represented by at least 30 cells, one would have reasonable confidence that the experiment

would robustly identify all subpopulations. This would mean that if one wanted to reliably identify a

subpopulation that was 10% of the total population, 300 cells would need to be examined.

In practice, subpopulations can be identified with fewer than 30 cells depending on the cells and

genes being analyzed. Guo et al. (2010) analyzed 159 single cells from 64-cell stage mouse embryos,

assaying 48 genes in each cell. A principal component analysis (PCA) from the study is shown in

Figure 5.

Figure 5: PCA showing subpopulations

12

From Guo et al. (2010). Image reprinted with permission from Developmental Cell.

Guo et al. were able to clearly identify the epiblast (EPI) subpopulation, with only 17 cells in that

subpopulation. They could do this because of the type of cells analyzed, the use of 48 genes, and the

fact that those 48 genes revealed very distinct signatures between EPI, primitive endoderm (PE), and

trophectoderm (TE) cells.

Identification and Use of Limit of Detection (LoD) and Log2Ex

When qPCR experiments are run on bulk RNA samples, the results are typically displayed as fold-

change differences between samples for each individual gene and known controls. Because of the

extensive normal variation in a given gene at the single-cell level, looking at fold changes between

individual cells is potentially not very informative. A better approach may be to first assess the

population behavior for each gene. By assessing which genes display a lognormal distribution within

the cell population under investigation, this type of first-pass analysis can provide the first

significant insight to the unique biology of the cell population and dictate further, more directed

analyses. This is best done by looking at histograms that bin expression levels and display the number

of cells in each bin. To generate such histograms, the expression for each gene must be comparable

between different single-cell samples. One starts by calculating the limit of detection (LoD) and then

computing Log2Ex values.

Because of the lognormal distribution described by Bengtsson et al. (2005) and others, it is useful to

view single-cell data as expression level above detection limit on a log scale. For qPCR data, it is

convenient and appropriate to do this in log base 2 by defining the term Log2Ex:

Log2Ex = LoD Ct Ct [Gene]

If the value is negative, Log2Ex = 0

Log2Ex represents transcript level above background expressed in log base 2. Conversion from a log

scale to a linear scale can be accomplished by calculating 2^Log2Ex, which gives the fold change.

These equations are expressed graphically in Figure 6. The value of each sample is subtracted from

the LoD. In this example LoD = 22. Therefore, Ct values higher than 22 are assigned a Log2Ex value

of 0.

13

Figure 6: Calculating LoD and Log2Ex

The use of Log2Ex enables plotting the number of cells where the transcript level is at or below the

detection limit. Figure 7 compares IER3 transcripts in 87 human K562 cells.

Figure 7: Comparison of IER3 transcripts

IER3 transcripts from 87 human K562 cells were plotted on a log (left) and linear (right) scale. No IER3 transcript was detected in 10 cells.

To compare histograms for multiple genes, it is convenient to use violin plots, which are essentially

histograms turned on their side and mirrored. Violin plots from Guo et al. (2010), Figure 8 below,

compare 10 genes in 75 single cells derived from 16-cell stage mouse embryos:

14

Figure 8: Compare Log2Ex levels of 10 genes in 75 single cells

Violin plots from Guo et al. (2010); Image reprinted with permission from Developmental Cell.

The violin plots reveal that seven genes have unimodal distributions and three (Id2, Nanog, Sox2)

have bimodal distributions. The unimodal distributions indicate no detectable variation other than

intrinsic noise. The bimodal distributions indicate that these three genes are differentially expressed

in at least two subpopulations within these 75 cells.

The vertical position of each histogram indicates the relative expression level. For example, ActB

has the highest expression level among these 10 genes.

It is also possible to see that transcripts can have distributions of varying widths, distribution being

an indicator of variation. For example, Pou5f1 has a much narrower distribution, or less variation on

the Log2Ex axis, than Cdx2. This is because each gene has a characteristic transcriptional burst size,

frequency, and decay rate.

If the histogram indicates two or more subpopulations, it is now possible to get meaningful average

fold change values. For the Id2 gene in the violin diagram, the median Log2Ex value is roughly 7.5

for the higher expressing subpopulation and roughly 1.8 for the lower expressing subpopulation.

Thus the Log2Ex between these two subpopulations is about 7.5 1.8 = 5.7 which corresponds to a

fold difference of 2^5.7, or approximately 50, in expression levels, on average.

Limit of Detection

The Log2Ex calculation requires defining a limit of detection (LoD) Ct value. This raises the issue of

defining the detection limit of qPCR. In fact, there are two separate questions:

1. What is the detection limit of the qPCR reaction by itself?

15

2. What is the detection limit of the overall process? (going from single cell RNA cDNA preamplified cDNA qPCR reaction)

Detection Limit of the qPCR Reaction

Based on digital PCR results using well-performing assays, it is clear that a single target DNA molecule

in a reaction chamber will generate a positive amplification plot. That is why the theoretical limit of

PCR is one molecule. A more stringent definition of detection limit, however, would incorporate

some indication of the confidence of detecting a target.

If a number of identical PCR reactions are performed at an average concentration of one target DNA

molecule per reaction chamber, then 37% of the reactions will not contain a single molecule. The

chance of detection is therefore 63%. This effect can be calculated according to the Poisson

distribution; there is a 37% likelihood that a molecule will not actually land in the chamber, and

thus will not show a positive amplification plot.

For stringent detection, at what concentration is there at least a 99% chance of generating a positive

amplification plot? This occurs at an average concentration of five target molecules per reaction

chamber as shown by the Poisson distribution in Figure 9.

Figure 9: Poisson distribution at average of 5 targets/chamber

16

Thus, a stringent definition of LoD would be the value that corresponds to five targets per reaction

chamber, which in turn corresponds to a >99% chance of detection with one single-cell replicate.

This stringent definition minimizes the number of false negatives; however, it may exclude true

positives. In other words, one can be very confident that a positive really is a positive, but some

data may be excluded.

To explore the effects of sensitivity on results, data can be analyzed using different values for LoD,

ranging from stringent to relaxed. For example, the data used in the workflow section of this

document indicates that 22 cycles is a stringent LoD Ct value. Thus, Log2Ex values could be

calculated using LoD = 22, 23, 24, or 25, and each data set then analyzed to see if altering

stringency impacts conclusions.

In the single-cell gene expression workflow, qPCR reactions are preceded by preamplification of

cDNA. Statistically, 18-20 cycles of preamplification will result in an average of five copies of target

per chamber from a single copy of cDNA. Preamplification can have efficiencies close to 100%, as

reported by Devonshire et al. (2011). More details on preamplification and its effect on target

concentration are discussed in Section 3 (Appendices).

The foregoing discussion indicates that the single-cell protocol should be fairly robust even if only a

single cDNA molecule is generated in the reverse transcription reaction on the mRNA from a single cell.

Of course, the overall limit of detection is critically dependent on the efficiency of the reverse

transcriptase. Furthermore, this efficiency probably varies depending on the transcript and the

location of the assay amplicon within the transcript. Although reverse transcriptase efficiency deserves

closer scrutiny, it will not be explored here. Also, the overall availability of RNA after cell lysis will

have an effect on the limit of detection for single-cell gene expression.

Qualification of Assays Prior to Single-Cell Experiments There are two reasons to test assays on cDNA prepared from bulk RNA before embarking on

analyzing single cells. First, when using DNA binding dye assays, such as DELTAgeneTM Assays, the

data are used to determine the correct Tm range for the amplicon generated by each assay. For

this purpose, it is best to use bulk RNA from the same or similar cells as the single cells to be

studied, so that splice variants will be the same as in the single cells. If bulk samples are not

17

available, then appropriate tissue-specific or universal RNA or cDNA can be purchased from

various vendors. Second, the data are used to estimate an LoD Ct value for use in data analysis.

These two properties, Tm and LoD Ct, are characteristics of the qPCR assay and not of the reverse

transcriptase step or preamplification step. Therefore, this qualification test is performed using

dilutions of preamplified cDNA in order to focus on the qPCR assays.

For the purpose of empirically estimating a LoD Ct value, six replicates of each dilution

concentration are run. For each assay, a preliminary LoD Ct is determined by taking the average Ct

for the most dilute sample that has positive amplification plots for all six replicates. Because of the

approximate nature of this LoD Ct value, it is reasonable to use it for any additional primer pairs

that are added to the experiment.

The LoD Ct value is most drastically affected by platform. For any particular platform, however, the

exact LoD Ct value is somewhat arbitrary and probably will not drastically impact the interpretation

of a single-cell experiment. As discussed above, this can be tested by first using the stringent LoD Ct

value, then increasing it in one-cycle increments and seeing how this affects the results.

Elimination of Cells or Genes from Subsequent Analysis It can be difficult to decide which cells can be eliminated from analysis due to abnormally low

expression. Using low (or no) expression of a single control gene is not a reliable metric for excluding

cells from the data set because the level of expression of any single gene (including housekeeping

genes) can vary widely between single cells. Using multiple control genes in single-cell experiments allows

greater confidence in eliminating samples, as cells with low expression across several genes are likely to be

abnormal.

We suggest including three highly-expressed, monophasic control genes in the set of assays used to

interrogate the cells. The standard deviation of the control genes can be calculated, as well as a

cutoff Ct that is three standard deviations below the mean, as shown in Figure 10. Cells whose

expression is below the cutoff Ct for at least two of the three control genes can be eliminated.

18

Figure 10: Cutoff Ct three standard deviations below mean

Normalization

The Ct method (Livak and Schmittgen, 2001) may not be best for identifying differences among the

single cells being analyzed. Normalization should be considered a variable that can be tried to see if

it has any significant effect on the analysis of the expression data. Normalizing to a single reference

gene that is varying 10- to 1,000-fold at the single-cell level is generally not useful. Guo et al. (2010)

normalized using the average of ActB and Gapdh Log2Ex values. One way that normalization might be

beneficial is by reducing variation due to differing cell size. It is not necessary to normalize Log2Ex data

on a per-cell basis. In fact, many single-cell publications have not used any cell-based normalization.

Vandesompele et al. (2002) describe the geNorm method, a robust way to use multiple reference

genes to determine a normalization factor.

Figure 11 depicts an example where normalization does not seem to have much effect on data

analysis. Guo et al. (2010) performed PCA on expression data from 159 single cells derived from 64-

cell stage mouse embryos. Prior to the analysis, they normalized their data using the average of ActB

and Gapdh Log2Ex values. Here, PCAs have been repeated using unnormalized data and median

Log2Ex normalized data.

19

Figure 11: Example where normalization does not greatly affect data analysis

From Guo et al. (2010)

The distributions of single cells in these three plots do not seem to be significantly different, indicating

that normalization would have little effect on data interpretation in this particular case.

We suggest normalizing such that each cell has the same median Log2Ex value across all genes detected in

that cell. This ensures that the normalization factor includes data from all genes in the study.

Secondary Analysis

Even if normalization issues are addressed by using data from multiple genes, as recommended earlier, the

Ct method focuses on genes one at a time. With the expression of each gene varying 10- to 1,000-fold,

it may be difficult to discern reliable patterns in data from any single gene. For lower expressed genes,

analysis is complicated by the fact that a transcript may not be detected in a particular cell purely due to

stochastic noise, not due to lack of expression. Rather, some form of multivariate analysis, such as

hierarchical clustering or principal component analysis, will be more fruitful in identifying subpopulations

with similar gene expression signatures.

The purpose of this section is to focus on the minimum steps required to process single-cell data to make

it ready for secondary analysis, rather than to explore all available methods of secondary analysis. In

order to provide additional guidance, below is a tabulated list of published research that used the

BioMark System to obtain single-cell gene expression data, and the secondary analysis methods that were

used in each. They shed additional light on ways to analyze single-cell data for biological insight.

20

Field Violin Plots

Plus/

Minus

Pairwise Correlation

HC PCA LDA DTA JSD

Buganim et al. 2012 Stem cells Guo et al. 2010 Developmental

Biology Flatz et al. 2011 Immunology Dalerba et al. 2011 Cancer Pang et al. 2011 Neuroscience Vincent et al. 2011 Developmental

Biology

Aguilo et al. 2011 Stem cells Table 1: Comparison of secondary analysis methods in published research using BioMark for single-cell gene expression (HC =

Hierarchical Clustering; PCA = Principal Component Analysis; LDA = Linear Discriminant Analysis; DTA = Decision Tree Analysis;

JSD = Jensen-Shannon Divergence)

21

Section 2

The SINGuLAR Workflow

22

Key steps in single-cell gene expression analysis are depicted in Figure 12 below. Two powerful tools, the

Fluidigm Real-Time PCR Analysis Software and the SINGuLARTM package, are used in combination, either to

process data or to perform the analysis.

Figure 12: The Analysis Workflow

The Fluidigm SINGuLAR Package

SINGuLAR leverages Rs statistical computing capability to streamline data preparation and analysis. Among other things, the data processing ability of SINGuLAR enables users to:

1. Estimate Limit of Detection (LoD) Ct values 2. Generate Log2Ex values

For data analysis and representation, SINGuLAR permits users to:

1. Create violin plots 2. Perform multivariate analyses such as hierarchical clustering and principal component analysis (PCA)

Installing R

NOTE: If you have already installed R and SINGuLAR, you can skip this section and proceed directly to creating the SINGuLAR directory for data analysis.

23

Figure 13: The SINGuLAR Workflow

Installing R

1. Download the latest version R for Windows. To do this go to http://www.r-project.org/ and

download from the Berkeley CRAN mirror located at http://cran.cnr.Berkeley.edu.

24

2. Run the downloaded .exe file. A setup wizard will walk you through installation. Choose to install the

base version only.

Installing SINGuLAR

1. Download fluidigmSC_.zip by logging in to the Fluidigm single-cell analysis tools

web page.

2. Open R. You will be taken to the R-GUI.

3. From the menu bar select Packages > Install package(s) from local zip files and select the file

named fluidigmSC_.zip.

4. At the R command prompt, type

library(fluidigmSC)

5. Hit Enter and type

fluidigmSC.firstrun()

6. Select the nearest mirror to install additional packages and hit Enter. You will need to set the CRAN

mirror for the session. Select the nearest mirror to reduce network load.

7. To download from Berkeley, please select the USA(CA1) mirror. This is required to continue

downloading. It ensures that you receive R updates and have access to online help.

25

8. The R GUI will display a series of messages. You can now proceed to create the SINGuLAR directory

for data analysis.

Creating the SINGuLAR Directory for Data Analysis 1. To load SINGuLAR, at the R command line, type

library(fluidigmSC)

2. Navigate to File > Change dir to set the working directory for this session.

NOTE: Data files calculated by SINGuLAR will automatically get saved to this directory. The working directory could match the location of the single-cell data exported from the Fluidigm Real-Time PCR Analysis software.

26

Preparing BioMark System Results SINGuLAR supports both 48.48 and 96.96 Dynamic Array IFCs. The examples in this document primarily use

the 96.96 IFCs.

1. Process data using the Fluidigm Real-Time PCR Analysis software.

2. Export the data as heat map results (.csv files) as described earlier in this document.

Estimating the Limit of Detection (LoD) Ct Value Background information on LoD is available in Section 1 of this document.

To experimentally determine the LoD for greater accuracy in estimating the LoD Ct value, one can perform a

qPCR experiment on cDNA prepared from bulk RNA.

NOTE: Appendix 1 provides a detailed protocol for assay qualification. Please follow the setup carefully to ensure that assay data is formatted correctly for subsequent analysis.

Option 1: Experimental Determination of LoD To estimate LoD, type in the following command at the R command line.

fluidigmSC.LoD(number of replicates, number of samples, number of assays)

For example, as described in Appendix 1, for a run with six replicates of each dilution using 96 samples and

96 assays, your command would look like this:

fluidigmSC.LoD(6, 96, 96)

A file selection window will open. Select the .csv file that contains your assay qualification experiment.

SINGuLAR will return the estimated LoD Ct value.

Option 2: Iterative Determination of LoD If an assay qualification run has not been performed for all assays, we suggest using the conservative LoD Ct

value of 22 for the initial run. As the exact LoD Ct value is somewhat arbitrary and probably will not have a

drastic impact on the overall interpretation of a single-cell experiment, the user can start with a less

stringent LoD Ct value and then go back to decrease the value in one cycle step to see how this affects

27

results. To decrease stringency, the LoD Ct value can be increased to 23, 24, 25, and so on and the single-

cell experiments analyzed to see whether changing stringency has any effect on the conclusions.

Removing Failed Data Points and Low Expression Cells Genes that are not detected in any of the single cells in the study can be eliminated. Optionally, genes

expressed in fewer than 5% to 10% of the single cells can be eliminated. Sample and assay numbers and

experimental layouts are unique for each experiment and the decision to remove failed data points and low

expression cells must be made for each specific experiment. Appendices 2 and 3 cover these procedures in

detail.

Loading and Analyzing Data for Single-Cell Experiment Results with SINGuLAR

1. Navigate to the R command line.

2. Enter an R command in the following format:

fluidigmSC.analysis(number of assays, number of samples, LoD =22, violin=TRUE, HC=true,

PCA=number of principal components)

NOTE: Starting with two principal components is highly recommended.

If you are using 96.96 Dynamic Array IFCs, then you will enter:

fluidigmSC.analysis(96, 96, LoD =22, violin=TRUE, HC=TRUE, PCA=2)

3. A file selection window will open. Select the heatmap.csv file, exported from the Fluidigm Real-Time

PCR Analysis software, containing your single-cell experimental data.

NOTE: To analyze data from multiple Dynamic Array runs, please refer to the section on Analyzing Multiple Chip Runs with SINGuLAR.

Single-Cell Data Analysis Performed Using Fluidigm SINGuLAR Graphics displaying violin plots, a hierarchical clustering map, a scree plot ranking the importance of each

principal component axis and a principal component plot will be generated. The resulting single-cell qPCR

data will be expressed as log base 2 (Log2Ex) values. Log2Ex values are calculated as Log2Ex = Ct - LoD. If the

28

Log2Ex is negative, then it will be replaced with zero. The calculated Log2Ex values are exported to a .csv

file named Log2Ex_data.csv and saved in the working directory that you set for this SINGuLAR session. Gene

names will appear in Row 1 and sample names in Column A, in the order they were entered in the Fluidigm

Real-Time PCR Analysis software.

Figure 14: Spreadsheet exported as .csv file

Violin Plots Violin plots display the distribution and frequency of Log2Ex values. Genes and assays in the plot are

arranged in decreasing order of standard deviation of the Log2Ex values.

29

Figure 15: Violin plots generated in R

To save the violin plot or to copy it to another location, right-click on the plot within the R window.

Hierarchical Clustering SINGuLAR performs unbiased hierarchical clustering (HC) on your data and presents it as a heat map. The

reordered data are exported to a .csv file named Hierarchical_clustering_sorted_data.csv and saved in the

working directory that you set up for this SINGuLAR session.

Figure 16: Sample heat map

30

To save the HC heat map or to copy it to another location, right-click on it within the R window.

Principal Component Analysis (PCA) The PCA algorithm reduces the dimensionality of a data set by transforming it into a new set of uncorrelated

variables with decreasing degrees of variability. The uncorrelated variables are called principal components.

The first principal component explains the most variation in the data set, indicating highest amount of

variability among the samples. Each succeeding component, in turn, explains the next highest variance for

the data under the constraint.

SINGuLAR produces two plots about the principal components: a PCA scree plot and a scatter plot.

The scree plot displays the first ten PC scores, the height of each bar indicating the PC score. This provides

a quick way to determine the number of principal components to use. For example, in the scree plot in

Figure 17, you can see that there is a large height difference between the second and third bars, indicating

that the first two principal components can be used and they will contain most of the original data variance.

Once the number of principal components has been identified from the scree plot, the command can be

repeated using that number.

The scatter plot graphs each principal component score on a separate axis. To find the label for any axis

within the plot, trace that axis outward until the PCA score label is found.

Figure 17: Scree and scatter plots

31

The PC scores for all samples for the first 10 principal components are exported to a file named

PCA_rotated_data.txt and saved in the working directory that you set for this SINGuLAR session. The file

should subsequently be opened in Microsoft Excel. To save the scatter plot or to copy it to another location,

right-click on it within the R window.

Loading and Individually Analyzing Data for Single-Cell Experiments SINGuLAR enables you to perform several data analyses with a single command but also permits the flexibility to run the same analyses individually.

To Calculate Log2Ex 1. To express your single-cell data in log base 2, type in the following command at the R command line.

fluidigmSC.analysis(number of assays, number of samples, LoD=22)

2. A file selection window will open. Select the .csv file that contains your single-cell experiment. SINGuLAR will return your data in log base 2.

To Generate a Violin Plot 1. To plot your gene expression data as a violin plot, type in the following command at the R command

line.

fluidigmSC.analysis(number of assays, number of samples, LoD=22, violin=TRUE)

2. A file selection window will open. Select the .csv file that contains your single-cell experiment. SINGuLAR will generate a violin plot of your data.

To Generate a Hierarchical Cluster Heat Map 1. To perform hierarchical clustering on your gene expression data, type in the following command at

the R command line.

fluidigmSC.analysis(number of assays, number of samples, LoD=22, HC=TRUE)

2. A file selection window will open. Select the .csv file that contains your single-cell experiment. SINGuLAR will generate a hierarchical cluster heat map.

32

To Perform PCA 1. To perform PCA on your gene expression data, type in the following command at the R command

line.

fluidigmSC.analysis(number of assays, number of samples, LoD=22, PCA=number of principal components to plot)

2. A file selection window will open. Select the .csv file that contains your single-cell experiment. SINGuLAR will generate scree and scatter plots for your data.

Analyzing Multiple Chip Runs with SINGuLAR Samples from different Dynamic Array IFC runs can be analyzed together. SINGuLAR will discard assays that

differ between experiments and will analyze only those assays that are common in all the experiments.

NOTE: Please ensure that every sample name is unique: no two names should match, even if they are the same sample from different runs. For example, if you have three runs of sample A, label them sampleA-1, sampleA-2, and sampleA-3. It is also helpful to name .csv files so that their filenames indicate the number of samples and the number of assays in the export.

Duplicate sample names will cause an error in the R scripts.

1. To perform the single-cell experiment analysis on combined data from multiple Dynamic Array IFC

runs, type the following command at the R command line:

fluidigmSC.analysis(number of assays, number of samples, LoD =22, expt=number of data sets,

violin=TRUE, HC=TRUE, PCA=number of principal components)

2. A file selection window will open. Select all the .csv files that contain your single-cell experiment

data. For example, if you have the following setup:

Number of Assays Number of Samples

Run 1 96 96

Run 2 96 90

Run 3 96 72

33

Then you would type the following R command:

fluidigmSC.analysis(c(96,96,96), c(96,90,72), LoD =22, expt=3, violin=TRUE, HC=TRUE, PCA=2)

Identifying Points in the PCA Graph

1. Specify two PCA components that you are interested in and type a locate command in the R console. If, for example, you are interested in components 1 and 2, you will type:

locate

34

Section 3

Appendices

35

Appendix 1: Protocol for Qualification of Assays

A detailed protocol is available in Appendix B of the Fluidigm Real-Time PCR Analysis Software User Guide (PN 68000088).

Determining Limit of Detection Threshold Cycle (LoD Ct) Value Using All Assays To estimate an LoD Ct value, six replicates are run of each dilution sample.

For each assay, a preliminary LoD Ct is determined by taking the average Ct for the most dilute

sample concentration that has positive amplification plots for all six replicates.

A stringent LoD Ct value would be the Ct corresponding to five target molecules per reaction chamber. At

this low concentration, there is considerable stochastic noise due to the Poisson distribution that affects

detection and actual Ct value (see Figure 9 and its accompanying explanation). The goal therefore is to

estimate a reasonable LoD Ct value using six replicates without precisely determining the Ct corresponding

to five target molecules per reaction chamber.

Concentration expressed as average target molecules per reaction chamber

1 2 3 4 5 6 7 8

Probability that all six replicates have a positive amplification plot

0.064 0.418 0.736 0.895 0.960 0.985 0.995 0.998

Table 2: Probability that all six replicates have positive amplification plots

Thus, the preliminary LoD Ct values determined for each assay probably correspond to concentrations

ranging from two to 10 target molecules per reaction chamber. The overall LoD Ct is then selected as the

median of all the preliminary LoD Ct values rounded up to the next highest whole cycle. Because of the

approximate nature of this LoD Ct value, it may be used for any subsequent assays that are used even if they

were not run in this experiment.

Finally, the exact LoD Ct value is somewhat arbitrary and probably will not have a drastic effect on the

interpretation of a single-cell experiment. As discussed above, this can be tested by first using the stringent

36

LoD Ct value described here and then going back and increasing the value in one-cycle increments and

seeing how this affects the Log2Ex results.

Preamplification 1. Prepare the following mixture:

8 L 2.5 ng/L Biochain Human Universal cDNA (PN C4234565-R) or appropriate cDNA standard

2 L 500 nM each PreAmp Primers (pool of all assays)

10 L 2x AB TaqMan PreAmp Master Mix (PN 4391128)

2. Transfer the mix to the thermal cycler and run the following protocol:

Cycle Step Temperature Time (minutes:seconds) 1 (1X) Step 1 95 C 10:00

2 (14X) Step 1 95 C 00:15

Step 2 60 C 04:00

3 (1X) Step 1 4 C hold

3. Prepare the following mixture:

2 L 20 units/L Exonuclease I (New England BioLabs, PN M0293L)

1 L 10X Exonuclease I Reaction Buffer

7 L H2O

4. Add 8 L of this mixture to the preamplified sample.

5. Transfer to the thermal cycler and run the following protocol:

Cycle Step Temperature Time (minutes:seconds) 1 (1X) Step 1 37 C 30:00

2 (1X) Step 1 80 C 15:00

3 (1X) Step 1 4 C hold

6. Add 72 L TE (10 mM Tris, pH 8.0, 1.0 mM EDTA) (TEKnova, PN T0224).

7. Store at -20 C.

Preparation of 1:2 Dilutions 1. Prepare a mixture of 1560 L TE + 40 L 10% Tween-20. 2. Prepare the following dilutions in 1.5 mL tubes, vortexing and centrifuging after each dilution.

37

Table 3: Dilution Table

3. Transfer the samples to 96-well plates for ease of loading into IFCs. 4. Store at -20 C.

qPCR Detection 1. Prime the chip.

2. Prepare the following mixture:

420 L 2X SsoFastTM EvaGreen Supermix with Low ROX

42 L 20X DNA Binding Dye Sample Loading Reagent 7 L H2O

18 L H2O

3. Add 20 L to each well of 16 wells, the first two columns of the 96-well plate.

4. Add 15 L of diluted sample to each well.

5. Vortex gently and centrifuge.

6. Mix 0.3ul assays (100uM each combined F+R primers) with 2.7uL DNA suspension buffer (teknova, PN)

and 3ul Assay loading reagent in a 96-well plate.

7. Dispense 5 L of DELTAgene Assays to detector inlets of the 96.96 IFC.

8. Dispense 6 5 L of each dilution sample + SsoFast MM to sample inlets of the 96.96 array.

9. Load the chip.

10. Run GE Fast 96x96 PCR+Melt v2.pcl

38

Segment Type Temperature (C) Duration (seconds)

BioMark HD Ramp Rate

(C/s)

1 Thermal Mix

70 2400 5.5

60 30 5.5

2 Hot Start 95 60 5.5

3 PCR (30 Cycles)

96 5 5.5

60 20 5.5

4 Melting Curve

60-95 1C / 3 seconds

Fluidigm DELTAgene Assay Qualification

NOTE: DELTAgene assays are DNA binding dye-based detection assays. If you are using TaqMan assays, please follow the assay qualification procedure at http://tinyurl.com/ctuavdx. For probe-based assays, use the Auto (Detector) Ct Threshold Method.

Run the First Chip 1. Use the protocol for assay qualification described earlier.

2. Annotate samples and detectors in the Sample Setup and Detector Setup windows, respectively.

3. Analyze the data using the Linear (Derivative) Baseline Correction Method and the Auto (Global) Ct Threshold Method.

Set the Tm Range for Each Assay The BioMark HD system allows users to identify and eliminate data from non-specific amplification, thereby

improving specificity and sensitivity. This is done by adjusting the Tm window in the data analysis software.

NOTE: The Fluidigm Real-Time PCR Analysis Software User Guide, downloadable from http://www.fluidigm.com/product-documents.html, provides a detailed procedure for Tm range selection. Be sure to select the Linear Derivative with Auto Global options.

39

Figure 18: Selecting the Tm range

Export Detector.plt 1. From the Detector Setup window, export the Detector.plt file. Use a filename appropriate for the set

of assays being analyzed. The Tm range information gets retained as part of the Detector.plt file.

2. Later, when you use the same set of assays to analyze single cells, use the Import button to import

the Detector.plt file. This ensures that assay information gets added to the chip run and that Tm

ranges selected in the qualification run are automatically applied to the single-cell data.

Export Heat Map Results

Save the chip run file and navigate to File > Export to export the Ct data. Heat map results are

exported to Microsoft Excel in comma-delimited (.csv) format.

Run Second Chip with Single-Cell Samples

1. Flow-sort and process cells, or use the C1 system, and run single cells on a 96.96 Dynamic Array IFC

following the guidelines in Appendix A of the Fluidigm Real-Time PCR User Guide, PN 68000088.

NOTE: This analysis uses 96 single-cell samples as an example. It is often useful to include some

control samples on the chip, but that topic will be discussed in subsequent documentation.

40

2. In Sample Setup, annotate the sample information.

3. In Detector Setup, import the detector.plt file generated in the assay qualification run. This will

bring in the Tm range for each assay.

4. Analyze the file once again to incorporate the sample and assay information as part of the chip run

file. To do this:

Make sure the Baseline Correction Method is still set to Linear (Derivative).

Make sure the Ct Threshold Method is still set to Auto (Global).

Click Analyze.

5. Save the chip run file.

6. Navigate to File > Export to export the Ct data to Microsoft Excel as Heat Map Results. The file will

be in .csv format.

41

Appendix 2: Removing Data Failed by Fluidigm Real-Time PCR Analysis Software

Although the Fluidigm Real-Time PCR Analysis software fails reactions with an improper Tm , it does not

change the Ct value determined from the amplification plot.

To eliminate Ct values for reactions failed due to Tm:

1. Open the Heat Map Results file in Excel.

2. Copy the sample information in cells A113:B208 to A213:B308.

3. Enter the formula =IF(C113="Pass",C13,999) in cell C213.

NOTE: The 999 in the formula is because of the fact that Fluidigm Real-Time PCR Analysis software reports a Ct value of 999 for any reaction in which a positive amplification plot is not detected.

42

4. Copy the formula in cell C213 to fill matrix C213:CT308 5. Save the file in .xls or .xlsx format.

43

Appendix 3: Eliminating Low-Expressing Cells from Subsequent Analysis

One way to eliminate cells or genes from subsequent analysis is to include at least two highly-expressed

control genes in the set of assays used to interrogate the cells. To do this:

1. Select at least two control genes that are highly expressed and are not expected to be differentially

expressed in the cells being studied.

2. Calculate the Log2Ex values for all genes and observe the expression histograms of the control genes

to confirm that their transcript distribution is monophasic.

3. Calculate the median and standard deviation for the control genes across all the single cells.

4. For each control gene, determine a cutoff Ct value by calculating value for the median Ct and

subtracting three times the standard deviation of Ct values for that gene.

5. If the measured Cts are lower than the cutoff Cts for at least two control genes, eliminate that cell

from further analysis.

6. Replace the heatmap in the .csv export file from the Biomark with this new heatmap (without these

samples) as it can then be loaded into the SINGuLAR package.

7. Save the file.

44

Appendix 4: Normalizing Using Median Log2Ex

NOTE: Sample and assay numbers and experimental layouts are unique to each experiment.

1. Find the median of all Log2Ex values for each sample. To do this, use the command:

=IFERROR(MEDIAN(All Log2Ex values for a single sample), )

2. Compute the average of all sample Log2Ex median values.

3. Calculate the difference in the median of each sample and the average of medians and add that value

(whether positive or negative) to every Log2Ex value for that sample. This command is:

=IFERROR(Individual Log2Ex value for a sample - (Sample Median Log2Ex Value - Avg of Medians),0)

These median normalized Log2Ex values can then be copied and pasted into the .csv export file from the

BioMark and saved to be loaded into the SINGuLAR package.

45

Appendix 5: A Note on the Optimal Number of Cycles Needed for Preamplification

In the single-cell gene expression workflow, the qPCR reactions are preceded by preamplification of

cDNA. Statistically, 18-20 cycles of preamplification will result in an average of five copies of target per

chamber from a single copy of cDNA. Preamplification can have efficiencies close to 100% as reported by

Devonshire et al. (2011).

Preamplification affects the limit of detection. For Dynamic Array IFCs, five target molecules per

reaction chamber correspond to 625 molecule/L in the 48.48 IFC and 730 molecule/L in the 96.96

IFC.

46

References Aguilo, F., S. Avagyan, A. Labar, A. Sevilla, D. F. Lee, P. Kumar, I. R. Lemischka, B. Y. Zhou, and H. W. Snoeck (2011) Prdm16 is a physiologic regulator of hematopoietic stem cells, Blood 117:5057-5066.

Bengtsson, M., A. Sthlberg, P. Rorsman, and M. Kubista (2005) Gene expression profiling in single cells from the pancreatic islets of Langerhans reveals lognormal distribution of mRNA levels, Genome Research 15:1388-1392.

Chubb, J. R., T Trcek, S. M. Shenoy, and R. H. Singer (2006) Transcriptional pulsing of a developmental gene, Current Biology 16:1018-1025.

Dalerba, P. et al. (2011) Single-cell dissection of transcriptional heterogeneity in human colon tumors, Nat Biotechnol 29:1120-1127.

Devonshire, A. S., R. Elaswarapu, and C. A. Foy (2011) Applicability of RNA standards for evaluating RT-qPCR assays and platforms, BMC Genomics 12:118- 127.

Diehn, M. et al. (2009) Association of reactive oxygen species levels and radioresistance in cancer stem cells, Nature 458:780-783.

Flatz, L. et al. (2011) Single-cell gene-expression profiling reveals qualitatively distinct CD8 T cells elicited by different gene-based vaccines, Proc Natl Acad Sci USA 108:5724-5729.

Guo, G., M. Huss, G. Q. Tong, C. Wang, L. L. Sun, N. E. Clarke, and P. Robson (2010) Resolution of cell fate decisions revealed by single-cell gene expression analysis from zygote to blastocyst, Developmental Cell 18:675-685.

Livak, K. J. and T. D. Schmittgen (2001) Analysis of relative gene expression data using real-time quantitative PCR and the 2-CT method, Methods 25:402- 408.

Pang, Z. P. et al. (2011) Induction of human neuronal cells by defined transcription factors, Nature 476:220-223.

Raj, A., C. S. Peskin, D. Tranchina, D. Y. Vargas, and S. Tyagi (2006) Stochastic mRNA synthesis in mammalian cells, PLoS Biol 4:e309.

Vandesompele, J., K. De Preter, F. Pattyn, B. Poppe, N. Van Roy, A. De Paepe, and F. Speleman (2002) Accurate normalization of real-time quantitative RT- PCR data by geometric averaging of multiple internal control genes, Genome Biology 3:research0034.1-research0034.11.

Vincent, J. J. et al. (2011) Single cell analysis facilitates staging of Blimp1- dependent primordial germ cells derived from mouse embryonic stem cells, PLoS ONE 6:e28960.

Acknowledgements

Fluidigm gratefully acknowledges the pioneering contributions of Dr. Paul Robson to single cell data analysis; the use of violin plots, principal component analysis, and unsupervised clustering was adopted from his work. We would also like to thank Dr. Robson and the Genome Institute of Singapore for providing the single cell gene expression data used in the SINGuLAR Practice Sets.

47

World Headquarters 7000 Shoreline Court, Suite 100 South San Francisco, CA 94080 USA Tel: 650-266-6000 Fax: 650-871-7152 Fluidigm Europe, BV Parnassustoren Locatellikade 1, 1076 AZ Amsterdam Netherlands Tel: +33 (1) 60 92 42 40 Fax: +31 (0) 20 203 1111 Fluidigm Japan KK Level 5, Ginza TK Building 1-1-7 Shintomi Chuo-ku, Tokyo 104-0041 Japan Office: +81335552351 Fax: +8133552353 Fluidigm Singapore PTE Ltd Block 1026 Tai Seng Avenue #07-3532 Singapore 534413 Office: +6568587316 Fax: +6562825531

Technical Support Email: [email protected]

Phone in United States: 1.866.FLUIDLINE (1.866.358.4354)

Outside the United States: 650.266.6100 On the Internet: www.fluidigm.com/support

Visit our website at www.fluidigm.com

PN 100-5066, Rev. B1

Purpose of this DocumentThe Nature of Single-Cell TranscriptionTranscriptional Bursting in Single CellsReplicatesIdentification and Use of Limit of Detection (LoD) and Log2ExLimit of DetectionDetection Limit of the qPCR Reaction

Qualification of Assays Prior to Single-Cell ExperimentsElimination of Cells or Genes from Subsequent AnalysisNormalizationSecondary AnalysisInstalling RInstalling RInstalling SINGuLAR

Creating the SINGuLAR Directory for Data AnalysisPreparing BioMark System ResultsEstimating the Limit of Detection (LoD) Ct ValueOption 1: Experimental Determination of LoDOption 2: Iterative Determination of LoD

Removing Failed Data Points and Low Expression CellsLoading and Analyzing Data for Single-Cell Experiment Results with SINGuLARSingle-Cell Data Analysis Performed Using Fluidigm SINGuLARViolin PlotsHierarchical ClusteringPrincipal Component Analysis (PCA)

Loading and Individually Analyzing Data for Single-Cell ExperimentsTo Calculate Log2ExTo Generate a Violin PlotTo Generate a Hierarchical Cluster Heat MapTo Perform PCA

Analyzing Multiple Chip Runs with SINGuLAR/Appendix 1: Protocol for Qualification of AssaysDetermining Limit of Detection Threshold Cycle (LoD Ct) Value Using All AssaysPreamplificationPreparation of 1:2 DilutionsqPCR DetectionFluidigm DELTAgene Assay QualificationRun the First ChipSet the Tm Range for Each AssayExport Detector.pltExport Heat Map Results

Run Second Chip with Single-Cell SamplesAppendix 2: Removing Data Failed by Fluidigm Real-Time PCR Analysis SoftwareAppendix 3: Eliminating Low-Expressing Cells from Subsequent AnalysisAppendix 4: Normalizing Using Median Log2ExAppendix 5: A Note on the Optimal Number of Cycles Needed for PreamplificationReferencesAcknowledgements

Date post:	26-Nov-2015
Category:	Documents
Upload:	kartik-soni
View:	208 times
Download:	7 times

Fluidigm Singular Analysis

Documents