+ All Categories
Home > Documents > Fluidigm Singular Analysis

Fluidigm Singular Analysis

Date post: 26-Nov-2015
Category:
Upload: kartik-soni
View: 208 times
Download: 7 times
Share this document with a friend
Description:
free
Popular Tags:
47
SINGuLAR TM Analysis Toolset User Guide PN 100-5066 B1
Transcript
  • SINGuLARTM Analysis Toolset

    User Guide

    PN 100-5066 B1

  • Copyright 2013 Fluidigm Corporation. All rights reserved.

    Limited License for SINGuLAR Analysis Toolset The SINGuLAR Analysis Toolset is a shared-source, proprietary data analysis resource for Fluidigm customers interested in analyzing or developing software for single-cell gene expression data generated on Fluidigm technology . It is comprised of unsupported software development resources, including R-scripts, documentation and reference data. Registered users of the SINGuLAR Analysis Toolset may use the code contained in this file in accordance with the terms set forth in sections 1 through 8 below. You may register to use the toolset at the following address: http://www.fluidigm.com/singular-sc-analysis-toolkit-request.html. Unregistered users or users whose registration has not been confirmed with a receipt at the aforementioned website have no rights or permission to use this code.

    1. Use of the code in source and binary forms, with or without modification is permitted solely in accordance with section 3 below. 2. Redistribution of the code in source and binary forms, with or without modification is permitted only to employees and agents of entities

    named as registered users of the SINGuLAR Analysis Toolset. Redistribution, whether in source or binary form must include this license statement.

    3. Any use must be in conjunction with a Fluidigm product. Any use with a Fluidigm product may also be in conjunction with data from any source, including products from other vendors. In any case, the code may not be used in conjunction with any product similar to the Fluidigm BioMark Real-Time PCR System that is made by another entity.

    4. Any redistribution and use shall be in accordance with the laws and export regulations of the United States of America. Under no circumstances shall code be distributed to or used by persons listed on the Denied Persons List maintained by the United States Department of Commerce, or be distributed to or used or executed in a country listed on the Export Control List, List of Extensively Embargoed Countries, or List of Targeted Sanctions Countries and Territories maintained by the United States Department of Commerce; appropriate measures shall be taken to ensure that recipients will also refrain from distribution to such parties.

    5. Fluidigm will not provide, and is not responsible for providing any end-user support. 6. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES,

    INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

    7. This license, and all matters relating to use of the SINGuLAR Analysis Toolset shall be governed by and interpreted in accordance with the law of the State of California except for its choice of law rules. For any disputes arising out of this Agreement, the parties consent to the personal and exclusive jurisdiction of, and venue in, the state and federal courts within San Mateo County, California.

    8. This license constitutes the entire agreement between you and Fluidigm Corporation. This license may only be amended or supplemented by a writing that refers explicitly to this Agreement and that is signed by duly authorized representatives of both parties.

    Information in this manual is subject to change without notice. Fluidigm assumes no responsibility for any errors or omissions. In no event shall Fluidigm be liable for any damages in connection with or arising from the use of this manual. Fluidigm, the Fluidigm logo, BioMark, C1, DELTAgene, Dynamic Array, FC1, and SINGuLAR are trademarks or registered trademarks of Fluidigm Corporation in the U.S. and/or other countries. Contacting Fluidigm By phone: In the United States: 1.866.FLUIDLINE (1.866.358.4354) Outside the United States: +1.650.266.6100 On the Internet: www.fluidigm.com/support; [email protected]

    Fluidigm Corporation 7000 Shoreline Court, Suite 100 South San Francisco, CA 94080

  • SINGuLARTM Analysis Toolset

    User Guide

    PN: 100-5066 B1

    Table of Contents

    Section 1: Single-Cell Data Analysis

    Purpose of this Document ................................................................................................... 7

    The Nature of Single-Cell Transcription ................................................................................... 7

    Transcriptional Bursting in Single Cells ................................................................................ 9

    Replicates .................................................................................................................... 10

    Identification and Use of Limit of Detection (LoD) and Log2Ex ....................................................... 12

    Limit of Detection ........................................................................................................... 14

    Detection limit of the qPCR reaction .................................................................................. 15

    Qualification of Assays Prior to Single-Cell Experiments ............................................................... 16

    Elimination of Cells or Genes from Subsequent Analysis ............................................................... 17

    Normalization ................................................................................................................ 18

    Secondary Analysis .......................................................................................................... 19

    Section 2: The SINGuLAR Workflow

    Installing R and SINGuLAR .................................................................................................. 22

    Installing R ................................................................................................................. 23

    Installing SINGuLAR ....................................................................................................... 24

    Creating the SINGuLAR Directory for Data Analysis .................................................................... 25

    Preparing BioMark System Results ........................................................................................ 26

    Estimating the Limit of Detection (LoD) Ct Value ....................................................................... 26

    Option 1: Experimental Determination of LoD ....................................................................... 26

    Option 2: Iterative Determination of LoD ............................................................................. 26

    Removing Failed Data Points and Low Expression Cells ............................................................... 27

    Loading and Analyzing Data for Single-Cell Experiment Results with SINGuLAR ................................... 27

    Single-Cell Data Analysis Performed Using Fluidigm SINGuLAR ....................................................... 27

    Violin Plots ................................................................................................................ 28

    Hierarchical Clustering .................................................................................................. 29

  • Principal Component Analysis (PCA) ................................................................................... 30

    Loading and Individually Analyzing Data for Single-Cell Experiments ............................................... 31

    To Calculate Log2Ex ...................................................................................................... 31

    To Generate a Violin Plot ............................................................................................... 31

    To Generate a Hierarchical Cluster Heat map ....................................................................... 31

    To Perform PCA ........................................................................................................... 32

    Analyzing Multiple Chip Runs with SINGuLAR ........................................................................... 32

    Section 3: Appendices

    Appendix 1: Protocol for the Qualification of Assays .................................................................. 35

    Appendix 2: Removing Data Failed by Fluidigm Real-Time PCR Analysis Software ................................ 41

    Appendix 3: Eliminating Low-Expressing Cells from Subsequent Analysis ........................................... 43

    Appendix 4: Normalizing Using Median Log2Ex .......................................................................... 44

    Appendix 5: A Note on the Optimal Number of Cycles Needed for Preamplification ............................. 45

    References ................................................................................................................... 46

  • Table of Figures

    Figure 1: The single-cell workflow ......................................................................................... 7

    Figure 2: ActB expression data; Fluidigm study ......................................................................... 8

    Figure 3: Data from Fluidigm experiment showing large fold-differences ........................................... 9

    Figure 4: Single-cell standard deviations ................................................................................ 10

    Figure 5: PCA showing subpopulations ................................................................................... 11

    Figure 6: Calculating LoD and Log2Ex .................................................................................... 13

    Figure 7: Comparison of IER3 transcripts ................................................................................ 13

    Figure 8: Compare Log2Ex levels of 10 genes in 75 single cells ....................................................... 14

    Figure 9: Poisson distribution at average of 5 targets/chamber ..................................................... 15

    Figure 10: Cutoff Ct three standard deviations below mean .......................................................... 18

    Figure 11: Example where normalization does not greatly affect data analysis ................................... 19

    Figure 12: The Analysis workflow ......................................................................................... 22

    Figure 13: The SINGuLAR workflow ....................................................................................... 23

    Figure 14: Spreadsheet exported as .csv file ............................................................................ 28

    Figure 15: Violin plots generated in R .................................................................................... 29

    Figure 16: Sample heat map ............................................................................................... 29

    Figure 17: Scree and scatter plots ........................................................................................ 30

    Figure 18: Selecting the Tm range ........................................................................................ 39

  • 6

    Section 1

    Single-Cell Data Analysis

  • 7

    Purpose of this Document Single-cell researchers use the Fluidigm BioMark System to measure gene expression levels for up to

    hundreds of genes in hundreds to thousands of samples. This document is a practical guide on minimum steps

    in using the BioMark System to obtain single-cell gene expression data. Starting with background material on

    the nature of single-cell transcription, it takes the reader through a tutorial of data collection, preparation,

    and analysis. Fundamental steps in the single-cell workflow are:

    Figure 1: The single-cell workflow

    This document takes users through one particular path of the latter half of the single-cell workflow: qPCR

    detection, primary data processing, and secondary data analysis. The choices available at each step lie

    beyond the scope of this document but will provide topics for subsequent documentation.

    The Nature of Single-Cell Transcription Bengtsson et al. (2005) were among the first to use qPCR to quantify transcripts in single cells. They

    measured gene expression levels of five genes in individual cells from mouse pancreatic islets and found

    that the transcript levels of the different genes were lognormally distributed. Since a lognormal

    distribution is characterized by its geometric mean rather than its arithmetic mean, there are profound

    implications for the comparison of single-cell data to population data. In a lognormal distribution, the

    average expression level (arithmetic mean) observed for a population of cells gets strongly biased by a

    few cells with a very high number of transcripts. Therefore, the average expression level does not reflect

    the expression level in a typical cell. The paper concluded, Accordingly, it may not be valid to

    extrapolate results of gene expression measurements on cell populations to the single-cell level.

  • 8

    The lognormal distribution means that data from single eukaryotic cells show cell-to-cell variation in

    mRNA amounts that ranges from 10-fold to 1,000-fold depending on the gene and type of cell. In the study

    cited, the levels of ActB transcript varied approximately 1,000-fold among the single cells analyzed. A

    Fluidigm replication of the study is shown in Figure 2 below.

    Figure 2: ActB expression data; Fluidigm study

    Fluidigm also ran a single-cell experiment on a 96.96 Dynamic Array integrated fluidic circuit (IFC), but

    analyzed a much larger number of genes. Data for 77 genes in 87 single human K562 cells showing large

    fold-differences between individual cells are presented in Figure 3 below.

  • 9

    Figure 3: Data from Fluidigm experiment showing large fold-differences

    The Fluidigm experiment determined the number of genes exhibiting differential expression between

    individual cells, depicted here as fold-change (upper X-axis labels) and equivalent Ct values (lower X-axis

    labels). These results indicate that 10- to >500-fold variation in transcript levels should be expected when

    comparing individual cells.

    Transcriptional Bursting in Single Cells

    Data such as these, collected by several researchers, have led to the model that eukaryotic transcripts are

    produced in short but intense bursts interspersed with intervals of inactivity during which transcript levels

    decay. Raj et al. (2006) directly observed intrinsically random bursting of mRNA for two genes in CHO

    cells. Chubb et al. (2006) also observed this burst-and-decay behavior for the dscA gene in living

    Dictyostelium cells. For this gene, they measured a mean burst duration of 5.2 minutes and a mean interval

    of inactivity (presumably mRNA decay) of 5.8 minutes, but there was a great deal of stochastic variation

    in each of these averages.

    This noise inherent in single-cell gene expression challenges conventional methods for obtaining and

    analyzing qPCR data. Factors such as replicates, data display, limits of detection, normalization, and

    univariate versus multivariate analysis need to be re-evaluated. Although one may think that this noise

  • 10

    precludes the ability to get useful information from single cells, the reality is quite the opposite. By

    acknowledging and addressing the intrinsic noise (using appropriate statistical analysis methods), single-

    cell gene expression profiling can provide biological insights that are simply not visible when one is

    averaging expression levels from hundreds or thousands of cells.

    Replicates

    Another way to assess the variation observed in single cells is to look at the standard deviation of various

    transcript levels in a population of single cells. Figure 4 uses data from the Fluidigm experiment described

    earlier (using K562 cells) to depict the standard deviations observed for 77 genes in a population of 87

    cells.

    Figure 4: Single-cell standard deviations

    Only two genes show a standard deviation of less than one cycle between single cells. For

    experiments run using bulk RNA on the BioMark System, the standard deviation observed for qPCR

    technical replicates is typically 0.16-0.25 cycle or less. Biological noise is thus greater than technical

    noise by a large amount. It is therefore better to focus on biological replicates rather than on

    technical replicates. Experimental bandwidth is thus better utilized by running more single-cell

    samples and by interrogating more genes than by running technical replicates of the single-cell

    samples or assays.

  • 11

    One way to restate the need for biological replicates is to say that data need to be collected from a

    statistically significant number of single cells in order to obtain reliable results. What is a

    statistically significant number of single cells? This is difficult to answer in absolute terms. Statistical

    significance depends not only on the number of cells, but also on other factors including the degree of

    variation within the population analyzed, the number of genes assayed, and the ability of those

    assays to differentiate the population variation.

    Basic statistics would indicate that for any single gene, a homogenous population can be

    characterized on the basis of 30 samples. Thus, if every subpopulation within a sample of single cells

    were represented by at least 30 cells, one would have reasonable confidence that the experiment

    would robustly identify all subpopulations. This would mean that if one wanted to reliably identify a

    subpopulation that was 10% of the total population, 300 cells would need to be examined.

    In practice, subpopulations can be identified with fewer than 30 cells depending on the cells and

    genes being analyzed. Guo et al. (2010) analyzed 159 single cells from 64-cell stage mouse embryos,

    assaying 48 genes in each cell. A principal component analysis (PCA) from the study is shown in

    Figure 5.

    Figure 5: PCA showing subpopulations

  • 12

    From Guo et al. (2010). Image reprinted with permission from Developmental Cell.

    Guo et al. were able to clearly identify the epiblast (EPI) subpopulation, with only 17 cells in that

    subpopulation. They could do this because of the type of cells analyzed, the use of 48 genes, and the

    fact that those 48 genes revealed very distinct signatures between EPI, primitive endoderm (PE), and

    trophectoderm (TE) cells.

    Identification and Use of Limit of Detection (LoD) and Log2Ex

    When qPCR experiments are run on bulk RNA samples, the results are typically displayed as fold-

    change differences between samples for each individual gene and known controls. Because of the

    extensive normal variation in a given gene at the single-cell level, looking at fold changes between

    individual cells is potentially not very informative. A better approach may be to first assess the

    population behavior for each gene. By assessing which genes display a lognormal distribution within

    the cell population under investigation, this type of first-pass analysis can provide the first

    significant insight to the unique biology of the cell population and dictate further, more directed

    analyses. This is best done by looking at histograms that bin expression levels and display the number

    of cells in each bin. To generate such histograms, the expression for each gene must be comparable

    between different single-cell samples. One starts by calculating the limit of detection (LoD) and then

    computing Log2Ex values.

    Because of the lognormal distribution described by Bengtsson et al. (2005) and others, it is useful to

    view single-cell data as expression level above detection limit on a log scale. For qPCR data, it is

    convenient and appropriate to do this in log base 2 by defining the term Log2Ex:

    Log2Ex = LoD Ct Ct [Gene]

    If the value is negative, Log2Ex = 0

    Log2Ex represents transcript level above background expressed in log base 2. Conversion from a log

    scale to a linear scale can be accomplished by calculating 2^Log2Ex, which gives the fold change.

    These equations are expressed graphically in Figure 6. The value of each sample is subtracted from

    the LoD. In this example LoD = 22. Therefore, Ct values higher than 22 are assigned a Log2Ex value

    of 0.

  • 13

    Figure 6: Calculating LoD and Log2Ex

    The use of Log2Ex enables plotting the number of cells where the transcript level is at or below the

    detection limit. Figure 7 compares IER3 transcripts in 87 human K562 cells.

    Figure 7: Comparison of IER3 transcripts

    IER3 transcripts from 87 human K562 cells were plotted on a log (left) and linear (right) scale. No IER3 transcript was detected in 10 cells.

    To compare histograms for multiple genes, it is convenient to use violin plots, which are essentially

    histograms turned on their side and mirrored. Violin plots from Guo et al. (2010), Figure 8 below,

    compare 10 genes in 75 single cells derived from 16-cell stage mouse embryos:

  • 14

    Figure 8: Compare Log2Ex levels of 10 genes in 75 single cells

    Violin plots from Guo et al. (2010); Image reprinted with permission from Developmental Cell.

    The violin plots reveal that seven genes have unimodal distributions and three (Id2, Nanog, Sox2)

    have bimodal distributions. The unimodal distributions indicate no detectable variation other than

    intrinsic noise. The bimodal distributions indicate that these three genes are differentially expressed

    in at least two subpopulations within these 75 cells.

    The vertical position of each histogram indicates the relative expression level. For example, ActB

    has the highest expression level among these 10 genes.

    It is also possible to see that transcripts can have distributions of varying widths, distribution being

    an indicator of variation. For example, Pou5f1 has a much narrower distribution, or less variation on

    the Log2Ex axis, than Cdx2. This is because each gene has a characteristic transcriptional burst size,

    frequency, and decay rate.

    If the histogram indicates two or more subpopulations, it is now possible to get meaningful average

    fold change values. For the Id2 gene in the violin diagram, the median Log2Ex value is roughly 7.5

    for the higher expressing subpopulation and roughly 1.8 for the lower expressing subpopulation.

    Thus the Log2Ex between these two subpopulations is about 7.5 1.8 = 5.7 which corresponds to a

    fold difference of 2^5.7, or approximately 50, in expression levels, on average.

    Limit of Detection

    The Log2Ex calculation requires defining a limit of detection (LoD) Ct value. This raises the issue of

    defining the detection limit of qPCR. In fact, there are two separate questions:

    1. What is the detection limit of the qPCR reaction by itself?

  • 15

    2. What is the detection limit of the overall process? (going from single cell RNA cDNA preamplified cDNA qPCR reaction)

    Detection Limit of the qPCR Reaction

    Based on digital PCR results using well-performing assays, it is clear that a single target DNA molecule

    in a reaction chamber will generate a positive amplification plot. That is why the theoretical limit of

    PCR is one molecule. A more stringent definition of detection limit, however, would incorporate

    some indication of the confidence of detecting a target.

    If a number of identical PCR reactions are performed at an average concentration of one target DNA

    molecule per reaction chamber, then 37% of the reactions will not contain a single molecule. The

    chance of detection is therefore 63%. This effect can be calculated according to the Poisson

    distribution; there is a 37% likelihood that a molecule will not actually land in the chamber, and

    thus will not show a positive amplification plot.

    For stringent detection, at what concentration is there at least a 99% chance of generating a positive

    amplification plot? This occurs at an average concentration of five target molecules per reaction

    chamber as shown by the Poisson distribution in Figure 9.

    Figure 9: Poisson distribution at average of 5 targets/chamber

  • 16

    Thus, a stringent definition of LoD would be the value that corresponds to five targets per reaction

    chamber, which in turn corresponds to a >99% chance of detection with one single-cell replicate.

    This stringent definition minimizes the number of false negatives; however, it may exclude true

    positives. In other words, one can be very confident that a positive really is a positive, but some

    data may be excluded.

    To explore the effects of sensitivity on results, data can be analyzed using different values for LoD,

    ranging from stringent to relaxed. For example, the data used in the workflow section of this

    document indicates that 22 cycles is a stringent LoD Ct value. Thus, Log2Ex values could be

    calculated using LoD = 22, 23, 24, or 25, and each data set then analyzed to see if altering

    stringency impacts conclusions.

    In the single-cell gene expression workflow, qPCR reactions are preceded by preamplification of

    cDNA. Statistically, 18-20 cycles of preamplification will result in an average of five copies of target

    per chamber from a single copy of cDNA. Preamplification can have efficiencies close to 100%, as

    reported by Devonshire et al. (2011). More details on preamplification and its effect on target

    concentration are discussed in Section 3 (Appendices).

    The foregoing discussion indicates that the single-cell protocol should be fairly robust even if only a

    single cDNA molecule is generated in the reverse transcription reaction on the mRNA from a single cell.

    Of course, the overall limit of detection is critically dependent on the efficiency of the reverse

    transcriptase. Furthermore, this efficiency probably varies depending on the transcript and the

    location of the assay amplicon within the transcript. Although reverse transcriptase efficiency deserves

    closer scrutiny, it will not be explored here. Also, the overall availability of RNA after cell lysis will

    have an effect on the limit of detection for single-cell gene expression.

    Qualification of Assays Prior to Single-Cell Experiments There are two reasons to test assays on cDNA prepared from bulk RNA before embarking on

    analyzing single cells. First, when using DNA binding dye assays, such as DELTAgeneTM Assays, the

    data are used to determine the correct Tm range for the amplicon generated by each assay. For

    this purpose, it is best to use bulk RNA from the same or similar cells as the single cells to be

    studied, so that splice variants will be the same as in the single cells. If bulk samples are not

  • 17

    available, then appropriate tissue-specific or universal RNA or cDNA can be purchased from

    various vendors. Second, the data are used to estimate an LoD Ct value for use in data analysis.

    These two properties, Tm and LoD Ct, are characteristics of the qPCR assay and not of the reverse

    transcriptase step or preamplification step. Therefore, this qualification test is performed using

    dilutions of preamplified cDNA in order to focus on the qPCR assays.

    For the purpose of empirically estimating a LoD Ct value, six replicates of each dilution

    concentration are run. For each assay, a preliminary LoD Ct is determined by taking the average Ct

    for the most dilute sample that has positive amplification plots for all six replicates. Because of the

    approximate nature of this LoD Ct value, it is reasonable to use it for any additional primer pairs

    that are added to the experiment.

    The LoD Ct value is most drastically affected by platform. For any particular platform, however, the

    exact LoD Ct value is somewhat arbitrary and probably will not drastically impact the interpretation

    of a single-cell experiment. As discussed above, this can be tested by first using the stringent LoD Ct

    value, then increasing it in one-cycle increments and seeing how this affects the results.

    Elimination of Cells or Genes from Subsequent Analysis It can be difficult to decide which cells can be eliminated from analysis due to abnormally low

    expression. Using low (or no) expression of a single control gene is not a reliable metric for excluding

    cells from the data set because the level of expression of any single gene (including housekeeping

    genes) can vary widely between single cells. Using multiple control genes in single-cell experiments allows

    greater confidence in eliminating samples, as cells with low expression across several genes are likely to be

    abnormal.

    We suggest including three highly-expressed, monophasic control genes in the set of assays used to

    interrogate the cells. The standard deviation of the control genes can be calculated, as well as a

    cutoff Ct that is three standard deviations below the mean, as shown in Figure 10. Cells whose

    expression is below the cutoff Ct for at least two of the three control genes can be eliminated.

  • 18

    Figure 10: Cutoff Ct three standard deviations below mean

    Normalization

    The Ct method (Livak and Schmittgen, 2001) may not be best for identifying differences among the

    single cells being analyzed. Normalization should be considered a variable that can be tried to see if

    it has any significant effect on the analysis of the expression data. Normalizing to a single reference

    gene that is varying 10- to 1,000-fold at the single-cell level is generally not useful. Guo et al. (2010)

    normalized using the average of ActB and Gapdh Log2Ex values. One way that normalization might be

    beneficial is by reducing variation due to differing cell size. It is not necessary to normalize Log2Ex data

    on a per-cell basis. In fact, many single-cell publications have not used any cell-based normalization.

    Vandesompele et al. (2002) describe the geNorm method, a robust way to use multiple reference

    genes to determine a normalization factor.

    Figure 11 depicts an example where normalization does not seem to have much effect on data

    analysis. Guo et al. (2010) performed PCA on expression data from 159 single cells derived from 64-

    cell stage mouse embryos. Prior to the analysis, they normalized their data using the average of ActB

    and Gapdh Log2Ex values. Here, PCAs have been repeated using unnormalized data and median

    Log2Ex normalized data.

  • 19

    Figure 11: Example where normalization does not greatly affect data analysis

    From Guo et al. (2010)

    The distributions of single cells in these three plots do not seem to be significantly different, indicating

    that normalization would have little effect on data interpretation in this particular case.

    We suggest normalizing such that each cell has the same median Log2Ex value across all genes detected in

    that cell. This ensures that the normalization factor includes data from all genes in the study.

    Secondary Analysis

    Even if normalization issues are addressed by using data from multiple genes, as recommended earlier, the

    Ct method focuses on genes one at a time. With the expression of each gene varying 10- to 1,000-fold,

    it may be difficult to discern reliable patterns in data from any single gene. For lower expressed genes,

    analysis is complicated by the fact that a transcript may not be detected in a particular cell purely due to

    stochastic noise, not due to lack of expression. Rather, some form of multivariate analysis, such as

    hierarchical clustering or principal component analysis, will be more fruitful in identifying subpopulations

    with similar gene expression signatures.

    The purpose of this section is to focus on the minimum steps required to process single-cell data to make

    it ready for secondary analysis, rather than to explore all available methods of secondary analysis. In

    order to provide additional guidance, below is a tabulated list of published research that used the

    BioMark System to obtain single-cell gene expression data, and the secondary analysis methods that were

    used in each. They shed additional light on ways to analyze single-cell data for biological insight.

  • 20

    Field Violin Plots

    Plus/

    Minus

    Pairwise Correlation

    HC PCA LDA DTA JSD

    Buganim et al. 2012 Stem cells Guo et al. 2010 Developmental

    Biology Flatz et al. 2011 Immunology Dalerba et al. 2011 Cancer Pang et al. 2011 Neuroscience Vincent et al. 2011 Developmental

    Biology

    Aguilo et al. 2011 Stem cells Table 1: Comparison of secondary analysis methods in published research using BioMark for single-cell gene expression (HC =

    Hierarchical Clustering; PCA = Principal Component Analysis; LDA = Linear Discriminant Analysis; DTA = Decision Tree Analysis;

    JSD = Jensen-Shannon Divergence)

  • 21

    Section 2

    The SINGuLAR Workflow

  • 22

    Key steps in single-cell gene expression analysis are depicted in Figure 12 below. Two powerful tools, the

    Fluidigm Real-Time PCR Analysis Software and the SINGuLARTM package, are used in combination, either to

    process data or to perform the analysis.

    Figure 12: The Analysis Workflow

    The Fluidigm SINGuLAR Package

    SINGuLAR leverages Rs statistical computing capability to streamline data preparation and analysis. Among other things, the data processing ability of SINGuLAR enables users to:

    1. Estimate Limit of Detection (LoD) Ct values 2. Generate Log2Ex values

    For data analysis and representation, SINGuLAR permits users to:

    1. Create violin plots 2. Perform multivariate analyses such as hierarchical clustering and principal component analysis (PCA)

    Installing R

    NOTE: If you have already installed R and SINGuLAR, you can skip this section and proceed directly to creating the SINGuLAR directory for data analysis.

  • 23

    Figure 13: The SINGuLAR Workflow

    Installing R

    1. Download the latest version R for Windows. To do this go to http://www.r-project.org/ and

    download from the Berkeley CRAN mirror located at http://cran.cnr.Berkeley.edu.

  • 24

    2. Run the downloaded .exe file. A setup wizard will walk you through installation. Choose to install the

    base version only.

    Installing SINGuLAR

    1. Download fluidigmSC_.zip by logging in to the Fluidigm single-cell analysis tools

    web page.

    2. Open R. You will be taken to the R-GUI.

    3. From the menu bar select Packages > Install package(s) from local zip files and select the file

    named fluidigmSC_.zip.

    4. At the R command prompt, type

    library(fluidigmSC)

    5. Hit Enter and type

    fluidigmSC.firstrun()

    6. Select the nearest mirror to install additional packages and hit Enter. You will need to set the CRAN

    mirror for the session. Select the nearest mirror to reduce network load.

    7. To download from Berkeley, please select the USA(CA1) mirror. This is required to continue

    downloading. It ensures that you receive R updates and have access to online help.

  • 25

    8. The R GUI will display a series of messages. You can now proceed to create the SINGuLAR directory

    for data analysis.

    Creating the SINGuLAR Directory for Data Analysis 1. To load SINGuLAR, at the R command line, type

    library(fluidigmSC)

    2. Navigate to File > Change dir to set the working directory for this session.

    NOTE: Data files calculated by SINGuLAR will automatically get saved to this directory. The working directory could match the location of the single-cell data exported from the Fluidigm Real-Time PCR Analysis software.

  • 26

    Preparing BioMark System Results SINGuLAR supports both 48.48 and 96.96 Dynamic Array IFCs. The examples in this document primarily use

    the 96.96 IFCs.

    1. Process data using the Fluidigm Real-Time PCR Analysis software.

    2. Export the data as heat map results (.csv files) as described earlier in this document.

    Estimating the Limit of Detection (LoD) Ct Value Background information on LoD is available in Section 1 of this document.

    To experimentally determine the LoD for greater accuracy in estimating the LoD Ct value, one can perform a

    qPCR experiment on cDNA prepared from bulk RNA.

    NOTE: Appendix 1 provides a detailed protocol for assay qualification. Please follow the setup carefully to ensure that assay data is formatted correctly for subsequent analysis.

    Option 1: Experimental Determination of LoD To estimate LoD, type in the following command at the R command line.

    fluidigmSC.LoD(number of replicates, number of samples, number of assays)

    For example, as described in Appendix 1, for a run with six replicates of each dilution using 96 samples and

    96 assays, your command would look like this:

    fluidigmSC.LoD(6, 96, 96)

    A file selection window will open. Select the .csv file that contains your assay qualification experiment.

    SINGuLAR will return the estimated LoD Ct value.

    Option 2: Iterative Determination of LoD If an assay qualification run has not been performed for all assays, we suggest using the conservative LoD Ct

    value of 22 for the initial run. As the exact LoD Ct value is somewhat arbitrary and probably will not have a

    drastic impact on the overall interpretation of a single-cell experiment, the user can start with a less

    stringent LoD Ct value and then go back to decrease the value in one cycle step to see how this affects

  • 27

    results. To decrease stringency, the LoD Ct value can be increased to 23, 24, 25, and so on and the single-

    cell experiments analyzed to see whether changing stringency has any effect on the conclusions.

    Removing Failed Data Points and Low Expression Cells Genes that are not detected in any of the single cells in the study can be eliminated. Optionally, genes

    expressed in fewer than 5% to 10% of the single cells can be eliminated. Sample and assay numbers and

    experimental layouts are unique for each experiment and the decision to remove failed data points and low

    expression cells must be made for each specific experiment. Appendices 2 and 3 cover these procedures in

    detail.

    Loading and Analyzing Data for Single-Cell Experiment Results with SINGuLAR

    1. Navigate to the R command line.

    2. Enter an R command in the following format:

    fluidigmSC.analysis(number of assays, number of samples, LoD =22, violin=TRUE, HC=true,

    PCA=number of principal components)

    NOTE: Starting with two principal components is highly recommended.

    If you are using 96.96 Dynamic Array IFCs, then you will enter:

    fluidigmSC.analysis(96, 96, LoD =22, violin=TRUE, HC=TRUE, PCA=2)

    3. A file selection window will open. Select the heatmap.csv file, exported from the Fluidigm Real-Time

    PCR Analysis software, containing your single-cell experimental data.

    NOTE: To analyze data from multiple Dynamic Array runs, please refer to the section on Analyzing Multiple Chip Runs with SINGuLAR.

    Single-Cell Data Analysis Performed Using Fluidigm SINGuLAR Graphics displaying violin plots, a hierarchical clustering map, a scree plot ranking the importance of each

    principal component axis and a principal component plot will be generated. The resulting single-cell qPCR

    data will be expressed as log base 2 (Log2Ex) values. Log2Ex values are calculated as Log2Ex = Ct - LoD. If the

  • 28

    Log2Ex is negative, then it will be replaced with zero. The calculated Log2Ex values are exported to a .csv

    file named Log2Ex_data.csv and saved in the working directory that you set for this SINGuLAR session. Gene

    names will appear in Row 1 and sample names in Column A, in the order they were entered in the Fluidigm

    Real-Time PCR Analysis software.

    Figure 14: Spreadsheet exported as .csv file

    Violin Plots Violin plots display the distribution and frequency of Log2Ex values. Genes and assays in the plot are

    arranged in decreasing order of standard deviation of the Log2Ex values.

  • 29

    Figure 15: Violin plots generated in R

    To save the violin plot or to copy it to another location, right-click on the plot within the R window.

    Hierarchical Clustering SINGuLAR performs unbiased hierarchical clustering (HC) on your data and presents it as a heat map. The

    reordered data are exported to a .csv file named Hierarchical_clustering_sorted_data.csv and saved in the

    working directory that you set up for this SINGuLAR session.

    Figure 16: Sample heat map

  • 30

    To save the HC heat map or to copy it to another location, right-click on it within the R window.

    Principal Component Analysis (PCA) The PCA algorithm reduces the dimensionality of a data set by transforming it into a new set of uncorrelated

    variables with decreasing degrees of variability. The uncorrelated variables are called principal components.

    The first principal component explains the most variation in the data set, indicating highest amount of

    variability among the samples. Each succeeding component, in turn, explains the next highest variance for

    the data under the constraint.

    SINGuLAR produces two plots about the principal components: a PCA scree plot and a scatter plot.

    The scree plot displays the first ten PC scores, the height of each bar indicating the PC score. This provides

    a quick way to determine the number of principal components to use. For example, in the scree plot in

    Figure 17, you can see that there is a large height difference between the second and third bars, indicating

    that the first two principal components can be used and they will contain most of the original data variance.

    Once the number of principal components has been identified from the scree plot, the command can be

    repeated using that number.

    The scatter plot graphs each principal component score on a separate axis. To find the label for any axis

    within the plot, trace that axis outward until the PCA score label is found.

    Figure 17: Scree and scatter plots

  • 31

    The PC scores for all samples for the first 10 principal components are exported to a file named

    PCA_rotated_data.txt and saved in the working directory that you set for this SINGuLAR session. The file

    should subsequently be opened in Microsoft Excel. To save the scatter plot or to copy it to another location,

    right-click on it within the R window.

    Loading and Individually Analyzing Data for Single-Cell Experiments SINGuLAR enables you to perform several data analyses with a single command but also permits the flexibility to run the same analyses individually.

    To Calculate Log2Ex 1. To express your single-cell data in log base 2, type in the following command at the R command line.

    fluidigmSC.analysis(number of assays, number of samples, LoD=22)

    2. A file selection window will open. Select the .csv file that contains your single-cell experiment. SINGuLAR will return your data in log base 2.

    To Generate a Violin Plot 1. To plot your gene expression data as a violin plot, type in the following command at the R command

    line.

    fluidigmSC.analysis(number of assays, number of samples, LoD=22, violin=TRUE)

    2. A file selection window will open. Select the .csv file that contains your single-cell experiment. SINGuLAR will generate a violin plot of your data.

    To Generate a Hierarchical Cluster Heat Map 1. To perform hierarchical clustering on your gene expression data, type in the following command at

    the R command line.

    fluidigmSC.analysis(number of assays, number of samples, LoD=22, HC=TRUE)

    2. A file selection window will open. Select the .csv file that contains your single-cell experiment. SINGuLAR will generate a hierarchical cluster heat map.

  • 32

    To Perform PCA 1. To perform PCA on your gene expression data, type in the following command at the R command

    line.

    fluidigmSC.analysis(number of assays, number of samples, LoD=22, PCA=number of principal components to plot)

    2. A file selection window will open. Select the .csv file that contains your single-cell experiment. SINGuLAR will generate scree and scatter plots for your data.

    Analyzing Multiple Chip Runs with SINGuLAR Samples from different Dynamic Array IFC runs can be analyzed together. SINGuLAR will discard assays that

    differ between experiments and will analyze only those assays that are common in all the experiments.

    NOTE: Please ensure that every sample name is unique: no two names should match, even if they are the same sample from different runs. For example, if you have three runs of sample A, label them sampleA-1, sampleA-2, and sampleA-3. It is also helpful to name .csv files so that their filenames indicate the number of samples and the number of assays in the export.

    Duplicate sample names will cause an error in the R scripts.

    1. To perform the single-cell experiment analysis on combined data from multiple Dynamic Array IFC

    runs, type the following command at the R command line:

    fluidigmSC.analysis(number of assays, number of samples, LoD =22, expt=number of data sets,

    violin=TRUE, HC=TRUE, PCA=number of principal components)

    2. A file selection window will open. Select all the .csv files that contain your single-cell experiment

    data. For example, if you have the following setup:

    Number of Assays Number of Samples

    Run 1 96 96

    Run 2 96 90

    Run 3 96 72

  • 33

    Then you would type the following R command:

    fluidigmSC.analysis(c(96,96,96), c(96,90,72), LoD =22, expt=3, violin=TRUE, HC=TRUE, PCA=2)

    Identifying Points in the PCA Graph

    1. Specify two PCA components that you are interested in and type a locate command in the R console. If, for example, you are interested in components 1 and 2, you will type:

    locate

  • 34

    Section 3

    Appendices

  • 35

    Appendix 1: Protocol for Qualification of Assays

    A detailed protocol is available in Appendix B of the Fluidigm Real-Time PCR Analysis Software User Guide (PN 68000088).

    Determining Limit of Detection Threshold Cycle (LoD Ct) Value Using All Assays To estimate an LoD Ct value, six replicates are run of each dilution sample.

    For each assay, a preliminary LoD Ct is determined by taking the average Ct for the most dilute

    sample concentration that has positive amplification plots for all six replicates.

    A stringent LoD Ct value would be the Ct corresponding to five target molecules per reaction chamber. At

    this low concentration, there is considerable stochastic noise due to the Poisson distribution that affects

    detection and actual Ct value (see Figure 9 and its accompanying explanation). The goal therefore is to

    estimate a reasonable LoD Ct value using six replicates without precisely determining the Ct corresponding

    to five target molecules per reaction chamber.

    Concentration expressed as average target molecules per reaction chamber

    1 2 3 4 5 6 7 8

    Probability that all six replicates have a positive amplification plot

    0.064 0.418 0.736 0.895 0.960 0.985 0.995 0.998

    Table 2: Probability that all six replicates have positive amplification plots

    Thus, the preliminary LoD Ct values determined for each assay probably correspond to concentrations

    ranging from two to 10 target molecules per reaction chamber. The overall LoD Ct is then selected as the

    median of all the preliminary LoD Ct values rounded up to the next highest whole cycle. Because of the

    approximate nature of this LoD Ct value, it may be used for any subsequent assays that are used even if they

    were not run in this experiment.

    Finally, the exact LoD Ct value is somewhat arbitrary and probably will not have a drastic effect on the

    interpretation of a single-cell experiment. As discussed above, this can be tested by first using the stringent

  • 36

    LoD Ct value described here and then going back and increasing the value in one-cycle increments and

    seeing how this affects the Log2Ex results.

    Preamplification 1. Prepare the following mixture:

    8 L 2.5 ng/L Biochain Human Universal cDNA (PN C4234565-R) or appropriate cDNA standard

    2 L 500 nM each PreAmp Primers (pool of all assays)

    10 L 2x AB TaqMan PreAmp Master Mix (PN 4391128)

    2. Transfer the mix to the thermal cycler and run the following protocol:

    Cycle Step Temperature Time (minutes:seconds) 1 (1X) Step 1 95 C 10:00

    2 (14X) Step 1 95 C 00:15

    Step 2 60 C 04:00

    3 (1X) Step 1 4 C hold

    3. Prepare the following mixture:

    2 L 20 units/L Exonuclease I (New England BioLabs, PN M0293L)

    1 L 10X Exonuclease I Reaction Buffer

    7 L H2O

    4. Add 8 L of this mixture to the preamplified sample.

    5. Transfer to the thermal cycler and run the following protocol:

    Cycle Step Temperature Time (minutes:seconds) 1 (1X) Step 1 37 C 30:00

    2 (1X) Step 1 80 C 15:00

    3 (1X) Step 1 4 C hold

    6. Add 72 L TE (10 mM Tris, pH 8.0, 1.0 mM EDTA) (TEKnova, PN T0224).

    7. Store at -20 C.

    Preparation of 1:2 Dilutions 1. Prepare a mixture of 1560 L TE + 40 L 10% Tween-20. 2. Prepare the following dilutions in 1.5 mL tubes, vortexing and centrifuging after each dilution.

  • 37

    Table 3: Dilution Table

    3. Transfer the samples to 96-well plates for ease of loading into IFCs. 4. Store at -20 C.

    qPCR Detection 1. Prime the chip.

    2. Prepare the following mixture:

    420 L 2X SsoFastTM EvaGreen Supermix with Low ROX

    42 L 20X DNA Binding Dye Sample Loading Reagent 7 L H2O

    18 L H2O

    3. Add 20 L to each well of 16 wells, the first two columns of the 96-well plate.

    4. Add 15 L of diluted sample to each well.

    5. Vortex gently and centrifuge.

    6. Mix 0.3ul assays (100uM each combined F+R primers) with 2.7uL DNA suspension buffer (teknova, PN)

    and 3ul Assay loading reagent in a 96-well plate.

    7. Dispense 5 L of DELTAgene Assays to detector inlets of the 96.96 IFC.

    8. Dispense 6 5 L of each dilution sample + SsoFast MM to sample inlets of the 96.96 array.

    9. Load the chip.

    10. Run GE Fast 96x96 PCR+Melt v2.pcl

  • 38

    Segment Type Temperature (C) Duration (seconds)

    BioMark HD Ramp Rate

    (C/s)

    1 Thermal Mix

    70 2400 5.5

    60 30 5.5

    2 Hot Start 95 60 5.5

    3 PCR (30 Cycles)

    96 5 5.5

    60 20 5.5

    4 Melting Curve

    60-95 1C / 3 seconds

    Fluidigm DELTAgene Assay Qualification

    NOTE: DELTAgene assays are DNA binding dye-based detection assays. If you are using TaqMan assays, please follow the assay qualification procedure at http://tinyurl.com/ctuavdx. For probe-based assays, use the Auto (Detector) Ct Threshold Method.

    Run the First Chip 1. Use the protocol for assay qualification described earlier.

    2. Annotate samples and detectors in the Sample Setup and Detector Setup windows, respectively.

    3. Analyze the data using the Linear (Derivative) Baseline Correction Method and the Auto (Global) Ct Threshold Method.

    Set the Tm Range for Each Assay The BioMark HD system allows users to identify and eliminate data from non-specific amplification, thereby

    improving specificity and sensitivity. This is done by adjusting the Tm window in the data analysis software.

    NOTE: The Fluidigm Real-Time PCR Analysis Software User Guide, downloadable from http://www.fluidigm.com/product-documents.html, provides a detailed procedure for Tm range selection. Be sure to select the Linear Derivative with Auto Global options.

  • 39

    Figure 18: Selecting the Tm range

    Export Detector.plt 1. From the Detector Setup window, export the Detector.plt file. Use a filename appropriate for the set

    of assays being analyzed. The Tm range information gets retained as part of the Detector.plt file.

    2. Later, when you use the same set of assays to analyze single cells, use the Import button to import

    the Detector.plt file. This ensures that assay information gets added to the chip run and that Tm

    ranges selected in the qualification run are automatically applied to the single-cell data.

    Export Heat Map Results

    Save the chip run file and navigate to File > Export to export the Ct data. Heat map results are

    exported to Microsoft Excel in comma-delimited (.csv) format.

    Run Second Chip with Single-Cell Samples

    1. Flow-sort and process cells, or use the C1 system, and run single cells on a 96.96 Dynamic Array IFC

    following the guidelines in Appendix A of the Fluidigm Real-Time PCR User Guide, PN 68000088.

    NOTE: This analysis uses 96 single-cell samples as an example. It is often useful to include some

    control samples on the chip, but that topic will be discussed in subsequent documentation.

  • 40

    2. In Sample Setup, annotate the sample information.

    3. In Detector Setup, import the detector.plt file generated in the assay qualification run. This will

    bring in the Tm range for each assay.

    4. Analyze the file once again to incorporate the sample and assay information as part of the chip run

    file. To do this:

    Make sure the Baseline Correction Method is still set to Linear (Derivative).

    Make sure the Ct Threshold Method is still set to Auto (Global).

    Click Analyze.

    5. Save the chip run file.

    6. Navigate to File > Export to export the Ct data to Microsoft Excel as Heat Map Results. The file will

    be in .csv format.

  • 41

    Appendix 2: Removing Data Failed by Fluidigm Real-Time PCR Analysis Software

    Although the Fluidigm Real-Time PCR Analysis software fails reactions with an improper Tm , it does not

    change the Ct value determined from the amplification plot.

    To eliminate Ct values for reactions failed due to Tm:

    1. Open the Heat Map Results file in Excel.

    2. Copy the sample information in cells A113:B208 to A213:B308.

    3. Enter the formula =IF(C113="Pass",C13,999) in cell C213.

    NOTE: The 999 in the formula is because of the fact that Fluidigm Real-Time PCR Analysis software reports a Ct value of 999 for any reaction in which a positive amplification plot is not detected.

  • 42

    4. Copy the formula in cell C213 to fill matrix C213:CT308 5. Save the file in .xls or .xlsx format.

  • 43

    Appendix 3: Eliminating Low-Expressing Cells from Subsequent Analysis

    One way to eliminate cells or genes from subsequent analysis is to include at least two highly-expressed

    control genes in the set of assays used to interrogate the cells. To do this:

    1. Select at least two control genes that are highly expressed and are not expected to be differentially

    expressed in the cells being studied.

    2. Calculate the Log2Ex values for all genes and observe the expression histograms of the control genes

    to confirm that their transcript distribution is monophasic.

    3. Calculate the median and standard deviation for the control genes across all the single cells.

    4. For each control gene, determine a cutoff Ct value by calculating value for the median Ct and

    subtracting three times the standard deviation of Ct values for that gene.

    5. If the measured Cts are lower than the cutoff Cts for at least two control genes, eliminate that cell

    from further analysis.

    6. Replace the heatmap in the .csv export file from the Biomark with this new heatmap (without these

    samples) as it can then be loaded into the SINGuLAR package.

    7. Save the file.

  • 44

    Appendix 4: Normalizing Using Median Log2Ex

    NOTE: Sample and assay numbers and experimental layouts are unique to each experiment.

    1. Find the median of all Log2Ex values for each sample. To do this, use the command:

    =IFERROR(MEDIAN(All Log2Ex values for a single sample), )

    2. Compute the average of all sample Log2Ex median values.

    3. Calculate the difference in the median of each sample and the average of medians and add that value

    (whether positive or negative) to every Log2Ex value for that sample. This command is:

    =IFERROR(Individual Log2Ex value for a sample - (Sample Median Log2Ex Value - Avg of Medians),0)

    These median normalized Log2Ex values can then be copied and pasted into the .csv export file from the

    BioMark and saved to be loaded into the SINGuLAR package.

  • 45

    Appendix 5: A Note on the Optimal Number of Cycles Needed for Preamplification

    In the single-cell gene expression workflow, the qPCR reactions are preceded by preamplification of

    cDNA. Statistically, 18-20 cycles of preamplification will result in an average of five copies of target per

    chamber from a single copy of cDNA. Preamplification can have efficiencies close to 100% as reported by

    Devonshire et al. (2011).

    Preamplification affects the limit of detection. For Dynamic Array IFCs, five target molecules per

    reaction chamber correspond to 625 molecule/L in the 48.48 IFC and 730 molecule/L in the 96.96

    IFC.

  • 46

    References Aguilo, F., S. Avagyan, A. Labar, A. Sevilla, D. F. Lee, P. Kumar, I. R. Lemischka, B. Y. Zhou, and H. W. Snoeck (2011) Prdm16 is a physiologic regulator of hematopoietic stem cells, Blood 117:5057-5066.

    Bengtsson, M., A. Sthlberg, P. Rorsman, and M. Kubista (2005) Gene expression profiling in single cells from the pancreatic islets of Langerhans reveals lognormal distribution of mRNA levels, Genome Research 15:1388-1392.

    Chubb, J. R., T Trcek, S. M. Shenoy, and R. H. Singer (2006) Transcriptional pulsing of a developmental gene, Current Biology 16:1018-1025.

    Dalerba, P. et al. (2011) Single-cell dissection of transcriptional heterogeneity in human colon tumors, Nat Biotechnol 29:1120-1127.

    Devonshire, A. S., R. Elaswarapu, and C. A. Foy (2011) Applicability of RNA standards for evaluating RT-qPCR assays and platforms, BMC Genomics 12:118- 127.

    Diehn, M. et al. (2009) Association of reactive oxygen species levels and radioresistance in cancer stem cells, Nature 458:780-783.

    Flatz, L. et al. (2011) Single-cell gene-expression profiling reveals qualitatively distinct CD8 T cells elicited by different gene-based vaccines, Proc Natl Acad Sci USA 108:5724-5729.

    Guo, G., M. Huss, G. Q. Tong, C. Wang, L. L. Sun, N. E. Clarke, and P. Robson (2010) Resolution of cell fate decisions revealed by single-cell gene expression analysis from zygote to blastocyst, Developmental Cell 18:675-685.

    Livak, K. J. and T. D. Schmittgen (2001) Analysis of relative gene expression data using real-time quantitative PCR and the 2-CT method, Methods 25:402- 408.

    Pang, Z. P. et al. (2011) Induction of human neuronal cells by defined transcription factors, Nature 476:220-223.

    Raj, A., C. S. Peskin, D. Tranchina, D. Y. Vargas, and S. Tyagi (2006) Stochastic mRNA synthesis in mammalian cells, PLoS Biol 4:e309.

    Vandesompele, J., K. De Preter, F. Pattyn, B. Poppe, N. Van Roy, A. De Paepe, and F. Speleman (2002) Accurate normalization of real-time quantitative RT- PCR data by geometric averaging of multiple internal control genes, Genome Biology 3:research0034.1-research0034.11.

    Vincent, J. J. et al. (2011) Single cell analysis facilitates staging of Blimp1- dependent primordial germ cells derived from mouse embryonic stem cells, PLoS ONE 6:e28960.

    Acknowledgements

    Fluidigm gratefully acknowledges the pioneering contributions of Dr. Paul Robson to single cell data analysis; the use of violin plots, principal component analysis, and unsupervised clustering was adopted from his work. We would also like to thank Dr. Robson and the Genome Institute of Singapore for providing the single cell gene expression data used in the SINGuLAR Practice Sets.

  • 47

    World Headquarters 7000 Shoreline Court, Suite 100 South San Francisco, CA 94080 USA Tel: 650-266-6000 Fax: 650-871-7152 Fluidigm Europe, BV Parnassustoren Locatellikade 1, 1076 AZ Amsterdam Netherlands Tel: +33 (1) 60 92 42 40 Fax: +31 (0) 20 203 1111 Fluidigm Japan KK Level 5, Ginza TK Building 1-1-7 Shintomi Chuo-ku, Tokyo 104-0041 Japan Office: +81335552351 Fax: +8133552353 Fluidigm Singapore PTE Ltd Block 1026 Tai Seng Avenue #07-3532 Singapore 534413 Office: +6568587316 Fax: +6562825531

    Technical Support Email: [email protected]

    Phone in United States: 1.866.FLUIDLINE (1.866.358.4354)

    Outside the United States: 650.266.6100 On the Internet: www.fluidigm.com/support

    Visit our website at www.fluidigm.com

    PN 100-5066, Rev. B1

    Purpose of this DocumentThe Nature of Single-Cell TranscriptionTranscriptional Bursting in Single CellsReplicatesIdentification and Use of Limit of Detection (LoD) and Log2ExLimit of DetectionDetection Limit of the qPCR Reaction

    Qualification of Assays Prior to Single-Cell ExperimentsElimination of Cells or Genes from Subsequent AnalysisNormalizationSecondary AnalysisInstalling RInstalling RInstalling SINGuLAR

    Creating the SINGuLAR Directory for Data AnalysisPreparing BioMark System ResultsEstimating the Limit of Detection (LoD) Ct ValueOption 1: Experimental Determination of LoDOption 2: Iterative Determination of LoD

    Removing Failed Data Points and Low Expression CellsLoading and Analyzing Data for Single-Cell Experiment Results with SINGuLARSingle-Cell Data Analysis Performed Using Fluidigm SINGuLARViolin PlotsHierarchical ClusteringPrincipal Component Analysis (PCA)

    Loading and Individually Analyzing Data for Single-Cell ExperimentsTo Calculate Log2ExTo Generate a Violin PlotTo Generate a Hierarchical Cluster Heat MapTo Perform PCA

    Analyzing Multiple Chip Runs with SINGuLAR/Appendix 1: Protocol for Qualification of AssaysDetermining Limit of Detection Threshold Cycle (LoD Ct) Value Using All AssaysPreamplificationPreparation of 1:2 DilutionsqPCR DetectionFluidigm DELTAgene Assay QualificationRun the First ChipSet the Tm Range for Each AssayExport Detector.pltExport Heat Map Results

    Run Second Chip with Single-Cell SamplesAppendix 2: Removing Data Failed by Fluidigm Real-Time PCR Analysis SoftwareAppendix 3: Eliminating Low-Expressing Cells from Subsequent AnalysisAppendix 4: Normalizing Using Median Log2ExAppendix 5: A Note on the Optimal Number of Cycles Needed for PreamplificationReferencesAcknowledgements


Recommended