+ All Categories
Home > Documents > © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data...

© 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data...

Date post: 15-Jan-2016
Category:
Upload: milton-gregory
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
53
© 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry fingerprinting
Transcript
Page 1: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

© 2010 by University of Pennsylvania School of Medicine

Making Sense out of Flow Cytometry Data Overload

A crash course in R/Bioconductor and flow cytometry fingerprinting

Page 2: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Outline

• Background R Bioconductor

• Motivating examples• Starting R, entering commands• How to get help• R fundamentals

Sequences and Repeats Characters and Numbers Vectors and Matrices Data Frames and Lists Importing data from spreadsheets

• flowCore Loading flow cytometry (FCS) data gating compensation transformation visualization

• flowFP Binning Fingerprinting Comparing multivariate distributions

• Writing your own functions• Installing and running R on your

computer• Suggestions for further reading

and reference

Page 3: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Background

• R Is an integrated suite of software facilities for data manipulation,

simulation, calculation and graphical display. It handles and analyzes data very effectively and it contains a suite of

operators for calculations on arrays and matrices. In addition, it has the graphical capabilities for very sophisticated graphs

and data displays. It is an elegant, object-oriented programming language. Started by Robert Gentleman and Ross Ihaka (hence “R”) in 1995

as a free, independent, open-source implementation of the S programming language (now part of Spotfire)

Currently, maintained by the R Core development team – an international group of hard-working volunteer developers

http://www.r-project.org

http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf

Page 4: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Background

• Bioconductor “Is an open source and open development software project to provide

tools for the analysis and comprehension of genomic data.” Goals

To provide widespread access to a broad range of powerful statistical and graphical methods for the analysis of genomic data.

To provide a common software platform that enables the rapid development and deployment of extensible, scalable, and interoperable software.

To further scientific understanding by producing high-quality documentation and reproducible research.

To train researchers on computational and statistical methods for the analysis of genomic data.

http://bioconductor.org/overview

Page 5: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

A motivating example

I’ve just collected data from a T cell stimulation experiment in a 96-well plate format. I need to gate the data on CD3/CD4. How consistent are the distributions, so that I can establish one set of gates for the whole plate?

Page 6: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

A motivating example

Page 7: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Another motivating example

I’m concerned that drawing gates to analyze my data introduces unintended bias. Additionally, since I have multiple data files, drawing multiple gates is time consuming. Can I use R to compute gates and then apply these same objective gating criteria to multiple data files?

Page 8: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Another motivating example

Autogate lymphocytesand monocytes

Automatically analyzeFMO tubes

Page 9: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Back to the basics

• R is a command-line driven program

the prompt is: > you type a command

(shown in blue), and R executes the command and gives the answer (shown in black)

Page 10: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Simple example: enter a set of measurements

• use the function c() to combine terms together• Create a variable named mfi• Put the result of c() into mfi using the

assignment operator <- (you can also use =)• The [1] indicates that the result is a vector

Page 11: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Help, functions, polymorphism

> help (log)

> ?log

> apropos(“log”)

Page 12: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Vignettes – really good help!

Page 13: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Sequences and Repeats

Page 14: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Characters and Numbers

• Characters and character strings are enclosed in “” or ‘’

• Special numbers• NA – “Not Available”• Inf – “Infinity”• NaN – “Not a Number”

Page 15: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Vectors and Matrices

Page 16: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Vectors and Matrices

• The subset operator for vectors and matrices is [ ]

Page 17: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Vectors and Matrices

• You can extend the length of a vector via subsetting

… but not a matrix

Page 18: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Vectors and Matrices

• However, all’s not lost if you want to extend either the columns …

… or rows

Page 19: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Data Frames

• A Data Frame is like a matrix, except that the data type in each column need not be the same

Often, a Data Frame is created from an Excel spreadsheet using the function read.table()

Save As…a tab-delimitedtext file.

Page 20: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Data Frames from spreadsheets

Page 21: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Data Frames from spreadsheets

Page 22: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Data Frames from spreadsheets

Page 23: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Lists

Page 24: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Handling Flow Cytometry Data: flowCore

• flowCore is a base package that supports reading and manipulation of FCS data files

• The fundamental object that encapsulates the data in an FCS file is a flowFrame

• A container object that holds a collection of flowFrames is called a flowSet

• In the next slides we will go over reading an FCS file gating compensation transformation visualization

Page 25: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Check out the example data

Page 26: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Read an FCS file, summarize the flowFrame

Page 27: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.
Page 28: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.
Page 29: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.
Page 30: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Apply the lymphocyte gate with Subset

Page 31: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

needs to be transformed becauseit is rendering the linear datain the FCS file

Page 32: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

hasn’t been compensated!

Page 33: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.
Page 34: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

• Lines require library(fields)

• Percentages are in summary(fres)$p[1:4]

• Percentages are drawn in the graph with text()

Page 35: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Fingerprinting Flow Cytometry Data: flowFP

• flowFP aims to transform flow cytometric data into a form amenable to

algorithmic analysis tools Acts as in intermediate step between acquisition of high-throughput

FCM data and empirical modeling, machine learning and knowledge discovery

Implements ideas from

Roederer M, Moore W, Treister A, Hardy RR & Herzenberg LA. Probability binning comparison: a metric for quantitating multivariate distribution differences. Cytometry 45:47-55, 2001.

Rogers WT, Moser AR, Holyst HA, Bantly A, Mohler ER III, Scangas G, and Moore JS, Cytometric Fingerprinting: Quantitative Characterization of Multivariate Distributions, Cytometry 73A: 430-441, 2008.

and

Page 36: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

The basic idea

• Subdivide multivariate space into bins Call this a “model” of the space

• For each flowFrame in a flowSet, count the number of events in

each bin in the model• Flatten the collection of counts for a flowFrame into a 1D feature

vector• Combine all of the feature vectors together into a n x m matrix

n = number of flowFrames (instances) m = number of bins in the model (features)

• Also, tag each event with its bin membership facilitates visualization, interpretation can be used for gating

Page 37: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Probability Binning

Page 38: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Probability Binning

Page 39: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Probability Binning

Page 40: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Probability Binning

Bin

Nu

mb

er

> plot (mod, fs)

Page 41: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Class Constructors

• flowFPModel (base class) Consumes a flowFrame or flowSet Produces a model, which is a recipe for subdividing multivariate space

• flowFP Consumes a flowFrame or flowSet, and a flowFPModel Produces a flowFP, which represents the multivariate probability density

function as a fingerprint Also tags each event with its bin membership

• flowFPPlex Consumes a collection of flowFPs The flowFPPlex is a container object to facilitate handling large and

complex collections of flowFPs

Page 42: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.
Page 43: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.
Page 44: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.
Page 45: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.
Page 46: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.
Page 47: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Writing Your Own Functions

commentscomments

declarationdeclaration

assignmentassignment

returnreturn

code blockcode block

## It’s a good idea to comment your code#

myfunc <- function (arg1=10, arg2, ...){

# your code goes hereanswer <- log (arg1, base=arg2)

return (answer)}

Page 48: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Writing Your Own Functions

Page 49: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.
Page 50: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Obtaining R and Bioconductor

• R http://cran.r-project.org/

• Bioconductor http://bioconductor.org/GettingStarted

Page 51: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

General Reference Material

• A good beginner’s guide to R http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf

• A nice one-page reference card http://cran.r-project.org/doc/contrib/Short-refcard.pdf

• Outstanding summary of R/Bioconductor, with many examples http://manuals.bioinformatics.ucr.edu/home/

R_BioCondManual#R_favorite • The definitive reference for writing R extensions (advanced!)

http://cran.r-project.org/doc/manuals/R-exts.pdf• Books

William N. Venables and Brian D. Ripley. Modern Applied Statistics with S. Fourth Edition. Springer, New York, 2002. ISBN 0-387-95457-0.

John M. Chambers. Programming with Data. Springer, New York, 1998. ISBN 0-387-98503-4 (aka “the Green Book”)

Page 52: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Flow-Specific References

• Vignettes http://bioconductor.org/packages/2.6/bioc/vignettes/flowCore/inst/doc/HowTo-flowCore.pdf http://bioconductor.org/packages/2.6/bioc/vignettes/flowViz/inst/doc/filters.pdf http://bioconductor.org/packages/2.6/bioc/vignettes/flowStats/inst/doc/

GettingStartedWithFlowStats.pdf http://bioconductor.org/packages/2.6/bioc/vignettes/flowQ/inst/doc/

DataQualityAssessment.pdf http://bioconductor.org/packages/2.6/bioc/vignettes/flowFP/inst/doc/flowFP_HowTo.pdf

• Original Articles flowCore

Hahne, F., N. LeMeur, et al. (2009). "flowCore: a Bioconductor package for high throughput flow cytometry." BMC Bioinformatics 10: 106.

Fingerprinting Rogers, W. T., A. R. Moser, et al. (2008). "Cytometric fingerprinting: quantitative

characterization of multivariate distributions." Cytometry A 73(5): 430-41. Rogers, W. T. and H. A. Holyst (2009). "flowFP: A Bioconductor Package for

Fingerprinting Flow Cytometric Data." Advances in Bioinformatics 2009(Article ID 193947): 11.

Page 53: © 2010 by University of Pennsylvania School of Medicine Making Sense out of Flow Cytometry Data Overload A crash course in R/Bioconductor and flow cytometry.

Contact Me!

Wade [email protected] (o)610-368-5821 (m)


Recommended