The ppc PackageOctober 11, 2004
Title Peak Probability Contrasts
Version 1.01
Author Balasubramanian Narasimhan, R. Tibshirani, T. Hastie
Description Sample classification of protein mass spectra by peak probabilty contrasts
Maintainer Rob Tibshirani <[email protected]>
License GPL2.0
URL http://www-stat.stanford.edu/˜tibs/PPC
R topics documented:
ppc.cv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2ppc.fdr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3ppc.find.splits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4ppc-internal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5ppc.make.centroid.list . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5ppc.make.peaklist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6ppc.peak.summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7ppc.peaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8ppc.plot.hist . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8ppc.plotcv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9ppc.plotcvprob . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10ppc.plotfdr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11ppc.predict . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11ppc.predict.peaks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12ppc.predict.peaks1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13ppc.predict1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14ppc.read.peaks.batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15ppc.read.peaks.nobatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15ppc.read.raw.batch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16ppc.read.raw.nobatch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17ppc.remove.beforeslash.and.suffix . . . . . . . . . . . . . . . . . . . . . . . . . . . 18ppc.remove.suffix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18ppc.subset.and.reshape . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19ppc.subset.and.reshape.peakdata . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Index 21
1
2 ppc.cv
ppc.cv Cross-validation for PPC analysis
Description
This function does K-fold cross-validation for PPC analysis
Usage
ppc.cv(ppc.fit, data, user.parms)
Arguments
ppc.fit Result of call to ppc.predict
data List containing mass spec data
user.parms List of user defiend parameters
Details
Value
err CV error rate for each threshold value
se se of CV error rate
confusion Cnfusion matrix for each threshold value
threshold Threshold vlaues used
yhat Predicted values from CV
prob Cv probabilities
y Training set outcome values
folds Indices defining CV folds
numsites Number of m/z sites surviving shrinkage at each threshold value
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
ppc.fdr 3
ppc.fdr Function to estimate False Discovery rates for peaks in PPCanalysis
Description
Estimate False Discovery rates for peaks in FDR analysis, using permutations of the samplelabels
Usage
ppc.fdr(data, centroid.fit, peak.fit, split.fit, ppc.fit, user.parms)
Arguments
data List containing mass spec data
centroid.fit Result of call to ppc.make.centroid.list
peak.fit Result of call to ppc.predict.peaks
split.fit Result of call to ppc.find.splits
ppc.fit Result of call to ppc.predict
user.parms List of user defined parameters
Details
Value
results Matrix with columns- threshold used, number of peaks found, FDR
pi0 Esimate of proportion of truly null peaks
threshold Vector of thresholds used
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
4 ppc.find.splits
ppc.find.splits Function to find best discriminating split points for training datain mass spec
Description
Find best discriminating split points for training data (to separate the classes). Takescentroids.fit- result of call to make.centroids.list and peaks.fit- result of call to predict.peaks
Usage
ppc.find.splits(centroid.fit, peak.fit, data, user.parms)
Arguments
centroid.fit Result of call to ppc.make.centroid.list
peak.fit Result of call to ppc.predict.peaks
data List containing mass spec data
user.parms List of user defined parameters
Value
prhat Proportion of samples beyond optimal cutpoint in each outcome class
pr Proportion of samples beyond cutpoints in each outcome class
n.class number of samples in each outcome class
cutpoints Cutpoints (split points) tried
cuthat Optimal cut points
prclose Indicators for split points with prob difference within 10 percent of thatof the optimal split point
nsplits Number of cutpoints tried
fix.at.one Was the optimal cutpoint fixed at one? (i.e no peak vs peak)
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
ppc-internal 5
ppc-internal Internal ppc functions
Description
Internal ppc functions
Usage
ppc.make.centroids(clust.tree, x, user.parms)which.is.min(x)medoid(x)permute.rows(x)balanced.folds(y, nfolds)hclust.1d(x, debug=FALSE)ppc.read.peaks.file(filename)
Details
These are not to be called by the user.
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
ppc.make.centroid.list
Function to make a list of peak centroids, from a set of peaksfrom different spectra
Description
This function starts with a list of peaks from a collection of spectra, and does a one dimen-sional hierarchical clustering. It then cuts off the dendogram at height user.parms$peak.gap,forming clusters of peaks. The medoids of each cluster form hte final list of peak centroids.
Usage
ppc.make.centroid.list(data, user.parms)
Arguments
data List containing mass spc data
user.parms List of user-defined parameters
6 ppc.make.peaklist
Value
cent Matrix of centroids, one per row
peaklist Peaklist used
clust.tree Dendrogram from hclust.1d
all.peaks Vector of m/z values of all peaks used
peak.gap Peak width used
recluster Not currently used
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
ppc.make.peaklist Function to extract peaks from raw mass spec data
Description
This function to extracts peaks from raw mass spec data. It uses a very simple peak finder”peaks”- looking for sites where the intensity is higher than it is elsewhere in a user-definedwindow surrounding that site.
Usage
ppc.make.peaklist(data, user.parms)
Arguments
data List containing raw mass spec data
user.parms List of user parameters
Value
List of peaks- one component per spectrum. Each component is a matrix of log m/z valuesand peak intensity.
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
ppc.peak.summary 7
ppc.peak.summary Produce summary of peaks from PPC analysis
Description
This function produces a summary of peaks from a PPC analysis
Usage
ppc.peak.summary(centroid.fit, peak.fit, data, user.parms, split.fit)
Arguments
centroid.fit Result of call to ppc.make.centroid.list
peak.fit Result of call to ppc.predict.peaks
data List containing mass spec data
user.parms List of user defined parameters
split.fit Result of call to ppc.find.splits
Details
Value
Matrix containing on row per peak. Columns are peak.position, min, max of peak.position,number.of.spectra in which the peak occurs, rank.of.peak, in discriminatory power, splitpoint for peak height, proportion of samples in class 1 exceeding split point, proportion ofsamples in class 2 exceeding split point etc, pairwise differences in these probabilities, peakinfo for each sample
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
8 ppc.plot.hist
ppc.peaks Find local maxima
Description
Finds the local maxima in a vector. This is an R implementation of the Splus functionpeaks. . Note that the span parameter is a proportion between 0 and 1, rather than thenumber of x values (as in the Splus function). Note also that it only handles a vector asinput.
Usage
ppc.peaks(x, span)
Arguments
x A vector. Peaks will find the local maxima in x.
span A peak is defined as an element in a sequence which is greater than allother elements within a window of width length(x)*span centered at thatelement.
Details
All elements within a halfspan of the end of a sequence or within a halfspan of a missingvalue are FALSE.
Value
vector of logical values, indicating whether there is a peak at each location
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
x<-rnorm(1000)
a<-ppc.peaks(x, .02)
ppc.plot.hist Plot peak histograms from PPC analysis
Description
This function plots the histograms of peaks from a PPC analysis. They are laid out in orderof discriminatory power, starting at the top left and moving down the leftmost column. Atmost 25 sites are plotted.
Usage
ppc.plot.hist(peak.fit, ppc.fit, centroid.fit, split.fit, data, first.site = 1, last.site = 25, title.plot = NULL)
ppc.plotcv 9
Arguments
peak.fit Result of call to ppc.predict.peaks
ppc.fit Result of call to ppc.predict
centroid.fit Result of call to ppc.make.centroid.list
split.fit Result of call to ppc.find.splits
data List containing mass spec data
first.site Integer- first site to plot. Default 1
last.site Integer- last site to plot. Default 1.
title.plot Character title for plot. Default NULL
Details
Value
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rscript.rawdata
ppc.plotcv Function to plot CV curves from PPC analysis
Description
Function to plot CV curves from PPC analysis
Usage
ppc.plotcv(fit)
Arguments
fit Result of call to ppc.cv
Details
Value
10 ppc.plotcvprob
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
ppc.plotcvprob Function to plot CV probabilities from PPC analysis
Description
Function to plot CV probabilities from PPC analysis
Usage
ppc.plotcvprob(fit, data, threshold)
Arguments
fit Result of call to ppc.cv
data List containing mass spec data
threshold Value of threshold to use for predictions
Details
Value
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
ppc.plotfdr 11
ppc.plotfdr Function to plot FDR results from PPC analysis
Description
Function to plot FDR results from PPC analysis
Usage
ppc.plotfdr(fdrfit)
Arguments
fdrfit Result of call to ppc.fdr
Details
Value
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
ppc.predict Function to do test set prediction for the PPC method
Description
This function does test set prediction for PPC method. It predicts outcome classes for alist of peaks from test set spectra.
Usage
ppc.predict(centroid.fit, split.fit, logmz, peaklist.te, n.threshold = 30, threshold = NULL, metric = c("binomial", "euclidean", "absolute"), summ = c("mean", "median"))
12 ppc.predict.peaks
Arguments
centroid.fit Result of a call to ppc.make.centroid.list
split.fit Result of a call to ppc.find.splits
logmz log of m/z values from training data
peaklist.te List of peaks from test set- each component is a matrix of log m/z valuesand peak intensities
n.threshold Number of shrinkage thresholds to use
threshold Threshold values to use
metric ”binomial”,”euclidean”, or ”absolute”
summ ”mean” or median”
Value
yhat Matrix of predicted classes
threshold Threshold values used.
numsites Number of sites surviving the threshold for each shrinkage value
sites List of sites surviving the threshold for each shrinkage value
ind Indicator matrix of event ( peak intensity at site > cutpoint
ind0 Indicator matrix of event ( peak intensity at site > cutpoint
prob Matrix of estimated class probabilities
ht Matrix of peak intensities
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
ppc.predict.peaks A function to find centroid peaks, in a list of individual peaks
Description
Takes centroid.fit (result of a call to make.centroids.list) and looks for these m peaks inpeaklist, a list of length n returns ind - an m by n matrix of TRUE/FALSE values andht, the matrix of corresponding peak heights. A peak is considered present if it is withinuser.parms$peak.gap units from the centroid.
Usage
ppc.predict.peaks(centroid.fit, data)
ppc.predict.peaks1 13
Arguments
centroid.fit Result of call to make.centroid.listdata List containing the mass spec peaks data
Value
ind indicator matrix of presence/absence of peakht matrix of heights. Note: peak is not present if ind=0, even if ht is >0
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
ppc.predict.peaks1 A function to find centroid peaks, in a list of individual peaksfrom one spectrum
Description
Takes centroid.fit (result of a call to make.centroids.list) and looks for these m peaks ina peaklist from one spectrum. returns ind - an m-vector of TRUE/FALSE values andht, the vector of corresponding peak heights. A peak is considered present if it is withinuser.parms$peak.gap units from the centroid.
Usage
ppc.predict.peaks1(centroid.fit, logmz, peaklist.new)
Arguments
centroid.fit Result of call to make.centroid.listlogmz Log of m/z valuespeaklist.new Matrix of peaks from a single spectrum. Rows are (log m/z value, peak
intensity)
Value
ind indicator vector of presence/absence of peakht Vector of heights. Note: peak is not present if ind=0, even if ht is >0
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
14 ppc.predict1
ppc.predict1 Function to do a single test set prediction for the PPC method
Description
This function does test set prediction for PPC method. It predicts outcome classes for alist of peaks from a single test set spectrum.
Usage
ppc.predict1(centroid.fit, split.fit, logmz, peaklist.te, threshold, metric = c("binomial", "euclidean", "absolute"), summ = c("mean", "median"))
Arguments
centroid.fit Result of a call to ppc.make.centroid.list
split.fit Result of a call to ppc.find.splits
logmz log of m/z values from training data
peaklist.te List of peaks from test set- each component is a matrix of log m/z valuesand peak intensities
threshold Threshold values to use
metric ”binomial”,”euclidean”, or ”absolute”
summ ”mean” or median”
Value
yhat Matrix of predicted classes
threshold Threshold values used.
numsites Number of sites surviving the threshold for each shrinkage value
sites List of sites surviving the threshold for each shrinkage value
ind Indicator matrix of event ( peak intensity at site > cutpoint
ind0 Indicator matrix of event ( peak intensity at site > cutpoint
prob Matrix of estimated class probabilities
ht Matrix of peak intensities
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
ppc.read.peaks.batch 15
ppc.read.peaks.batch Read in mass spec peak data, with batches
Description
A function to read in protein mass spec peaks data, with batches
Usage
ppc.read.peaks.batch(dir, batches)
Arguments
dir Name of directory containing the data
batches Vector of batch names
Details
Value
peaklist List of peaks for each sample. Each component is a matrix- one row perm/z site, consiting of log m/z and peak intensity
filenames List of filenames read in
logmz log m/z values
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.peakdata
ppc.read.peaks.nobatch
Read in mass spec peak data, without batches
Description
A function to read in protein mass spec peaks data
Usage
ppc.read.peaks.nobatch(dir)
Arguments
dir Name of directory containing the data
16 ppc.read.raw.batch
Details
Value
peaklist List of peaks for each sample. Each component is a matrix- one row perm/z site, consiting of log m/z and peak intensity
filenames List of filenames read in
logmz log m/z values
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.peakdata
ppc.read.raw.batch A function to read in raw protein mass spec data, with batches
Description
This function reads in raw protein mass spec data from a directory. The directory is assumedto have on subdirectory per class (eg control or disease). The subdirectories like ”control”have further subdirectories, one per batch. So the structure looks lik control/batch1/file.csv,control/batch2/file.csv, disease/batch1/file.csv, etc. There is one comma-separated (csv)file.csv in the subdirectory per spectrum, having lines of the form m/ value, intensity (oneline per m/z site)
Usage
ppc.read.raw.batch(dir, batches, mz = NULL)
Arguments
dir Name of directory containing the data
batches Vector of character names of batches
mz Optional vector of m/z values. Default NULL. If NULL, m/z values areread in from files. Otherwise the values in mz are used.
Value
xtr Matrix of intensities- one row per m/z site, one col per spectrum
mz m/z values
filenames List of filenames read in
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
ppc.read.raw.nobatch 17
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
ppc.read.raw.nobatch A function to read in raw protein mass spec data, with no batches
Description
This function reads in raw protein mass spec data from a directory. The directory is assumedto have on subdirectory per class (eg control or disease). There is one comma-separated(csv) file in the subdirectory per spectrum, having lines of the form m/ value, intensity (oneline per m/z site)
Usage
ppc.read.raw.nobatch(directory, mz = NULL)
Arguments
directory Name of directory containing the data
mz Optional vector of m/z values. Default NULL. If NULL, m/z values areread in from files. Otherwise the values in mz are used.
Value
xtr Matrix of intensities- one row per m/z site, one col per spectrum
mz m/z values
filenames List of names of files that were read in
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
18 ppc.remove.suffix
ppc.remove.beforeslash.and.suffix
Remove characters efore a slash and at end
Description
This function takes a character file name or vector of character file names, and removes thecharacter before the rightmost slash, and the suffix after the last dot. Eg ”file/foo/junk.csv”becomes ”junk”
Usage
ppc.remove.beforeslash.and.suffix(x)
Arguments
x Character string or vector of character strings.
Details
Value
Character string or vector of character strings, modified in the manner descibed above.
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
ppc.remove.suffix Function to remove the suffix from a file name
Description
This function just removes a .xxx suffix from the end of a filename or a vector of filenames
Usage
ppc.remove.suffix(x)
Arguments
x Filename or vector of filenames
ppc.subset.and.reshape 19
Value
Filename or list of filenames with suffix removed
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
ppc.subset.and.reshape
Function to subset and reshape raw mass spec data
Description
This function subsets raw mass spec data, using the m/z limits mz.min and mz.max inuser.parms. If there are batches, it also reshapes the data, concatenating the spectra indifferent batches for a given patient. Thus for each patient it produces one long column ofvalues. The m/z values are strung out as well. Using this trick we can essentially ignorethe presence of batches for the rest of the analysis,
Usage
ppc.subset.and.reshape(data, user.parms)
Arguments
data List containing the mass spec data
user.parms List of user parameters
Value
List containing the subset and reshaped mass spec data
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.rawdata
20 ppc.subset.and.reshape.peakdata
ppc.subset.and.reshape.peakdata
Function to subset and reshape mass spec peak data
Description
This function subsets mass spec peak data, using the m/z limits mz.min and mz.max inuser.parms. If there are batches, it also reshapes the data, concatenating the spectra indifferent batches for a given patient. Thus for each patient it produces one long vector ofvalues. The m/z values are strung out as well. Using this trick we can essentially ignorethe presence of batches for the rest of the analysis,
Usage
ppc.subset.and.reshape.peakdata(data, user.parms)
Arguments
data List containing the mass spec data
user.parms List of user parameters
Value
List containing the subset and reshaped mass spec data
Author(s)
Balasubramanian Narasimhan and Rob Tibshirani
Examples
## for a complete worked example of this function in a PPC analysis see
## http://www-stat.stanford.edu/~tibs/PPC/Rdist/Rscript.peakdata
Index
∗Topic internalppc-internal, 4
balanced.folds (ppc-internal), 4
hclust.1d (ppc-internal), 4
medoid (ppc-internal), 4
permute.rows (ppc-internal), 4ppc-internal, 4ppc.cv, 1ppc.fdr, 2ppc.find.splits, 3ppc.make.centroid.list, 4ppc.make.centroids (ppc-internal), 4ppc.make.peaklist, 5ppc.peak.summary, 6ppc.peaks, 7ppc.plot.hist, 7ppc.plotcv, 8ppc.plotcvprob, 9ppc.plotfdr, 10ppc.predict, 10ppc.predict.peaks, 11ppc.predict.peaks1, 12ppc.predict1, 13ppc.read.peaks.batch, 14ppc.read.peaks.file (ppc-internal), 4ppc.read.peaks.nobatch, 14ppc.read.raw.batch, 15ppc.read.raw.nobatch, 16ppc.remove.beforeslash.and.suffix, 17ppc.remove.suffix, 17ppc.subset.and.reshape, 18ppc.subset.and.reshape.peakdata, 19
which.is.min (ppc-internal), 4
21