SPMd_Tutorial.doc

SPMd Tutorial12/22/2005By Dianne Patterson

Getting Started

"SPMd by Wen-Lin Luo, Hui Zhang & Thomas Nichols is a toolbox for SPM which allows you to establish the validity of inferences in fMRI modeling through diagnosis of linear model assumptions, and to characterize fMRI signal and artifacts through exploratory data analysis. Currently there are two versions, SPMd99, a version for SPM99, which is no longer being developed, SPMd2, a version for SPM2." http://www.sph.umich.edu/ni-stat/SPMd/

The two tutorials mentioned below are very helpful, you should look them over. In general, you should have completed an analysis of a single run. For spm99, there are several restrictions on the analysis choices if you want to be able to run spmd. For spm2, you just need to make sure you use a single run analysis. If you have preprocessed multiple runs together, you can still do a quick analysis of one run by itself. Make sure you identify the correct rp (realignment parameters) text file for spmd (if you have multiple runs).

In this spmd tutorial, I focus on spm2d and use the face-house block data available here (look in the BlockData directory for the data, and the Docs directory for a word file describing how to process it):http://merlin.psych.arizona.edu/cgi-bin/wrap/dpat/Public/Imaging/SPM/FaceHouse/

Tutorial Links: A tutorial for SPMd99 that tells you what buttons to press and what to expect:

http://www.sph.umich.edu/ni-stat/SPMd/Start.txt A tutorial for SPMd2 that shows you roughly what buttons to press and what to

expect as you run the program (there have been some minor changes to the interface since the tutorial was posted) http://www.sph.umich.edu/ni-stat/SPMd/spmd_example.html#Introduction

Two Papers:A 2002 technical paper on SPMd99. This has the benefit of containing detailed appendices not contained in the published paper: http://www.sph.umich.edu/ni-stat/SPMd/SPMd.pdf

A nice published version of the above paper: Luo, W.-L., & Nichols, T. E. (2003). Diagnosis and exploration of massively univariate neuroimaging models. NeuroImage, 19(3), 1014-1032.

http://www.sph.umich.edu/ni-stat/SPMd/SPMd.pdf

http://www.sph.umich.edu/ni-stat/SPMd/spmd_example.html#Introduction

http://www.sph.umich.edu/ni-stat/SPMd/Start.txt

http://merlin.psych.arizona.edu/cgi-bin/wrap/dpat/Public/Imaging/SPM/FaceHouse/

http://www.sph.umich.edu/ni-stat/SPMd/

Differences between the two versions of SPMdRestrictions on spm analysis for the use of SPMd

No multiple sessions allowed (SPM99 and SPM2) No low pass filtering allowed (only for SPM99) No intrinsic correlation allowed (i.e., you need to choose “Model intrinsic

correlations none” during the model estimation step in SPM99 if you intend to use spmd; only for SPM99)

SPM2d calculates Outlier proportions rather than the outlier count of SPM99dSPM99d generates a PCT (Percent Change Threshold) image, and SPM2d does not. No PCT tool appears to be available for SPM2.

SPMd: Diagnosis Strategies Scan summaries (time plots showing problem time points)

Identify problems in the timeline:Check for unexpected dips or spikes in the signal.Check for relationships between plots that might explain the dips and spikes.

Model summaries (brain images showing problem spatial locations)Check for violations of linear regression assumptions:Explore brain areas with excessive violations for each statistic image.Explore brain areas with predicted signal.

Model detail (residual plots of scans for a voxel)Check for unmodeled, systematic variation of scans in a residual plot.Identify bad scans in problem voxels. Note possible problem scans (bad for many or all voxels).

Scan detail (brain images for neighbouring scans)Identify problem brain locations associated with bad scans.Determine whether a bad scan looks like its neighbours.Look for bad slices, or problems with every other slice corresponding to outliers.

Remediation Remove problem scans.Modify model (e.g., turn off global scaling, modify timing).Identify problem regions.

Resolution Declare significant activation valid, orDeclare significant activation as questionable (because it occurs in an area that violates regression assumptions).Describe unmodeled and artifactual variation.

Understanding the Scan Summary

Why: The scan summary displays anomalies in the timeline and gives you an opportunity to identify relationships among those anomalies.

How: Look for spikes or dips in any of the parallel temporal plots and determine whether these correspond to peaks and dips in the other temporal plots, including the predictors. The sorts of things you might identify include, for example: transient spikes or dips caused by something external to the experiment (like the scanner); periodic spikes or dips that might correlate with the experiment, like outliers caused by experiment-related movement). The x axis is specified in TRs and appears only once, but applies to all plots above it. A temporal cursor (vertical dotted red line) is available for the temporal plots.>>ginput can be used to identify exact x and y coordinates of any point on a matlab generated graph in terms of that graph’s own x and y coordinates. Ginput allows you to select multiple points. Hit <enter> when finished and the x-y coordinates are printed to the main matlab window.

Experimental Predictor

At the top of the Scan Summary, the “Experimental Predictor” plot shows you the model. In this model, we predict 3 face trials and 3 house trials (face house face house face house). Predicted signal amplitude, on the y axis, will be useful to us when we look at the “Residual versus Predictor” plot in the Model Detail.

Global Signal:

“Because neuroimaging experiments often test hypotheses regarding local changes in neuronal activity, variations in signal that are common to the entire brain volume, global signal, have been considered as nuisance effects to be eliminated.” (Luo et al., 2002, p. 10) “Sources of these effects can be external (scanner drift, etc.) or physiological (motion, respiration, etc.)” (Gablab Wiki). “Aguirre et al. (1998) suggested that, to allow for the proper interpretation of neuroimaging results, the degree of correlation of the global signal with the experimental paradigm should be reported in any study to understand the role of global signal. In particular, if the global signal can be explained by experimental predictors, the global is a confound; otherwise it is simply a nuisance variable. To facilitate this assessment, we regress the global on the design matrix X, display the fitted line, and report the significance of this regression”. (Luo et al., 2002, p. 10)

Look at the blue line representing the global signal in the plot above. You’d like it to be pretty steady throughout the entire time course. But, you may notice that the first couple of time points (and especially the first time point) of our example are in a deep and wretched trough. We’ll see later that this has a pretty devastating and far reaching effect, despite the fact that most of the signal looks relatively good.Specific Suggestions: Report the F Test and P value for the global signal as Aguirre recommends. If the F test is significant, then your model is significantly correlated to the global

signal. If this is the case, the global signal is a confound and removing it may also remove some of your activations. Choose “Remove Global Effects=None”.

If the F-test is NOT significant, then the global signal is a nuisance variable and your activations will benefit from their removal: Choose “Remove Global Effects=Global Scaling”.

Outlier Rate

The outlier rate “is the ratio of outlier count to the number of outliers expected under the null hypothesis of normal data and a well-fitting model.” Outlier rate “is calculated for each scan and plotted with the timeline in TRs as the X-axis. For spm2d, an outlier is an observation that is greater than 3 sd from the fit. If the data are normal and the model fits, there should be approximately 0.003*S outliers, where S is the number of brain voxels. So, if C is the outlier count, the plot shows C/(0.003*S) (to be precise, it's not 0.003*S, but rather spm_Ncdf(-3)*S)” (http://www.jiscmail.ac.uk/cgi-bin/wa.exe?A2=ind0509&L=SPM&P=R20404&I=-3). If an observation is an outlier, you may want to remove the corresponding scan from your dataset. People have tried several approaches to this: 1) Remove the scan and respecify the model to account for the missing time point. 2) Copy a contiguous volume and rename it to replace that scan. 3) Average together the scans on either side of the offending scan to create an interpolated approximation of what “ought” to happen there (this will require additional tools or Matlab knowledge).

Shift Movement and Rotation Movement

These are the graphs produced by realignment and correspond to the values in the rp* text file (but with a helpful legend). For “Shift Movement”, the y axis is in mm. The y

http://www.jiscmail.ac.uk/cgi-bin/wa.exe?A2=ind0509&L=SPM&P=R20404&I=-3


axis for rotation movement is in degrees. It may be useful to determine whether values in this graph correspond to values in the global signal or outlier plots. If there is a correspondence, then movement may account for outlying values or changes in global signal. However, incorrect reconstruction of the data can also result in peculiar errors that look like movement, so interpret what you see carefully.

“Several things might make a particular session entirely unusable: several isolated scans with head motion of greater than 10mm (summed across x/y/z); several scans with head motion in a single direction greater than the size of a single voxels; a run of several scans in a row with significant motion and significant intensity change; high correlation of your motion parameters with your task... All subjects should be vetted for these problems before their results are taken seriously...” (Gablab Wiki).

Given the likelihood that large intensity shifts will occur in the first few volumes, it seems wise to do realignment to a volume later in the sequence. In our face-house block data, for example, we realigned to scan *000, which we now know to be an outlier. This may be a problem.

Average Periodogram

At the bottom of the scan summary is an average periodogram of raw residuals. It is NOT on a time line. This is a power spectrum of the raw residuals, (signal intensity on the y axis and frequency on the x-axis). It tells you which frequencies have a strong signal, and this means it tells you what cyclic signals are unaccounted for in your model.Here is how it works: One Hertz (Hz) is one cycle per second. A 1 Hz signal thus recurs every second and has a 1 second period. (1/hz=period)• 0.05 hz=20s period;

0.1 hz=10s period; 0.15 hz=6.7s period; 0.2 hz=5s period; 0.25 hz=4s period

Using >>ginput we see that our peaks in the plot above are at: 0.18 (5.6s), 0.151 (6.6s), and 0.025 (40s). That is, we have a peak that recurs every 5.6 seconds, another

that recurs every 6.6 seconds, and a third that occurs every 40 seconds. Because these peaks remain in our residuals, our model has not accounted for them.

Understanding the Model Summary

Why: The tests in the model summary are designed to test for data that violates the assumptions of the general linear model. "There are four principal assumptions which justify the use of linear regression models for purposes of prediction” (http://www.duke.edu/~rnau/testing.htm): linearity of the relationship between dependent and independent variables independence of the errors (no serial correlation of scans through time) homoscedasticity (constant variance) of the errors

o versus timeo versus the predictors

normality of the error distribution (e.g., the distribution is not skewed).

http://www.duke.edu/~rnau/testing.htm

“If any of these assumptions is violated (i.e., if there is nonlinearity, serial correlation, heteroscedasticity, and/or non-normality), then the predictions, confidence intervals, and insights yielded by a regression model may be (at best) inefficient or (at worst) seriously biased or misleading."(http://www.duke.edu/~rnau/testing.htm)

Assumption Model Summary Visualization Checkbox Name

Model Summary Test

Model Detail Test

linearity Res vs Experimental Predictorindependence Correlation Durbin-Watson Lag-1 plot

Dependence Cumulative Periodogram

homoskedasticity Homo vs Glob Cook-Weisberg: Homo1?

Res vs Global signal?

Homo vs h(Y) Cook-Weisberg: Homo2

??

Homo vs X Cook-Weisberg: Homo3?

Res vs Experimental Predictor?

normality Normality Shapiro-Wilk Res vs Experimental PredictorNormal Probability Plot Of Residuals

How: When you compute a model summary, you create a brain image for each statistical test AND a -log10 p value image for each test in your directory. When you view the model summary, you see the -log10 p value images. The intensity range is defined in -log10 p values. This means that the larger the -log10 p value is, the more significant the test is. White voxels are very significant violations of the test assumptions. For more detailed information about each Model Summary test, see the glossary entries. For more information about the model detail tests, see the Model Detail section.

-log10 p values can be easily calculated in matlab:>>-log10(.01)=2>>-log10(.001)=3

Right click the scale bar to adjust the range of values, which may help you better view the images. Select 'Mipify' from the right-click menu to see a maximum intensity projection (MIP) image instead of the usual slice views. With the MIP you will see where the maximum value is, since nothing is 'hidden' in this view. Select 'Mipify' again to return to the slice view.

If you have xjview http://people.hnl.bcm.tmc.edu/cuixu/xjView/ (requires spm2 and matlab 6.5 or greater), then you can also view the statistics images (not the –log10 p value images) in xjview (e.g., view SPMd_Corr.img not SPMd_PCorr.img). This allows you to identify the anatomical areas of abnormal statistics with a simple mouse click, and to use a slider to modify the p-value.

http://people.hnl.bcm.tmc.edu/cuixu/xjView/


When you click on an interesting region in one image (e.g., a region of activation in the t-test image), you can see at a glance whether there are problems with normality, heteroskedasticity or independence in that region. (Luo et al., 2002, p. 12)

If you right click on a voxel in one of the images, the corresponding model detail plots will be displayed.

Img_nltFrac.m and spmdbatch.m

Img_nltFrac.m is not part of spmd, but was written by Thomas Nichols.Img_nltFrac calculates the proportion of voxels that violate the assumptions of the test, given a particular (uncorrected) p-value.

Img_nltFrac.m asks for a -log10 P-value image, and then it will report thepercent of all voxels exceeding a particular p-value (0.05 by default) threshold and what percent that is of the expected excedance rate.

spmdbatch.m simply specifies a p-value (alpha level) to pass to Img_nltFrac.m and runs Img_nltFrac.m for all –log10 p images. This means you can quickly quantify how good a particular model is.

e.g., SPMd_PNorm.img alpha=0.01 :PosRate 3.43% RelPosRate 342.50%

3.43% of voxels had an alpha=0.01 significant non-normality statistic, which is 342.50% of the number expected if Normality was OK everywhere.

Note: Img_nltFrac.m does NOT control for the multiple comparison problem. An uncorrected alpha only gives a qualitative (but probably a very sensitive assessment) of the non-compliance with the regression model. Use of a corrected alpha gives a valid (if conservative) assessment of the problem. (I’m indebted to Jennifer Johnson-Cox and Thomas Nichols for both the program and the description above.)

Understanding Scan Detail

Why: Residual scans with lots of hyperintensities (white areas, e.g., *001) or hypointensities (dark areas, e.g., *000) correspond to data that did not fit the model. Failure to fit may correspond to spikes in the outlier plot and spikes or dips in the global signal. The scan detail “for most outlier-spike scans shows either similar acquisition artifacts confined to a single plane or to every other plane…In general, these dramatic artifacts are not evident by inspection of the raw images;” and are “not attributable to the particular model” (Luo et al. 2002, p. 19-20). You can also determine whether a particular area of the brain, rather than the whole brain, has a lot of unexplained variance…and again, how much that scan differed from its nearest neighbors.

How: If you click on a time point in the scan summary, then choose “scan detail” from the Visualization menu, you’ll see a temporal series of residual images for the selected scan and several neighboring scans. In the above scan detail example, the first two residual images (000 and 001) show up as outliers on our Scan Summary, with scan 000 being the worst by far and corresponding to a significant dip in global signal.

“When examining the temporal series of residual images, we note the spatiotemporal extent of the problem. An artifact confined to a single slice in a single volume suggests

shot noise, while an extended artifact may be due to physiological sources or model deficiencies” (Luo, et al., 2002, p. 14)

If you right click on a voxel in the scan detail, you’ll see the model detail for that voxel.

Understanding the Model DetailIn the scan summary, we have a plot of global signal anomalies through time, but we have no way to identify which specific voxels might be particularly problematic. In the model summary, we can identify voxels that are problematic, but we don’t know which time points are problematic. The model detail allows us to determine which time points violate the statistical assumptions at a particular voxel. “A panel of residual plots shows the standard diagnostic plots corresponding to the diagnostic summary statistics” in the Model Summary viewer (Luo et al., 2002, p.12). In general, “residuals are homogeneous and unstructured if the model fits” (Luo et al., 2002, p.3). For example, if the Durbin-Watson and Shapiro-Wilk images detected problems, one would view a lagged residual and normality plot, respectively. (Luo et al., 2002, p. 12)We use the diagnostic residual plots to check the specificity of the significant diagnostic statistics. For example, if a voxel is large in the image of Cook-Weisberg score statistic, we use a residual plot versus predictor variable to verify that systematic heteroscedasticity and not an outlier is responsible. (Luo et al., 2002, p. 14) From the time series plots of data with fit and residuals, we can not only assess the goodness-of-fit of the model to the signal, but also identify unmodeled signals. Also, from the time series plot of residuals, we note possible outlier scans. (Luo et al., 2002, p. 14)

Right click on a voxel in the model summary or scan detail, to see model detail like the above for that voxel. The residual plots display a dot for each time point in the selected voxel. The current time point is selected in red and the 6 residual plots are “yoked” so that changing the selected time point in one plot, will change it in the others as well. For example, if you select a point that is an outlier in one plot, you can easily see whether it is also an outlier in the other plots. In the above example, we have chosen the time point corresponding to the first scan (*000). A red * is displayed at the location of scan 000 in each of the upper 4 plots. The two time series plots at the bottom of the model detail show the vertical time cursor (dotted red line) at the selected time point. If you have the scan summary open along with the model detail, you’ll see that the vertical time cursor in those plots is yoked as well.

Res vs Experimental Predictor

Each dot represents a scan (Face=Blue; House=Green). On the x-axis, the scale (from 0 to1.5 in this example) corresponds to the predicted value of each scan. Linearity: The points should be symmetrically distributed around the horizontal line if the data meet the assumption of linearity.Normality: Look carefully for evidence of a "bowed" pattern, indicating that the model makes systematic errors whenever it is making unusually large or small predictions.Homoskdasticity: To detect heteroskedasticity be alert for evidence of residuals that are getting larger (i.e., more spread-out) as a function of the predicted value.http://www.duke.edu/~rnau/testing.htmOutliers: Individual outliers can also be seen on this plot (e.g., the red star marks the first scan, which we know to be an outlier)

The value of the y axis in the “Experimental Predictor” plot (in the scan summary) corresponds to the x-axis in our “Residual versus Experimental Predictor” plot (above):


Residual vs Global Signal

In a good model, we assume that variance should not increase or decrease with global signal intensity. The Cook-Weisberg global image in the Model Summary reveals brain areas that violate this assumption. The Residual vs Global Signal plot allows you to determine, for an individual voxel, whether a systematic pattern of heteroskasticity or a few outliers account for the violation of homoskedasticity. The plot displays the residual status of each time point relative to the global signal (on the x-axis) for an individual voxel. We want the time points to be distributed symmetrically and randomly about the zero-line irrespective of the total global intensity. The plot uses only one color for the time points, rather than color coding time points by condition.

Lag-1 Plot

The Lag-1 plot corresponds to the Durbin-Watson Statistic for identifying serial correlations between consecutive scans. Values should be distributed around the

intersection of the two zero lines. Optimally, the distribution should be symmetrical and tight (there should be very little correlation between consecutive scans). The plot does not color code the time points by condition. Rather, each data point simply represents a scan volume at a single time. Keep in mind that for fMRI, there is always serial correlation. The tests used in spmd have been “validated in the context of typical autocorrelated fMRI models” (Luo et al., 2002, p. 2).

Normal Probability Plot Of Residuals

"Normal probability plots give a visual way to determine if a distribution is approximately normal. These plots are produced by doing the following.

1. The data are arranged from smallest to largest. 2. The percentile of each data value is determined. 3. From these percentiles, normal calculations are done to determine their corresponding z-scores. 4. Each z-score is plotted against its corresponding data value.

If the distribution is close to normal, the plotted points will lie close to a line. "(http://www.math.hope.edu/swanson/statlabs/normal_pp.html)

“The best test for normally distributed errors is a normal probability plot of the residuals. …If the distribution is normal, the points on this plot should fall close to the diagonal line. A bow-shaped pattern of deviations from the diagonal indicates that the residuals have excessive skewness (i.e., they are not symmetrically distributed, with too many large errors in the same direction). An S-shaped pattern of deviations indicates that the residuals have excessive kurtosis--i.e., there are either two many or two few large errors in both directions.” The plot uses only one color for the data points, because they are

http://www.math.hope.edu/swanson/statlabs/normal_pp.html

not relating the time points to the conditions. Rather, each data point simply represents a scan volume at a single time.http://www.duke.edu/~rnau/testing.htm

Time Series Plot of Data with Fit and Residuals

This plot can be used to assess goodness of fit of the model to the signal and to identify unmodeled signals (Luo, et al., 2002, p. 14)

Time Series Plot of Residuals

Here we can identify possible outlier scans for an individual voxel

Glossary of Terms

AR(1) or AR(1) + w (or (AR(2), AR(3), etc.): Terms used to describe different models of autocorrelation in your fMRI data. See autocorrelation below for more info. AR stands for autoregression. AR models are used to estimate to what extent the noise at each time point in your data is influenced by the noise in the time point (or points) before it. The amount of autocorrelation of noise is estimated as a model parameter, just like a beta weight. The difference between AR(1), AR(2), AR(1) + w, etc., is in which parameters are estimated. An AR(1) model describes the autocorrelation function in your data by looking only at one time point before each moment. In other words, only the correlation of each time point to the first previous time point is considered. In an AR(2) model, the correlation of each time point to the first previous time point and the second previous time point is considered; in an AR(3) model, the three time points


before each time point are considered as parameters, etc. The "w" in AR(1) + w stands for "white noise." An AR(1) + w model assumes the value of noise isn't solely a function of the previous noise; it also includes a random white noise parameter in the model. AR(1) + w models, which are used in SPM2 and other packages, seem to do a pretty good job describes the "actual" fMRI noise function. A good model can be used to remove the effects of noise correlation in your data, thus validating the assumptions of the general linear model. (From Gablab Wiki: Glossary). See “Durbin-Watson Statistic” and “Lag-1 Residual Plots”.

Autocorrelation (function, correction, etc.): One major problem in the statistical analysis of fMRI data is the shape of fMRI noise. Analysis with the general linear model assumes each timepoint is an independent observation, implying the noise at each timepoint is independent of the noise at the next timepoint. But several empirical studies have shown that in fMRI, that assumption's simply not true. Instead, the amount of noise at each timepoint is heavily correlated with the amount of noise at the timepoints before and after. fMRI noise is heavily "autocorrelated," i.e., correlated with itself. This means that each timepoint isn't an independent observation - the temporal data is essentially heavily smoothed, which means any statistical analysis that assumes temporal independence will give biased results. The way to deal with this problem is pretty well-established in other scientific domains. If you can estimate what the autocorrelation function is - in other words, what, exactly, is the degree of correlation of the noise from one timepoint to the next - than you can remove the amount of noise that is correlated from the signal, and hence render your noise "white," or random (rather than correlated). This strategy is called pre-whitening, and is referred to in some fMRI packages as autocorrelation correction. The models used to do this in fMRI are mostly AR(1) + w models, but sometimes more complicated ones are used." (From Gablab Wiki: Glossary) See “Durbin-Watson Statistic” and “Lag-1 Residual Plots”.

Cook-Weisberg Statistic: (Homo. vs Glob, Homo vs h(Y), Homo vs X). A statistic used to detect heteroskedasticity. "Violations of homoscedasticity make it difficult to gauge the true standard deviation of the forecast errors, usually resulting in confidence intervals that are too wide or too narrow. In particular, if the variance of the errors is increasing over time, confidence intervals for out-of-sample predictions will tend to be unrealistically narrow. Heteroscedasticity may also have the effect of giving too much weight to small subset of the data (namely the subset where the error variance was largest) when estimating coefficients. If a voxel is significant in the Cook-Weisberg image, one can use the “residual vs predictor” plot to determine if the cause is systematic heterskedasticity or an outlier. SPMd tests for heteroskedasticity relative to three different criteria: 1) The experimental design matrix (Homo1?); 2) Predicted Response (Homo2); 3) Global signal (Homo3?) (note that there is a discrepancy between the checkbox labels in the Model Summary Visualization and the documentation in the m-files wrt the 1st and 3rd homo buttons)See “Homoskedasticity”.

Cumulative Periodogram “Dependence” A test for assessing white noise assumption of residuals, i.e., independence of items in a time series. The cumulative periodogram

http://cnl.web.arizona.edu/glossary.htm#gablab



looks for long term correlation in the data, and detects “general non-white noise” (Luo et al., 2002, p. 7). "Because fMRI noise is quite complicated with high-temporal frequency artifacts arising due to noise or physiological effects, a first order autoregressive model (like the Durbin-Watson statistic) may be too restrictive. We prefer a test with fewer constraints on the correlation structure and the distribution of the data" (Luo et al., 2002, p 7-8).See also Durbin-Watson statistic and Lag-1 Residual Plot.

Durbin-Watson Statistic: “Correlation” A test for autocorrelation of residuals. “A statistic for first order autocorrelation testing. The Durbin-Watson statistic is used to test for the presence of first order autocorrelation in the residuals of a regression equation. The test compares the residual for time period t with the residual for time period t-1 and develops a statistic that measures the significance of the correlation between the successive comparisons”. http://www.csus.edu/indiv/j/jensena/mgmt105/durbin.htm See “Independence”, “AR1” and “Autocorrelation”.

"Violations of independence are also very serious in time series regression models: serial correlation in the residuals means that there is room for improvement in the model, and extreme serial correlation is often a symptom of a badly mis-specified model, as we saw in the auto sales example. Serial correlation is also sometimes a byproduct of a violation of the linearity assumption--as in the case of a simple (i.e., straight) trend line fitted to data which are growing exponentially over time.

How to detect: The best test for residual autocorrelation is to look at an autocorrelation plot of the residuals. (If this is not part of the standard output for your regression procedure, you can save the RESIDUALS and use another procedure to plot the autocorrelations.) Ideally, most of the residual autocorrelations should fall within the 95% confidence bands around zero, which are located at roughly plus-or-minus 2-over-the-square-root-of-n, where n is the sample size. Thus, if the sample size is 50, the autocorrelations should be between +/- 0.3. If the sample size is 100, they should be between +/- 0.2. Pay especially close attention to significant correlations at the first couple of lags and in the vicinity of the seasonal period, because these are probably not due to mere chance and are also fixable. The Durbin-Watson statistic provides a test for significant residual autocorrelation at lag 1: the DW stat is approximately equal to 2(1-a) where a is the lag-1 residual autocorrelation, so ideally it should be close to 2.0--say, between 1.4 and 2.6 for a sample size of 50.

How to fix: Minor cases of positive serial correlation (say, lag-1 residual autocorrelation in the range 0.2 to 0.4, or a Durbin-Watson statistic between 1.2 and 1.6) indicate that there is some room for fine-tuning in the model. Consider adding lags of the dependent variable and/or lags of some of the independent variables.

If there is significant negative correlation in the residuals (lag-1 autocorrelation more negative than -0.3 or DW stat greater than 2.6), watch out for the possibility that you may have overdifferenced some of your variables. Differencing tends to drive

http://www.csus.edu/indiv/j/jensena/mgmt105/durbin.htm

autocorrelations in the negative direction, and too much differencing may lead to patterns of negative correlation that lagged variables cannot correct for.

Major cases of serial correlation (a Durbin-Watson statistic well below 1.0, autocorrelations well above 0.5) usually indicate a fundamental structural problem in the model. You may wish to reconsider the transformations (if any) that have been applied to the dependent and independent variables. It may help to stationarize all variables through appropriate combinations of differencing, logging, and/or deflating." http://www.duke.edu/~rnau/testing.htm

General Linear Model The general linear model is a statistical tool for quantifying the relationship between several independent and several dependent variables. It's a sort of extension of multiple regression, which is itself an extension of simple linear regression. The model assumes that the effects of different independent variables on a dependent variable can be modeled as linear, which sum in a standard linear-type fashion. The standard GLM equation is Y = BX + E, where Y is signal, X is your design matrix, B is a vector of beta weights, and E is error unaccounted for by the model. Most neuroimaging software packages use the GLM as their basic model for fMRI data, and it has been a very effective tool at testing many effects. Other forms of discovering experimental effects exist, notably non-model-based methods like principal components analysis. (From Gablab Wiki: Glossary)

Global Signal Effects Any change in your fMRI signal that affects the whole brain (or whole volume) at once. Sources of these effects can be external (scanner drift, etc.) or physiological (motion, respiration, etc.). They are generally taken to be non-neuronal in nature, and so generally you'd like to remove any global effects from your signal, since it's extremely unlike to be caused by any actual neuronal firing. (From Gablab Wiki: Glossary)

Global Scaling (to compensate for global signal effects) An analysis step in which the voxel values in every image are divided by the global mean intensity of that image. This effectively makes the global mean identical for every image in the analysis. In other words, it effectively removes any differences in mean global intensity between images. This is different than grand mean scaling! Global scaling (also called proportional scaling) was introduced in PET, where the signal could vary significantly image-to-image based on the total amount of cerebral blood flow, but it doesn't make very much sense to do generally in fMRI. The reason is because if your activations are large, the timecourse of your global means may correlate with your task - if you have a lot of voxels in the brain going up and down with your task, your global mean may well be going up and down with your task as well. So if you divide that variation out by scaling, you will lose those activations and possibly introduce weird negative activations! (see the Gablab Wiki PhysiologyFaq for some), considering that moment-to-moment global variations are very small in fMRI compared to PET. They can be quite large session-to-session, though, so grand mean scaling is generally a good idea (see below). (From Gablab Wiki: Glossary)







Homoscedasticity (constant variance of the errors) An assumption of linear regression. See “Cook-Weisberg Statistic” (a) versus time (b) versus the predictions (or versus any independent variable)

"Violations of homoscedasticity make it difficult to gauge the true standard deviation of the forecast errors, usually resulting in confidence intervals that are too wide or too narrow. In particular, if the variance of the errors is increasing over time, confidence intervals for out-of-sample predictions will tend to be unrealistically narrow. Heteroscedasticity (the violation of homoskedasticity) may also have the effect of giving too much weight to small subset of the data (namely the subset where the error variance was largest) when estimating coefficients.

How to detect: look at plots of residuals versus time and residuals versus predicted value, and be alert for evidence of residuals that are getting larger (i.e., more spread-out) either as a function of time or as a function of the predicted value. (To be really thorough, you might also want to plot residuals versus some of the independent variables.)

How to fix: In time series models, heteroscedasticity often arises due to the effects of inflation and/or real compound growth, perhaps magnified by a multiplicative seasonal pattern. Some combination of logging and/or deflating will often stabilize the variance in this case. Stock market data may show periods of increased or decreased volatility over time--this is normal and is often modeled with so-called ARCH (auto-regressive conditional heteroscedasticity) models in which the error variance is fitted by an autoregressive model. Such models are beyond the scope of this course--however, a simple fix would be to work with shorter intervals of data in which volatility is more nearly constant. Heteroscedasticity can also be a byproduct of a significant violation of the linearity and/or independence assumptions, in which case it may also be fixed as a byproduct of fixing those problems."http://www.duke.edu/~rnau/testing.htm

Independence of the errors (no serial correlation) An assumption of linear regression. "Violations of independence are also very serious in time series regression models: serial correlation in the residuals means that there is room for improvement in the model, and extreme serial correlation is often a symptom of a badly mis-specified model, as we saw in the auto sales example. Serial correlation is also sometimes a byproduct of a violation of the linearity assumption--as in the case of a simple (i.e., straight) trend line fitted to data which are growing exponentially over time.

How to detect: The best test for residual autocorrelation is to look at an autocorrelation plot of the residuals. (If this is not part of the standard output for your regression procedure, you can save the RESIDUALS and use another procedure to plot the autocorrelations.) Ideally, most of the residual autocorrelations should fall within the 95% confidence bands around zero, which are located at roughly plus-or-minus 2-over-the-square-root-of-n, where n is the sample size. Thus, if the sample size is 50, the

autocorrelations should be between +/- 0.3. If the sample size is 100, they should be between +/- 0.2. Pay especially close attention to significant correlations at the first couple of lags and in the vicinity of the seasonal period, because these are probably not due to mere chance and are also fixable. The Durbin-Watson statistic provides a test for significant residual autocorrelation at lag 1: the DW stat is approximately equal to 2(1-a) where a is the lag-1 residual autocorrelation, so ideally it should be close to 2.0--say, between 1.4 and 2.6 for a sample size of 50.

How to fix: Minor cases of positive serial correlation (say, lag-1 residual autocorrelation in the range 0.2 to 0.4, or a Durbin-Watson statistic between 1.2 and 1.6) indicate that there is some room for fine-tuing in the model. Consider adding lags of the dependent variable and/or lags of some of the independent variables. Or, if you have ARIMA options available, try adding an AR=1 or MA=1 term. (An AR=1 term in Statgraphics adds a lag of the dependent variable to the forecasting equation, whereas an MA=1 term adds a lag of the forecast error.) If there is significant correlation at lag 2, then a 2nd-order lag may be appropriate."http://www.duke.edu/~rnau/testing.htmSee “Cumulative Periodogram”, “Durbin Watson Statistic” and “Lag-1 Residual plot”

Lag 1 Residual Plot See Durbin-Watson Statistic for testing serial autocorrelations.

Linearity of the relationship between dependent and independent variables. An assumption of linear regression that the relationship between the dependent and independent variable will be simple (when one goes up, the other will go up at the same rate).

Normality of the error distribution. An assumption of linear regression, that your data will follow a normal bell curve. "Violations of normality compromise the estimation of coefficients and the calculation of confidence intervals. Sometimes the error distribution is "skewed" by the presence of a few large outliers. Since parameter estimation is based on the minimization of squared error, a few extreme observations can exert a disproportionate influence on parameter estimates. Calculation of confidence intervals and various signficance tests for coefficients are all based on the assumptions of normally distributed errors. If the error distribution is significantly non-normal, confidence intervals may be too wide or too narrow.

Violations of normality often arise either because (a) the distributions of the dependent and/or independent variables are themselves significantly non-normal, and/or (b) the linearity assumption is violated. In such cases, a nonlinear transformation of variables might cure both problems. In some cases, the problem with the residual distribution is mainly due to one or two very large errors. Such values should be scrutinized closely: are they genuine (i.e., not the result of data entry errors), are they explainable, are similar events likely to occur again in the future, and how influential are they in your model-fitting results? (The "influence measures" report is a guide to the relative influence of extreme observations.) If they are merely errors or if they can be explained as unique events not likely to be repeated, then you may have cause to remove them. In some cases, however, it may be that the extreme values in the data


provide the most useful information about values of some of the coefficients and/or provide the most realistic guide to the magnitudes of forecast errors."http://www.duke.edu/~rnau/testing.htm See http://www.skymark.com/resources/tools/normal_test_plot.asp for a nice discussionSee “Shapiro Wilk Statistic” and “Normal Probability Plot of Residuals”.

Normal Probability Plot of Residuals “The best test for normally distributed errors is a normal probability plot of the residuals. This is a plot of the fractiles of error distribution versus the fractiles of a normal distribution having the same mean and variance. If the distribution is normal, the points on this plot should fall close to the diagonal line. A bow-shaped pattern of deviations from the diagonal indicates that the residuals have excessive skewness (i.e., they are not symmetrically distributed, with too many large errors in the same direction). An S-shaped pattern of deviations indicates that the residuals have excessive kurtosis--i.e., there are either two many or two few large errors in both directions.” http://www.duke.edu/~rnau/testing.htm

Residuals The difference between the observed and predicted response (i.e., what is left over when your model has accounted for everything you can in the data). “The SPMd toolbox creates a 4D dataset of *standardized* residual (i.e., it does it SPM2 style, as a sequence of 3D analyze files). (Standardized residuals are the residual divided by their standard deviation.)” (http://www.jiscmail.ac.uk/cgi-bin/wa.exe?A2=ind0511&L=SPM&P=R357&I=-3). Apparently, even when error is homogeneous and independent, the residuals are heteroskedastic and dependent. To visualize residuals with homogeneous variance, spmd uses studentized residuals. When dependence of residuals is a problem, BLUS (Best Linear Unbiased residuals with Scalar [diagonal] covariance matrix) residuals are used. (Luo et al., 2003, p. 1016)

Shapiro-Wilk Statistic: “Normality” Tests. The hypothesis that a sample came from a normally distributed population. The linear regression assumes that remaining residuals will be normally distributed. The Shapiro-Wilk statistic in spmd is sensitive to statistical violations only within plane (Luo, W.-L. et al., p. 42). See “normality”.

ReferencesLuo, W.-L., & Nichols, T. E. (2002). “Diagnosis and exploration of massively univariate fMRI models”, 1-51.

Luo, W.-L., & Nichols, T. E. (2003). Diagnosis and exploration of massively univariate neuroimaging models. NeuroImage, 19(3), 1014-1032.



http://www.skymark.com/resources/tools/normal_test_plot.asp

Date post:	11-May-2015
Category:	Business
Upload:	zorro29
View:	566 times
Download:	0 times

SPMd_Tutorial.doc

Business