+ All Categories
Home > Documents > Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database...

Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database...

Date post: 12-Mar-2018
Category:
Upload: hoangdan
View: 217 times
Download: 2 times
Share this document with a friend
31
Supplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret the search strategy. The key words for this topic consist of colorectal, cancer, serum and markers. Their “associated words” correspond to CRC (colorectal cancer), colorectal, large intestine, large bowel, colon, colonic, rectal and rectum; cancer, carcinoma, tumor, malignancy and neoplasma; cancer, carcinoma, tumor, malignancy and neoplasma; and marker(s), biomarker(s), signature molecule, molecular marker(s), marker(s) biological, biological marker(s), biologic markers and mark, respectively (tip: a shortcut to find the “associated words” is to go to the website http://thesaurus.com/browse/ and the OvidSP database website and input your key words, which will provide “associated words”). According the PUBMED syntax search rule, a Boolean search, i.e., (colorectal OR large intestine OR large bowel OR colon OR colonic OR rectal OR rectum) AND (cancer OR carcinoma OR tumor OR neoplasm OR cancers) AND (serum OR sera OR serums OR blood OR plasma) AND (marker OR signature molecule OR molecular marker OR markers OR biomarkers OR biomarker OR mark), can be directly performed in the PUBMED search box (See The below search strategy in MEDLINE). Duplicates from literature databases can be deleted automatically or manually using Reference Manager and EndNote (produced by Thomson Reuters, New York, NY, USA). The search strategy in MEDLINE Colorectal cancer {Including Limited Related Terms} (colorectal[Title/Abstract] OR (large[Title/Abstract] AND intestine[Title/Abstract]) OR (large[Title/Abstract] AND bowel[Title/Abstract]) OR colon[Title/Abstract] OR colonic[Title/Abstract] OR rectal[Title/Abstract] OR rectum[Title/Abstract]) AND (cancer[Title/Abstract] OR carcinoma[Title/Abstract] OR tumor[Title/Abstract] OR neoplasm[Title/Abstract] OR cancers[Title/Abstract]) AND (serum[Title/Abstract] OR sera[Title/Abstract] OR serums[Title/Abstract] OR blood[Title/Abstract] OR plasma[Title/Abstract]) AND (marker[Title/Abstract] OR (signature[Title/Abstract] AND
Transcript
Page 1: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

Supplementary Materials

Appendix 1

The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret the search strategy. The key words for this topic consist of colorectal, cancer, serum and markers. Their “associated words” correspond to CRC (colorectal cancer), colorectal, large intestine, large bowel, colon, colonic, rectal and rectum; cancer, carcinoma, tumor, malignancy and neoplasma; cancer, carcinoma, tumor, malignancy and neoplasma; and marker(s), biomarker(s), signature molecule, molecular marker(s), marker(s) biological, biological marker(s), biologic markers and mark, respectively (tip: a shortcut to find the “associated words” is to go to the website http://thesaurus.com/browse/ and the OvidSP database website and input your key words, which will provide “associated words”). According the PUBMED syntax search rule, a Boolean search, i.e., (colorectal OR large intestine OR large bowel OR colon OR colonic OR rectal OR rectum) AND (cancer OR carcinoma OR tumor OR neoplasm OR cancers) AND (serum OR sera OR serums OR blood OR plasma) AND (marker OR signature molecule OR molecular marker OR markers OR biomarkers OR biomarker OR mark), can be directly performed in the PUBMED search box (See The below search strategy in MEDLINE). Duplicates from literature databases can be deleted automatically or manually using Reference Manager and EndNote (produced by Thomson Reuters, New York, NY, USA).

The search strategy in MEDLINEColorectal cancer {Including Limited Related Terms} (colorectal[Title/Abstract] OR (large[Title/Abstract] AND intestine[Title/Abstract]) OR (large[Title/Abstract] AND bowel[Title/Abstract]) OR colon[Title/Abstract] OR colonic[Title/Abstract] OR rectal[Title/Abstract] OR rectum[Title/Abstract]) AND (cancer[Title/Abstract] OR carcinoma[Title/Abstract] OR tumor[Title/Abstract] OR neoplasm[Title/Abstract] OR cancers[Title/Abstract]) AND (serum[Title/Abstract] OR sera[Title/Abstract] OR serums[Title/Abstract] OR blood[Title/Abstract] OR plasma[Title/Abstract]) AND (marker[Title/Abstract] OR (signature[Title/Abstract] AND molecule[Title/Abstract]) OR (molecular[Title/Abstract] AND marker[Title/Abstract]) OR markers[Title/Abstract] OR biomarkers[Title/Abstract] OR biomarker[Title/Abstract] OR mark[Title/Abstract]) AND ("humans"[MeSH Terms] AND (Clinical Trial[ptyp] OR Editorial[ptyp] OR Letter[ptyp] OR Case Reports[ptyp] OR Classical Article[ptyp] OR Clinical Conference[ptyp] OR Clinical Trial, Phase I[ptyp] OR Clinical Trial, Phase II[ptyp] OR Clinical Trial, Phase III[ptyp] OR Clinical Trial, Phase IV[ptyp] OR Controlled Clinical Trial[ptyp] OR Corrected and Republished Article[ptyp] OR English Abstract[ptyp] OR Journal Article[ptyp] OR Multicenter Study[ptyp]) AND English[lang] AND (cancer[sb] OR medline[sb]))

Page 2: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

Appendix 2

Data extraction for screening or diagnostic markersIn a diagnostic (or screening) study, the four core values, i.e., the numbers of true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN), should first be considered, as these data will be used to construct a two-by-two table for meta-analysis. If complete data on the four values cannot be obtained, relevant values should be abstracted as far as possible and be used to reckon one or more values of the four core values. The relevant values generally correspond to the numbers in the cases and control group; sensitivity=TP/(TP+FN); specificity=TN/(TN+FP); 95% confidence intervals (CIs); overall accuracy=(TP+TN)/(TP+FN+FP+TN); positive predictive value (PPV)=TP/(TP+FP); negative predictive value (NPV)=TN/(TN+FN); positive likelihood ratio (LR+)=(TP/(TP+FN))/(FP/(FP+TN))=sensitivity/(1-specificity); negative likelihood ratio (LR-)=(FN/(TP+FN))/(TN/(TN+FN))=(1-sensitivity)/specificity; diagnostic odds ratio (DOR)=(TP*TN)/(FP*FN); and the risk ratio=relative risk=(TP/(TP+FP))/(FN/(FN+TN)). In addition to the preceding the parameters, the following elements from the article are also extracted: 1) the overall study characteristics, such as the author(s), institution, date of publication, recruitment setting, study design and study years; 2) participant characteristics, for example, the description of the cases and control groups; 3) the details of the index marker test including the positive versus negative cut-off value; and 4) the type of reference test used to confirm the presence or absence of subjects. A standard form or database must generally be prepared in advance.

Page 3: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

Appendix 3

Statistical methods used to obtain estimates of loge(HR) and its variance

The following describes the methods that were used to obtain estimates of loge(HR) and its variance (var[loge(HR)]) from all the occasions desired. The methods are based on those of Parmar and co-workers (Parmar MK et al,1998)

(1) Given loge(HR) and var(HR): Extract these direct estimates

(2) Given HR and var(HR):Calculate loge(HR)Calculate a 95% CI for HR = HR ± 1.96 × SE(HR)Calculate a 95% CI for loge(HR) = loge (95% HR CI)Then use (4)

(3) Given HR and an αi%CI:Use logs to obtain loge(HR) and its 95% CIThen use (4)

(4) Given loge(HR) and an αi% CI: Calculate

(5) Given HR and a p value (pi):Calculate loge(HR) and use (6)

(6) Given loge(HR) and a p value:Calculate

(7) Given a p value (pi) for loge(HR) or HR, and the total number of deaths/recurrences (Oi) and the group sizes are unequal with sizes n1and n2:Calculate

(8) Given the χ2 statistic from the log-rank/Mantel–Haenszel test or Cox regression or Wilcoxon test comparing two groups of patients defined by marker status and the total number of deaths/recurrences (Oi) in each group:Use (7), since

(9) Given individual patient data (IPD) that include initial mark value, follow-up time and final known status:Calculate the direct estimate by using a Cox proportional hazards model

(10)Given a survival curve with censoring

Page 4: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

points on it:Estimate the observed number of events and patients at risk from each group at each event time, and use these to estimate the expected number of events for each group.Then

(11)Given an HR and only group numbers and group events:Calculate loge(HR)Calculate var[loge(HR)] from (7)

Page 5: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

Appendix 4

Random-effects model versus fixed-effects modelTo facilitate understanding, we will use the simplest mathematical formulas and non-technical language to interpret the relationships among the above terms.Meta-analysis is used to integrate the same effect sizes from multiple studies; therefore, experimental variability may arise both between studies and within a study. Here, tau-squared (2) denotes the difference between studies (i.e., the between-study variance) and delta-squared (δ2) denotes the difference within a study (i.e., the within-study variance). 2 is defined to be equal to2/(2+δ2), expressing the ratio between the between-study variance and total variance (=between-study variance +within-study variance). If 2 is too large, which would seem to imply that the between-study variance is predominant, the assessment of heterogeneity should be acceptable. The selection of a model must be based solely on the question of which model fits the distribution of effect sizes, and takes account of the relevant source(s) of error. When studies are gathered from the published literature, the random-effects model is generally a more plausible match. That the strategy of starting with a fixed-effect model and then moving to a random-effect model if the test for heterogeneity is significant is a mistake, and should be strongly discouraged (Borenstein M et al, 2009).

Appendix 5

QUADAS checklist for all diagnosis papers

No items1a representative patient sample1b study participants clearly described2 selection criteria clearly described3 adequate reference standard4 acceptable delay between tests5 partial verification avoided6 differential verification avoided7 incorporation avoided8a adequate index test description8b cut-off value clearly described9 adequate reference standard description10 blinding for reference test results

Page 6: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

11 blinding for index test results12 clinical data avaiable as in practice13 uninterpretable test results reported14 explaining withdrawals from the study

Appendix 6

Below, we will introduce widely used two ways to perform a meta-analysis of the accuracy of diagnostic tests. 1) Forest plots of sensitivity and specificity estimates and their 95% CIs were constructed from every study using MetaDiSc software (version 1.4)(Zamora J et al, 2006) with the heterogeneity of the accuracy estimates assessed with the I2 statistic (Higgins JP et al, 2006). The summary estimates of sensitivity and specificity were calculated using the package metandi for STATA 11 statistical software (STATA Corp, College Station, TX) (Harbord RM et al, 2009). (See below first way for an example and the flow of operation). 2) The software program Review Manager (RevMan) 5.1 (Cochrane Collaboration, Copenhagen, Denmark) (Review Manager, 2011) plus the metandi command in STATA can also be used to complete meta-analyses of diagnostic test accuracy (See below second way for an example and the detailed operation flow).

The first way

MetaDisc + metandi operation flow for meta-analysis of the accuracy of diagnostic tests1. The software MetaDisc version (1.4) can be freely downloaded from

ftp://ftp.hrc.es/pub/programas/metadisc/Metadisc_update.htm and installed in the Windows operating system, followed by opening the program and entering data. The interface display is shown in Figure 1.

Page 7: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

Figure 1. The interface of entering data of MetaDisc

2. In the “Analyze” pull-down menu, clicking “plot” produces forest plots for sensitivity and specificity (Figure 2), and by clicking ‘export’, the plots can be saved.

Figure 2, Forest plot of the meta-analysis of sensitivity or specificity.

3. STATA version 11 is first confirmed to be installed correctly, referring to ‘STATARelease 11 Installation Guide’, downloaded from : https://remote.bus.brocku.ca/files/Published_Resources/STATA_11/ig.pdf, or www.agecon.ksu.edu/support/ Stata11Manual /ig.pdf ). To ensure that your STATA package is up-to-date, type “update all” in the command box in the STATA window when your system is logged on to the internet. STATA will automatically connect to www.STATA.com and download all required updates. Note: STATA is case-sensitive. All STATA commands must be lower

Page 8: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

case.

4. By clicking the “Data editor (Edit)” button in the toolbar in the main interface of STATA, a

data editing window will be opened, in which the data are input (Figure 3).

Figure 3, Interface for inputting data in STATA

5. In the command box of main STATA interface, by first inputting the command metandi tp fp fn tn, nolog (The outputting information is omitted), followed by the command metandiplot tp fp fn tn, a summary receiver operating characteristic (SROC) plot will be automatically generated (Figure 4).

Page 9: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

0.2

.4.6

.81

Sen

sitiv

ity

0.2.4.6.81Specificity

Study estimate Summary point

HSROC curve 95% confidenceregion

95% predictionregion

Figure 4. Plot of the fitted model from metandiplot

The second way

RevMan 5.1 operation flow for meta-analysis of the accuracy of diagnostic tests1. Installation and start-up.

Download RevMan 5.1 from http://ims.cochrane.org/revman/download, and install and open it. The two panes, the outline pane and content pane, of the main interface of RevMan 5.1 will be displayed (Figure 5).

Page 10: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

Figure 5. The RevMan 5.1operation interface

In the “File” menu, select “new”, and click the “next” button; select “Diagnostic test accuracy review”, and click “next”; input a title, for example, ‘ serological tests ’ for ‘pulmonary TB’ in ‘

adults and children’ into

the boxes; click “next”; select “protocol”; and click “Finish”.2. Adding studies to ‘ a review’.

In the ‘outline pane’, click the key icon next to “Studies and references”; click the key icon next to the “Reference to studies” section; place the cursor of your mouse on “Included studies”; press the right mouse button; and click the “Add Study” button on the outline pane toolbar; input a study ID, for instance, Alifano 1994, into the “New Study Wizard” box; click “next”; from the “Data source” drop-down list, choose “Published and unpublished data”; click “next”, and re-click “next”; select “Add another in the same section ”; and click “Finished”. According to the number of studies, repeat the above operations until all study IDs have been input.

3. Entering data and analyses.

In the ‘outline pane’, select “Data tables by test”; click the “Add Test” button on the

outline pane toolbar; input the test name, such as ‘AndaTB lgG’ , into name box of the “New Test Wizard”; click “Finish”.

In the ‘outline pane’, click the key icon to see the new test(s) listed under “Data table by

test”; select ‘AndaTB lgG’, and click the “Add Test Data” button on the outline pane

toolbar; select all studies listed in the “New Test Data Wizard” individually (Figure 6); click “Finish”. Enter the data into the data table (Figure 7)

Page 11: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

Figure 6. Interface to select studies.

Figure 7. Interface for entering data in RevRan 5.1.

4. Generate a plot using the HSROC model or bivariate model.

The HSROC model (Rutter CM et al, 2001) assumes that there is an underlying ROC curve in each study with parameters α and β, which characterize the accuracy and asymmetry of the curve. The 2×2 table for each study then arises from dichotomizing at a positivity threshold, θ. The parameters α and θ are assumed to vary between studies; both are assumed to have normal distributions, as in a conventional random-effects meta-analysis. The accuracy parameter has

a mean of Λ (capital lambda) and a variance of , while the positivity parameter θ has a

Page 12: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

mean of Θ (capital theta) and a variance of . Because estimation of the shape parameter,

β, requires information from more than one study, it is assumed to be constant across studies. Therefore, when no covariates are included in an HSROC model, there are five

parameters: Λ, Θ, β, ,and .

The bivariate model (Reitsma JB et al, 2005) models the sensitivity and specificity more directly. It assumes that their logit (log-odds) transforms present a bivariate normal distribution between studies. The logit-transformed sensitivities are assumed to have a mean of μA and a

variance of , while the logit-transformed specificities have a mean of μB and a variance of

. The trade-off between sensitivity and specificity is allowed for by including a

correlation, ρAB, that is expected to be negative. The bivariate model, like the HSROC model,

therefore has five parameters when no covariates are included: μA, μB, , , and ρAB.

RevMan cannot be used to perform meta-analysis or calculate summary estimates for studies involving diagnostic test accuracy reviews (DTA). External software will be required for this purpose. RevMan is capable of graphical analysis of the data at hand, either on its own or with input parameters from external software, In the Analysis content pane, there is also the option to generate additional figures based on the results of more complex models, such as the HSROC and bivariate models (see the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy). These more complex models cannot be fitted in RevMan, but you can perform these analyses in an external statistical package and then import the results into RevMan.

Externally calculated parameters A model is chosen to create a corresponding 95% confidence ellipse around a summary point. There are two types of models for this purpose: a. HSROC Model Here, you can enter the five parameters of the HSROC model: Lambda (Λ)-accuracy parameter. Theta (Θ) - cut-point parameter. Beta (β) - shape parameter.

Var(accuracy) ( )- variance of the accuracy parameter.

Var(threshold) ( ) - variance of the threshold parameter.

b. Bivariate Model You can enter the six parameters of the bivariate model: E(logitSe) (μA)-expected mean value of logit-transformed sensitivity. E(logitSp) (μB)- expected mean value of logit-transformed specificity.

Var(logitSe) ( )- between-study variance of logit-transformed sensitivity.

Page 13: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

Var(logitSp) ( )- between study variance of logit-transformed specificity.

Corr(logits) (ρAB) -correlation between logit-transformed sensitivity and specificity. These parameters can be calculated using SAS or STATA software. The details and software codes related to fitting the bivariate model can be found in Reitsma et al. (2005) (Reitsma JB et

al, 2005). In the present article, we adopt the metandi command of STATA to calculate these parameters. For this purpose, copy the table from the RevMan 5.1 interface, and paste it into the “Data Editor (Edit)” interface of STATA, or export the data from RevMan 5.1, and save them as an .xls file; then, input or paste the data into the “Data Editor (Edit)” interface of STATA. Input the metandi tp fp fn tn, nolog command in the command box of the main STATA interface. The parameters will be output as shown in Figure 8.

Covariance between estimates of E(logitSe) & E(logitSp) .0534886 1/LR- 3.766101 .7654101 2.528695 5.609025 LR- .2655266 .0539648 .1782841 .3954609 LR+ 8.461791 3.94637 3.392212 21.10773 DOR 31.86796 19.97895 9.326341 108.8923 Sp .9103892 .0391315 .7987128 .9629783 Se .7582674 .0433697 .6636185 .832987Summary pt. s2theta .09473 .1036661 .0110915 .8090674 s2alpha 1.876917 1.184224 .5449883 6.464025 beta .7870456 .4496577 1.75 0.080 -.0942673 1.668358 Theta .0651412 .3494087 -.6196871 .7499696 Lambda 3.258616 .5793504 2.12311 4.394122HSROC Corr(logits) .6640539 .3415275 -.3776901 .9638451Var(logitSp) 1.23896 .8768902 .3094626 4.960282Var(logitSe) .2567073 .1905372 .0599303 1.099588 E(logitSp) 2.318397 .4796662 1.378269 3.258525 E(logitSe) 1.143204 .2366077 .6794615 1.606947Bivariate Coef. Std. Err. z P>|z| [95% Conf. Interval] Log likelihood = -42.459148 Number of studies = 7

Meta-analysis of diagnostic accuracy

. metandi tp fp fn tn, nolog

Notes: s2alpha= Var(accuracy); s2theta=Var(threshold); covariance between the estimates of E(logitSe) & E(logitSp) =Cov(logits).Input the parameters shown in Figure 8 into the corresponding boxes in Figure 9, and the summary receiver operation curve (SROC) plot will be obtained via the HSROC or bivariate model (Figure 9)

Page 14: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

Figure 9. input the parameters of HSROC and bivariate model manually.

Appendix 7

STATA installation and 14 STATA meta-analysis commands. STATA (version >11) should first be correctly installed prior to conducting a meta-analysis (see ‘STATARelease 11 Installation Guide’, downloaded from www.agecon.ksu.edu/support/ Stata11Manual /ig.pdf ). To ensure that your STATA package is up-to-date, type “update all” in the command box in the STATA window when your system is logged on to the internet. STATA will automatically connect to www.STATA.com and download all required updates and further ensure that all 14 STATA commands are already installed on your system (see below Checking 14 STATA meta-analysis commands). Before performing a meta-analysis in STATA, let us briefly go over the general syntax of a STATA command. The most common form of a STATA command is as follows: command [varlist] [,options].command denotes a STATA command and is the only part of the general form that is always required. Specific commands may require additional parameters. varlist denotes a list of variable names, and option denotes a list of options. The square brackets indicate optional qualifiers. The brackets are not typed when entering a command.(Notes: STATA is case-sensitive. All STATA commands must be lower case . To receive help for any STATA command: simply type “help command_name” in the command box; e.g., typing “help metabias” will open a window that describes the metabias command, its options, and so on)

Checking 14 STATA meta-analysis commands

1. Pull down the help menu in STATA; select “search”; and search for “meta_dialog” with the “search all” option checked. Install the meta_dialog module. 2. Once you install the meta_dialog module on your system, you will need to create a profile.do file and add it to your STATA do file list. The profile.do file will launch itself automatically each time you open STATA. The profile.do file should contain the program provided below (look under “Menu Creation Commands”). Cut and paste the program into an empty do file, and save the file in some path that STATA will recognize at the time of initialization: C:\data\STATA\profile.do OR C:\ado\personal\profile.do. 3. Once the profile.do file is saved in the right path, open STATA. You should see a line like this come up on your opening screen: running c:\ado\personal\profile.do, which indicates that the profile.do commands are being run automatically. 4. Now look in the top menu bar, and pull down the “USER” menu. You should see a new pull-down menu called “Meta-analysis” (Figure 10). When you open the Meta-analysis menu, you will

Page 15: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

see all 14 commands. Click on any command, and it will open the dialog box.

Figure 10. Pull-down menu of the 14 STATA meta-analysis commands

Example 1

MYC-N prognostic marker in neruoblastoma [data type: loge(HR) and its variance or standard errorRiley RD et al. (2003) (Riley RD et al, 2003) reported a systematic review of tumor markers used in pediatric oncology: for Ewing’s sarcoma and neuroblastoma, we extracted the data on the four main variables, i.e., the paper No, loge(HR), var[loge(HR)] and outcome (the data for this example originated from the Table 12 in Riley RD et al. (2003), which is shown in below example data 1). The data were input (referring to below Inputting Data into STATA) (continued),

Inputting Data into STATA

First, you must input data into the analysis system by typing the data into a worksheet in the Data Editor or accessing a preexisting worksheet stored on the hard disk. Then, you can analyze the data. Click the bottom of the Data Editor to open the Data Editor that lies under the word window at the top of the screen, or type edit in the Command Window and press enter. As shown, this opens a blank worksheet. Every column is a variable, while every row of the worksheet is an observation. For instance, if we had data on the sizes and heights of four subjects, the two values for the first subject would go into columns one and two of the first row, while those for the second subject would go into columns one and two of the second row, and so on. The variables are designated Var1, Var2, and so on by default, though they can be changed as needed. The intersection of a column and row is termed a cell. STATA uses the default notation

Page 16: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

Var[i] to designate the cell defined by the intersection of the variable Var and the ith observation. Therefore, Var1[2] is the cell defined by the intersection of variable Var1 and the second observation. The variable labeled x in Table 2.6 on page 34 of the text is used to demonstrate data entry. First, click cell Var1[1] until the cell becomes dark. The dark background indicates the cell in which data will be entered after you type the data in and press Enter or Tab. Then, type the first value in Table 2.6 (number 3) in cell Var1[1]. Moving the cursor to the next observation is accomplished by pressing Enter at this point, while pressing Tab moves the cursor to the next variable. Next, press Enter, and enter the second observation (also a 3). Continue to type until all observations for Var1 have been entered. Naming the variables in the worksheet is a good idea. We will name our variable x because that was the only designation used in table 2.6. However, we would usually want to use a more informative name, such as “gender” or “TN” for true negative. As can be observed, variable labels can be employed to make variable names such as TN clearer.

Figure 12: Screen appearance when data from Table 2.6 are entered.

Double click on any cell in the Var1 column to change the variable name from Var1 to x. Open the Variable Properties window, and then type x in the Name cell, and click on the OK window. Click the Preserve button after entering the data and changing Var1 to x. The Preserve button stores your worksheet in the computer memory temporarily. Then, if you wish to bring back the data as it existed at the time of the last Preserve, you can click the Restore button to return it. However, you must understand that Preserve does not store your data on your hard disk permanently, which is performed with the Save or Save As button. The Data Editor window should be as shown in Figure 12. Save this file to the hard disk, thus making a permanent copy. You must first close the Data Editor window. Second, choose File⇒ Save As. Be sure you are saving into the directory you created previously. Name this data file table 2.6 data.dta. You do not need to add the .dta

Page 17: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

extension, as STATA will automatically do that for you. After clicking Save, this worksheet will be available to you at any time you choose to load it into the Data Editor. To verify this, be sure the Data Editor is closed, and then type ‘clear’ in the Command window, and press Enter. This removes the current data from the computer’s memory. You can re-open the Data Editor to assure yourself that no data are available to the editor, as the blank worksheet attests. Close the editor. Now, click the open folder icon or click File⇒ Open. Double click the table 2.6 data.dta file (Note that STATA has appended the suffix.dta to the file), or type the file name in the File name window and click Open. There is no obvious result, but table 2.6 data.dta has been placed in the memory, as you can verify by opening the Data Editor. Also note that the variable x appears in the Variables window.

(continued) and describe was typed in the SATA command panel. The data are described below.

. describeContains data obs: 48 vars: 4 size: 768 (99.9% of memory free)------------------------------------------------ storage display valuevariable name type format label variable label-------------------------------------------------paperno int %8.0g paper nogroup str2 %9s logehr float %8.0g loge(HR)varlogehr float %8.0g var[loge(HR)]-------------------------------------------------Sorted by: Note: dataset has changed since last saved

We start by producing data in the format of Example 1 data and pooling loge(HR) using the default options.gen selogehr=1/sqrt(varlogehr) /*variance ‘logehr’ was transferred to standard error ‘selogehr’metan logehr selogehr

Heterogeneity chi-squared = 56.22 (d.f. = 47) p = 0.168 I-squared (variation in ES attributable to heterogeneity) = 16.4% Test of ES=0 : z= 13.05 p = 0.000

At the bottom of the ‘results box’ of the STATA interface displayed above the contents, it can be seen that I-squared (I2)=16.4%<50%, and P=0.168>0.05, which means that the heterogeneity test is not significant and that a fixed-effects model is more suitable for this meta-analysis. Thus, the

Page 18: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

full command is as follows:metan logehr selogehr, eform effect (Hazard Ratio) title (“Fixed-effects meta-analysis”) boxsca(0.9) label (namevar=paperno)or metan logehr varlogehr, var eform effect (Hazard Ratio) title (“Fixed-effects meta-analysis”) boxsca(0.9) label (namevar=paperno)

Syntactic interpretations: metan is the command for meta-analysis; logehr, selogehr and varlogehr are variables denoting the natural logarithm of the hazard ratio, its standard error and its variance, respectively; eform requests that the output be exponentiated; effect (Hazard Ratio) allows the graph to name the summary statistic used; title (“Fixed-effects meta-analysis”) adds a title to the forest plot; boxsca(0.9) controls box scaling; label (namevar=paperno) means that variables (paperno) were chosen as a labels for each study in the forest plot; and var means that the user has specified a variable containing the variance of the effect estimate. If this option is not included, the command assumes that the standard error has been specified.Considering the constraints of article length, the output is omitted, and the forest plot for Example 1 is shown in Figure 11.

Figure 11: Forest plot for example 1. A meta-analysis will customarily include a forest plot, in which results from

each study are displayed as a square and a horizontal line, representing the intervention effect estimate together

Page 19: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

with its confidence interval. The area of the square reflects the weight that the study contributes to the meta-

analysis. The combined-effect estimate and its confidence interval are represented by a diamond.

Example data 1

Prognostic marker MYC-Npaper no group logehr varlogehr27 OS 1.27069 1.646258540 OS 2.37108 0.519245559 OS 1.53749 0.387 OS 2.59002 1.07729487 OS 0.70804 0.393549593 OS 1.75241 0.4083333108 OS 2.62768 0.569877109 OS 2.99172 0.2637121111 OS 1.25903 0.1464032194 OS 1.71722 0.4444444199 OS 0.76198 0.038358247 OS 2.1285 0.6828283256 OS 1.52171 0.213868285 OS 2.65543 0.4658385288 OS 1.82247 0.5005952297 OS 1.21788 0.0462941316 OS 0.95551 0.2164599373 OS 3.54167 0.8286738376 OS 0.92822 0.0721786387 OS 2.59002 1.077294393 OS 1.86553 0.3214286396 OS 2.80061 0.2739106469 OS 0.04879 0.1589515469 OS 1.00202 0.2910289501 OS 5.04262 1.2506 OS 0.6668 0.0850725544 OS 1.7352 0.1989146548 OS 5.70267 3.003571439 OS 3.3329 0.506516944 OS 1.5372 0.26522550 OS 2.0684 0.482330380 OS 1.5135 0.2344496106 OS 2.3113 0.2516026107 OS 2.364 0.3201296173 OS 0.4291 0.6590192

Page 20: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

200 OS 1.3233 0.2592846200 OS 2.1691 0.418609214 OS 0.7332 0.5032484216 OS 1.745 0.5194085246 OS 2.8988 1.216609260 OS 2.7462 1.2115405277 OS 1.827 0.2238236540 OS 0.7039 0.3101376181 OS 1.2562 0.2378394193 OS 1.63118 0.6926407306 OS 2.08042 0.4532293337 OS 1.38329 0.1383142356 OS 2.24971 0.7628118388 DFS 1.449 0.328388 DFS 1.022 0.2816687 DFS 1.63511 0.413884102 DFS 2.04122 0.3848234122 DFS 0.77122 0.4687848142 DFS 0.55015 0.1451613188 DFS 2.18605 0.1220517200 DFS 1.05881 0.2918403200 DFS 1.4755 0.2010766254 DFS 1.77071 0.207138276 DFS 2.97696 0.3313636278 DFS 2.50144 0.5779104280 DFS 1.91692 0.1172873295 DFS 2.39393 0.529304306 DFS 0.47 0.2813982317 DFS 1.89995 0.3334014332 DFS 1.61859 0.1730769378 DFS 0.84244 0.0655474386 DFS 1.59948 0.2362874386 DFS 0.84255 0.074337387 DFS 1.63511 0.414245465 DFS 0.25464 0.0844886474 DFS 2.36561 1506 DFS 0.3001 0.0675255548 DFS 5.70267 3.003571444 DFS 1.6375 0.262656380 DFS 1.4552 0.1676903109 DFS 2.1876 0.176568173 DFS 0.4145 0.6689604239 DFS 1.1773 0.3287876

Page 21: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

246 DFS 1.8485 0.435864260 DFS 2.9515 1.1752728277 DFS 1.8951 0.2131669288 DFS 1.8983 0.7782768540 DFS 0.7648 0.2367796122 DFS 1.34419 0.2568889145 DFS 3.58047 2.7222222145 DFS 0.93392 0.1052915315 DFS 3.29269 0.2547413337 DFS 1.70149 0.1485597337 DFS 0.41085 0.1932524505 DFS 1.42979 0.1350555152 DFS 1.5216 0.12119335 DFS 2.55723 0.3010159495 DFS 0.51879 0.1712756528 DFS 0.29267 0.3483816Notes: The table contains data for four variables: paper no, outcome, loge(HR) and var[loge(HR)], derived from Table 12 in Riley RD et al. (2003). Abbreviations: OS: overall survival; DFS: disease free survival; HR, hazard ratio.

Example 2

An estimate of the efficacy of gemcitabine plus platinum chemotherapy compared with other platinum-containing regimens in advanced non-small-cell lung cancer [data type: the HR and its confidence interval, i.e., when the effect size and its confidence are declared]The data were extracted from Le Chevalier T et al. (2005) (Le Chevalier T et al, 2005) (see Example data 2 for details on the data). After inputting the data into the Data Editor (Edit) interface, 4 commands are entered individually in the following order:gen lnhr=ln(hr)/*note: the hazard ratio is transferred to natural logarithm form.gen lnll=ln(ll) gen lnul=ln(ul) metan lnhr lnll lnul

Heterogeneity calculated by the formula Q = SIGMA_i{ (1/variance_i)*(effect_i - effect_pooled)^2 } where variance_i = ((upper limit - lower limit)/(2*z))^2 Heterogeneity chi-squared = 18.75 (d.f. = 16) p = 0.282 I-squared (variation in ES attributable to heterogeneity) = 14.7% Test of ES=0 : z= 3.35 p = 0.001

Page 22: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

At the bottom of the ‘results box’ of the STATA interface displayed above the contents, it can be seen that I-squared (I2)=14.7%<50%, and P=0.282>0.05, which means that the heterogeneity test is not significant. Hence, a random-effects model is suitable for this meta-analysis, and the full command to perform the meta-analysis of prognostic test accuracy with the hazard ratio and its CI is as follows:metan lnhr lnll lnul, eform label (namevar=studyid) title ("random-effects model") boxsca (0.9) random effect (Hazard Ratio) The output is omitted, and the forest plot for example 2 is shown Figure 13.

Figure 13: Forest plot for example 2.

Example data 2

Study id Hazard ratio

Lower limit

Upper limit

rdenal 1999 0.77 0.55 1.1Crino 1999 1.02 0.78 1.33

Page 23: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

Danson 2001 1.12 0.88 1.42Melo 2002 0.54 0.32 0.93Rudd 2002 0.76 0.61 0.93Sandler 2000 0.76 0.63 0.92Chang 2001 0.93 0.4 2.16Comella 2000 0.71 0.45 1.13Gridelli 2002 1.02 0.76 1.35Melo 2002 0.71 0.41 1.22Scagliotti 2002 0.87 0.69 1.09Scagliotti 2002 1.04 0.83 1.31Schiller 2002 0.94 0.79 1.14Schiller 2002 0.92 0.76 1.1Schiller 2002 0.96 0.8 1.15Thomas 2002 0.89 0.53 1.49Van Meerbeeck 2001 0.9 0.65 1.25

Example 3

(Data from Example data 1)We used the metabias command to perform a test of small-study effects employing the commonly used Begg test. The commands are invoked as follows:gen selogehr=1/sqrt(varlogehr)metabias logehr selogehr, graph(begg)or metabias logehr varlogehr, var graph(begg)

Note: option 'var' specified.Tests for Publication BiasBegg's Test adj. Kendall's Score (P-Q) = 443 Std. Dev. of Score = 112.51 (corrected for ties) Number of Studies = 48 z = 3.94 Pr > |z| = 0.000 z = 3.93 (continuity corrected) Pr > |z| = 0.000 (continuity corrected)Egger's test-------------------------------------------------------------------- Std_Eff | Coef. Std. Err. t P>|t| [95% Conf. Interval]

Page 24: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

-------------+------------------------------------------------------ slope | .3788738 .1927232 1.97 0.055 -.0090579 .7668054 bias | 2.30764 .3894473 5.93 0.000 1.523724 3.091557--------------------------------------------------------------------

Both Begg’s test and Egger’s test gave a p-value=0.000, which strongly indicates the presence of small-study effects. Likewise, the funnel plot illustrates the existence of asymmetry and publication bias. Furthermore, the sign of the coefficient (positive) suggests that small studies overestimate the effect (or, alternatively, that negative and/or nonsignificant small studies are not included in the Example data 1).The Begg funnel graph of the data (Figure 14), which could have been selected with the graph(begg) option , provides additional support for this interpretation.

Figure 14: Funnel plot of the log hazard ratio against its standard error.

Example 4

(Data from Example data 2)The metabias command is invoked as follows:metabias hr ll ul, ci graph(begg)hr, ll and ul denote the hazard ratio, lower limit and upper limit, respectively; ci means that the user has specified the lower and upper confidence limits of the effect estimate, which is assumed

Page 25: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

to be on a ratio scale (e.g., hazard ratio, odds ratio or risk ratio); and graph(begg) means that the modified Egger’s test results are shown on the graph. (The output and forest plot are omitted.)

Example 5

Analysis of Example data 1 using Egger et al.‘s publication bias test (p=0.00), as provided in metabias, suggests that a publication bias may affect the data. To examine the potential impact of the publication bias on the interpretation of the data, metatrim is invoked as follows:metatrim logehr varlogehr, var eform reffect funnelThe random-effects model and displaying of the optional funnel graph are requested via the reffect and funnel options. The var option is required because the data were provided as log hazard ratios and variances. Metatrim provides the following output:Note: option 'var' specified.Meta-analysis | Pooled 95% CI Asymptotic No. ofMethod | Est Lower Upper z_value p_value studies-------+-------------------------------------------Fixed | 1.404 1.264 1.544 19.658 0.000 48Random | 1.681 1.439 1.924 13.588 0.000

Test for heterogeneity: Q= 114.532 on 47 degrees of freedom (p= 0.000)Moment-based estimate of between-study variance = 0.363

Trimming estimator: LinearMeta-analysis type: Random-effects model

iteration | estimate Tn # to trim diff----------+-------------------------------------- 1 | 1.681 692 4 1176 2 | 1.540 788 8 192 3 | 1.408 878 12 180 4 | 1.326 929 14 102 5 | 1.268 956 15 54 6 | 1.219 997 17 82 7 | 1.181 1009 18 24 8 | 1.168 1012 18 6 9 | 1.168 1012 18 0

Filled

Page 26: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

Meta-analysis (exponential form)

| Pooled 95% CI Asymptotic No. ofMethod | Est Lower Upper z_value p_value studies-------+-------------------------------------------Fixed | 3.140 2.759 3.574 17.332 0.000 66Random | 3.401 2.624 4.407 9.257 0.000

Test for heterogeneity: Q= 217.631 on 65 degrees of freedom (p= 0.000)Moment-based estimate of between-study variance = 0.692

metatrim finishes with a call to program meta to report an analysis of the trimmed and filled data. It can be seen that there are now 66 studies, composed of 48 observational studies plus 18 imputed studies. The random-effects summary estimate changes from 1.681 with a confidence interval (CI) of (1.439, 1.924) to 1.224 with a CI of (0.965 1.483). The new estimate, although lower, remains statistically significant, and a correction for publication bias does not change the overall interpretation of the dataset. The addition of“missing” studies results in an increased variance between studies, with the estimate increasing from 0.0363 to 0.692, and the evidence of heterogeneity in the dataset remains unchanged, at p =0.00 in the observed data versus p=0.00 in the filled data.

The funnel plot (Figure 15) graphically shows the final filled estimate (as a horizontal line) and the augmented data (as points), along with pseudo confidence-interval limits intended to assist in visualizing the funnel. The plot indicates the imputed data with a square around the data symbol. The filled dataset is much more symmetric than the original data, and the plot shows no evidence of a publication bias.

Figure 15: Funnel plot with ‘filled’ studies for analysis of Example data 1

Page 27: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

Example 6

(Data from Example data 2)The metatrim command invoked is as follows:metatrim hr ll ul, ci eform reffect funnel(The output and funnel plots are omitted.)

References

Borenstein M, Hedges LV, Higgins JP, Rothstein HR (2009) introduction to meta-analysis. A John Wiley and Sons,

Publication.

Harbord RM, Whiting P. metandi: meta-analysis of diagnostic accuracy using hierarchical logistic regression. In:

Sterne JAC, ed (2009) Meta-analysis in STATA: an updated collection from the STATA Journa College Station, TX,

USA: STATA Press 181–199

Higgins JP, Thompson SG, Deeks JJ, Altman DG (2003) Measuring inconsistency in meta-analyses. BMJ 327(7414):

557–560

Le Chevalier T, Scagliotti G, Natale R, Danson S, Rosell R, Stahel R, Thomas P, Rudd RM, Vansteenkiste J, Thatcher

N, Manegold C, Pujol JL, van Zandwijk N, Gridelli C, van Meerbeeck JP, Crino L, Brown A, Fitzgerald P, Aristides M,

Schiller JH (2005) Efficacy of gemcitabine plus platinum chemotherapy compared with other platinum containing

regimens in advanced non-small-cell lung cancer: a meta-analysis of survival outcomes. Lung Cancer 47(1):69-80

Parmar MK, Torri V, Stewart L (1998) Extracting summary statistics to perform meta-analyses of the published

literature for survival endpoints. Stat Med 17:2815–2834

Reitsma JB, Glas AS, Rutjes AW, Scholten RJ, Bossuyt PM, Zwinderman AH (2005) Bivariate analysis of sensitivity

and specificity produces informative summary measures in diagnostic reviews. J Clin Epidemiol 58(10): 982–990

Review Manager (RevMan) [computer software] (2011) Version 5.1 Copenhagen, Denmark: Cochrane

Collaboration

Riley RD, Burchill SA, Abrams KR, Heney D, Lambert PC, Jones DR, Sutton AJ, Young B, Wailoo AJ, Lewis IJ (2003)

systematic review and evaluation of the use of tumour markers in paediatric oncology: Ewing's sarcoma and

neuroblastoma. Health Technol Assess 7(5):1-162

Rothstein HR, Sutton AJ, and Borenstein M (2005) Publication Bias in Meta-Analysis: Prevention, Assessment and

Adjustments. Chichester, UK: Wiley

Rutter CM, and Gatsonis CA (2001) A hierarchical regression approach to meta-analysis of diagnostic test accuracy

evaluations. Stat Med 20 (19): 2865–2884

Zamora J, Abraira V, Muriel A, Khan K, Coomarasamy A (2006) Meta-DiSc: a software for meta-analysis of test

accuracy data. BMC Med Res Methodol 6: 31

Page 28: Supplementary Materials - · Web viewSupplementary Materials Appendix 1 The MEDLINE database and the topic of colorectal cancer serum markers will next serve as an example to interpret

Recommended