+ All Categories
Home > Health & Medicine > Multiple Comparisons and SAS Arrays in Clinical Trials

Multiple Comparisons and SAS Arrays in Clinical Trials

Date post: 25-Jun-2015
Category:
Upload: makrocare-clinical-research-limited
View: 681 times
Download: 2 times
Share this document with a friend
Description:
MakroCare is a global functional service provider specialized in Biostatistics, SAS Programing. The companies state-of-the-art facility in Hyderabad, India is comprised of highly qualified SAS programmers dedicated to biopharmaceutical development projects.
Popular Tags:
5
01 Most of the doubts in the results of randomized clinical trials (RCT) result either from inadequate sample size or from problems of multiplicity. The Problems of multiplicity arises from the testing of multiple hypotheses or the testing of a hypothesis at multiple points in time. Several common problems of this type includes: multiple analyses of accumulating data at different time points like frequent interim analysis, analyses of multiple endpoints, multiple subgroup analysis of Subjects, multiple treatment group contrasts and interpreting the results of multiple clinical trials especially in meta analysis. The Clinical trials often require number of outcomes to be calculated and a number of hypotheses to be tested. Such testing involves comparing treatments using multiple outcome measures with univariate statistical methods. Studies with multiple outcome measures occur frequently within medical research. Some researchers recommend adjusting the p-values when clinical trials use multiple outcome measures so as to prevent the findings from falsely claiming "statistical significance". But some researches have not agreed with this strategy, because it is not appropriate and may cause mislead the conclusions from the study. Multiple tests make the traditional 0.05 level of test no longer necessarily valid and needs to be controlled. However, in the case of a study that includes multiple-treatment groups and/or multiple endpoints, a multiple hypotheses testing procedure is used to control the type 1 error. Most commonly used multiple tests procedures are: 1. Bonferroni and Sidak procedures: The Bonferroni is probably the most commonly used test, because it is highly flexible, very simple to compute, and can be used with any type of statistical test. We divide the level of significance by total no-of comparisons. For example if in a clinical trial we compare two treatments within five subsets of Subjects the treatments will be significantly different at the 0.01 level if there is a P value less than 0.01(α * = 0.05 /5) within any of the subsets. In Sidak procedure, which is modified procedure of Bonferroni, each test is carried out at level α* = 1- [(1-α) 1/K]. These methods are recommended when the comparison tests are independent. 2. Dunnett’s test: This is a classical and a frequently used test for many randomized laboratory experiments and even for clinical trials where multiple-treatment means are compared to that of the control; the multiple treatments are often multiple doses of the same treatment. 3. Scheffe, Tukey, and Tukey–Kramer procedures: 1.These procedures give great flexibility for many randomized biological experiments where variations between experimental units are not of major concern and the experimenter is not able to specify specific comparisons or contrasts in advance. Scheffe’s procedure provides a way of looking into all possible linear contrasts of K treatment means with the adjustment of Type I error rate inflation. Similarly, Tukey and Tukey–Kramer procedures provide tests and confidence intervals for all possible pairwise comparisons of the treatment means. However, as the number K increases, these methods become quite conservative in declaring significance. Are multiple comparisons really needed? Here are three situations where multiple comparisons are not needed. 1.1.The account for multiple comparisons when we interpret the results rather than in the calculations The Testing of multiple hypotheses at once creates a confusion that cannot be escaped. If we do not make any corrections for multiple comparisons, it becomes 'very easy' to find 'significant' results by chance -- it is too easy to make a Type I error. But if we do corrections for multiple comparisons, we lose power to detect real differences -- it is too easy to make a Type II error. The only way to escape this dilemma is to focus on analyses, and thus avoid making multiple comparisons. For example, if the treatments are ordered, then don't compare each mean with other means Newsletter February 2011 Multiple Comparisons in Clinical Trial
Transcript
Page 1: Multiple Comparisons and SAS Arrays in Clinical Trials

01

Most of the doubts in the results of randomized clinical trials (RCT) result either from inadequate sample size or from problems of multiplicity. The Problems of multiplicity arises from the testing of multiple hypotheses or the testing of a hypothesis at multiple points in time. Several common problems of this type includes: multiple analyses of accumulating data at di�erent time points like frequent interim analysis, analyses of multiple endpoints, multiple subgroup analysis of Subjects, multiple treatment group contrasts and interpreting the results of multiple clinical trials especially in meta analysis. The Clinical trials often require number of outcomes to be calculated and a number of hypotheses to be tested. Such testing involves comparing treatments using multiple outcome measures with univariate statistical methods. Studies with multiple outcome measures occur frequently within medical research. Some researchers recommend adjusting the p-values when clinical trials use multiple outcome measures so as to prevent the �ndings from falsely claiming "statistical signi�cance". But some researches have not agreed with this strategy, because it is not appropriate and may cause mislead the conclusions from the study.

Multiple tests make the traditional 0.05 level of test no longer necessarily valid and needs to be controlled. However, in the case of a study that includes multiple-treatment groups and/or multiple endpoints, a multiple hypotheses testing procedure is used to control the type 1 error.

Most commonly used multiple tests procedures are:

1. Bonferroni and Sidak procedures:The Bonferroni is probably the most commonly used test, because it is highly �exible, very simple to compute, and can be used with any type of statistical test. We divide the level of signi�cance by total no-of comparisons. For example if in a clinical trial we compare two treatments within �ve subsets of Subjects the treatments will be signi�cantly di�erent at the 0.01 level if there is a P value less than 0.01(α * = 0.05 /5) within any of the subsets. In Sidak procedure, which is modi�ed procedure of Bonferroni, each test is carried out at level α* = 1- [(1-α) 1/K]. These methods are recommended when the comparison tests are independent.

2. Dunnett’s test:This is a classical and a frequently used test for many randomized laboratory experiments and even for clinical trials where multiple-treatment means are compared to that of the control; the multiple treatments are often multiple doses of the same treatment.

3. Sche�e, Tukey, and Tukey–Kramer procedures:1.These procedures give great �exibility for many randomized biological experiments where variations between experimental units are not of major concern and the experimenter is not able to specify speci�c comparisons or contrasts in advance.

Sche�e’s procedure provides a way of looking into all possible linear contrasts of K treatment means with the adjustment of Type I error rate in�ation. Similarly, Tukey and Tukey–Kramer procedures provide tests and con�dence intervals for all possible pairwise comparisons of the treatment means. However, as the number K increases, these methods become quite conservative in declaring signi�cance.

Are multiple comparisons really needed?

Here are three situations where multiple comparisons are not needed. 

1.1.The account for multiple comparisons when we interpret the results rather than in the calculationsThe Testing of multiple hypotheses at once creates a confusion that cannot be escaped. If we do  not  make any corrections for multiple comparisons, it becomes 'very easy' to �nd 'signi�cant' results by chance -- it is too easy to make a Type I error. But if we  do corrections for multiple comparisons, we lose power to detect real di�erences -- it is too easy to make a Type II error. The only way to escape this dilemma is to focus on analyses, and thus avoid making multiple comparisons. For example, if the treatments are ordered, then don't compare each mean with other means

Newsletter Fe b r u a r y 2 0 1 1

Multiple Comparisons in Clinical Trial

Page 2: Multiple Comparisons and SAS Arrays in Clinical Trials

Introduction:Statistical Analysis System (SAS) is an integral part of clinical trial data management and statistical analysis. The Regulatory agencies like FDA insist to use SAS for clinical trial data analysis. SAS programmers write programs in various ways to produce the tables, listings and �gures (TLFs). However the e�cient programmers write few lines of code to produce the �nal TLFs. The objective of this paper is to highlight the use of arrays in SAS programming which may be most e�cient, time saving and cost e�ective in the pharmaceutical industry.

(multiple comparisons), instead just perform one test for trend to check if the outcomes are linearly related or not. Another situation is that if there is a positive and negative control groups included apart from experimental groups, then don't include them as part of the ANOVA and as part of the multiple comparisons. Some statisticians recommend that no need for correcting type 1 error for multiple comparisons while analyzing data. Instead report all individual P values and con�dence intervals, and make it clear that no mathematical correction was made for multiple comparisons. When we interpret these results, we need to informally account for multiple comparisons.

2. The corrections or adjustments may not be needed if we make only a few planned comparisonsSome statisticians recommend not doing any formal corrections or adjustment for multiple comparisons when the study focuses only on a few scienti�cally sensible comparisons, rather than every possible comparison. The term planned comparison to describe this situation (Planned comparison: It requires that we focus on a few scienti�cally sensible comparisons, we can't decide which comparisons to do after looking at the data. The choice must be based on the scienti�c questions we are asking, and be chosen when we design the experiment).

3. The Correction or adjustment for multiple comparisons are not needed when the comparisons are complementaryThe example for this situation is taken from the study reported by Ridker and colleagues. They asked whether lowering LDL cholesterol would prevent heart disease in Subjects who did not have high LDL concentrations and did not have a prior history of heart disease (but did have an abnormal blood test suggesting the presence of some in�ammatory disease). The study included almost 18,000 subjects. Half of subjects received a statin drug to lower LDL cholesterol and half received placebo. The investigators’ primary goal was to compare the number of “end points” which occurred in the two groups, including deaths from a heart attack or stroke, nonfatal heart attacks or strokes, and hospitalization for the chest pain. These events happened about half as often to many Subjects treated with the drug

compared to subjects taking placebo. The drug worked. The investigators also analyzed each of the endpoints. Those taking the drug had fewer deaths, and fewer heart attacks, and fewer strokes, and fewer hospitalizations for chest pain (compared to those who are taking placebo). The data from di�erent demographic groups were then analyzed separately. Separate subgroup analyses were done for men and women, old and young, smokers and nonsmokers, subjects with hypertension and without, subjects with a family history of heart disease and those without. In each of 25 subgroups, Subjects receiving the Statin drug experienced fewer primary endpoints than those taking placebo, and all these e�ects were statistically signi�cant. The investigators had made no correction for multiple comparisons for all these separate analyses of outcomes and subgroups. No adjustments or corrections were needed, because the results are so consistent. The each multiple comparisons ask the same basic question in a di�erent way, and all comparisons pointed to the same conclusion that subjects taking the drug had less cardiovascular disease than those taking placebo.

The treatment comparisons in randomized clinical trials usually involve many endpoints such that conventional signi�cance testing can seriously in�ate the overall type 1 error rate. One option is to select a single endpoint for formal statistical inference, but this is not always feasible. Another approach is to apply bonferroni correction (i.e. multiply each p-value by the total no-of comparisons). The excessive use of the multiple signi�cance tests in clinical trials can greatly increase the probability of false positive �ndings. The problem is di�cult by the fact that endpoints are usually correlated or related and studies often have a mixture of data types, e.g. quantitative, binary and survival data. Perhaps the common method in the medical literature is to analyze each endpoint separately, presenting multiple p-values and an overall subjective conclusion. At best, this provides an open display of data enabling readers to draw their own (possibly di�erent) conclusions. WWWhenever multiple comparisons are taking place we need to adjust the type 1 error rate accordingly, except a few situations mentioned in this paper

02

SAS Arrays in Clinical Trials

(multiple comparisons), instead just perform one test for trend to check if the outcomes are linearly related or not.

Newsletter Fe b r u a r y 2 0 1 1

02

Page 3: Multiple Comparisons and SAS Arrays in Clinical Trials

USING ARRAY INDEXES:The array index is the range of array elements.

For example, the temperature for each of the 24 hours of the day is de�ned as:

array temperature_array {24} temp1 – temp24;

There may be scenarios when the index has to begin at a lower bound other than 1 (say 6) and upper bound other than 24 (say 18). This is possible by modifying the subscript value when the array is de�ned.

array temperature_array {6:18} temp6 – temp18;

The subscript can be written as the lower bound and upper bound of the range, separated by a colon.

ONE DIMENSION ARRAYS:The array statement to de�ne the one-dimensional array will be, for example

array temperature_array {24} temp1 – temp24;

The array has 24 elements for the variables TEMP1 through TEMP24.

When the array elements are used within the data step the array name and the element number will reference them. For example, the reference to the ninth element in the temperature array is: temperature_array{9}

MULTI-DIMENSION ARRAYS:If there is more than one dimension then it is a Multi Dimensional array.

For Example, the array statement to de�ne the two-dimensional array will be:

An array statement must contain either all numeric or all character elements. i.e. mixed type variables are not allowed

An array statement must be used to de�ne an array before the array name can be referenced

If the elements are not speci�ed on the ARRAY statement, SAS will use the Array name, append an element number as a su�x starting at 1 and check to see if that variable name exists already in the Program Data Vector (PDV). If those variable names do not exist, it is the array that actually creates them as variables in the PDV

_TEMPORARY_ signals to SAS that it does not need to create actual variables In the PDV for this array and that the elements of the array will be held in memory but not output as variables to the data set.

By using the asterisk '*', SAS will count the number of array variables

The array name can be any name as long as it does not match any of the variable names in data set or any SAS keywords and it must adhere to the SAS naming convention

Array names cannot be used in label, format, drop, keep or length statements

SAS SystemSAS System helps to analyze and organize a collection of data items using SAS programming statements. A SAS program is a collection of SAS statements in a logical sequence.

SAS is available in multiple computing environments like Windows, Unix etc.

SAS Array It is a temporary grouping of SAS variables that are arranged in a particular order

Once the array has been de�ned the programmer is now able to perform the same tasks for a series of related variables, the array elements. Arrays are widely used in the Pharmaceutical Industry.

The use of arrays allows simplify processing of SAS. Arrays helps read and analyze repetitive data with a minimum coding.

The ARRAY statement de�nes the elements in an array. These elements will be processed as a group and refers to elements of the array by the array name and subscript.

Syntax:Array array-name (index variable) <$> <length> array-elements <(initial-values)>;

The ARRAY statements provides the following information about SAS array:

is identi�ed by an array name

exists only for the duration of the current DATA step

is not a variable

These SAS variable lists enable to reference variables that have been previously de�ned in the same DATA step

_NUMERIC_ indicates all numeric variables

_CHARACTER_ indicates all character variables

_ALL_ indicates both numeric and character variables

RULES FOR ARRAY STATEMENTS:Some important rules to keep in mind when using arrays in SAS programs:

array-name – Any valid SAS name

index variable– Number of elements within the array

$ - Indicates character type variables are elements within the array

array-elements – List of SAS variables to be part of the array

length – A common length for the array elements

initial values – Provides the initial values for each of the array elements

Newsletter Fe b r u a r y 2 0 1 1

03

Page 4: Multiple Comparisons and SAS Arrays in Clinical Trials

array ae_array {3, 12} aeterm1-aeterm12 Preferredterm1 – Preferredterm12 visit1 - visit12 ;

The array contains three sets of twelve elements. When the array is de�ned the number of elements indicates the number of rows (�rst dimension), and the number of columns (second dimension).

TEMPORARY ARRAYS:A temporary array is an array that exists only for the duration of the data step where it is de�ned. A temporary array is useful for storing constant values, which are used in calculat ions. I n a temporar y ar ray there are no corresponding variables to identify the array elements. The elements are de�ned by the key word _TEMPORARY_.

Example: array systolicbp {6} _temporary_ (120 103 114 132 109 105);

EXPLICIT VS IMPLICIT SUBSCRIPTING:Earlier versions of SAS originally de�ned arrays in a more implicit manner as follows:

array array-name<(index-variable)> <$> <length> array-elements <(initial-values)>;

When an implicit array is de�ned, processing for every element in the array may be completed with a DO-OVER statement, an index variable may be indicated after the array name, For Example,

*** Implicitly subscripted array;DATA ftoc; INPUT month $ f1-f7; ARRAY f(i) f1-f7; ARRAY c(i) c1-c7; DO over f; c=(f-32)*5/9; END; FORMAT c1-c7 4.1; CARDS;aug 94 98 99 98 99 96 91 90 88 89sept 93 92 87 87 89 90 91 92 82 80;PROC PRINT;TITLE1 'DATA: FTOC';TITLE2 'Implicit Array Example'; run;

TITLE;RUN;;

This di�ers from the explicit array, previously discussed where a constant value or an asterisk, as the subscript, denotes the array bounds. For Example,

*** Explicitly subscripted array;DATA ftoc2; INPUT month $ f1-f7; ARRAY f{7} f1-f7; ARRAY c{7} c1-c7;

DO i=1 to 7; c{i}=( f{i}-32 )*5/9; END; FORMAT c1-c7 4.1; CARDS;aug 94 98 99 98 99 96 91 90 88 89sept 93 92 87 87 89 90 91 92 82 80;

PROC PRINT; title1 'DATA; FTOC2'; title2 'Explicit Array Example';RUN;

SORTING ARRAYS:SORTQ can be used for character �elds and SORTN can be used to sort numeric variables. An example of sorting several numeric variables is as follows:

data _null_;array xarry{6} x1-x6;set datasetname;call sortn(of x1-x6);run;

Following are some of the functions widely used in arrays.

HBOUND FUNCTION:This function returns the upper bound of the dimension of an array.

Example 1: One-dimensional Array In this example, HBOUND returns the upper bound of the dimension, a value of 5. Therefore, SAS repeats the statements in the DO loop �ve times. array big{5} weight sex height state city; do i=1 to hbound(big5); more SAS statements.... end;

Example 2: Multidimensional Array This example shows two ways of specifying the HBOUND function for multidimensional arrays. Both methods return the same value for HBOUND, as shown in the table that follows the SAS code example. array mult{2:6,4:13,2} mult1-mult100;

LBOUND Function:This function returns the lower bound of the dimension of an array.

Example 1: One-dimensional Array In this example, LBOUND returns the lower bound of the dimension, a value of 2. SAS repeats the statements in the DO loop �ve times.

HBOUND (MULT) HBOUND (MULT, 1) 6

HBOUND2 (MULT) HBOUND (MULT, 2) 13

Syntax Alternative Syntax Value

HBOUND3 (MULT) HBOUND (MULT, 3) 2

Newsletter Fe b r u a r y 2 0 1 1

04

Page 5: Multiple Comparisons and SAS Arrays in Clinical Trials

array big{2:6} weight sex height state city; do i=lbound(big) to hbound(big); ...more SAS statements...; end;

Example 2: Multidimensional Array This example shows two ways of specifying the LBOUND function for multidimensional arrays. Both methods return the same value for LBOUND, as shown in the table that follows the SAS code example. array mult{2:6,4:13,2} mult1-mult100;

DIM FUNCTION:This function returns the total number of elements in an array.

Example 1: One-dimensional Array In this example, DIM returns a value of 5. Therefore, SAS repeats the statements in the DO loop �ve times. array big{5} weight sex height state city; do i=1 to dim(big); more SAS statements; end;

Example 2: Multidimensional Array This example shows two ways of specifying the DIM function for multidimensional arrays. Both methods return the same value for DIM, as shown in the table that follows the SAS code example.

array mult{5,10,2} mult1-mult100;

About MakroCareMakroCare is a global drug development services �rm that operates through 4 main divisions - CRO, SMO, Informatics and Consulting. Integrated and innovative services in the areas of regulatory a�airs, risk management, site management, patient recruitment, trial management (P II/III and late phase), biometrics, QA audits, PV/Safety, and informatics.

Arrays Vs Proc TransposeTo transpose the data (turning variables into observations or turning observations into variables), one can use either PROC TRANSPOSE or array processing within a DATA step.

A Simple Transposition:For example, in a simple situation, where the program should transpose observations into variables. In the

original data, each person has 3 observations. In the �nal version, each person should have just one observation. In the "before" scenario, the data are already sorted BY NAME DATE:

NAME DATE1, Amy Date #A1Amy Date #A2Amy Date #A3Bob Date #B1Bob Date #B2Bob Date #B3

In the "after" scenario, the data will still be sorted by NAME:

NAME DATE1 DATE2 DATE3Amy Date #A1 Date #A2 Date #A3Bob Date #B1 Date #B2 Date #B3

The PROC TRANSPOSE program is as follows:

PROC TRANSPOSE DATA=OLD OUT=NEWPREFIX=DATE;VAR DATE;BY NAME;

The PREFIX= option controls the names for the transposed variables (DATE1, DATE2, etc.) Without it, the names of the new variables would be COL1, COL2, etc. Actually, PROC TRANSPOSE creates an extra variable, _NAME_. _NAME_ has a value of DATE on both observations, indicating the name of the transposed variable

The equivalent DATA step code using arrays could be:

DATA NEW (KEEP=NAME DATE1-DATE3);ARRAY DATES {3} DATE1-DATE3;DO I=1 TO 3;SET OLD;DATES{I} = DATE;END;

However, the programmer could choose either proc transpose or arrays in the data step.

Conclusion:Arrays play a vital role and is much e�cient in SAS programming in clinical trial data management and statistical analysis. Since arrays reduce CPU time, cost e�ective and reduces repetitive coding, it is a better choice for SAS programmers in their daily programming activities

References1. The Little SAS Book. Lora D. Delwiche and Susan J Slaughter2. SAS Language reference from sas.com

DIM3 (MULT) DIM (MULT, 3) 2

DIM (MULT) DIM (MULT, 1) 5

DIM2 (MULT) DIM (MULT, 2) 10

Syntax Alternative Syntax Value

2LBOUND3 (MULT) LBOUND (MULT, 3)

4LBOUND2 (MULT) LBOUND (MULT, 2)

2LBOUND (MULT) LBOUND (MULT, 1)

Syntax Alternative Syntax Value

05www.makrocare.com

Newsletter Fe b r u a r y 2 0 1 1


Recommended