Day 2 Track 2
Conference for Statistical Programmers In Clinical Research (ConSPIC) 2011
Bengaluru
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
ConSPIC 2011 Page iv
Day 2
Time Track 1 (Lalit 1&2) Track 2 (Lalit 3&4)
9.30 – 11.00 Session 1 – Reporting Session 1 – Statistics
D2S1T1
&
D2S1T2
Subject Narratives through SAS -
Devayani Deodhar
Personalized medicines – Role of
ROC analysis using SAS 9.2 -
Muralikrishna C
Techniques in RTF formatting in
clinical trial reports - Sameer
Bamnote
Comparison of multiple SAS
procedures to perform statistical
activity for same objective -
Pradeep Acharya
Overview of NLS (National
Language Support) in SAS - K.
E. Sudarshan
Simulating clinical trial data
using SAS - Ramsathish S
Template procedure, and
customizing style template -
Rubia Shaik
Statistical and Graphical
Methods used in clinical trials -
Meghana Marathe
11.00 – 11.30 Break
11.300 – 1.00 Session 2 – SAS and Beyond Session 2 – Efficiency
D2S2T1
&
D2S2T2
Graphics across languages -
Tapas Chakraborty
Hashing Unleashed!!! - Pratibha
Jalui
Bookmarking and hyperlinking
in PDF and HTML outputs using
SAS - Sugunesh Sivalingam
Dorfman-Whitlock DO- (DOW-)
Loop –A Loop for N to one data
step programming - Periasamy K
Oracle Clinical for SAS
programmers - Sandeep Kumar
Lets go for a picture (format) -
Senthilkumar Karuppiah
Using R to validate SAS outputs
- Nikhil Abhyankar
Proc transpose for Horizontal
data - Ramesh Sundaram
1.00 – 2.00 Lunch
2.00 – 3.30 Session 3 – SDTM/CDISC Session 3 - Quality
D2S3T1
&
D2S3T2
Clinical Data Standardization
Methodology - Pankaj
Bharadwaj
10 simple ways to do quality
checks on database -
Jayapandian N
SDTM data conversion system -
Ashwin Venkat
Why should stats have all the
fun? – Priya Iyer
Journey into the world of SAS
clinical standards toolkit and
Proc CDISC - Melanie Vaz
Electronic validation for clinical
data and summary reports -
Amruta Pathak
Validation of SDTM metadata
using an utility - Jegan Pillaiyar
RTF to SAS datasets -
Vijaybhaskar Reddy
3.30 – 4.00 Break
4.00 – 5.15 AGM (Lalit 1 – 4)
- By Vinay Mahajan, Mahesh Babu, Ramsathish S
Data Scrambling
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
All material in these slides are the opinion of the speakers, and do not reflect the views of Novartis
Pharmaceuticals.
Disclaimer
| ConSPIC 2011| Author | Sep 29-30, 2011 2
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Purpose
• To blind the data or Scrambling the data
• blinded outputs for clinical team review• to create output for client review before un-blinding the study• we can create test data• Data for edit check testing
3 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Different type of studies
• Blinded studies (Single blind, Double blind)
• Un-blinded studies (Open label)
• FDA recommendation: To keep the blind of the study as high possible
4 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Scrambling the what does it mean? How to do this?
5 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
• For a given study if there is 100 patient data available in a form of SAS datasets, then
• completely dismantle linkage across and within CRF pages for a patient level info
• but keep number of patients close to the original number in the resultant datasets
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Types of scrambling data
6 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
• Type 1: Subject level patient number changing
• Type 2: Record level patient number changing
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Scrambling data: flowchart for Type1
Identify a study with available SAS data
Go through the source data library for all the datasets and
unique identifier
Setup macro variables (1) STYSID1A (unique ID)(2) Input Library name(3) Output Library Name.
USER INPUT
Using new dataset and creates a new order for new variable using “Proc Plan”
Creates a new dataset with new variable by using _n_
Creates a dataset with unique dataset name [one
record/dataset]
merge new dataset and main dataset with new variable as
by var
Send the final dataset to output library
Output libraryScrambling complete
7 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Type 1 example:
8 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Scrambling data: flowchart for Type2
9 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
Finds unique dataset in work library
Creates a macro variable contains number of unique
datasets in the library
Setup macro variables (1) patient IDUSER INPUT
Merge the records in original dataset and the newly created
dataset named new by the common variable _n_
Variable _n_ will be created in each of the original dataset
Create a dataset with 2 variables: Patient ID + _n_ for
each dataset when the iteration runs. Patient ID will be picked in a random order from the source dataset without excluding any
Patient ID
Real number of patients in a dataset will be the one which we get from the dataset called new
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Type 2 example:
10 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Pros• Complete dismantling of the data• Patient data at visit level as well as across CRF pages delinked• Could be a very useful way to create dummy data• Useful tool to create a lot data issues – in turn useful data to write
data validation programs
Cons• No way to check data consistency across reports• May have some funny data issues e.g. Visit dates appearing after
death
11 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
Pros and Cons
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
- Vinay Mahajan, Mahesh Babu, Ramsathish S
Creating Dummy Data
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
To test new programs
To develop new codes
To create dummy project
To train new programmers
Purpose
13 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
User Input part
Automatic part
About the program
14 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Defining Key variables• Subject ID• Site ID• Visits• Treatments• Days for each visits• Sex• Ethnicity• Race• Age range• Child baring potential options• Output Library Name
User Input part
15 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Example:
No of adverse events
Severity levels and grades
No of Medical History events
Etc
Automatic Part
16 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
17 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
Medical History Dataset
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Pros We can generate our own data We can use this data any trainings Generation is very simple it is very useful for training purpose
Cons Limited to few datasets User inputs are more AE-terms and MH-terms are not original terms We may get few data issues We may not get accurate data
18 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
Pros and Cons
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
19 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
20 | ConSPIC 2011| Vinay Mahajan, Mahesh Babu, Ramsathish S | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 1 -
STATISTICAL & GRAPHICAL METHODS USED IN
EXPLORATORY TRIALS
Meghana MarathePinakin Jani
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 2 -
Agenda
Introduction Exploratory Analysis Statistical Methods Overview Case Study SAS Macro for Cancer Exploratory Trials Output Display Conclusion
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 3 -
Exploratory Analysis
Why exploratory analysis?
In present scenario we see lot of Independent New Drug Discovery (IND) and trend will increase in the upcoming years.
With current trends of efficacy, safety concerns and lengthy drug development processes, there is a critical need for efficiencies in drug development procedures.
In exploratory analysis data review, the goals are to quickly extract, display and review the salient safety and efficacy information in the data, generally using graphical and statistical methods, which facilitates quick interpretation and communication.
More emphasis is given on the published proposals and materials of exploratory trial analyses which may act as an benchmark for presenting and analyzing the data further in Clinical Trials.
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 4 -
Statistical Methods Overview
Introduction to Correlation
It is the extent to which – two or more quantitative variables are related:
Positively Correlated : the value of one variable “varying somewhat in step” with the values of another variable
Negatively Correlated : the values of one continuous variable “varying somewhat in opposite step” with the values of another variable
Not Correlated : the values of one continuous variable “varying randomly” with the values of another variable
Pearson’s Linear Correlation Coefficient (r):
The Pearson product-moment correlation coefficient is a measure of the co-relation between two variables x and y.
Pearson's r reflects the intensity of linear relationship between two variables. It ranges from +1 to -1.
value of (r) near 1 : Positive Correlationvalue of (r) near -1 : Negative Correlationvalue of (r) near 0 : No or poor correlationInd
ian A
ssoc
iation
for S
tatist
ics in
Clin
ical T
rials
CONFIDENTIALCorporate Presentation- 5 -
Statistical Methods Overview ………….continued
Assumptions of Pearson’s r
There is a linear relationships between x and y Both x and y are continuous random variables Both variables are normally distributed Equal differences between measurements represent equivalent intervals.
Spearman’s Rank Correlation Coefficient (ρ) or (rho):
• Spearman's rank correlation is a non parametric measure of the intensity of a correlation between two variables, without making any assumptions about the distribution of the variables, i.e. about the linearity, normality or scale of the relationship.
value of (ρ) near 1 : Positive Correlationvalue of (ρ) near -1 : Negative Correlationvalue of (ρ) near 0 : No or poor correlation
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 6 -
Case Study
Overview of Graphical & Statistical methods used in Exploratory Cancer Trial:
Data Domain: Laboratory
Graphical Method: Scatter Plot with regression fit
Statistical Methods Used: Spearman Rank Correlation / Pearson Correlation
Inferential Statistics 95% Confidence Interval (CI) using parameters from Spearman Correlation.
In our examples we show how appropriate statistical graphics elements may be used to extract and highlight salient information in the clinical study data.
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 7 -
SAS Macro for Cancer Exploratory Trials
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 8 -
Output Display (End of Treatment Period)
Interpretation:
•As per the output we see that the values of the variables are Negatively correlated.
•For Treatment A, the value of rho is greater than -0.9 which implies NK is negatively correlated to CD3_TUM.
•Similar Trend is observed for Treatment B and C.
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 9 -
Conclusion
In Exploratory Cancer trials, we recommend to use Correlation Analysis to find relationship between the target variable which provides scientific evidence for easy interpretation and hence decision-making.
Standard Macro can be developed for use across the similar studies and can be modified for other functional areas.
Similarly there are several statistical and graphical methods that can be optimized for the better analysis and presentation of the clinical data which would play vital role in the Independent New Drug research market and hence would provide robust platformfor further analysis in clinical trials.
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 10 -
References
• http://www.springer.com/cda/content/document/cda_downloaddocument/9780387988146-c7.pdf?SGWID=0-0-45-101848-p2018642
• http://support.sas.com/resources/papers/proceedings10/234-2010.pdf
We gratefully acknowledge the support provided by TCS SPA management Team and IASCT for providing this opportunity.
Acknowledgement
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 11 - 11
Thank You
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Muralikrishna ChakravarthulaSenior Statistical Analyst - INovartis Oncology
Personalized Medicines – Role of ROC Curve analysis using SAS® 9.2
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Disclaimer
All opinions expressed in this presentation are the authors’ personal views, and do not reflect the views or opinions of Novartis
| ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011 2
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Introduction
ROC Curve and Its Uses
Conceptual definitions – Measurements of quality of the test
Example: PSA as a biomarker for Prostate cancer
ROC curve - PROC LOGISTIC – SAS 9.2
Area under the ROC curve
AGENDA
| ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011 3
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
4
Introduction
| ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
Increasing trend towards developing personalized (individualized) medicines through evaluation of new biomarkers.
Right patient selection is a primary goal. • Diagnostic tests plays major role.
Build a robust diagnostic/clinical test.
Evaluate diagnostic test - Receiver Operating Characteristic (ROC) curve analysis.
Note: Receiver-operating characteristic (ROC) analysis was originally developed during World War II for radar images. The first applications of this theory with in the medical area occurred during late 1960s.Ind
ian A
ssoc
iation
for S
tatist
ics in
Clin
ical T
rials
ROC curve - Uses
5 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
ROC Curve analysis is a standard analytical tool for evaluating diagnostic tests. • It is the plot of sensitivity on the vertical axis and 1-specificity on the
horizontal axis for all possible thresholds in the study data set. • The area under the ROC curve is an effective way to summarize the
overall diagnostic accuracy of the test.
Generally it is used to
• Describe a diagnostic test.- Associated measures - Sensitivity, Specificity, Accuracy, Area
under curve- Often, the calculation of sensitivity and specificity of the test are
depends on the specific threshold selected.• Also, determine a cutoff (Threshold) value for a clinical/
diagnostic test.Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Definitions – Four possible decisions
6 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
Test Result
Condition (Disease)Positive (Present)
Negative (Absent) Row total
Positive TP(# of True Positive)
FP(# of False Positive)
TP+FP (Total # of subjects with positive test)
Negative FN(# of False Negative )
TN (# of True negative)
FN + TN(Total # of subjects with negative test)
Column total
TP + FN(Total # of subjectsWith disease)
FP + TN(Total # ofsubjects without disease)
N= TP+TN+FP+FN(Total # of subjects in study)
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Measurements of the quality of the test.
7 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
Sensitivity : The probability of having a positive test among the patients who have a positive diagnosis. = TP/(TP + FN)
Specificity : The probability of having a negative test among the patients who have a negative diagnosis. = TN/(TN + FP)
Efficiency : TP + TN.
Accuracy : (TN+TP)/(TN+TP+FN+FP) = (sensitivity) (prevalence) + (specificity) (1 - prevalence).Here Prevalence is the probability of disease in the population at a given time, and it is known.
The area under the ROC curve is an effective way to summarize the overall diagnostic the testInd
ian A
ssoc
iation
for S
tatist
ics in
Clin
ical T
rials
Example 1: PSA as biomarker for Prostate cancer
8 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
Example : Simulated Prostate cancer data.
Biomarker : Prostate antigen (PSA)
Cutoff point : 4 ng/mL as a threshold/ cutoff point
Decision:1= Positive (abnormal), if PSA > 4 ng/mL0= Negative(normal), if PSA < 4 ng/mL
Actual Disease status : 1= Yes, 0=No
Note: Though PSA only may not serve as an accurate marker to detect Prostate cancer, I have used this data for the explanation purpose.Ind
ian A
ssoc
iation
for S
tatist
ics in
Clin
ical T
rials
Example 1: Prostate cancer data (cont.)
9 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
PSA results (0 = < 4 ng/mL = (- Ve) (1 = > 4 ng/mL = (+Ve)
Actual Disease condition 0= ( - ve) = No cancer1= (+ ve) = Cancer
1 11 11 00 10 00 00 11 11 0
..... .....
Data set – PROSTPCA
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Example 1: Prostate cancer (cont.) – SAS code for Freq counts
10 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
PROC FORMAT;VALUE cutoffmt 0 = “< 4 ng/ml "
1 = “4+ ng/ml "; VALUE prostfmt 0 = “ No Cancer ”
1 = “Cancer ”; RUN;
PROC FREQ DATA= PROSTPCA ORDER=formatted;
FORMAT psa cutoffmt. resp prostfmt.; LABEL psa=‘PCA level' resp='Prostate Cancer'; TABLES psa * resp / NOROW NOPERCENT;
RUN;Ind
ian A
ssoc
iation
for S
tatist
ics in
Clin
ical T
rials
Example 1: Prostate cancer (cont.) – Frequency counts from SAS output
11 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
Test results
Disease status / Condition Total Sensitivity= TP/(TP +FN)=0.83
Specificity= TN/(TN+FP)=0.58
Accuracy=(TN+TP)/(TN+TP+FN+FP) = 0.74
(Cancer) (no cancer)
Positive 66 (TP) 18 (FP) 84Negative 13 (FN) 25 (TN) 38Total 79 (TP+FN) 43 (FP + TN) 122Ind
ian A
ssoc
iation
for S
tatist
ics in
Clin
ical T
rials
Example 1: Prostate cancer (cont.) - 95 % CI
12 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
PROC FORMAT;VALUE cutoffmt 0 = “< 4 ng/ml "
1 = “4+ ng/ml "; VALUE prostfmt 0 = “ No Cancer ”
1 = “Cancer ”; RUN;
PROC FREQ DATA= PROSTPCA ORDER=formatted; FORMAT psa cutoffmt. resp prostfmt.; LABEL psa=‘PCA level' resp='Prostate Cancer'; TABLES psa / BINOMIAL; EXACT BINOMIAL; WHERE resp =1;
RUN;Ind
ian A
ssoc
iation
for S
tatist
ics in
Clin
ical T
rials
Example 1: Confidence Intervals from SAS output
13 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
NOTE: SAS generates confidence intervals for the proportion in the first row of the output (in this instance, 83.54%). Therefore, you should make sure that the proportions are listed in the order that places sensitivity (or specificity) in the first row. The code above uses ORDER=formatted to instruct SAS to use formatted values for ordering. Otherwise, the output would have generated confidence intervals for 16.46 %. The BINOMIAL option in the TABLES statement along with the EXACT BINOMIAL statement instructs SAS to produce confidence intervals using the normal approximation of the binomial distribution (asymptotic standard error or ASE) and the exact binomial distribution. SAS is able to efficiently calculate exact confidence intervals, which is the preferred method. Ind
ian A
ssoc
iation
for S
tatist
ics in
Clin
ical T
rials
Example 1: Prostate Cancer data - ROC Analysis using the LOGISTIC procedure in SAS 9.2
14 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
GENERATING THE ROC CURVE The empirical ROC curve is the plot of sensitivity on the vertical axis and 1-specificity on the horizontal axis for all possible thresholds in the study data set. It is often used to explore thresholds for the application of a new biomarker in clinical practice or to visually assess the overall performance of the biomarker.
With the release of SAS 9.2, ROC curves can be generated using standard ODS STATISTICAL GRAPHICS and simple LOGISTIC procedure statements. The code below generates an ROC curve for the Prostate Cancer data.
ODS GRAPHICS ON; PROC LOGISTIC DATA=prostpsa
PLOTS(ONLY)=ROC; MODEL resp (EVENT='1') = PSA; RUN; ODS GRAPHICS OFF;
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Example 1: Prostate Cancer data - ROC Curve
15 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
The PLOTS(ONLY)=ROC directs ODS STATISTICAL GRAPHICS to plot an ROC curve without plotting other standard graphs associated with PROC LOGISTIC.
The MODEL statement is constructed using the standard PROC LOGISTIC syntax (dependent variable = covariates) with EVENT=’1’ specified as the outcome we want to predict. Figure:1 ROC curve for biomarker PSA.
ODS GRAPHICS ON; PROC LOGISTIC DATA=prostpsa
PLOTS(ONLY)=ROC; MODEL resp (EVENT='1') = PSA; RUN; ODS GRAPHICS OFF;
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Example 1: Prostate Cancer Data – The Area under the ROC curve
16 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
The Area Under the curve
The area under the ROC curve (AUC) is the average sensitivity of the biomarker over the range of specificities. It is often used as a summary statistic representing the overall performance of the biomarker. A biomarker with no predictive value would have an AUC of 0.5 (also represented by the diagonal “chance” line above), while a biomarker with perfect ability to predict disease would have an AUC of 1.
In SAS 9.2, the empirical AUC is calculated and printed at the top of the ROC curve generated by PROC LOGISTIC. As shown in Figure 1, the PSA biomarker has an AUC of 0.7084 for the diagnosis of prostate cancer in the sample population.
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Accuracy classification Rule for a diagnostic test
17 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
Commonly used classification using AUC for a diagnostic test is summarized in the below table
AUC range Classification0.9 < AUC < 1.0 Excellent0.8 < AUC < 0.9 Good0.7 < AUC < 0.8 Worthless0.6 < AUC < 0.7 Not good
In short, ROC curve is a good tool to select possible optimal cut-point for a given diagnostic test.
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Example 1: Comparing with Chance AUC of 0.5
18 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
The AUC of a biomarker is often compared to chance which has an AUC of 0.5. The statistical test involves estimating AUCtest - AUCchance which is asymptotically normal. The code below performs this task using standard features in PROC LOGISTIC of SAS 9.2 for comparing ROC curves.
ODS GRAPHICS ON;PROC LOGISTIC DATA=PROSTPSA
PLOTS=ROC ROCOPTIONS(NODETAILS);MODEL resp (EVENT='1') = PSA / NOFIT;ROC 'PSA' PSA;ROC 'Chance';ROCCONTRAST REFERENCE('Chance') / ESTIMATE;run;ODS GRAPHICS OFF;
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Example 1: Comparing with Standard AUC of 0.5
19 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
In our example, the estimated AUC for PSA is statistically greater than 0.5, providing evidence that the PSA biomarker is useful for correctly classifying prostate cancer patients and patients without prostate cancer.
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
20 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
Example 2: Pancreatic cancer, CA-125 and CA19-9 Biomarkers
Pancreatic cancer data (Wieand et al. 1989)Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Example 2: Pancreatic cancer, CA-125 and CA19-9 Biomarkers (Contd..)
21 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
As displayed in figure, CA19-9 appears to perform better than CA-125, particularly in the area of the curve representing high specificity. Overall, the AUC of 0.86 for CA19-9 is significantly greater than the AUC of 0.71 for CA-125 (p=0.0065). Ind
ian A
ssoc
iation
for S
tatist
ics in
Clin
ical T
rials
Q & A
22 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
23 | ConSPIC 2011| Muralikrishna Chakravarthula | Sep 29 – 30, 2011
Thank you
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Comparison of Multiple SAS Procedures to Perform Statistical Activity for the same Objective
Pradeep AcharyaIASCT – ConSPIC 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Contents
• Introduction to hypothetical study and its objectives
• Introduction to Analysis of Covariance (ANCOVA)
• SAS procedures to perform ANCOVA
• ANCOVA results with GLM procedure
• ANCOVA results with MIXED procedure
• Comparison of GLM and MIXED procedures
• Conclusion
• Acknowledgement
• References
Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011 2
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Introduction to hypothetical study & its objectives
A multi-center, randomized, double blind, 24 weeks study to
compare efficacy and safety of Treatment A with Treatment B
in patients with Hypertension
3Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Introduction to the study continued...
Inclusion Criteria:• Clinical diagnosis of hypertension
• Age between 18 years and 75 years
Exclusion Criteria:• Systolic blood pressure >170 mmHg and/or diastolic blood pressure
of >105 mmHg
• Second or third-degree atrio-ventricular block
• Neurological disorders (such as Parkinson's disease, multiple sclerosis or peripheral neuropathy)
4Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Introduction to the study continued...
Primary objective :
• To evaluate the efficacy of Treatment-A versus Treatment-B in patients with hypertension by assessing the reduction in Blood pressure from baseline after 24 weeks of treatment
Secondary objective:
• To evaluate the safety of Treatment-A versus Treatment-B in patients with hypertension over 24 weeks of treatment
5Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Structure of the dataset
6Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Introduction to ANCOVA
Analysis of Covariance is a statistical method which is combination of Analysis of Variance (ANOVA) and Regression Analysis
Purposes:
• To increase the precision of comparisons between groups by accounting to variation on important prognostic variables
• To “adjust” comparison between groups for imbalances in important prognostic variables between these groups
7Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Introduction to ANCOVA continued...
8Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
SAS procedures to perform ANCOVA
9Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
SAS procedures to perform ANCOVA continued...
10Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
ANCOVA results with GLM procedure
11Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
ANCOVA results with GLM procedure continued...
12Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
ANCOVA results with MIXED procedure
13Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
ANCOVA results with MIXED procedure continued...
14Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Comparison of GLM and MIXED procedures
15Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Ignore for ANCOVA
Ignore for ANCOVA
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Conclusion
• Depends on programmer’s confidence and flexibility any of these methods can be used to perform ANCOVA
• Both of these methods are robust to perform ANCOVA
16Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Acknowledgement
• Sincere thanks to my manager Priti Pandey and my
colleagues for their support and inspiration which made me
to attend/present in this conference
17Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
References
• Analysis of Covariance, Medical Statistics course: MD/PhD students, Faculty of Medicine & MED819: ANCOVA. (can be obtained @ http://www.mas.ncl.ac.uk/~njnsm/medfac/docs/ancova.pdf)
• Comparing the SAS GLM and MIXED procedures for repeated measures, Russ Wolfinger and Ming Chang, SAS Institute Inc., Cary, NC. (can be obtained @ http://www.ats.ucla.edu/stat/sas/library/mixedglm.pdf)
• Statistical considerations in a protocol, Pradeep Acharya, Thesis work of Biostatistics.
• SAS/STAT 9.2 user guide, second edition, SAS Institute Inc., Cary, NC, USA
• http://www.ats.ucla.edu/stat/sas/library/SASExpDes_os.htm
• http://rfd.uoregon.edu/files/rfd/StatisticalResources/glm04_mixed_why.txt
18Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
19Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
20Pradeep Acharya | IASCT - ConSPIC 2011 | Bangalore | 30-Sep-2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Periasamy KNovartis Healthcare Pvt ltd.
Dorfman-Whitlock DO- (DOW) LoopA Loop for N to one data step programming
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
All opinions expressed in this presentation are the authors’ personal views, and do not reflect the views or opinions of Novartis
Disclaimer
2 | ConSPIC 2011 | Periasamy K| Sep 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
oDOW loop is actually a DO UNTIL loop.
oThe W just comes from the name of the person who thought about that (Whitlock).
oThe DOW loop is a technique that moves the DATA step SET statement inside of an explicitly-coded DO-loop.
oProgrammer can control the retention of variable values and the population of the Program Data Vector (PDV) by controlling a certain break-event.
o DOW loop allows to go through every records of a block in a single iteration of a DATA step No re-initialisation of variable is done in the PDV
Do Until - loop
| ConSPIC 2011| Periasamy K| Sep 30, 2011 3
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
data ... ;
<Statements before do -loop> ;
do < Iteration(Optional)> until ( break-event ) ;
set … ;
by …;
< Statement inside the loop> ;
Output;
end ;
<Statements after break-event... > ;
run ;
Structure of Do until - Loop
| ConSPIC 2011| Periasamy K| Sep 30, 2011 4
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
data ... ;
<Statements before do -loop> ;
do < Iteration(Optional)> until ( break-event ) ;
set … ;
by …;
< Statement inside the loop> ;
end ;
<Statements after break-event... > ;
do < Iteration(Optional)> until ( break-event ) ;
set … ;
by …;
< Statement inside the loop> ;
Output;end ;
run ;
Double Do Until- Loop
| ConSPIC 2011| Periasamy K| Sep 30, 2011 5
Explicit output statement
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
oSteps required before the first record in the by-group is read.
oStatements inside the DOW-loop, for each record in the by-group.
oStatements need to be done after the last record in the by-group has been processed.
oDouble DOW-loop, Updates the calculated variable value into all observation for by group.
oExplicit output statement to write all observation into output dataset. Otherwise only one observation for by group into output dataset.
Dow Loop
| ConSPIC 2011| Periasamy K| Sep 30, 2011 6
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
o Baseline and change from baseline calculation
o Summary calculation with average for baseline calculation
o Abnormal value flagging
o Transpose more than 1 variable in 1 data step
Useful scenarios
| ConSPIC 2011| Periasamy K| Sep 30, 2011 7
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
oBaseline value is last non-missing value on or before treatment start day
oCalculate change from baseline
o Approach 1: Two data step for baseline calculation and change from baseline calculation
oApproach 2:One data step with Dow loop
Baseline and change from baseline calculation
| ConSPIC 2011| Periasamy K | Sep 30, 2011 8
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
data base;
do until(last.patient);
set bpp;
by PATIENT;
if DAY<=0 and SUPSYS then
b_supsys=SUPSYS;
end;
do until (last.patient);
set bpp;
by PATIENT;
if DAY > 0 then c_sysbp=SUPSYS - b_supsys;
output; /* Explicit output statement to get all the observations from the data set*/
end;
run;
Baseline and change from baseline in one step
| ConSPIC 2011| Periasamy K| Sep 30, 2011 9
Baseline is last non-missing value before treatment start.
Change from baseline calculation.
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
o Baseline is mean of all pre-dose assessment
o Number of non-missing value assessment for mean value calculation
o Min and Max value in all visit for the subject
o Approach 1: One Proc Step and Two data step
o Approach 2: One data step with Dow loop.
Baseline calculation with summary
| ConSPIC 2011| Periasamy K| Sep 30, 2011 10
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Baseline calculation with summary
| ConSPIC 2011| Periasamy K | Sep 30, 2011 11
data base2;max=0;min=1000; do until (last.patient);
set bpp;by PATIENT;if first.patient then nb=0;
if DAY<=0 and SUPSYS >. then do; nb+1;b_supsys=sum(SUPSYS,b_supsys);end;IF SUPSYS > max THEN max = SUPSYS;if SUPSYS < min THEN min = SUPSYS;
end;s_supsys=round(b_supsys/nb,0.001);
do until (last.patient);set bpp;by PATIENT;if DAY > 0 then do;c_sysbp=SUPSYS-s_supsys;end;
output;end;
run;
Baseline is mean value before treatment start.
Select all non-missing value on or before treatment start day.
Mini and Max calculation
Mean calculation.
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
oMax , Min, N as “nb” and pre-dose assessment mean value as S_supsys.
Output dataset
| ConSPIC 2011| Periasamy K | Sep 30, 2011 12
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
oFlag subject with abnormal BP value < 120
o Flag ‘Y’ for subject with abnormal BP
o Flag ‘N’ for subject with normal BP
o Approach 1: Two data step; Step1: Flag individual abnormal value Step2: Merge with source data set
oApproach 2: Proc SQL, with sub query
oApproach 3: One data step with DOW -loop
Abnormal value flagging
| ConSPIC 2011| Periasamy K| Sep 30, 2011 13
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
data base;
do until (last.patient);
set bpp2;
by PATIENT;
if 120 > SUPSYS >. then miss=1;
end;
do until (last.patient);
set bpp2;
by PATIENT;
if miss=1 then LOW="Y";
else if miss=. then LOW="N";
output;
end;
run;
Abnormal value flagging
| ConSPIC 2011| Periasamy K | Sep 30, 2011 14
Flag abnormal BP value less 120
Flag subject with Y or N atleast one abnormal value
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
oAll observation for the subject flagged
Output dataset
| ConSPIC 2011| Periasamy K| Sep 30, 2011 15
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
o Transpose 3 variables by group
o Summary dataset variables(N, Mean,Median) needs to be transposed by group variable
o Approach 1: Three Proc transpose and One data step merge.
o Approach 2: One data step with DOW loop.
Transpose more than 1 variable in 1 data step
| ConSPIC 2011| Periasamy K| Sep 30, 2011 16
Source dataset
Transpose dataset
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
data Trans2(drop=i treatmnt SUPSYS_N SUPSYS_Mean SUPSYS_Median);
array na n1-n3;
array mean Mean1-Mean3;
array median Median1-Median3;
do i=1 by 1 until (last.group);
set trns;
by group;
na(i)=SUPSYS_N;
mean(i)=SUPSYS_Mean;
median(i)=SUPSYS_Median;
end;
output;
run;
Transpose more than 1 variable in 1 data step
| ConSPIC 2011| Periasamy K | Sep 30, 2011 17
Define number of arrays for each variable to transpose
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Reference:
* Practical Uses of the DOW-loop, R.R. Allen, Phuse 2009.
| ConSPIC 2011| Periasamy K| Sep 30, 2011 18
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Questions ?
| ConSPIC 2011| Periasamy K | Sep 30, 2011 19
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
THANK YOU!
| ConSPIC 2011| Periasamy K | Sep 30, 2011 20
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Strategic Engagements , Integrating Excellence !
1
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Hashing Unleashed!!!!
Pratibha Jalui
Alliance Manager,
SCEDAM (SBU of SIRO Clinpharm Pvt. Ltd)
2
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Introduction
Common
Merging datasets – common - at times challenging task
Popular ‘sort’ - ‘merge’ - ‘SQL joins’ - limitation in terms of decline in performance - increased turnaround time.
Significance in today’s context - large volumes of clinical data -performing efficient merging -very critical.
3
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Introduction
Hash tables can help . . .
Achieve great I/O efficiencies
Enormous time savings when merging.
Fast, easy way to perform lookups without sorting or indexing.
Consumes memory, if necessary, at the run-time.
4
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
What is a Hash object ?
In-memory lookup table accessible from the DATA step.
Loaded with records, only available from the DATA step that creates it.
Two parts:
– Key part: one or more character and numeric values.
– Data part: zero or more character and numeric values.
5
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
How does a hash object work?
Once a hash object is loaded with records, a lookup occurs by passing a key to the hash object's FIND method.
If a record with the particular key is found, the data part of the record is copied into DATA step variables.
In addition to being able to add and find records, there are methods to replace records, remove records, and output records to a data set.
6
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Syntax for Hashingdata New_Dataset_Name;length variable1 format1… variableN $ formatN;if _N_ = 1 then do;
declare hash h(dataset:'lookup_dataset_name');h.defineKey('common_variable'); h.defineData(‘variable1’,.., 'variableN');h.defineDone();call missing (‘variable1’,.., 'variableN');
end;set Input_Dataset;if h.find() = 0 then
output;run;
7
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Example to show how a hash object is declared, loaded, and how lookups occur.
Dataset Participant.sas7bdat
8
Applications of Hashing
NAME GENDER TREATMENT
John M Placebo
Ronald M Drug-A
Barbara F Drug-B
Alice F Drug-A
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Dataset02: Weight.sas7bdat
9
Applications of Hashing
DATE NAME WEIGHT
5-May-06 Barbara 125
5-May-06 Alice 130
5-May-06 Ronald 170
5-May-06 John 160
4-Jun-06 Barbara 122
4-Jun-06 Alice 133
4-Jun-06 Ronald 168
4-Jun-06 John 155Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Objective: To merge the name, gender, and treatment with the weight data for an analysis that occurs later i.e. the expected resultant dataset should look like for e.g. Results.sas7bdat
10
Applications of Hashing
NAME TREATMENT GENDER DATE WEIGHT
Barbara Drug-B F 5-May-06 125
Alice Drug-A F 5-May-06 130
Ronald Drug-A M 5-May-06 170
John Placebo M 5-May-06 160
Barbara Drug-B F 4-Jun-06 122
Alice Drug-A F 4-Jun-06 133
Ronald Drug-A M 4-Jun-06 168
John Placebo M 4-Jun-06 155Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
How to do it using hashing
data results;length name treatment $ 8 gender $ 1;if _N_ = 1 then do;
declare hash h(dataset: 'participants');h.defineKey('name');h.defineData('gender', 'treatment');h.defineDone();call missing(gender, treatment);
end;set weight;if h.find() = 0 then
output;Run;
11
Applications of Hashing
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Steps to define a hash object
STEP 01: LENGTH STATEMENT length name treatment $ 8 gender $ 1
Each variable used to define the hash table must be defined in a LENGTH statement.
Else ERROR: Variable <variable name> has been defined as both character and numeric.
12
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Steps to define a hash object
STEP 02: DECLARE THE HASH OBJECT declare hash h(dataset:'participants');
'participants’ -name of the SAS® data set from which the hash table, h, will be populated. Must be bounded with single- or double-quotation marks.
DECLARE or DCL statement creates an object
Keyword HASH is specified after DECLARE
To manipulate the hash object in the code, it must be given a name after the keyword HASH
13
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
STEP 03: DEFINE THE KEY FOR THE HASH OBJECT Rc = h.DefineKey(‘name'); OR h.DefineKey(‘name');
DefineKey -method used to define the hash table’s key.
H - name of the hash object coded
Rc is a numeric variable into which the return code value, from executing the DefineKey method, is stored.
“name” - column name from the data set ‘participants' that is used as the hash table key. To be enclosed in single- or double-quotation marks.
14
Steps to define a hash object
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
STEP 04: DEFINE THE DATA VALUES FOR THE HASH OBJECT Rc = h.defineData('gender', 'treatment') ; OR h.defineData('gender', 'treatment') ;
DefineData is a method to define the values returned when the hash table is searched.
Is optional.
'gender‘,'treatment'- columns from the data set, 'participants' that is used as the hash table’s data value. To be enclosed within single- or double-quotation marks.
15
Steps to define a hash object
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Steps to define a hash object
STEP 05: CONCLUDE THE HASH OBJECT DECLARATION Rc = h.defineDone();or h.defineDone();
DefineDone is a method used to conclude the declaration of the hash table.
STEP 06: INITIALIZE THE VARIABLES USED IN THE HASH OBJECT TO MISSING VALUES call missing (gender,treatment);
Is used to suppress the following messages from the SAS®: Log: NOTE: Variable ‘Gender’ is uninitialized.
NOTE: Variable ‘treatment’ is uninitialized
16
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
STEP 07: SEARCHING THE HASH OBJECT Rc = h.Find();
‘Find’ is a method used to search the hash table
A return code value of zero indicates that the execution of the Find method was successful.h.Find(key:variable_name);
NOTE: Only variables listed in the DefineKey and DefineData methods will be stored in the hash table.
17
Steps to define a hash object
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Example to ADD, REPLACE, AND OUTPUT using hashing
Objective: To concatenate each player's goal times into a comma separated list and output the results to a data set.
» Input dataset –goal.sas7bdat«
18
Applications of Hashing
PLAYER WHEN
Hill 1st 01:24
Jones 1st 09:43
Santos 1st 12:45
Santos 2nd 00:42
Santos 2nd 03:46
Jones 2nd 11:15Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Example to ADD, REPLACE, AND OUTPUT using hashing
»Output Dataset: goal_summary «
19
Applications of Hashing
PLAYER GOALS_LIST
Hill 1st 01:24
Santos 1st 12:45, 2nd 00:42, 2nd 03:46
Jones 1st 09:43, 2nd 11:15
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Example:» ADD, REPLACE, AND OUTPUT using hashingDATA _NULL_;
LENGTH goals_list $ 64;If _N_ = 1 THEN DO;
DECLARE hash h();h.defineKey('player');h.defineData('player', 'goals_list');h.defineDone();
END;SET goals end=done;IF h.find() ^= 0 THEN DO; /*key variable player has to exist in goals */
goals_list = when;h.add();
END;ELSE DO;
goals_list = trim(goals_list) || ', ' || when;h.replace();
END;IF done THEN
h.output(dataset:'goal_summary');RUN;
20
Applications of Hashing
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Benefits of Hashing
Processing time for huge datasets reduced by almost 90%
Key lookup occurs in memory, avoiding costly disk access.
When a key lookup occurs, only a small subset of the records are searched-reducing load on memory.
The hash object allocates memory as records are added-utilizing memory efficiently.
When loading a hash object from a data set, the data set need not be sorted or indexed.
Smart strategy for programming!!!!!!!!!!!!!!!!!!
21
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Common messages & missteps of Hashing
NOTE: VARIABLE XXXX IS UNINITIALIZED.
To avoid these NOTES, you can use the CALL MISSING routine to initialize variables to missing
ERROR: UNDECLARED DATA SYMBOL FOR XXXX FOR HASH OBJECT AT LINE N COLUMN M.
Ensure you have a LENGTH statement for the variable names passed to DEFINEKEY and DEFINEDATA and have given those variables an initial value as well.
22
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Common messages & missteps of Hashing
ERROR 559-185: INVALID OBJECT ATTRIBUTE REFERENCE XX.YY.
Ensure there is parentheses after the FIND method
ERROR 557-185: VARIABLE XXXX IS NOT AN OBJECT.
Ensure you do not misspell the name of the hash object H in the one of the methods.
DUPLICATE KEYS
Ensure that the keys are unique
23
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
C nclusi n The SAS hash object is much more than an in-
Memory table look-up.
When it comes to very large data sets and multiple common variables ( keys), hashing is one of the most efficient and viable solution.
SAS programmers should see the hash object as an alternative to any proc- sql-join or data-step merge.
24
Only limit to the amount of data that can be loaded into a hash object is the amount of memory available to the SAS session.Ind
ian A
ssoc
iation
for S
tatist
ics in
Clin
ical T
rials
Contact information
Your comments /suggestions / questions are valued and
encouraged.
Contact the author at:
Pratibha Jalui
SIRO Clinpharm Pvt. Ltd.
Email id: [email protected]
25
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
26
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
27
Any Questions
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Author: Ramesh SundaramDate: 30-Sep-2011Company: Novartis Healthcare Pvt. Ltd.
Proc Transpose for Horizontal Data
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011
Disclaimer
All opinions expressed in this presentation are the authors’ personal views, and do not reflect the views or opinions of Novartis
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Outline
Proc Transpose
Horizontal data
Vertical data
Baseline calculation for Horizontal data • Without Transpose• With Transpose
Summary Report for Horizontal data• Without Transpose• With Transpose
Conclusion
3 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Proc Transpose
To convert Horizontal data in to vertical data or vice versa.
Syntax
PROC TRANSPOSE <DATA=input-data-set> <LABEL=label> <LET> <NAME=name> <OUT=output-data-set> <PREFIX=prefix>;
BY <DESCENDING> variable-1 <…<DESCENDING> variable-n> <NOTSORTED>;
COPY variable(s);
ID variable;
IDLABEL variable;
VAR variable(s);
2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Horizontal data
2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011
VS: Vital sign dataset
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Vertical data
VS: Vital sign dataset
2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011
Baseline calculation for Horizontal data
Baseline calculation (LOCF) for following data as Input
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Baseline calculation for Horizontal data
2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011
Output:
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Code without Transpose: Method1
2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Code without Transpose: Method 2
2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Code with Transpose: Method 3
2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Advantages of method 3 over 1 & 2
Simple
No redundancies
Macros/Macro functions are not used
Loops are not used
2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Summary Report for Horizontal data
2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Code without Transpose: Method1
2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Code without Transpose: Method 2
2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Code with Transpose: Method 3
2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Conclusion
Transpose procedure for horizontal data will make the program simple and efficient in the below scenarios- Baseline Calculation- Endpoint Calculation- Summary reports
From programming perspective, vertical data is more flexible then horizontal data in the above scenarios
2 | ConSPIC 2011 | Ramesh sundaram | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
SenthilKumar
Let’s go for a picture
September 29 – 30, 2011
ConSPIC@
on
by
Let’s go for a Picture
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Let’s go for a picture
What is Format? Formats :
- contains the instructions used by the SAS System to display, or portray the values of variables
- two broad classes of SAS Formats
1. VALUE - either “supplied” or “internal” to the SAS System. Assigns a label or text string to a range of values
2. PICTURE - create a series of templates for displaying the data
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Let’s go for a picture
What is Picture Format? Why do we need ? How to use ? Options in Picture Format What’s happening behind? Getting with Examples Challenges of using Picture Format?
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Let’s go for a picture
• Picture Format -- > Creates a template that is used to display the values of variable
• Template -- > Control how the values are displayed in our SAS-generated output
• Picture Formats is same as Value Formats.
• As like Value formats, we can use PROC FORMAT tools such as the FMTLIB, CNTLIN, CNTLOUT and MULTILABEL in Picture Formats also
What is Picture Format?
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Let’s go for a picture
Why do we need a Picture Format?
Output:
Mockup Requires:
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Let’s go for a picture
Why do we need a Picture Format?(contd…)
Output:
12
34
5
1. Name of Picture Format
2. Range of values to which the Format will be applied (0, 1-9)3. PREFIX option, which, like the DEFAULT option4. The template, showing a series of digit selectors5. Default length of the Picture Format 16 characters
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Let’s go for a picture
What’s happening behind the Picture Format?
Output:
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Let’s go for a picture
Options in Picture Format...
Output:
Control the attributes of each picture in the format
FILL =Specify a character that completes the formatted value.
MULTIPLIER =Specify a number to multiply the variable's value by before it is formatted.
NOEDITSpecify that numbers are message characters rather than digit selectors.
PREFIX = Specify a character prefix for the formatted value.
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Let’s go for a picture
Control the attributes of the format
DATATYPE= Specify that you can use directives in the picture as a template to format date, time, or datetime values.
DEFAULT= Specify the default length of the format.DECSEP= Specify the separator character for the fractional part
of a number.
DIG3SEP= Specify the three-digit separator character for a number.
FUZZ= Specify a fuzz factor for matching values to a range.MAX= Specify a maximum length for the format.MIN= Specify a minimum length for the format.MULTILABEL Specify multiple pictures for a given value or range
and for overlapping ranges.
NOTSORTED Store values or ranges in the order that you define them.
ROUND Round the value to the nearest integer before formatting.
Options in Picture Format... (contd..)
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Let’s go for a picture
Getting into the Picture..
Output:
Mockup Requires:
Picture - 1
Frequency Report..
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Let’s go for a picture
Getting into the Picture..
Output:
Mockup Requires:
Picture - 1
Frequency Report..
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Let’s go for a picture
Getting into the Picture..
Mockup Requires:
Picture - 2
Summary Report..
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Let’s go for a picture
Challenges.. Any text in front of the first digit selector is ignored
Ex : low - high = 'XX999) - 999 – 9999‘
Other than digit selectors, If you want to include a text message, you should use the NOEDIT option. Else, SAS will interpret those text as digit selectors.
Ex : picture miles 1-1000 = '0000' 1000<-high = ‘ >1000 miles'(noedit);
If you use the FILL= and PREFIX= options in the same picture, then the format places the prefix and then the fill characters.
Ex : low-high='00,000,000.00' (fill='*' prefix='$'); Output : ****$1,259.45
NINE represents 0 and ZERO represents blanksIndian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Let’s go for a picture
Challenges.. The ROUND option should be used for the continuous data to not be truncated. By default SAS truncates the data.
Easy to produce misleading output with a PICTURE format if the template does not accommodate the data, or the specified ranges do not capture all the data values.
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Thank You!
PICTURE FORMAT is a powerful tool for displaying data. (If handled with CARE)
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 1 -
Electronic Validation For Clinical Data And Summary Reports.
Amruta Pathak, Bhushan Kulkarni
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 2 -
Agenda
• Validation• Electronic Validation• Comparing Proc Report Datasets• Examples of Report dataset• Use of “diff” command in Unix• Example of output by “diff” command• Other methods of Electronic Validation• Advantages of Electronic Validation
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 3 -
Validation
• Importance– High Quality deliverables– Required at different phases of clinical trial study– General Practice
• Create two reports and compare
• Methods– Traditional method
• Manual checks• Spot checks (Random Checks), Cross check reports
– Programming method• Use of SAS procedures• 100 % check
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 4 -
Electronic Validation
• Required for multiple runs• No manual checks• Input 1 (1st line report) ---------- Difference
betweenInput 2 (QC report)---------------- two reports
• Various ways of automated approach– Comparing Proc Report Datasets– Use of “diff” command on UNIX– Use of customized shells– Use of SAS/ASSIST– Use of SAS Format
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 5 -
Comparing Proc Report Datasets
• Most common procedure to produce reports: PROC REPORT• Proc Report Dataset : Datasets going into SAS procedure
PROC REPORT • Compare the two datasets by PROC COMPARE• Transforming report to dataset• Need to define the specification of Proc Report Dataset• QC report needs to be in proper format
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 6 -
Example of Reports-1Report 1
Report 2
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 7 -
Output by Comparing Proc Report DatasetsReport 1 Dataset Report 2 Dataset
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 8 -
Proc Compare Output
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 9 -
Example of reports-2Report 1
Report 2
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 10 -
Output by Comparing Proc Report DatasetsReport 1 Dataset Report 2 Dataset
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 11 -
Example of listingListing 1
Listing 2
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 12 -
Validation of listingListing 1 Dataset
Listing 2 Dataset
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 13 -
Use of “diff” command in Unix
• Need to provide paths for the two reports• Does a line by line comparison• Difference of a space is also captured• Used to compare various versions of same code• Used during huge number of re-runs
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 14 -
Example of output by “diff” command
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 15 -
Other Methods of Electronic Validation• Use of customized shells
– If multiple tables have the same template– Reports created by Macros and validated – Macros written for validation codes– Proc Compare of the input data to the Macro
• Use of SAS/ASSIST– SAS/ASSIST provides the tools for the user to produce output using
report writing and graphing to help facilitate the verification process – To validate listings, Proc Prints of the data set(s) being used are
compared to the listing output. – A dummy data set is created _null_. – The same data using a Proc Print defined through SAS/ASSIST.– This assurance would be done by spot-checking the Proc Print versus
the Data _null_ output – summary tables can be verified using the interactive Proc Tabulate
definition through SAS/ASSIST
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 16 -
Other Methods of Electronic Validation
• Use of SAS Format
– The SAS solution described here involves a macro and a user-defined format that performs validation checks on a data set, then produces a report.
– Create several formats that contain validation checks. E.g The numeric formats qcpat, qclab, and qcdose contain a collection of checks specific to the several data sets PATIENTS, LABS, and DOSE, respectively.
– You can add or delete validation checks simply by modifying the format. – Also, notice that the format library containing the validation checks and the
data sets can reside in separate data libraries. – Each format is an independent collection of Boolean expressions that are
pertinent to a specific SAS data set used for the sole purpose of validating it.
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 17 -
Advantages of Electronic Validation
• Saves time when huge Number of Re-Runs• Can be applicable for Pooled study• Less risk of manual errors• More accurate for Huge data• Reduces the time to market
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 18 -
References:
http://www2.sas.com/proceedings/sugi31/018-31.pdfhttp://www.phuse.eu/download.aspx?type=cms&docID=536http://www2.sas.com/proceedings/sugi22/POSTERS/PAPER230.PDFhttp://www.nesug.org/Proceedings/nesug97/advtut/gerlach.pdf
Acknowledgement:
• We gratefully acknowledge the support provided by TCS Management Team and IASCT for providing this opportunity .
Contact:• [email protected]• [email protected]
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
CONFIDENTIALCorporate Presentation- 19 - 19
Thank You.!
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Priya Iyer, Neha Singh and Anand BoopalanOncology Biometrics, Hyderabad
Why should stats have all the fun?
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Disclaimer
All opinions expressed in this presentation are the authors’ personal views, and do not reflect the views or opinions of Novartis
ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 20112
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Quality concerns and the facts
The Way Forward
Are we First Time Right?- Let’s review
Explore the story behind the statistical procedures
What to keep in mind- A checklist
Agenda
3 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
What makes them say... “This report is not what I want”
ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
• Alignment & Indentation• Spelling Mistakes• Missing Titles and
Footnotes• Truncation• Precision & Accuracy• Formatting• Data used• Logical concept• Reports not making sense• Reporting measures• Procedures used• Information displayed
4
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Science: Are we sure of the science behind the analysis?• Reports cannot be generalized always.
Reports: Do we understand what our reports are trying to say?• Every report speaks a story and has its own importance.
Procedures: Are we always sure of the what is produced while STAT procedures are run? • Ignorance of the concepts behind the procedures in SAS and
sometime not sure what to choose amongst the huge content that these procedures end up in displaying.
The fact
ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 20115
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Today, we shall focus on • Helping the programmer with primary checks on SAS/STAT based
reports before sending them out for review- Showcasing the common mistakes while displaying summary statistics,
counts, percentages. - An overview of some Advanced Statistical Procedures like LIFETEST,
LOGISTIC, PHREG- Recommending selective information out of what is produced using these
procedures- Some hand check tips to check small statistical concepts
The way forward
ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 20116
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
Are we first time right?
7
Precision: Up to what level?
Indentation on decimal points?
Mean? Isn’t that looking strange
Summaries speak a lot of informationIndian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
Are we first time right?
8
Reporting need not be generalizedIndian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
9 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
Are we missing some stars here?
A report is complete when everything speaks the same language
Are we first time right?
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
PROC LOGISTIC
PROC LIFETEST
PROC PHREG
Explore the story behind statistical procedures
10 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Required: To find the number of responses in each arm and also calculate Odds Ratio
Reports and Procedures - Using Odds ratio
ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 201111
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Procedures that could be used
PROC FREQ PROC LOGISTIC
ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
Programming Steps
12
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Confirm your output is right
ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 201113
Odds in Arm A = 1/(5-1) = 1/4 = 0.25Odds in Arm B = 20/(76-20) = 20/56 =0.357Odds Ratio = 0.25/0.357 = 0.700
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Choose the right information from the output generated or you may end up with wrong information displayed• Don’t confuse between Relative risk and Odds Ratio
OR = ad/bc RR = (a/(a + b))/(c/(c + d))
Defining the right values- Event =1 and Non–event = 0
Response level ordering- PROC LOGISTIC by default models the probability of response levels with
lower ordered value
Some facts on the procedures
14 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
15 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
Reports and Procedures - Using LIFETEST
Required: To find the number of deaths and censored, produce the KM estimates, Median and 25th, 75th percentiles for OSInd
ian A
ssoc
iation
for S
tatist
ics in
Clin
ical T
rials
16 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
Programming Steps
Procedures to Use: • PROC LIFETEST
No of events and censored
Time variable * censor variable
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
KM estimates and confidence intervals
Median and Quartile estimates
17 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
Programming Steps
Confidence intervals
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
For reporting graphs
Reporting graphs
18 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
The outsurv option outputs the survival probabilities at time points and the respective upper and lower limits
To output the other information use specific ods output option
Some of the default options are- ALPHA = 0.05- METHOD = KM/PL- CONFTYPE =LOGLOG
Wilcoxon and Log rank tests are produced for testing the homogenity of survival curves over strata
Some facts on the procedures
19 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 201120
Reports and Procedures - Using PHREG
Required: To find the Hazard Ratio and CI based
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
21 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
Programming Steps
Procedures to Use: • PROC PHREG
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Setting the data• Choose the right population for determining the N’s• Ensure the censor, time variables are defined clearly ( eg: days, months)• Definitions of the event and non event to be consistent
Using the procedures• Based on the requirement use the right procedure (eg: stratified, PHREG)• Pass the right options in the procedure (eg: reference variables, censor(1))
Choose what to display from the output• Requirements to be based on what needs to be reported (eg: survival, failure)
Reporting• Consistent usage of functions across summary statistics (Eg: PUT, ROUND)• Graphs and the statistics displayed to be in sync
What to keep in mind – A checklist
22 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Check your reports for P values and CI’s• Confidence Intervals
- The summary value should fall within the interval- If the null value lies with the CI then p-value should be insignificant
(p-val >0.05)
• Probabilities- Always in between 0 and 1- Survival probabilities are always decreasing
23 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
What to keep in mind – A checklist
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
24 ConSPIC 2011 | Priya Iyer, Neha Singh & Anand Boopalan | Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Presenter,Ravinder Arakati
10 SIMPLE WAYS TO PERFORM QUALITY CHECK ON CLINICAL DATABASE
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Disclaimer
All opinions expressed in this presentation are the authors’ personal views, and do not reflect the views or opinions of Novartis
2 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Duplicate records• Duplicate treatment records, Multiple demographic records, Vital signs data, etc..
Missing data• Demog, lab data, vital signs data, adverse events terms, etc.
Same data collected in different units• Height, Weight, lab data, etc.
Data Ranges • Height, Weight, lab data, etc.
Inconsistency between dates• startdate > enddate ; following visit date< previous visit date ; overlapping dates
Possible Data Issues
33 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Study specific data or mandatory Laboratory data not collected
• Hematology, biochemistry, and Urine parameters
Different DATES formats• Contains alphabets, length of dates is not consistent
Data Variation between two consecutive visits
Possible Data Issues (cont.)
4 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
It checks the variation of the value between two consecutive visit or between a visit and the baseline value
Macro 1: %check_variation
Parameters:datain = entry datasetunik_id = unique ID for a patientbaseline = value of the parameter at baseline - needed if the variation is calculated from baselinevisit = variable contains the chronologic order - needed if the variation is calculated by visitparam = name of parameter to calculate variation - neededrang_inf = inferior range of incorrect variationrang_sup = superior range of incorrect variationvar_pct = Y if the variation is in percentage, else the variation is in absolute value –
needed - Y by defaultcentr_id = create an additional report by center, ex ctr1n - not needed (only 1)
5 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
%check_variation (
datain=vsn,
unik_id=stysid1a,
baseline=,
visit=vis1n,
param=wgt1n,
rang_inf=-15,
rang_sup=30,
var_pct=Y,
centr_id=ctr1n
);
Macro 1: %check_variation (Example)
Here, we want to be sure no patient has lost more than 15% of his weight and has taken more than 30% of his weight.
6 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
It checks the consistency between two consecutive dates- Find the overlap between dates- No discrepancy between visit numbers. Example: Visit number 3 cannot have a
date before the visit number 1
Macro 2: %check_order_date
Parameter:datain = entry dataset unik_id = unique ID for a patientvisitstt = variable contains the chronologic order, start date - neededvisitend = stop date - needed if this date existvisitnum = variable contains the chronologic order, visit number – needed if the
comparison is made between the visit number and the visit dategrp_by = variable of grouping, ex pt_txt - no needed (only 1)add_rep = parameter to add on the report - no needed (<10)centr_id = create an additional report by center, ex ctr1n - not needed (only 1)
7 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Macro 2: %check_order_date (Example)• Example 1
%check_order_date (datain=dar,
unik_id=stysid1a,
visitstt=smdstt1o,
visitend=smdend1o,
visitnum=,
grp_by=,
add_rep=dartyp1c tdd1n tddunt1a rsndos2c doschg4c,
centr_id=
);
• Example 2
%check_order_date (datain=import_s.vis,
unik_id=stysid1a,
visitstt=vis1o,
visitend=,
visitnum=vis1n,
grp_by=,
add_rep=visnam1a,
centr_id=ctr1n
);
Here, we want be sure there is no overlap between treatment date or the start treatment date is not after the end treatment date.
Here, we want be sure there is no discrepencies between visit number and date. Ex: the visit number 3 can't have a date before the visit number 1.
8 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
This macro checks the number and the percentage of missing values for a reference variable
Macro 3: %check_missing_value
Parameter:datain = entry dataset - neededunik_id = unique ID for a patient - no needed - stysid1a by defaultmissvar = variable contains potential missing value - needed - _all_ by default.
This parameter can be a list of parameters <=9.centr_id = create an additional report by center, ex ctr1n - not needed (only 1)del_lst = this option delete the last record classed by &del_lst. variable. printopt = if equal to Y, this option print the listing of unik_id with missing parameters –
not needed - missing by default
9 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
%check_missing_value (datain=dar,
unik_id=stysid1a,
missvar=smdend1o,
del_lst=smdstt1o,
printopt=Y
);
Macro 3: %check_missing_value (Example)
Here, we want print the list of all missing end treatment date which are not the latest end treatment date as the study is not frozen and patient are on treatment
10 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
This helps to check whether the same data collected in different units
Macro 4: %check_lab_units
Parameters:
dsn = one or two level sas dataset nameparm =parameter nameunitvar =variable name which has the unit informationparmvar =Name of the parameter variable
11 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Example:
Macro 4: %check_lab_units (Example)
%check_lab_units(dsn=data_a.a_lrs,parm=FERR,unitvar=parunt1c,parmvar=parnam1c
);
Here, it lists out number of distinct units for the parameter SERUM FERRITIN in the laboratory dataset
12 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
This is to check how the values are deviated from Low and upper normal limits
Macro 5: %check_lrs_range
Parameters:
dsn = one or two level dataset nameparm = parameter nameparamvar = variable name contains the parameterlabvar = Variable contains laboratory valuelowvar =lower normal limit of the parameteruppvar =upper normal limit of the parameterpercent =preferred percentage deviated from lower or upper normal limitspatidvar =patient id variable
13 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Macro 5: %check_lrs_range (Example)
%check_lrs_range(dsn=data_s.lrs,
parm=FERR,
parmvar=parnam1c,
lowvar=nrgllm1n,
uppvar=nrgulm1n,
labvar=labrsl1n,
percent=50,
patidvar=stysid1a);
Here , it lists all the patient ids whose lab values deviated by 50% from lower or upper normal limits.
14 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
This macro is to identify any incorrect date formats present
Macro 6: %check_dar_incorrect_date
Parameters:
dsn = one or two level dataset namechardt =Character date variablepercent =preferred percentage deviated from lower or upper normal limitspatidvar =patient id variable
15 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Macro 6: %check_dar_incorrect_date (Example)
%check_dar_incorrect_date(dsn=data_s.dar,
chardt=smdstt1d,
patidvar=stysid1a);
It displays all patient ids with the medication start dates that have incorrect date format
16 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
This is to check the duplicate dose records or dose interruptions
Macro 7: %check_dar_dup_dose
Parameters:dsn = one or two level dataset namepatidvar = patient id variablebygrpdup = list of variables to identify duplicatesdosevar = variable contain dose information
17 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Macro 7: %check_dar_dup_dose (Example)
%check_dar_dup_dose(dsn=data_s.dar,patidvar=stysid1a,bygrpdup=smdstt1o smdend1o,dosevar=tdd1n);
It displays list of patient ids with missing dose or interruptions and find records of duplicates with the by group variables smdstt1o and smdend1o.
18 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
This helps to identify records- Any patient visit start and end date after Last Patient Last Visit (LPLV)- Any patient visit start and end date before First Patient First Visit (FPFV)- End date is less than Start date
Macro 8: %check_aev_fpfv_lplv
Parameters:dsn = one or two level dataset namepatidvar =patient id variablelplvdate =last patient last visit datefpfvdate =first patient first visit dateaevstvar =Adverse event start dateaevendvar =Adverse event end date
19 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Macro 8: %check_aev_fpfv_lplv (Example)
%check_aev_fpfv_lplv(dsn=data_s.aev,
lplvdate=,
fpfvdate="01Feb2006"d,
aevstvar=aevstt1o,
aevendvar=aevend1o,
patidvar=stysid1a);
It displays list of patient ids who have adverse events before first patient first visit
Also it displays list of patient ids with adverse event end date less than adverse event start date.
20 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
This helps to identify records with AEVs leading to discontinuation but no reason given for discontinuation
Macro 9: %check_aev_reason
Parameters:dsn = one or two level dataset namepatidvar =patient id variableaevservar =Variable name has seriousness of adverse eventsacntknvar =Variable name contains action taken or not
Example:
%check_aev_reason(dsn=data_s.aev,aevservar=aevser1c,acntknvar=acntak1n,patidvar=stysid1a);
21 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
This helps to check missing MedDRA terms (SOC,PT), duplicate records
Macro 10: %check_aev_meddra
Parameters:dsn = one or two level dataset namepatidvar = patient id variablesocvar = SOC variable nameptvar = preferred Term variable namedtvar = adverse event start and/or end date
Example:%check_aev_meddra
(dsn=data_s.aev,socvar=soc_txt,ptvar=pt_txt,patidvar=stysid1a,dtvar=aev);22 | ConSPIC 2011 | Ravinder Arakati; Jayapandian N| Sep 29 – 30, 2011
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Laura Robin, Senior Statistical Programmer, Novartis, Basel
Jayapandian N, Senior Statistical Analyst, Novartis, Hyderabad
Acknowledgement
| ConSPIC 2011| Author | Sep 29-30, 2011 23
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
| ConSPIC 2011| Author | Sep 29-30, 2011 24
QUESTIONS!
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
TAKE Solutions
RTF_Read
Presented ByVijayabhaskar ReddySr. Clinical SAS Programmer
1
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Why RTF_READ. Know about RTF. Control Words in RTF. Identify the Titles and Footnotes in RTF.
Dynamic Titles and Footnotes. Extract Unique Titles and Footnotes into a SAS dataset. Read Spanning Headers. Group Column Headers. Read Special Symbols. Extract the Data into SAS dataset. Distinguish between different rows and columns. Create Horizontal Dataset. Create Vertical Dataset.
Agenda
2
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Why RTF_READ?
Validation of RTF Tables programmatically
Comparison of multiple versions of the same RTF Table file
3
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
RTF File Generated by SAS
4
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
RTF File - NOTEPAD
5
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Validate RTF Clarify the RTF is generated by SAS
Clarify the RTF is not modified outside of the SAS
PRXPARSE Function :
• Compiles a Perl regular expression (PRX) that can be used for patternmatching of a character value
PRXMATCH Function :
• Searches for a pattern match and returns the position at which the pattern is found.
6
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Data READFILE;length ver $1000;infile "&RTFFile." missover length = l end = lastobs lrecl = 2000;input string $varying1500. l;rownum = _n_;string=_infile_;rc1=prxparse("/\\*\\generator/");rc2=prxparse("/\\version\d+/");if prxmatch(rc1,string) then MWFLG=1; ……………
SAS CODE
Validate RTF
7
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Control Word Comment Control Word Comment
1. \header Identify Title in Header Section
2. \headery Identify Titles in Document section
3. \footer Identify Footnotes in Footer Section
4. \footery Identify Footnotes in Document Section
5. \trhdr Identify Column header 6. \trowd Identify Table Row7. \cell Identify the Table Cell 8. \cellx Identify the cell size in
twips9. \line Identify line break 10. \li200 Works line tab, use to enter
extra space.Note : numeric value represents twips.
Control Words
Although there might be lot of control words used in RTF, only a few are required to understand the basic structure of the RTF file and extract data out of it.
8
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
9
{\rtf1\ansi\ansicpg1252\uc1\deff0\deflang1033\deflangfe1033{\fonttbl..........\pard\sectd\linex0\endnhere\pgwsxn15840\pghsxn12240\lndscpsxn\headery1440\footery540\marglsxn540\margrsxn1440\margtsxn1440\margbsxn540
\trowd\trkeep\trhdr\trgaph10\cltxlrtb\clvertalt\cellx6929\cltxlrtb\clvertalt\cellx13858\pard\plain\intbl\keepn\sb10\sa10\ql\f1\fs20\cf1{Company Name - Protocol123\cell}\pard\plain\intbl\keepn\sb10\sa10\qr\f1\fs20\cf1{Page 1 of 1\cell}{\row}........{\pard}.......\trowd\trkeep\trqc\trgaph10\cltxlrtb\clvertalt\cellx13860\pard\plain\intbl\posyb\posyb\keepn\sb10\sa10\ql\f1\fs18\cf1{Names of input datasets: ADSL\cell}{\row}.......\trowd\trkeep\trhdr\trqc\trgaph22\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx4838....\pard\plain\intbl\keepn\sb22\sa22\ql\f1\fs18\cf1{Parameter\cell}....{\row}\trowd\trkeep\trqc\trgaph22\cltxlrtb\clvertalt\cellx4838.....\pard\plain\intbl\keepn\sb22\sa22\ql\f1\fs18\cf1{Age at Screening (Years)\cell}.....{\row}
Control WordsRTF File
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
10
Document or Header/Footer SectionDocument Section
Header Section
Document Section
Titles & Footnotes in Document or Header/Footer
Header Section
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Titles & Footnotes in Document or Header/Footer Section
%* Header Section;if index(string,'\header')>0 and index(string,'\footer')=0then hdrflg=1; if hdrflg=1 then do; %* HEADER SECTION;%* PATTERN 1 HEADER INFORMATION *;if index(string,'\trowd')>0 then hdr= hdr + 1;else if index(string,'\pard{\par}')>0 then do;
hdr= 0;hdrflg=0;
end;end;else do; %* DOCUMENT SECTION;%* PATTERN 2 HEADER INFORMATION **;
if index(string,'\headery')>0 and index(string,'\footery')>0then hdrflg1=1;if index(string,'\pard{\par}')>0 then hdrflg1=0; if hdr=1 and index(string,'\trowd')>0 then tblid = tblid + 1;…….
SAS CODE
11
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Dynamic Titles & Footnotes
Standard CRF Page
Metadata
Library
Table 14.1.1.1 DEMOGRAPHICS AND BASELINE CHARACTERISTICS
FULL ANALYSIS SET
Program name: ‘Path’ Creation date of output 14.1.1.1: 22SEP2009 4:43
Names of input datasets: ADSL
…..\pard\plain\intbl\keepn\sb10\sa10\ql\f1\fs20\cf1{Company Name - Protocol123\cell}\pard\plain\intbl\keepn\sb10\sa10\qr\f1\fs20\cf1{Page 1 of 1\cell}{\row}…\pard\plain\intbl\keepn\sb10\sa10\qc\f1\fs20\cf1{Table 14.1.1.1 \line DEMOGRAPHICS AND BASELINE CHARACTERISTICS\line FULL ANALYSIS SET\cell}{\row}…..\pard\plain\intbl\keepn\sb10\sa10\qr\f1\fs20\cf1{\cell}{\row}
Company Name – Protocol123 Page 1 of 1
…..\pard\plain\intbl\posyb\posyb\keepn\sb10\sa10\ql\f1\fs18\cf1{\cell}{\row}…..\pard\plain\intbl\posyb\posyb\keepn\sb10\sa10\ql\f1\fs18\cf1{Names of input datasets: ADSL\cell}{\row}…..\pard\plain\intbl\posyb\posyb\keepn\sb10\sa10\ql\f1\fs18\cf1{Program name: ‘Path’ \cell}\pard\plain\intbl\posyb\posyb\keepn\sb10\sa10\qr\f1\fs18\cf1{Creation date of output 14.1.1.1: 22SEP2011 4:43\cell}{\row}
12
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Create Title and Footnote (T&F) dataset with unique Titles and Footnotes
Extract Titles and Footnotes in to a SAS dataset
13
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Spanning Headers
Standard CRF Page
Metadata
Library
Spanning Headers
\trowd\trkeep\trhdr\trqc\trgaph22\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx4838\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx5776\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx10108\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx12280\pard\plain\intbl\keepn\sb22\sa22\ql\f1\fs18\cf1{\cell}\pard\plain\intbl\keepn\sb22\sa22\qc\f1\fs18\cf1{\cell}\pard\plain\intbl\keepn\sb22\sa22\qc\f1\fs18\cf1{Treatment\cell}\pard\plain\intbl\keepn\sb22\sa22\qc\f1\fs18\cf1{\cell}{\row}
\trowd\trkeep\trhdr\trqc\trgaph22\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx4838\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx5776\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx7942\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx10108\clbrdrt\brdrs\brdrw6\brdrcf1\clbrdrb\brdrs\brdrw6\brdrcf1\cltxlrtb\clvertalb\cellx12280\pard\plain\intbl\keepn\sb22\sa22\ql\f1\fs18\cf1{Parameter\cell}\pard\plain\intbl\keepn\sb22\sa22\qc\f1\fs18\cf1{Statistics\cell}\pard\plain\intbl\keepn\sb22\sa22\qc\f1\fs18\cf1{Trt Group 1\cell}\pard\plain\intbl\keepn\sb22\sa22\qc\f1\fs18\cf1{Trt Group 2\cell}\pard\plain\intbl\keepn\sb22\sa22\qc\f1\fs18\cf1{Total\cell}{\row}
14
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Spanning Headers
Standard CRF Page
Metadata
Library
SAS CODE
%*Extract twip values and generate twip data;data &prefix._twipdata;set &prefix._hdr1;by tblid hdr;where index(string,'\cellx')>0;
if index(string,'\cellx')>0 then twpval=scan(string,-1,'\cellx');
run;
Header is spanning, hence the twpval is between 5776 and 10108
15
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Group Column Headers
Standard CRF Page
Metadata
LibraryTreatment
Parameter Statistics Trt Group 1 Trt Group 2 Total
|Parameter |StatisticsTreatment|
Trt Group 1
Treatment|
Trt Group 2|Total
16
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Special Symbols
\\pard\plain\intbl\keepn\sb22\sa22\ql\f1\fs18\cf1{BMI (kg/m\super 2\nosupersub{})\uc1\u0956\\uc1\u0945\\cell}
μα
RTF Code
PRXCHANGE Function• Performs a pattern-matching replacement.
CALL PRXNEXT Routine• Returns the position and length of a substring that matches a pattern,
and iterates over multiple matches within one string. 17
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Special Symbols – SAS Code
Standard CRF Page
Metadata
Library
if _N_=1 thendo;
%* Takes care of tokens for Super and Subscripts;re1=prxparse('s/\\super ?(.+?)\\nosupersub(\{\}| ?)/_super-\1_/');%* Handling Special Symbols present in $token format;tokenw_re = prxparse('/\\uc1\\u\d{4}\\~|\\uc1\\u\d{4}/');
end;%* Replace the Special characters with SAS tokens;call prxchange(re1,2,newvar);start=1;stop=length(newvar);call prxnext(tokenw_re,start,stop,newvar,stpos,len);do i=1 to 5 while(stpos>0);
tok[i]=substr(newvar,stpos,len);call prxnext (tokenw_re,start,stop,newvar,stpos,len);fval[i]=put(compress(tok[i],'() '),$token.);
end;
proc format;value $token "\uc1\u0956" = '_mu_'"\uc1\u0945" ='_alpha_'"¹" = '_super_1'
18
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Extract the Data into SAS dataset
{<data>\cell} Control words for extract the tables cells data in to col1• { Means save all the character formatting attributes now.• <data> It would be any text contain in the column.• } Means restore the character formatting attributes to their most
recently saved values.
There are multiple possibilities for \cell control word. This code snippet will capture correct data.
SAS CODE%*Extract data from file and store it in Col1 variable.;if index(string,'{')>0 and index(string,'\cell')>0 and index(string,'}')>0 then do;
_col1='';col1=trim(left(substr(string,(index(string,'{')+1))));col1=tranwrd(col1, '\cell}' , '');
end;19
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Extract the Data into SAS dataset
else if index(string,'{')>0 and index(string,'\cell')=0 and index(string,'}')=0 then do;
_col1=trim(left(substr(string,(index(string,'{')+1))));end;
else if index(string,'{')=0 and index(string,'\cell')=0 and index(string,'}')=0 and index(string,'\')=0 then
do;col1=trim(left(_col1)) || " " || trim(left(col1));_col1='';
end;* if \ is missing means no tokens only data, read full string;_col1=trim(left(_col1))|| " " || trim(left(string));
end;else if index(string,'{')=0 and index(string,'\cell')>0 and index(string,'}')>0 then do;
col1=trim(left(substr(string,1,(index(string,'\cell')-1))));
20
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Distinguish between different rows and columns
* Initialize counters for identifying Rows and Cols;data DATARECS;
set PARSEFILE1;by tblid;where indexw(string,'\cellx')=0; * Keep only data records;retain rowid colid 0;if first.tblid then rowid=0;if index(string, '\trowd') or index(string,'\pard{\par}')>0 then do;
rowid=rowid + 1; colid=0;end;else do;
if (index(string,'\cell')>0 then colid=colid+1;if (index(string,'\bkmkstart')>0 orindex(string,'\bkmkend')>0) then colid= colid + 1;if index(string,'{\row}')>0 or index(string,'\footer')>0 then colid=0;
end;run; 21
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Distinguish between different rows and columns
*Transpose Vertical data to Horizontal table like structure;proc transpose data=DATARECS1 out=TDATARECS(drop=_NAME_) prefix=c;
by tblid rowid hdr;id colid;var col1;
run;
22
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Horizontal Data Set Structure
Metadata
Library
Protocol:123 Page 1 of 1TABLE 14.3.1.6.1
Treatment Emergent Adverse Events Considered Related (Possibly, Probably, or Definitely) to Treatment by System Organ Class and Preferred Term Safety Set
Treatment 1 Treatment 2 Treatment 3 Treatment 4 Treatment 4 Overall(N=3) (N=3) (N=2) (N=2) (N=1) (N=11)
Body System or Organ ClassDictionary -Derived Term
Eventsn
Patientsn (%)
Eventsn
Patientsn (%)
Eventsn
Patientsn (%)
Eventsn
Patientsn (%)
Eventsn
Patientsn (%)
Eventsn
Patientsn (%)
Gastrointestinal disorders 0 0 ( 0) 0 0 ( 0) 1 1 ( 50) 1 1 ( 50) 2 1 ( 100) 4 3 ( 27)Nausea 0 0 ( 0) 0 0 ( 0) 1 1 ( 50) 0 0 ( 0) 1 1 ( 100) 2 2 ( 18)Abdominal pain 0 0 ( 0) 0 0 ( 0) 0 0 ( 0) 1 1 ( 50) 0 0 ( 0) 1 1 ( 9)Vomiting 0 0 ( 0) 0 0 ( 0) 0 0 ( 0) 0 0 ( 0) 1 1 ( 100) 1 1 ( 9)
Note: MedDRA Dictionary Version 11.1 was used for codingNames of input datasets: ADS.ADAE and ADS.ADSLProgram name: ‘Path’\
Creation date of output 14.3.1.6.1: 12JUN2011 11:17
23
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Output Data Set Structure
Standard CRF Page
Metadata
Library
Problem: How do you represent ALL types of tables and listings
in a data set?
Solution? Turn every column into a variable?
What About:• Multiple pages• Tables that flow horizontally onto another page
24
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Some don’t fit on one page
Standard CRF Page
Metadata
Library
RTF_READ training Page 1 of 2
This is Title 1Safety Population
Parameter Long Treatment Name 1 Long Treatment Name 2 Long Treatment Name 3 Long Treatment Name 4
This is a categorical Variable 14 13 13 14
25
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Some don’t fit on one page
Standard CRF Page
Metadata
Library
RTF_READ training Page 2 of 2
This is Title 1Safety Population
Parameter Long Treatment Name 5 Long Treatment Overall
This is a categorical Variable 13 1
26
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Some don’t fit on one page
Standard CRF Page
Metadata
Library What is column 2? Treatment 1 or Treatment 5?
How do you know (without looking) how many columns fit across on the page?
Are you sure they are not going to add/subtract any columns ?• Request 1: Can you make it fit on one page ?• Request 2: Can you add a total column ?
27
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Our Solution
Standard CRF Page
Metadata
Library Normalize the structure of the table Two ways to think
Instead of reading across, think about reading each column all the way down the table stopping when the header text changes or the table ends. Then move to the next column.
Or, think of it as transposed down PROC TRANSPOSE DATA=table; BY statistic; VAR columns;
28
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Vertical aka. Normalized
Standard CRF Page
Metadata
Library
Always same variables Number of Records in Table [R] - 4 Number of columns per Treatment [N(C)] -12 Number of Records after RTF_READ [N(Obs)] - 48
Protocol :123 Page 1 of 1TABLE 14.3.1.6.1
Treatment Emergent Adverse Events Considered Related (Possibly, Probably, or Definitely) to Treatment by System Organ Class and Preferred Term Safety Set
Treatment 1 Treatment 2 Treatment 3 Treatment 4 Treatment 5 Overall(N=3) (N=3) (N=2) (N=2) (N=1) (N=11)
Body System or Organ ClassDictionary -Derived Term
Eventsn
Patientsn (%)
Eventsn
Patientsn (%)
Eventsn
Patientsn (%)
Eventsn
Patientsn (%)
Eventsn
Patientsn (%)
Eventsn
Patientsn (%)
Gastrointestinal disorders 0 0 ( 0) 0 0 ( 0) 1 1 ( 50) 1 1 ( 50) 2 1 ( 100) 4 3 ( 27)Nausea 0 0 ( 0) 0 0 ( 0) 1 1 ( 50) 0 0 ( 0) 1 1 ( 100) 2 2 ( 18)Abdominal pain 0 0 ( 0) 0 0 ( 0) 0 0 ( 0) 1 1 ( 50) 0 0 ( 0) 1 1 ( 9)Vomiting 0 0 ( 0) 0 0 ( 0) 0 0 ( 0) 0 0 ( 0) 1 1 ( 100) 1 1 ( 9)
Note: MedDRA Dictionary Version 11.1 was used for codingNames of input datasets: ADS.ADAE and ADS.ADSLProgram name: ‘Path’\
Creation date of output 14.3.1.6.1: 12JUN2011 11:17
29
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Vertical Data Structure
Standard CRF Page
Metadata
Library
Obs tf_id H1 C1 hvalue cvalue
1 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 1|(N=3)|Events n 0
2 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 1|(N=3)|Events n 0
3 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 1|(N=3)|Events n 0
4 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 1|(N=3)|Events n 0
5 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 1|(N=3)|Patients n (%) 0 ( 0)
6 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 1|(N=3)|Patients n (%) 0 ( 0)
7 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 1|(N=3)|Patients n (%) 0 ( 0)
8 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 1|(N=3)|Patients n (%) 0 ( 0)
9 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 2|(N=3)|Events n 0
10 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 2|(N=3)|Events n 0
11 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 2|(N=3)|Events n 0
12 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 2|(N=3)|Events n 0
13 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 2|(N=3)|Patients n (%) 0 ( 0)
14 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 2|(N=3)|Patients n (%) 0 ( 0)
15 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 2|(N=3)|Patients n (%) 0 ( 0)
16 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 2|(N=3)|Patients n (%) 0 ( 0)
17 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 3|(N=2)|Events n 1
18 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 3|(N=2)|Events n 1
19 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 3|(N=2)|Events n 0
20 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 3|(N=2)|Events n 0
21 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 3|(N=2)|Patients n (%) 1 ( 50)
22 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 3|(N=2)|Patients n (%) 1 ( 50)
23 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 3|(N=2)|Patients n (%) 0 ( 0)
24 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 3|(N=2)|Patients n (%) 0 ( 0)
30
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Vertical Data Structure
Standard CRF Page
Metadata
Library
25 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 4|(N=2)|Events n 1
26 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 4|(N=2)|Events n 0
27 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 4|(N=2)|Events n 1
28 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 4|(N=2)|Events n 0
29 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 4|(N=2)|Patients n (%) 1 ( 50)
30 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 4|(N=2)|Patients n (%) 0 ( 0)
31 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 4|(N=2)|Patients n (%) 1 ( 50)
32 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 4|(N=2)|Patients n (%) 0 ( 0)
33 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 5|(N=1)|Events n 2
34 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 5|(N=1)|Events n 1
35 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 5|(N=1)|Events n 0
36 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 5|(N=1)|Events n 1
37 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Treatment 5|(N=1)|Patients n (%) 1 ( 100)
38 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Treatment 5|(N=1)|Patients n (%) 1 ( 100)
39 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Treatment 5|(N=1)|Patients n (%) 0 ( 0)
40 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Treatment 5|(N=1)|Patients n (%) 1 ( 100)
41 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Overall|(N=11)|Events n 4
42 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Overall|(N=11)|Events n 2
43 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Overall|(N=11)|Events n 1
44 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Overall|(N=11)|Events n 1
45 1 ||Body System or Organ Class Dictionary-Derived Term Gastrointestinal disorders Overall|(N=11)|Patients n (%) 3 ( 27)
46 1 ||Body System or Organ Class Dictionary-Derived Term Nausea Overall|(N=11)|Patients n (%) 2 ( 18)
47 1 ||Body System or Organ Class Dictionary-Derived Term Abdominal pain Overall|(N=11)|Patients n (%) 1 ( 9)
48 1 ||Body System or Organ Class Dictionary-Derived Term Vomiting Overall|(N=11)|Patients n (%) 1 ( 9)
31
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Return on Investment
Standard CRF Page
Metadata
Library Significant increase in Quality– 100% QC on all tables
• Whether 1 page or 1000 pages– QC that we can Re-run
• Initial QC takes longer• Re-QC is much faster
32
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls
Questions?
Standard CRF Page
Metadata
Library
33
Indian
Ass
ociat
ion fo
r Stat
istics
in C
linica
l Tria
ls