+ All Categories
Home > Documents > Using SAS to read and write annotations on Adobe .pdf …portal.cdisc.org/CDISC User Networks/North...

Using SAS to read and write annotations on Adobe .pdf …portal.cdisc.org/CDISC User Networks/North...

Date post: 01-May-2018
Category:
Upload: nguyenque
View: 214 times
Download: 0 times
Share this document with a friend
94
Using SAS to read and write annotations on Adobe .pdf files John Fulda DCDISC November 15, 2012 11/15/2012 1
Transcript

Using SAS to read and write annotations on Adobe .pdf files

John Fulda

DCDISC

November 15, 2012

11/15/2012 1

Disclaimer:

This presentation reflects the views and expertise of the author, and should not be construed to represent the

expertise or views of any of the organizations the author has been associated with, either now or in the past.

(with kudos to the FDA)

11/15/2012 2

Background:

The FDA requires a blank Case Report Form (CRF) to be annotated with the variable names and coding for each CRF item included in the data tabulation datasets. This annotated CRF is provided to FDA as an Adobe Acrobat PDF file, named blankcrf.pdf.

The process of creating and validating these CRF annotation is labor-intensive, time-consuming, and subject to human error.

This presentation shows the results of a proof-of-concept project called “AnnoGen” to use SAS to improve this process.

11/15/2012 3

Exporting comments in SAS readable form in a built-in feature to Adobe Reader XI.

Select: <Comment> <Comment List> < < Options> <Export all to data file>

11/15/2012 4

All comments in the parent PDF file will be exported in .FDF file format.

When a .FDF file is selected in Windows Explorer, it launches the parent .PDF file, whose name is imbedded in the FDF file. All comments defined in the FDF file will be ADDED to the parent PDF file, as long as the mapped pages in the FDF file exist in the parent PDF file.

Understanding the FDF file function and format is the key to reading and writing comments to a PDF file.

.

11/15/2012 5

All comments in the parent PDF file will be exported in .FDF file format.

When a .FDF file is selected in Windows Explorer, it launches the parent .PDF file, whose name is imbedded in the FDF file. All comments defined in the FDF file will be ADDED to the parent PDF file, as long as the mapped pages in the FDF file exist in the parent PDF file.

Understanding the FDF file function and format is the key to reading and writing comments to a PDF file.

Good News:

The FDF file contain all of the text, and the metadata of the box which contains the text, including font size and family, size, orientation, color, background color, box line thickness, and box location on a specific page on the parent PDF file.

Knowing the structure of the FDF file permits the program to read, change, create, and write any comment box on any page of a named PDF file.

11/15/2012 6

Bad news: FDF file structure and function is poorly documented. ISO 32000-1 open standard for basic PDF structure. Adobe: Portable Document Format 1.6: 1,236 pages. Adobe: Document management – Portable document format 1.7: 756 pages Adobe Supplement to ISO 32000: 140 pages Only paper: PharmaSug 2004 Paper CC02: Dirk Spruck, Monika Kawohl. Using SAS to Speed up Annotating Case Report Forms in PDF format. http://www.lexjansen.com/pharmasug/2004/coderscorner/cc02.pdf Paper used SAS 8.2 and Adobe Acrobat 5.0. Showed basics of creating an FDF file. Left out some critical information in FDF build. Did not address reading FDF. Acrobat version 11 (XI) added significant element to the FDF file structure. Most are optional. Some involve changes in data structures, and require different programming.

11/15/2012 7

Resources from Adobe:

Acrobat Forms Data Foramt (FDF) Toolkit.

Developed for Acrobat 7.

Libraries for C and Perl, and Java.

But: FDF Toolkit must be installed on a server. It cannot run on client machine.

Offers some clues into the FDF structure, but does not explain.

Google searches did not yield any additional information.

11/15/2012 8

Example: A simple, two page Case Report Form (CRF) in .PDF format.

11/15/2012 9

Creates this FDF File: %FDF-1.2 %âãÏÓ 1 0 obj<</FDF<</Annots[2 0 R 3 0 R 4 0 R 5 0 R 6 0 R 7 0 R 8 0 R 9 0 R 10 0 R]/ID[<0313D2E49F5AAE52B44BA4C875E28D5D><912D01A6E7F4694E8BCE3B36AF297226>]/F(/D/SAS_PROJECTS/AnnoGen/DCDISC Documents/ANNOTATIEDBLANK.pdf)>>>> endobj 2 0 obj<</Rect[530.0 770.0 600.0 792.0]/NM(c6900a73-89d3-494c-a137-00a73dfa700b)/Subtype/FreeText/C[1.0 1.0 0.75]/F 4/Contents(\(DOMAIN \(C2\) = 'DM'\))/M(D:20120924152729)/T(A)/DS(font: Arial 6.0p; text-align: left color:#000000 )/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:10.0.0" xfa:spec="2.0.2" style="font-size:6.0pt;font-weight:normal;font-style:normal;font-family:Arial;font-stretc\ h:normal"><p>\(DOMAIN \(C2\) = 'DM'\)</p></body>)/RD[0.5 0.5 0.5 0.5]/Type/Annot/Page 0>> endobj 3 0 obj<</Rect[336.0 549.0 426.0 560.0]/NM(9455db73-535e-4f03-a4c1-2da95c497b99)/Subtype/FreeText/C[1.0 0.0 0.0]/F 4/Contents(STUDYID \(C10\) = 'ABC-01')/M(D:20120924152729)/T(A)/DS(font: Arial 6.0p; text-align: left color:#4055FF )/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:10.0.0" xfa:spec="2.0.2" style="font-size:6.0pt;font-weight:normal;font-style:normal;font-family:Arial;font-stretc\ h:normal"><p>STUDYID \(C10\) = 'ABC-01'</p></body>)/RD[0.5 0.5 0.5 0.5]/Type/Annot/Page 0>> endobj 4 0 obj<</Rect[407.0 420.0 537.0 442.0]/NM(5cd3250a-9df7-4d73-934f-f5a9e56b3fa1)/Subtype/FreeText/C[1.0 0.0 0.0]/F 4/Contents(\(USUBJID \(C20\) = 'ABC-01-' || DM.SUBJID\))/M(D:20120924152729)/T(A)/DS(font: Arial 6.0p; text-align: left color:#4055FF )/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:10.0.0" xfa:spec="2.0.2" style="font-size:6.0pt;font-weight:normal;font-style:normal;font-family:Arial;font-stretc\ h:normal"><p>\(USUBJID \(C20\) = 'ABC-01-' || DM.SUBJID\)</p></body>)/RD[0.5 0.5 0.5 0.5]/Type/Annot/Page 0>> endobj 5 0 obj<</Rect[285.0 401.0 326.0 423.0]/NM(d51314f5-f107-4a7b-8b0f-d18e87eaeaa4)/Subtype/FreeText/C[1.0 1.0 0.666656]/F 4/Contents(SITEID \(C2\))/M(D:20120924152729)/T(A)/DS(font: Arial 6.0p; text-align: left color:#000000 )/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:10.0.0" xfa:spec="2.0.2" style="font-size:6.0pt;font-weight:normal;font-style:normal;font-family:Arial;font-stretc\ h:normal"><p>SITEID \(C2\)</p></body>)/RD[0.5 0.5 0.5 0.5]/Type/Annot/Page 0>> endobj 6 0 obj<</Rect[327.0 377.0 422.0 399.0]/NM(cb65e251-d6c7-4e68-ac5c-5d10316dab93)/Subtype/FreeText/C[1.0 1.0 0.666656]/F 4/Contents(SUBJID \(C10\) format is XX-YYY)/M(D:20120924152729)/T(A)/DS(font: Arial 6.0p; text-align: left color:#4055FF )/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:10.0.0" xfa:spec="2.0.2" style="font-size:6.0pt;font-weight:normal;font-style:normal;font-family:Arial;font-stretc\ h:normal"><p>SUBJID \(C10\) format is XX-YYY</p></body>)/RD[0.5 0.5 0.5 0.5]/Type/Annot/Page 0>> endobj 7 0 obj<</Rect[530.0 770.0 600.0 792.0]/NM(c6900a73-89d3-494c-a137-00a73dfa700b)/Subtype/FreeText/C[1.0 1.0 0.75]/F 4/Contents(\(DOMAIN \(C2\) = 'DM'\))/M(D:20120924152729)/T(A)/DS(font: Arial 6.0p; text-align: left color:#4055FF )/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:10.0.0" xfa:spec="2.0.2" style="font-size:6.0pt;font-weight:normal;font-style:normal;font-family:Arial;font-stretc\ h:normal"><p>\(DOMAIN \(C2\) = 'DM'\)</p></body>)/RD[0.5 0.5 0.5 0.5]/Type/Annot/Page 1>> endobj 8 0 obj<</Rect[388.0 671.0 438.0 693.0]/NM(4e012da9-034e-4762-91d2-8cb86d0eb04d)/Subtype/FreeText/C[0.75 0.75 0.75]/F 4/Contents([Not Submitted])/M(D:20120924152729)/T(A)/DS(font: Arial 6.0p; text-align: left color:#000000 )/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:10.0.0" xfa:spec="2.0.2" style="font-size:6.0pt;font-weight:normal;font-style:normal;font-family:Arial;font-stretc\ h:normal"><p>[Not Submitted]</p></body>)/RD[0.5 0.5 0.5 0.5]/Type/Annot/Page 1>> endobj 9 0 obj<</Rect[361.0 585.0 475.0 607.0]/NM(5c6176de-2620-45b5-9140-e919cf57a480)/Subtype/FreeText/C[1.0 1.0 0.666656]/F 4/Contents(ARM \(C50\) IN \('Three Week Schedule', 'Four Week Schedule'\))/M(D:20120924152729)/T(A)/DS(font: Arial 6.0p; text-align: left color:#4055FF )/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:10.0.0" xfa:spec="2.0.2" style="font-size:6.0pt;font-weight:normal;font-style:normal;font-family:Arial;font-stretc\ h:normal"><p>ARM \(C50\) IN \('Three Week Schedule', 'Four Week Schedule'\)</p></body>)/RD[0.5 0.5 0.5 0.5]/Type/Annot/Page 1>> endobj 10 0 obj<</Rect[362.0 570.0 466.0 592.0]/NM(30a35bca-6c40-451d-b66b-bd6af0607052)/Subtype/FreeText/C[1.0 1.0 0.666656]/F 4/Contents(ARMCD \(C8\) IN \('THREE', 'FOUR'\))/M(D:20120924152729)/T(A)/DS(font: Arial 6.0p; text-align: left color:#4055FF )/RC(<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:10.0.0" xfa:spec="2.0.2" style="font-size:6.0pt;font-weight:normal;font-style:normal;font-family:Arial;font-stretc\ h:normal"><p>ARMCD \(C8\) IN \('THREE', 'FOUR'\)</p></body>)/RD[0.5 0.5 0.5 0.5]/Type/Annot/Page 1>> endobj trailer <</Root 1 0 R>> %%EOF

11/15/2012 10

Results of creating SAS code to read, modify, and write an FDF file:

Proof of Concept Project: AnnoGen (Annotations Generator).

Solved the problem of reading FDF file, to the degree that SAS could create an FDF file that would be validly read by Adobe Acrobat, viewer or Professional.

The text of the comment boxes, and all the metadata about the comment boxes was able to be manipulated by SAS.

Examples from this project are presented in the next set of slides.

11/15/2012 11

Captured Comments:

Both the contents of each Comment box and the metadata about that box were captured.

Data was exported into Excel for easy of checking.

Following slide is an example of captured data.

11/15/2012 12

11/15/2012 13

11/15/2012 14

11/15/2012 15

Changing Metadata and Creating a new PDF.

Text and Metadata shown on the Excel sheet were used to create an updated version of the original PDF.

Often a new CRF page will not match the old CRF page.

This is particularly true in a CRO environment, when mapping the same Domain to different layouts used by different sponsors.

One way to solve this problem is to list all the fields on the new page, and then have the annotator drag the boxes to the appropriate location. This frees the annotator from entering the content of the boxes. Their only task is to match the comment to the field on the page. Any comments not used can be deleted.

Comment can be move from page to page if needed when a singe domain spans more than one page on the CRF.

11/15/2012 16

11/15/2012 17

11/15/2012 18

Ghost Comments were discovered when reading the FDF file.

These are comments that are copied and pasted from a source PDF file into a destination PDF file, usually by a SELECT ALL command.

If the geography of the comments on the source page does not match exactly the geography of the comments on the destination page, some of the copied comments will be completely off the page, and invisible unless VIEW; ZOOM is set to less than 100%, showing the space around the page.

If the view is 100%, or is ZOOM TO PAGE LEVEL (as is normally used), these comments will not been seen.

11/15/2012 19

11/15/2012 20

11/15/2012 21

Mapping from old to new CRF.

To map from an old CRF to a new CRF required creating a map file.

This file, called agPageMap.xls, was completed by the annotator.

The mapper was given the option to do an identical geography map, from one page to another, or to stack all the comments into the margin.

The options was included to rotate all comments from a portrait source to a landscape destination.

11/15/2012 22

11/15/2012 23

11/15/2012 24

Process Flow.

Following is a flowchart, showing how a source CRF is copied to a destination CRF.

11/15/2012 25

11/15/2012 26

11/15/2012 27

11/15/2012 28

11/15/2012 29

11/15/2012 30

11/15/2012 31

FDF Header:

11/15/2012 32

With one change in the process,

we eliminate the need for a physical, original, source PDF.

Using any page map in agPageMap.xls format to locate the comments, SAS can create the Old.FDF file programmatically.

The page map itself can be created manually.

It could also be the result of a SAS program that read the data and metadata from a source PDF, and used that information to create a new agPageMap Excel sheet.

The Process Flow to create a new PDF is the next slide.

11/15/2012 33

11/15/2012 34

Alternative to reading an existing PDF file: SAS can create all the text and metadata directly from a SAS database, without the need for a source document.

11/15/2012 35

11/15/2012 36

Technical issues in reading the FDF file:

This next section gets more technical, but is important to understand how SAS needs to process the FDF data to achieve the results shown in the previous flow chart.

It is not necessary to understand the technical details presented.

The intent is to scope out the complexity of what needs to be understood when processing FDF files.

If you understand the complexity of some of these issues, you should be able to appreciate the level of technical expertise needed to solve the programming issues.

These details are only important if you decide to do your own programming.

Hang in there!

11/15/2012 37

Technical issues in reading the FDF file:

The size, record length, and number of physical records can change by the very act of reading the FDF file, even if the file is NOT explicitly saved!

This behavior may or may not occur, depending on a number of factors. But when it does, it plays havoc with any program logic trying to read the file. You never know from run to run what file structure you will encounter.

11/15/2012 38

Technical issues in reading the FDF file:

The size, record length, and number of physical records can change by the very act of reading the FDF file, even if the file is NOT explicitly saved!

This behavior may or may not occur, depending on a number of factors. But when it does, it plays havoc with any program logic trying to read the file. You never know from run to run what file structure you will encounter.

Cause: FDF was designed to be platform independent, and run in Unix and Mac environments. FDF uses a Carriage Return (CR) ‘0D’x as the only record delimiter.

Windows expects either a Line Feed (LF) ‘0A’x or CR,LF byte-sequence as a record delimiter. Windows will INSERT the missing CR byte into the FDF file under certain conditions.

Solution: Read the FDF file one-byte at a time. Create new record from concatenated bytes. Ignore LF. When CR is found, output concatenated bytes as a physical record, clear concatenation, and continue until end of file .

11/15/2012 39

File Structure:

FDF uses \ at end of physical records to indicate continuation of logical record.

Process SAS observations to concatenate multiple physical records into single logical record.

11/15/2012 40

File Structure:

There are three types of FDF records: header, body, and trailer.

All logical body records end with a physical record containing only the text ‘endobj’. Use this to group physical records into logical records.

Comments are not the only type of body records.

However, in reading comments from and existing PDF, any body records that are not comments can be ignored.

11/15/2012 41

IGNORE HTML:

FDF records also include large blocks of HTML code, as well as native FDF code. In some cases the metadata and the text of the HTML code conflicts with the native FDF code.

For example, the native FDF code may specify font family: Arial, while the HTML code may specify font family: Helvetica.

SAS programming logic is greatly simplified by not supporting any HTML code.

Throw away any HTML code when reading FDF records.

Do not create any HTML code when writing FDF records.

Acrobat will dynamically add any needed missing HTML code to the PDF document if the HTML code is not found in the FDF file.

11/15/2012 42

Three Generations of FDF records

The FDF file goes through three generation.

The First is the FDF file exported from the source PDF file.

The Second is the FDF file created by SAS, and imported back into a new and usually comment-free PDF file. It will be a bare subset of the first file, with changes made by SAS.

The Third FDF file is exported from the new PDF file, created from the Second file. Adobe Acrobat fills in all missing and optional fields not found when it read the Second file, and includes then when it exports this new FDF file. The Third file is shown for learning purposes only, and is not used in the creation process.

11/15/2012 43

11/15/2012 44

11/15/2012 45

11/15/2012 46

11/15/2012 47

11/15/2012 48

11/15/2012 49

11/15/2012 50

Note the changes and commonalities between the three versions of the FDF files. Some fields in red do not change at all. The XML section in blue was not imported or created by SAS, but is dynamically created by Acrobat when the new file is exported. The key fields, /NM and /Page have the same value, but do not have to appear in the same sequence from file to file.

The value of /RD was changed by the SAS program, and retained in the new file, as was /Rect.

As long as the key fields are present, the new PDF file will be annotated with the FDF information. Fields in the FDF file do not have to be in the same sequence. When the new FDF file is created from the new PDF file, the output fields will be resequenced.

FDF Footer:

11/15/2012 51

Programming Tips:

11/15/2012 52

Following are tips and tricks that were learned during this proof-of-concept project. They will help the SAS programmer and developer in writing the SAS code needed to read, modify, and create FDF files. This presentation is not intended to show a complete, working system from start to finish. This presentation provides the SAS programmer and developer with a way to “jump-start” the development of a FDF read-write system. Because this task has been done one, it can be done again.

Programming Tips:

11/15/2012 53

Yuk!

Programming Tips:

11/15/2012 54

Yum!

Programming Tips:

Read the original FDF file byte by byte.

Convert into SAS data where each observation is a logical record.

Eliminate record types: LineArrow and other non-comment records.

SAS-created FDF records can have fields in any sequence.

Not all FDF records will have the same fields.

Older versions of Acrobat use leading blanks in numerics; later version use leading zeros for numeric fields.

11/15/2012 55

Programming Tips:

For human readability, create logical groups of FDF text on different lines.

Blank lines in FDF files are ignored. So use them to separate each comment.

FDF pages start with 0; PDF pages start with 1.

Must be one entry in the header record for each comment, following the pattern: comment number, “ 0 R”.

The first two records in all FDF files have the same values:

Record one is: “%FDF-1.2”

Record two is: "%" || 'E2'x || 'E3'x || 'CF'x || 'D3'x;

11/15/2012 56

Programming Tips:

FDF Colors are Red, Green, Blue, from 0.0 to 1.0.

Use SAS to round color values to standard values.

The human eye can only see so many shades of color.

11/15/2012 57

Programming Tips: Sample of standard colors:

11/15/2012 58

11/15/2012 59

Programming Tips: Full set of 125 standard colors

Programming Tips:

Box coordinates start at lower-left of page, and are x-y axis.

Units are points, where 72 points = 1 inch.

Comment boxes define as:

Lower Left-X, Lower Left-Y; Upper Right X, Upper Right Y.

Code from FTF file example: /Rect[336.000 549.000 426.000 560.000]

336= Lower Left of box 4.7 inches from left margin of page.

549= Lower Left of box 7.6 inches from bottom of page.

426= Upper Right of box 5.9 inches from left margin. Length = (5.9 - 4.7 = 1.2 inches)

560= Upper Right of box 7.8 inches from bottom of page. Height = 0.2 inches.

11/15/2012 60

11/15/2012 61

Programming Tips:

Programming Tips:

Standardize on font name and font size for consistent look.

Standard font size makes it easier to compute box size when creating your own comment box from scratch. Use an algorithm to optimize box height and width.

Check for unbalanced bracketing symbols: { }, ( ), [ ] in text. Unbalanced symbols will cause FDF file not to load. No diagnostics given.

Use binary select to eliminate some comments until the comment causing the error is identified. Keep eliminating records until FDF file loads ok.

This is most frustrating part of FDF builds, and can normally be avoided if syntax checking is added to the FDF build.

11/15/2012 62

Programming Tips:

Consider eliminating invalid ASCII characters (value => 128). ASCII is a 7-bit standard and does NOT include or permit 8-bit values.

High-ASCII codes are no longer permitted in SDTM or ADaM as of v3.1.3.

OpenCDISC will find and flag as errors.

Consider equivalent replacements where appropriate.

Also check for low ASCII (value <32).

High values codes usually come from cut-and-paste text from PDF typeset documents, Excel files, or outside-USA sources.

11/15/2012 63

11/15/2012 64

11/15/2012 65

11/15/2012 66

Programming Tips:

Parse FDF elements to get values. Store in SAS variables. Use to create new FDF text.

Unique comment id: NMID and Page Number are the unique key.

NMID values appear to be arbitrary, with no internal references. They must be unique.

Example: /NM(9455db73-535e-4f03-a4c1-2da95c497b99) /Page 0

Trick: embed the following: ad0be in a SAS-generated /NM value, as a human-readable flag. Or and other combination of words that can be created from the following characters: abcde 01.

11/15/2012 67

Programming Tips:

Not all FDF records will follow the same format. Different versions of Adobe create differently structured FDF files. However, all versions of FDF will have the same core elements. Be flexible with parsing. Expect some differences. Test on different samples. Build in exception testing to handle new formats.

To test what different elements do, create a PDF file with only one comment. Generate FDF file from it. Then edit FDF, change one item, and load FDF. That is the best way to learn what the different components of the FDF record do.

11/15/2012 68

Programming Tips:

By exporting data to Excel, it is easier to visualize the data and metadata for each comment.

Once a unique comment is identified, by page number, and text value, or location on page, then changes can be made by SAS to the text, or to the metadata, including changing the page number and location on the page.

11/15/2012 69

Programming Tips:

There must be the name of the parent file to which the FDF file is associated with. You must use the UNC format and not the Window path. This is particularly important for networked files. If in doubt, export the FDF file, and look at the structure in that header.

/F(/Server01/shared/AG/Work/BlankCRF.pdf)

Do not create an XML segment in the new FDF file. Not needed.

11/15/2012 70

Programming Tips:

Many elements in the FDF file are optional. Adobe Acrobat will create them if they are missing. When building a FDF file from scratch, start with the basics.

If a PDF file is NOT BLANK, then the FDF file will overlay any existing comments. This permits multiple and complementary FDF files to be combined into a single PDF file.

The danger is retaining unwanted comments from the target PDF file named in the FDF file header.

11/15/2012 71

Programming Tips:

Text read from the comments can be processed against a STDM database for compliance with standards.

Text boxes for one purpose, such as Oracle Clinical, can have the text replaced using a 1 to 1 match from a lookup table.

Using an external page map in Excel, predefined sets of comment boxes can be loaded onto a page. These can show up either in the margins, or on predefined locations of the CRF.

11/15/2012 72

Programming Tips:

Standard CRFs can be associated with standard annotation sets, by Domain.

Internal text or metadata can permit an annotated CRF to be indexed as to content, and then have new comments defined by page.

There is no easy way to delete selected comments from a CRF using SAS.

Adobe Acrobat does permit the manual selection and deletion of all comments.

This is an easy way to get a blank CRF as a working target.

Expect surprises. Test all new .PDF documents. This learning is all trial-and-error. It is based on experimental results, not on reading the manual. However, it has worked with over 30 different .PDF files from many different sources.

11/15/2012 73

Summary Results of SAS Programming:

1. Move comments from old page to new page.

11/15/2012 74

Summary Results of SAS Programming:

1. Move comments from old page to new page.

2. Copy same source page to multiple destination pages.

11/15/2012 75

Summary Results of SAS Programming:

1. Move comments from old page to new page.

2. Copy same source page to multiple destination pages.

3. Copy from multiple source pages to the same destination page.

11/15/2012 76

Summary Results of SAS Programming:

1. Move comments from old page to new page.

2. Copy same source page to multiple destination pages.

3. Copy from multiple source pages to the same destination page.

4. Copy from source page to destination page that already has comments.

11/15/2012 77

Summary Results of SAS Programming:

1. Move comments from old page to new page.

2. Copy same source page to multiple destination pages.

3. Copy from multiple source pages to the same destination page.

4. Copy from source page to destination page that already has comments.

5. Rotate and reposition all comments from portrait-format page, to landscape-formatted page, and visa-versa.

11/15/2012 78

Summary Results of SAS Programming:

1. Move comments from old page to new page.

2. Copy same source page to multiple destination pages.

3. Copy from multiple source pages to the same destination page.

4. Copy from source page to destination page that already has comments.

5. Rotate and reposition all comments from portrait-format page, to landscape-formatted page, and visa-versa.

6. Change font and font-size for all comments to standard font and font-size.

11/15/2012 79

Summary Results of SAS Programming:

1. Move comments from old page to new page.

2. Copy same source page to multiple destination pages.

3. Copy from multiple source pages to the same destination page.

4. Copy from source page to destination page that already has comments.

5. Rotate and reposition all comments from portrait-format page, to landscape-formatted page, and visa-versa.

6. Change font and font-size for all comments to standard font and font-size.

7. Change font color, border line color, border line thickness, and background color to standard attributes.

11/15/2012 80

Summary Results of SAS Programming:

1. Move comments from old page to new page.

2. Copy same source page to multiple destination pages.

3. Copy from multiple source pages to the same destination page.

4. Copy from source page to destination page that already has comments.

5. Rotate and reposition all comments from portrait-format page, to landscape-formatted page, and visa-versa.

6. Change font and font-size for all comments to standard font and font-size.

7. Change font color, border line color, border line thickness, and background color to standard attributes.

8. Change all colors from 16 million possible colors (2563) to 125 colors (53). This improves the look of the layout.

11/15/2012 81

Summary Results of SAS Programming:

1. Move comments from old page to new page.

2. Copy same source page to multiple destination pages.

3. Copy from multiple source pages to the same destination page.

4. Copy from source page to destination page that already has comments.

5. Rotate and reposition all comments from portrait-format page, to landscape-formatted page, and visa-versa.

6. Change font and font-size for all comments to standard font and font-size.

7. Change font color, border line color, border line thickness, and background color to standard attributes.

8. Change all colors from 16 million possible colors (2563) to 125 colors (53). This improves the look of the layout.

9. Dynamically resize comment boxes to give enough space to text, while avoiding excessive white space.

11/15/2012 82

What Was Learned:

1. SAS can be used to read, change and write Adobe FDF files.

2. The text of PDF comments, and the text metadata - font family, font size, box size, location of box on page – can be created or modified by a SAS program.

3. Text and metadata can be “harvested” from any existing PDF file, and integrated into a SAS system.

4. Text and metadata can be originated by SAS, and used to populated either a blank PDF file, or a PDF file with existing comments.

5. A limited subset of FDF structure and function is all that is required to create a functional system in SAS.

6. Additional knowledge of FDF structure and function will lead to new features in SAS programming.

11/15/2012 83

What Was Learned:

1. SAS can be used to read, change and write Adobe FDF files.

2. The text of PDF comments, and the text metadata - font family, font size, box size, location of box on page – can be created or modified by a SAS program.

3. Text and metadata can be “harvested” from any existing PDF file, and integrated into a SAS system.

4. Text and metadata can be originated by SAS, and used to populated either a blank PDF file, or a PDF file with existing comments.

5. A limited subset of FDF structure and function is all that is required to create a functional system in SAS.

6. Additional knowledge of FDF structure and function will lead to new features in SAS programming.

11/15/2012 84

What Was Learned:

1. SAS can be used to read, change and write Adobe FDF files.

2. The text of PDF comments, and the text metadata - font family, font size, box size, location of box on page – can be created or modified by a SAS program.

3. Text and metadata can be “harvested” from any existing PDF file, and integrated into a SAS system.

4. Text and metadata can be originated by SAS, and used to populated either a blank PDF file, or a PDF file with existing comments.

5. A limited subset of FDF structure and function is all that is required to create a functional system in SAS.

6. Additional knowledge of FDF structure and function will lead to new features in SAS programming.

11/15/2012 85

What Was Learned:

1. SAS can be used to read, change and write Adobe FDF files.

2. The text of PDF comments, and the text metadata - font family, font size, box size, location of box on page – can be created or modified by a SAS program.

3. Text and metadata can be “harvested” from any existing PDF file, and integrated into a SAS system.

4. Text and metadata can be originated by SAS, and used to populated either a blank PDF file, or a PDF file with existing comments.

5. A limited subset of FDF structure and function is all that is required to create a functional system in SAS.

6. Additional knowledge of FDF structure and function will lead to new features in SAS programming.

11/15/2012 86

What Was Learned:

1. SAS can be used to read, change and write Adobe FDF files.

2. The text of PDF comments, and the text metadata - font family, font size, box size, location of box on page – can be created or modified by a SAS program.

3. Text and metadata can be “harvested” from any existing PDF file, and integrated into a SAS system.

4. Text and metadata can be originated by SAS, and used to populated either a blank PDF file, or a PDF file with existing comments.

5. A limited subset of FDF structure and function is all that is required to create a functional system in SAS.

6. Additional knowledge of FDF structure and function will lead to new features in SAS programming.

11/15/2012 87

What Was Learned:

1. SAS can be used to read, change and write Adobe FDF files.

2. The text of PDF comments, and the text metadata - font family, font size, box size, location of box on page – can be created or modified by a SAS program.

3. Text and metadata can be “harvested” from any existing PDF file, and integrated into a SAS system.

4. Text and metadata can be originated by SAS, and used to populated either a blank PDF file, or a PDF file with existing comments.

5. A limited subset of FDF structure and function is all that is required to create a functional system in SAS.

6. Additional knowledge of FDF structure and function will lead to new features in SAS programming.

11/15/2012 88

Conclusions:

1. The AnnoGen project presented here successfully demonstrates that SAS can be used to manage comments in a PDF file.

2. The risk of success has been significantly decreased by the programming of a working proof-of-concept systems.

3. Lessons learned can be easily applied to the development of a live application.

4. The benefits of adopting this solutions are limited only to the needs and imagination of the users.

11/15/2012 89

Conclusions:

1. The AnnoGen project presented here successfully demonstrates that SAS can be used to manage comments in a PDF file.

2. The risk of success has been significantly decreased by the programming of a working proof-of-concept systems.

3. Lessons learned can be easily applied to the development of a live application.

4. The benefits of adopting this solutions are limited only to the needs and imagination of the users.

11/15/2012 90

Conclusions:

1. The AnnoGen project presented here successfully demonstrates that SAS can be used to manage comments in a PDF file.

2. The risk of success has been significantly decreased by the programming of a working proof-of-concept systems.

3. Lessons learned can be easily applied to the development of a live application.

4. The benefits of adopting this solutions are limited only to the needs and imagination of the users.

11/15/2012 91

Conclusions:

1. The AnnoGen project presented here successfully demonstrates that SAS can be used to manage comments in a PDF file.

2. The risk of success has been significantly decreased by the programming of a working proof-of-concept systems.

3. Lessons learned can be easily applied to the development of a live application.

4. The benefits of adopting this solutions are limited only to the needs and imagination of the users.

11/15/2012 92

Conclusions:

1. The AnnoGen project presented here successfully demonstrates that SAS can be used to manage comments in a PDF file.

2. The risk of success has been significantly decreased by the programming of a working proof-of-concept systems.

3. Lessons learned can be easily applied to the development of a live application.

4. The benefits of adopting this solutions are limited only to the needs and imagination of the users.

Lesson learned:

SAS can successfully be used to read and write

annotations on Adobe .pdf files.

11/15/2012 93

Using SAS to read and write annotations on Adobe .pdf files

Questions?

11/15/2012 94

[email protected] [email protected]

443.600.9271

39 Washington Street

Frostburg, Maryland 21532


Recommended