Thank You ©2012, Cognizant SICPA Test Automation Consulting Proposal PDF AUTOMATION Pro The third...

transcript

Thank You

SICPA Test Automation

Consulting Proposal

PDF AUTOMATION ProThe third eye for all your PDF Automation needs

PDF Automation Pro has been created by the Research and Development

team from Cognizant’s Automation Centre of Excellence

PDF Automation Pro is continuously enhanced and updated by the R&D

team, based on feedback from the end users of the tool

PDF Automation Pro has a dedicated helpdesk to assist end users with

implementation and troubleshooting

PDF Automation Pro helps to significantly reduce the manual

effort required for PDF automation

PDF Automation Pro eliminates any manual errors which might

creep in, especially for large documents

PDF Automation Pro fits perfectly into the existing test

infrastructure and enables integration with end-to-end tests

Cognizant’s solution for test automation of PDF documents, consisting of a

suite of 3 tools

Designed to address most aspects of PDF automation such as comparison of

similar documents, extraction of specific data from a document, automating

an interactive form, etc.

Supports integration with most of the functional testing tools in the market

PDF Probe

Provides a solution to extract specific

content from within a PDF document

based on user defined criteria.

The extracted content can

subsequently be validated against an

expected result for testing purposes.

PDF Assist

Provides a solution for comparing

PDF documents and reporting the

differences, if any

Also supports comparison of a

PDF document with an MS Word

document

PDF PerFORM

Provides a solution for

automation of PDF interactive

forms.

Supports filling up empty

forms as well as extracting

data from filled-in forms

Core Features Each of these tools comes with a simple and user friendly GUI which can be directly used to

automate the PDF documents as required.

In addition, all the 3 tools expose APIs which enable easy integration with most of the

functional automation tools in the market.

A handy code generator is included with all the tools, which automatically generates the API

calls required to automate the PDF documents as required. These code generators support

multiple languages including VBScript, C#, VB.NET and Java, and generate code which is

consistent with Cognizant’s accepted standards and conventions.

Comparison Features Textual content comparison Font size comparison Font family comparison Font style comparison (Bold and Italics) Font colour comparison Line spacing comparison Whitespace comparison

Special Features Ability to compare a specified range of pages Batch comparison of multiple document sets Batch comparison of multiple documents against a

specified template

Provision to ignore the case (uppercase/lowercase) while comparing

Supports comparison of multi-column text, tables, header and footer

Supports comparison of password protected documents

Comparison Reports Visual report in HTML format Detailed report in Excel format (optional) Both reports contain a high level summary, as well

as corresponding performance statistics

Highlights

DOCUMENT COMPARATOR

Text Extraction features Get the occurrence count of a specified word Get the word next to a given search key Get the text in between two specified words Get the hash value for a given key based on a

specified delimiter (for key-value pairs separated by a delimiter such as “:”)

Get the metadata of a given word, including font name, colour, width, etc.

Get the document metadata , including PDF Author,PDF title,PDF producer etc.

Special features Enables fine-tuning the content extraction with

features such as limiting the search to a specified range of pages, case sensitive searching, etc.

Supports searching within tables as well as document headers/footers

Supports extracting content from password protected documents

Image Extraction features Extract the specified image from the document Get the metadata of a specified image, including

the position, dimensions, and pixel-by-pixel data

UI features Clearly displays the description, input parameters

and return values for the API selected Validates the user inputs to ensure that they are

within acceptable boundaries

Highlights

EXTRACT

SEARCH

VALIDATEAPPLICATION

PROGRAMMING INTERFACE

Form filling features Get the complete list of form fields from the

document loaded Select specific fields to be filled in – this

includes all types of fields such as textboxes, checkboxes, radio buttons, etc.

Specify appropriate values for the selected fields

Fill the specified values and save the filled form into a specified location

Form values extraction/validation features Get the complete list of form fields from the

document loaded Select specific fields whose values are to be

extracted – this includes all types of fields such as textboxes, checkboxes, radio buttons, etc.

If required, specify the expected values for the selected fields

Extract the values from the fields specified Compare the extracted values with the expected

results (if specified), and report any differences found

Highlights

APPLICATION PROGRAMMING

INTERFACE

VALIDATE

EXTRACT

DiffPDF DiffDoc Adobe Acrobat Pro i-net PDFC PDF Probe

Textual content comparison (including headers, footers, tables, multi-column text, etc.)

Comparison of metadata such as font color, font family, font size, font style, etc.

Comparison of images

Partially possible, using the "Compare Appearance" mode

Integration with functional automation tools

Execution can be triggered using the command line, but

the comparison results cannot be

retrieved and reported from the automation tool

Execution can be triggered using the command line, but

the comparison results cannot be

retrieved and reported from the automation tool

No API or command line

execution possible to

enable integrations with

functional automation tools

A Java API is provided, which enables integration with any Java

based automation tool; this can be used for continuous integration as well. Apart from this, a command

line option is also available, however the comparison results cannot be retrieved and reported from the

automation tool in this case.

Yes, the API provided enables integration

with most of the automation tools. This

can be used for continuous integration as well. The API calls

are automatically generated by the tool.

Support for password protected documents

Provision for bulk comparison and template comparison

Visual report highlighting the differences

Detailed report documenting the differences

Compare MS Word with PDFLicensing Open Source Licensed Licensed Licensed Priced

.)Document content extraction tools:

There are many tools which enable extraction of content from a PDF document

However, such tools provide only basic features such as extracting all the text from the document or from a specific

None of the tools provide the range of search criteria as provided by PDF Assist, which helps to really zero in on the

exact content required to be extracted from the document

To sum up, PDF Assist is probably the most advanced tool in the PDF content extraction space

Interactive PDF forms automation tools:

There are many APIs available which enable the automation of PDF forms by writing appropriate scripts

Adobe has also released its Adobe Test Toolkit to cater to this requirement, however, the tool has not really matured

The USP of PDF PerFORM in this space is the code generation facility, as well as the ability to directly fill an empty form

or extract content from a filled-in form through the GUI provided

Appendix

itatio

General:

Documents created by non-standard PDF writers may not be processed properly.

If a single word or a single line contains multiple font faces, the results may be unexpected.

The time taken to load the document for processing is directly proportional to the size of the document. Large

documents may take a long time to load.

PDF Probe:

Images cannot be compared. The recommended approach here is to use PDF Assist to extract the required images and

use any available image comparison algorithms.

Values in form fields like checkboxes, radio buttons, etc. cannot be compared, and the presence of such fields may

affect the accuracy of the comparison.

When images are present in the document, the line spacing comparison might be affected.

The comparison may be inaccurate if there are significant differences with respect to margin and line spacing between

the documents.

Split sections within documents are supported; however, the word wrapping must be similar across the source and

target documents.

itatio

PDF Probe (contd.):

Word documents with tables in headers cannot be compared.

If there is a text content deviation together with any other deviation like font size, color, etc., only the text deviation will

be highlighted in the tool’s HTML report. The Excel report, however, will capture all the differences.

Documents may not be compared properly if the font size of the words in the document is too small

Border lines ,underline, table borders may not be displayed in html report

The comparison may be inaccurate if the same content of source scattered in different position(page) of target

document.

PDF Probe does not support Page range for WORD-PDF.

WORD- PDF comparison’s performance is slower than PDF-PDF comparison.

Based on the coordinates retrieved by third party tool(Used internally for retrieving the PDF content), the html report

are generated. Therefore html report accuracy it depends on the quality of PDF.

Word Document with Image can give unexpected results

Tool will read the content line by line even though it is a table. It won’t read the values cell by cell or column by column.

Therefore if you find any text deviation in a line together with any other deviation like font size, color, etc., only the text

deviation will be highlighted in the tool’s HTML report and in Excel report

itatio

PDF Assist:

General

Images accessed using PDF Assist reflect the properties of the original image file, even if some of these

properties may have changed while embedding it into the document. For example:

The image may have been resized within the document, but PDF Assist will return only the original size

of the image.

The image may have been rotated by some angle while placing it into the document, but PDF Assist will

return the original orientation of the image.

In rare cases, words in upper case may be wrongly perceived by PDF Assist to be lower case.

Values in form fields like checkboxes and special characters cannot be extracted.

Split sections within documents are supported; however, the following points must be taken into consideration:

PDF Assist considers each line of text as one cutting across all the sections.

In some documents, the split sections may not be aligned equally on the horizontal plane, causing PDF

Assist to read each of the sectioned portions as a separate line.

In some cases, images in “.tiff” format may be recognized as “.png” images.

Though the API supports Java, it is not possible to use the API in platforms other than Windows.

itatio

Certain documents may not load properly in the UI; however, this will not affect the working of the API. For

example:

If there is any text overlapping on top of an image, it may not be rendered properly.

If there is any text which is aligned vertically in the document, it will be rendered horizontally within the

For API functions which return an array, the UI generates code only for the first element in the array. This code

has to be extended if the user needs to access other elements of the array.

PDF PerFORM:

The API does not have any provision to obtain the page numbers under which each of the form fields are present

(unless the document contains bookmarks)

Thank you