Post on 23-Dec-2015
transcript
Thank You
©2012, Cognizant
SICPA Test Automation
Consulting Proposal
PDF AUTOMATION ProThe third eye for all your PDF Automation needs
What?
PDF Automation Pro has been created by the Research and Development
team from Cognizant’s Automation Centre of Excellence
PDF Automation Pro is continuously enhanced and updated by the R&D
team, based on feedback from the end users of the tool
PDF Automation Pro has a dedicated helpdesk to assist end users with
implementation and troubleshooting
PDF Automation Pro helps to significantly reduce the manual
effort required for PDF automation
PDF Automation Pro eliminates any manual errors which might
creep in, especially for large documents
PDF Automation Pro fits perfectly into the existing test
infrastructure and enables integration with end-to-end tests
Cognizant’s solution for test automation of PDF documents, consisting of a
suite of 3 tools
Designed to address most aspects of PDF automation such as comparison of
similar documents, extraction of specific data from a document, automating
an interactive form, etc.
Supports integration with most of the functional testing tools in the market
Intr
oduc
tion
to P
DF
Auto
mati
on P
ro
Who?
Why?
PDF Probe
Provides a solution to extract specific
content from within a PDF document
based on user defined criteria.
The extracted content can
subsequently be validated against an
expected result for testing purposes.
PDF Assist
Provides a solution for comparing
PDF documents and reporting the
differences, if any
Also supports comparison of a
PDF document with an MS Word
document
Ove
rvie
w o
f PD
F Au
tom
ation
Pro
PDF PerFORM
Provides a solution for
automation of PDF interactive
forms.
Supports filling up empty
forms as well as extracting
data from filled-in forms
Core Features Each of these tools comes with a simple and user friendly GUI which can be directly used to
automate the PDF documents as required.
In addition, all the 3 tools expose APIs which enable easy integration with most of the
functional automation tools in the market.
A handy code generator is included with all the tools, which automatically generates the API
calls required to automate the PDF documents as required. These code generators support
multiple languages including VBScript, C#, VB.NET and Java, and generate code which is
consistent with Cognizant’s accepted standards and conventions.
Ove
rvie
w o
f PD
F Pr
obe
Comparison Features Textual content comparison Font size comparison Font family comparison Font style comparison (Bold and Italics) Font colour comparison Line spacing comparison Whitespace comparison
Special Features Ability to compare a specified range of pages Batch comparison of multiple document sets Batch comparison of multiple documents against a
specified template
Provision to ignore the case (uppercase/lowercase) while comparing
Supports comparison of multi-column text, tables, header and footer
Supports comparison of password protected documents
Comparison Reports Visual report in HTML format Detailed report in Excel format (optional) Both reports contain a high level summary, as well
as corresponding performance statistics
Highlights
TRG
DOCUMENT COMPARATOR
SRC
Ove
rvie
w o
f PD
F As
sist
Text Extraction features Get the occurrence count of a specified word Get the word next to a given search key Get the text in between two specified words Get the hash value for a given key based on a
specified delimiter (for key-value pairs separated by a delimiter such as “:”)
Get the metadata of a given word, including font name, colour, width, etc.
Get the document metadata , including PDF Author,PDF title,PDF producer etc.
Special features Enables fine-tuning the content extraction with
features such as limiting the search to a specified range of pages, case sensitive searching, etc.
Supports searching within tables as well as document headers/footers
Supports extracting content from password protected documents
Image Extraction features Extract the specified image from the document Get the metadata of a specified image, including
the position, dimensions, and pixel-by-pixel data
UI features Clearly displays the description, input parameters
and return values for the API selected Validates the user inputs to ensure that they are
within acceptable boundaries
Highlights
EXTRACT
SEARCH
VALIDATEAPPLICATION
PROGRAMMING INTERFACE
Ove
rvie
w o
f PD
F Pe
rFO
RM
Form filling features Get the complete list of form fields from the
document loaded Select specific fields to be filled in – this
includes all types of fields such as textboxes, checkboxes, radio buttons, etc.
Specify appropriate values for the selected fields
Fill the specified values and save the filled form into a specified location
Form values extraction/validation features Get the complete list of form fields from the
document loaded Select specific fields whose values are to be
extracted – this includes all types of fields such as textboxes, checkboxes, radio buttons, etc.
If required, specify the expected values for the selected fields
Extract the values from the fields specified Compare the extracted values with the expected
results (if specified), and report any differences found
Highlights
APPLICATION PROGRAMMING
INTERFACE
FILL
VALIDATE
EXTRACT
Prob
e Co
mpa
rison
with
oth
er to
ols
DiffPDF DiffDoc Adobe Acrobat Pro i-net PDFC PDF Probe
Textual content comparison (including headers, footers, tables, multi-column text, etc.)
Comparison of metadata such as font color, font family, font size, font style, etc.
Comparison of images
Partially possible, using the "Compare Appearance" mode
Integration with functional automation tools
Execution can be triggered using the command line, but
the comparison results cannot be
retrieved and reported from the automation tool
Execution can be triggered using the command line, but
the comparison results cannot be
retrieved and reported from the automation tool
No API or command line
execution possible to
enable integrations with
functional automation tools
A Java API is provided, which enables integration with any Java
based automation tool; this can be used for continuous integration as well. Apart from this, a command
line option is also available, however the comparison results cannot be retrieved and reported from the
automation tool in this case.
Yes, the API provided enables integration
with most of the automation tools. This
can be used for continuous integration as well. The API calls
are automatically generated by the tool.
Support for password protected documents
Provision for bulk comparison and template comparison
Visual report highlighting the differences
Detailed report documenting the differences
Compare MS Word with PDFLicensing Open Source Licensed Licensed Licensed Priced
Com
paris
on w
ith o
ther
tool
s (c
ontd
.)Document content extraction tools:
There are many tools which enable extraction of content from a PDF document
However, such tools provide only basic features such as extracting all the text from the document or from a specific
page
None of the tools provide the range of search criteria as provided by PDF Assist, which helps to really zero in on the
exact content required to be extracted from the document
To sum up, PDF Assist is probably the most advanced tool in the PDF content extraction space
Interactive PDF forms automation tools:
There are many APIs available which enable the automation of PDF forms by writing appropriate scripts
Adobe has also released its Adobe Test Toolkit to cater to this requirement, however, the tool has not really matured
yet
The USP of PDF PerFORM in this space is the code generation facility, as well as the ability to directly fill an empty form
or extract content from a filled-in form through the GUI provided
| ©2012, Cognizant
Appendix
Lim
itatio
ns o
f PD
F Au
tom
ation
Pro
General:
Documents created by non-standard PDF writers may not be processed properly.
If a single word or a single line contains multiple font faces, the results may be unexpected.
The time taken to load the document for processing is directly proportional to the size of the document. Large
documents may take a long time to load.
PDF Probe:
Images cannot be compared. The recommended approach here is to use PDF Assist to extract the required images and
use any available image comparison algorithms.
Values in form fields like checkboxes, radio buttons, etc. cannot be compared, and the presence of such fields may
affect the accuracy of the comparison.
When images are present in the document, the line spacing comparison might be affected.
The comparison may be inaccurate if there are significant differences with respect to margin and line spacing between
the documents.
Split sections within documents are supported; however, the word wrapping must be similar across the source and
target documents.
Lim
itatio
ns o
f PD
F Au
tom
ation
Pro
PDF Probe (contd.):
Word documents with tables in headers cannot be compared.
If there is a text content deviation together with any other deviation like font size, color, etc., only the text deviation will
be highlighted in the tool’s HTML report. The Excel report, however, will capture all the differences.
Documents may not be compared properly if the font size of the words in the document is too small
Border lines ,underline, table borders may not be displayed in html report
The comparison may be inaccurate if the same content of source scattered in different position(page) of target
document.
PDF Probe does not support Page range for WORD-PDF.
WORD- PDF comparison’s performance is slower than PDF-PDF comparison.
Based on the coordinates retrieved by third party tool(Used internally for retrieving the PDF content), the html report
are generated. Therefore html report accuracy it depends on the quality of PDF.
Word Document with Image can give unexpected results
Tool will read the content line by line even though it is a table. It won’t read the values cell by cell or column by column.
Therefore if you find any text deviation in a line together with any other deviation like font size, color, etc., only the text
deviation will be highlighted in the tool’s HTML report and in Excel report
Lim
itatio
ns o
f PD
F Au
tom
ation
Pro
PDF Assist:
General
Images accessed using PDF Assist reflect the properties of the original image file, even if some of these
properties may have changed while embedding it into the document. For example:
The image may have been resized within the document, but PDF Assist will return only the original size
of the image.
The image may have been rotated by some angle while placing it into the document, but PDF Assist will
return the original orientation of the image.
In rare cases, words in upper case may be wrongly perceived by PDF Assist to be lower case.
API
Values in form fields like checkboxes and special characters cannot be extracted.
Split sections within documents are supported; however, the following points must be taken into consideration:
PDF Assist considers each line of text as one cutting across all the sections.
In some documents, the split sections may not be aligned equally on the horizontal plane, causing PDF
Assist to read each of the sectioned portions as a separate line.
In some cases, images in “.tiff” format may be recognized as “.png” images.
Though the API supports Java, it is not possible to use the API in platforms other than Windows.
Lim
itatio
ns o
f PD
F Au
tom
ation
Pro
UI
Certain documents may not load properly in the UI; however, this will not affect the working of the API. For
example:
If there is any text overlapping on top of an image, it may not be rendered properly.
If there is any text which is aligned vertically in the document, it will be rendered horizontally within the
UI.
For API functions which return an array, the UI generates code only for the first element in the array. This code
has to be extended if the user needs to access other elements of the array.
PDF PerFORM:
The API does not have any provision to obtain the page numbers under which each of the form fields are present
(unless the document contains bookmarks)
©2011, Cognizant
Thank you