+ All Categories
Home > Documents > MHE - Consultants for Document and Datament Technologies The Document In The 21st Century William J....

MHE - Consultants for Document and Datament Technologies The Document In The 21st Century William J....

Date post: 26-Dec-2015
Category:
Upload: gerard-walton
View: 220 times
Download: 2 times
Share this document with a friend
Popular Tags:
96
MHE - Consultants for Document and Datament Technologies The Document In The 21st Century William J. “Bill” McCalpin MIT, LIT, CDIA, EDP Principal, MHE
Transcript

MHE - Consultants for Document and Datament Technologies

The Document In The 21st Century

William J. “Bill” McCalpin

MIT, LIT, CDIA, EDP

Principal, MHE

MHE - Consultants for Document and Datament Technologies

Who MHE Is...

MHE is the consulting firm which specializes in the transition of information both within and between the electronic printing, imaging, and

Internet environments.

MHE - Consultants for Document and Datament Technologies

Introduction

The Hegelian Dialectic

MHE - Consultants for Document and Datament Technologies

Thesis, Antithesis, SynthesisIn the philosophy of Hegel,

these words show the inevitable transition of thought, by contradiction and reconciliation, from an initial conviction to its opposite and then to a new, higher conception that involves but transcends both of them

MHE - Consultants for Document and Datament Technologies

The Hegelian Dialectic• Thesis: Most business have well-established,

productive legacy systems• Antithesis: XML is springing forth everywhere and

will replace most legacy systems• Synthesis: XML will be integrated with legacy

systems - enhancing some processes, changing many others, and eliminating some altogether

• In short, XML will change - not destroy - what you do

MHE - Consultants for Document and Datament Technologies

The Document In The 21th Century

MHE - Consultants for Document and Datament Technologies

What Is A Document?

• The American Heritage Dictionary defines a document as “information in writing placed on a medium such as paper, often used as a record.”

• Documents have been placed on clay tablets, gold leaf, animal skins, all types of paper, microfilm, optical storage, and so on

MHE - Consultants for Document and Datament Technologies

Information And Presentation

• In every case, the document represents a fundamental union of information and presentation

• But “presentation” presumes that the primary audience for the document is a human being

• With the coming of the Internet, this is no longer the case

MHE - Consultants for Document and Datament Technologies

The Curse Of Presentation• Composition

products require that you specify a printer, even before you know where the document will print

MHE - Consultants for Document and Datament Technologies

Why Are Print, Image, And Presentation Formats

Incompatible?

MHE - Consultants for Document and Datament Technologies

Printing And Imaging Formats

• Many printing formats: AFP, Metacode, DJDE, XES (UDK), PostScript, PCL, etc.

• All formats use external resources like fonts, forms, graphics, etc., although sometimes inconsistently

• Most are escape-sequence based, some are formal data architectures, and some are almost programming languages

MHE - Consultants for Document and Datament Technologies

Printing And Imaging Formats

• Many imaging formats - while most use CCITT Group 4 for image compression, most also have proprietary data wrappers

• Later systems adopted text-based formats such as PDF, although storing other print streams is not unknown

• Systems which store text-based formats must wrestle with resource issues

MHE - Consultants for Document and Datament Technologies

Different Print Formats• Why do printers have different formats?

Because of physical constraints imposed by the hardware:– resources reduce the amount of data sent through

pipeline to printer– pages must be imaged in less than a fraction of a

second– complex graphics can be developed on the

printer, but this needs a special language

MHE - Consultants for Document and Datament Technologies

Different Imaging Formats• Why do imaging systems have different

formats: because of physical constraints imposed by the hardware:– Mass storage was expensive

– Indexing schemes were too close to the application

– Text is avoided sometimes because of resource issues

– Interoperability with other products an issue

MHE - Consultants for Document and Datament Technologies

Result

• In each case, data architecture decisions were made in order to enhance some aspect of legibility of the stored objects.

• If there were no requirement to present the information (to a human reader), then the requirement for custom data formats for each vendor would probably disappear!

MHE - Consultants for Document and Datament Technologies

Information Exchanges

• B2C - business to consumer

• B2B - business to business

• B2B2C - business to business to consumer

• *2C requires presentation information

• B2B requires no presentation information, if the recipient is a process, not a person

MHE - Consultants for Document and Datament Technologies

Why B2B?

• NYSE (New York Stock Exchange)– Formerly, 100 million trades in a day was

considered very heavy– Now 1 billion trades a day is considered very

heavy– The difference is automation; the same

multiplier applies to B2B

• #1 effect of XML is the separation of information from presentation

MHE - Consultants for Document and Datament Technologies

The Nature Of XML

MHE - Consultants for Document and Datament Technologies

XML And SGML

• XML is eXtensible Markup Language

• XML is an instance of SGML, Standard Generalized Markup Language, an ISO standard (ISO 8879)

• XML is “extensible” because people and enterprises with common interests get together to define the tags which describe their data

MHE - Consultants for Document and Datament Technologies

XML And Print Formats

• In most print formats, something like an account number would be:– AMB 200 AMI 300 SCFL 01 STO 0, 90 TRN

12345-67890

• In XML, the same information is:<account_number>12345-67890</account_number>

MHE - Consultants for Document and Datament Technologies

XML and Image Formats

• Raster-based image formats contain only bitmaps

• To read the text data within the bitmap requires an OCR/ICR process, which can fail

• Most usable data is extracted from the document and placed in the index

MHE - Consultants for Document and Datament Technologies

XML And Electronic Formats

• The nature of all electronic presentation formats is to be focused on the presentation of the information.

• The nature of XML is focused on the “author’s content”, that is, information is described as what it is, not how it looks.

MHE - Consultants for Document and Datament Technologies

Separating Information From Presentation

• XML enables the total separation of information from presentation

• Thus, some XML objects have only tagged information, while others have content and presentation information

XML

XSL

XML

MHE - Consultants for Document and Datament Technologies

How To Relate XML to Everyman

• You might think that XML is too esoteric for most people to understand

• But XML is based on the basic human need exchanging information

• XML couples the communication skills we have used over the last several thousand years to modern, Internet technology

• So how can you understand it?

MHE - Consultants for Document and Datament Technologies

Communication Difficulty #1

• In order for any communication to take place, both parties must share the same fundamental mechanism which carries information

• For example, in writing, if a boy and girl don’t even share the same writing schemes, they can’t possibly understand...

MHE - Consultants for Document and Datament Technologies

Chinese Characters vs Latin Alphabet

“I Love You”

MHE - Consultants for Document and Datament Technologies

Underlying Structure of XML• Text characters• Tags are delimited by “<“ and “>”, i.e.

<xml>• Ending tags have “/”, e.g., </xml>• Parameters are indicated by double quotes,

e.g., <PAPER track="Application">• XML is a series of tags and data, e.g.,

<STATE>Texas</STATE>

MHE - Consultants for Document and Datament Technologies

Communication Difficulty #2

• Once both parties agree to the fundamental syntax, then both parties must next agree to the words to be used

• In the case of XML, how do both parties know that <STATE> means a political subdivision and not one of {gas,liquid,solid}?

MHE - Consultants for Document and Datament Technologies

A Date Gone Bad• One evening in the

hotel lobby bar, two young Italian men spend a while talking to an attractive Venezuelan girl...and her aunt

• They spoke Italian and she spoke Spanish, but they communicated passably

MHE - Consultants for Document and Datament Technologies

A Date Still Going Bad

• However, the aunt wanted to go up to her room with her niece

• The Italians wanted to take the young lady out dancing...

• So they asked her:

MHE - Consultants for Document and Datament Technologies

Oops

• What the boys said:

“Vuoi andare con noi ‘sta sera?

• What the young lady needed to hear:

“Quisieras ir con nosotros esta tarde?”

MHE - Consultants for Document and Datament Technologies

Miscommunication

• Even though Italian and Spanish use the same sounds, the same grammar, and have a common ancestry in Latin, some words are different

• Unfortunately, the most common words in both languages are likely to be the most different

MHE - Consultants for Document and Datament Technologies

The Cost Of Data Differences“NASA lost a $125

million Mars orbiter because one engineering team used metric units while another used English units for a key spacecraft operation...” CNN 9/30/99

MHE - Consultants for Document and Datament Technologies

XML “Words”• HTML has a certain

number of fixed tags - everyone knows what they are, but they can’t be augmented

• In XML, everyone can make up their own tags to suit their needs - but how do we avoid a Tower of CyberBabel?

MHE - Consultants for Document and Datament Technologies

Communication Difficulty #3

• Even when you agree to common tags, you still need to agree to a common understanding

• In XML, the Schema (now replacing the DTD) defines what tags are allowed to describe a particular collection of data

• For example, in the field of human relations, what is a “date”?

MHE - Consultants for Document and Datament Technologies

One DTD For A “Date”

• A woman thinks:– Invitation - formal

– Dress-up - nicely

– Eat out – dinner with wine at nice restaurant

– Entertainment – see a movie

– Private moment – good night kiss

• <!DOCTYPE Date [• <!ELEMENT Date (Invitation, Dress,

Meal, Entertainment+, Intimacy) >• <!ELEMENT Invitation (#PCDATA) >• <!ELEMENT Dress (#PCDATA) >• <!ELEMENT Meal (#PCDATA) >• <!ELEMENT Entertainment

(#PCDATA) >• <!ELEMENT Intimacy (#PCDATA) >

MHE - Consultants for Document and Datament Technologies

A Woman’s View Of A “Date”

<date>

<invitation>Telephone call</invitation>

<dress>Long dress</dress>

<meal>4-star restaurant</meal>

<entertainment>the theatre</entertainment>

<intimacy>A passionate, romantic kiss</intimacy>

</date>

MHE - Consultants for Document and Datament Technologies

Another DTD For A “Date”

• A man thinks:– Eat out – six-pack of

beer

– Private moment – necking

• <!DOCTYPE Date [• <!ELEMENT Date (Meal,Intimacy+) >• <!ELEMENT Meal (#PCDATA) >• <!ELEMENT Intimacy (#PCDATA) >

MHE - Consultants for Document and Datament Technologies

A Man’s View Of A “Date”

<date>

<meal>six-pack of beer</meal>

<intimacy>necking

</intimacy>

</date>

MHE - Consultants for Document and Datament Technologies

When Men And Women Agree

<date>

<invitation>Telephone call</invitation>

<dress>Long dress</dress>

<meal>4-star restaurant</meal>

<entertainment>the theatre</entertainment>

<intimacy>A passionate, romantic kiss</intimacy>

</date>

<date>

<invitation>Honking

</invitation>

<dress>Not the shirt he changed the oil in</dress>

<meal>food and beer</meal>

<entertainment>rent a video</entertainment>

<intimacy>A passionate, romantic kiss while necking</intimacy>

</date>

MHE - Consultants for Document and Datament Technologies

The Four Stages Of XML Evolution

MHE - Consultants for Document and Datament Technologies

The Evolution Of Technology

• Creation of basic technology

• Growth of technical tools

• Conversion of technology into business applications - the penetration into verticals

• Reduction to commodity

MHE - Consultants for Document and Datament Technologies

#1 Creation Of The Basic Technology Of XML

MHE - Consultants for Document and Datament Technologies

Creation Of Basic Technology

• In 1998, the World Wide Web Consortium declared XML to be a “recommendation”, that is, a world-wide standard

• This phase began in 1990 with the creation of the Web and browsers, and is now substantially complete

MHE - Consultants for Document and Datament Technologies

#2The Growth Of Technical Tools

MHE - Consultants for Document and Datament Technologies

Growth Of Technical Tools

• Once the underlying technology has been created, tools and utilities are built to use this technology

• These tools are often somewhat primitive and are not focused on the business problem

• This phase has been going furiously since 1998

MHE - Consultants for Document and Datament Technologies

The World Wide Web Consortium and XML

MHE - Consultants for Document and Datament Technologies

World Wide Web Consortium

• The World Wide Web Consortium was created in October 1994 to develop common protocols that promote the Web’s evolution and ensure its interoperability

• The W3C has more than 500 Member organizations from around the world

• The W3C has many roles

MHE - Consultants for Document and Datament Technologies

The Roles of the W3C

• Standards Body (XML and others)

• Software and Services

• Working Groups

• Initiatives

• Activities with other standards bodies

MHE - Consultants for Document and Datament Technologies

W3C and Standards

• XML• XSL• CSS1 & CSS2• DOM• HTML• MathML• PICS

• PNG• RDF• SMIL• SVG• XHTML• XPath, XPointer,

XML Base, Xlink• XML Schema

MHE - Consultants for Document and Datament Technologies

Standards

• XML (eXtensible Markup Language) is the universal format for structured documents and data on the Web. The base specifications are XML 1.0 Feb '98, and Namespaces, Jan '99.

MHE - Consultants for Document and Datament Technologies

Standards (Cont.)

• XSL (eXtensible Style Sheets)– XSL is a language (in XML) for expressing

stylesheets. It consists of two parts:• XSL Transformations (XSLT): a language for

transforming XML documents

• An XML vocabulary for specifying formatting semantics (XSL Formatting Objects)

MHE - Consultants for Document and Datament Technologies

Standards (Cont.)

• CSS (Cascading Style Sheets) CSS1 and CSS2 describe how documents are presented on screens, in print, or perhaps how they are pronounced

• Authors and readers can influence the presentation of documents without sacrificing device-independence or adding new HTML tags

MHE - Consultants for Document and Datament Technologies

Standards (Cont.)

• CSS3 is now a Working Draft

• The main purpose of CSS3 is to modularize the specification, so that dozens of changes don’t have to be “shove(d) ... into a single monolithic specification”

• Devices which are constrained (such as an aural browser) can choose to support only certain modules instead of all of CSS.

MHE - Consultants for Document and Datament Technologies

Why Two Style Sheet Languages?

Style Sheet Format CSS XSL

Can be used with HTML? Yes No

Can be used with XML? Yes Yes

Transformation language? No Yes

Syntax CSS XML

MHE - Consultants for Document and Datament Technologies

Standards (Cont.)

• DOM (Document Object Model)– a standard API to the document structure and

aims to make it easy for programmers to access components of a document and delete, add or edit their content, attributes and style.

• HTML (HyperText Markup Language)– The current language of the Internet, which is

being redefined as XHTML 1.0

MHE - Consultants for Document and Datament Technologies

Standards (Cont.)

• MathML (Mathematical Markup Language)– provides a much needed foundation for the inclusion

of mathematical expressions in Web pages.

• PICS – Platform for Internet Content Selection– The PICS specification enables labels (metadata) to

be associated with Internet content. It was originally designed to help parents and teachers control what children access on the Internet.

MHE - Consultants for Document and Datament Technologies

Standards (Cont.)

• PNG – Portable Network Graphics – a patent-free replacement for GIF and many

common uses of TIFF

• RDF – Resource Description Framework – provide a lightweight metadata system to

support the exchange of knowledge on the Web.

MHE - Consultants for Document and Datament Technologies

Standards (Cont.)

• SMIL – Synchronized Multimedia Integration Language – for television-like multimedia on the Web

• SVG – Scalable Vector Graphics – SVG is a language for describing two-

dimensional graphics in XML

MHE - Consultants for Document and Datament Technologies

Standards (Cont.)

• XHTML – eXtensible HyperText Markup Language– What is the difference between XHTML 1.0,

XHTML Basic and XHTML 1.1?• XHTML 1.0 = HTML 4.01

• XHTML Basic - subset for mobile apps

• XHTML 1.1 - modularized tags to help support other applications

MHE - Consultants for Document and Datament Technologies

Standards (Cont.)• XPath, XPointer, XML Base, Xlink

– defines linking, pointers, base URIs, etc.

• XML Schema – offers facilities for describing the structure and

constraining the contents of XML 1.0 documents– The major difference between DTDs and Schemas is

that Schemas allow better data typing (and Schemas are in XML)

– Became a recommendation on May 2, 2001

MHE - Consultants for Document and Datament Technologies

Software and Services

• Amaya - W3C's Editor/Browser– Amaya is a browser/authoring tool that allows

you to publish documents on the Web. – From http://www.w3.org/Amaya/

• CSS Validator - W3C CSS Validation Service– At http://jigsaw.w3.org/css-validator/

MHE - Consultants for Document and Datament Technologies

Software and Services (cont.)• HTML Tidy

– Tidy is a utility which is able to fix up a wide range of HTML problems.

– From http://www.w3.org/People/Raggett/tidy/

• HTML Validator – It checks HTML documents for conformance to

W3C HTML and XHTML Recommendations and other HTML standards.

– From http://validator.w3.org/

MHE - Consultants for Document and Datament Technologies

Software and Services (cont.)• Jigsaw – W3C’s Java Server

– Jigsaw is W3C's leading-edge Web server platform, providing a sample HTTP 1.1 implementation on top of an advanced architecture implemented in Java. From http://www.w3.org/Jigsaw/

• Libwww – Libwww is a highly modular, general-purpose client

side Web API written in C for Unix and Windows (Win32). From http://www.w3.org/Library/

MHE - Consultants for Document and Datament Technologies

Working Groups• CC/PP – Composite Capabilities/Preference

Profiles– Automating the way in which your agent (PC, cell

phone, PDA) identifies its capabilities and preferences

• Device Independence Activity – These Groups are working towards making the

information of the World Wide Web accessible to various devices and achieving Web device independent authoring.

MHE - Consultants for Document and Datament Technologies

Working Groups (cont.)

• Internationalization Working Group and Internationalization Interest Group– These groups promote the use of Unicode in

other recommendations and activities

• Micropayments – The Internet enables commerce in intangibles

(like information), but conventional payment methods are too expensive for this

MHE - Consultants for Document and Datament Technologies

Working Groups (cont.)

• XForms - Interactive forms in XML

• XML Encryption - encrypting/decrypting XML documents and their contents

• XML Protocol - using XML as an encapsulation language in communications

• XML Query - enabling collections of XML files to be accessed like databases

MHE - Consultants for Document and Datament Technologies

Working Groups (cont.)

• Voice Browser Activity– This group has created a number of working

drafts, such as on a Speech Recognition Grammar and a Speech Synthesis Markup Language

– The W3C working group is basing its proposal for Dialog Markup Language on VoiceXML, from the VoiceXML Forum (www.voicexml.org), which is an IEEE group

MHE - Consultants for Document and Datament Technologies

Initiatives

• Web Accessibility Initiative (WAI) – These guidelines explain how to make Web

content accessible to people with disabilities

• P3P - Platform for Privacy Preference – P3P is an industry standard providing a simple,

automated way for users to gain more control over the use of personal information on Web sites they visit.

MHE - Consultants for Document and Datament Technologies

Where Can I find...?• Each of the preceding items can be found

(today) at www.w3c.org• Everyone should check here periodically to

obtain updates• Members can participate in projects and

setting standards• www.xml.com is a commercial site with a

newsletter and a huge amount of educational material

MHE - Consultants for Document and Datament Technologies

#3 Conversion Of Technology Into

Business Applications

MHE - Consultants for Document and Datament Technologies

XML In The Verticals

• The next step in the evolution of XML is the integration of XML objects into the processes of “verticals”, e.g., insurance, telecommunications, banking, finance, etc.

• In each vertical, groups will come together to create standards for that vertical

• This phase is just beginning in most verticals

MHE - Consultants for Document and Datament Technologies

The Insurance Vertical

• ACORD (www.acord.org) is a well-known body in the insurance vertical

• ACORD, the Association for Cooperative Operations Research and Development, describes itself as “the insurance industry's nonprofit standards developer”

• ACORD initially developed standard forms to enable information sharing in the vertical

MHE - Consultants for Document and Datament Technologies

ACORD And P&C• “In the Property and Casualty business, the main

driver to the Internet is the real-time exchange of data between producers, carriers, rating bureaus, service providers, and more.”

• “The ACORD XML standard is designed to address the real-time requirement by defining P&C transactions that include both a request and a response message.”

• from http://www.acord.org/xml_frame.htm

MHE - Consultants for Document and Datament Technologies

#4Reduction To Commodity

MHE - Consultants for Document and Datament Technologies

Reduction To Commodity

• In the last phase, the “technology” disappears from the view of the user

• Older technologies are invisibly replaced with the newer technology, e.g. EDI by XML

• Users perform business-oriented tasks without being aware of underlying technology

MHE - Consultants for Document and Datament Technologies

Past Progressions - Example #1

• #1 - Computer chips

• #2 - assembler

• #3 - COBOL, Fortran, PL/I, C, and a host of 3rd generation languages

• #4 - GUI-based code generators

• We are now well into phase #4

MHE - Consultants for Document and Datament Technologies

Past Progressions - Example #2

• #1 - Laser printer

• #2 - FDL (Xerox), PPFA (IBM), etc.

• #3 - Business-user friendly composition and formatting tools

• #4 - GUI-based products with multiple, transparent drivers

• We are now in phase #4

MHE - Consultants for Document and Datament Technologies

The Growth Of The XML Bubble

MHE - Consultants for Document and Datament Technologies

PolicyPrint

Reports

1:1Mark.

Billing

EDI

Com-pliance

CampaignManage.

CRM

Pol. &Proc.

Archive

Notices

New Sales

HR

Reprints

MHE - Consultants for Document and Datament Technologies

PolicyPrint

Reports

1:1Mark.

Billing

EDI

Com-pliance

CampaignManage.

CRM

Pol. &Proc.

Archive

Notices

New Sales

HR

Reprints

XML

EBPP

MHE - Consultants for Document and Datament Technologies

PolicyPrint

Reports

1:1Mark.

Billing

EDI

Com-pliance

CampaignManage.

CRM

Pol. &Proc.

Archive

Notices

New Sales

HR

Reprints

XML

Bubble

EBPP

MHE - Consultants for Document and Datament Technologies

PolicyPrint

Reports

1:1Mark.

Billing

EDI

Com-pliance

CampaignManage.

CRM

Pol. &Proc.

Archive

Notices

New Sales

HR

Reprints

XML

Bubble

EBPP

MHE - Consultants for Document and Datament Technologies

PolicyPrint

Reports

1:1Mark.

Billing

EDI

Com-pliance

CampaignManage.

CRM

Pol. &Proc.

Archive

Notices

New Sales

HR

Reprints

XML

Bubble

EBPP

MHE - Consultants for Document and Datament Technologies

PolicyPrint

Reports

1:1Mark.

Billing

EDI

Com-pliance

CampaignManage.

CRM

Pol. &Proc.

Archive

Notices

New Sales

HR

Reprints

XML

Bubble

EBPP

MHE - Consultants for Document and Datament Technologies

PolicyPrint

Reports

1:1Mark.

Billing

EDI

Com-pliance

CampaignManage.

CRM

Pol. &Proc.

Archive

Notices

New Sales

HR

Reprints

XML

Bubble

EBPP

MHE - Consultants for Document and Datament Technologies

Today’s Billing Process

BillingExtract

Print/Format

DataBase

PostProcess

MHE - Consultants for Document and Datament Technologies

Today’s Billing Process + XML

BillingExtract

Print/Format

DataBase

PostProcess

XMLApp.

MHE - Consultants for Document and Datament Technologies

As the Bubble Grows

BillingExtract

Print/Format

DataBase

PostProcess

XMLApp.

MHE - Consultants for Document and Datament Technologies

Driver

XMLApplicationswith business rules

Driver

Driver

DriverEmail

MHE - Consultants for Document and Datament Technologies

Composition Systems Before XML - #1

Database

Business Rules

Compo-sition

MHE - Consultants for Document and Datament Technologies

Composition Systems Before XML - #2

Database

Business Rules

Compo-sition

MHE - Consultants for Document and Datament Technologies

Compo-sition

XMLApplicationswith business rules

Driver

Driver

DriverEmail

Business Rules

MHE - Consultants for Document and Datament Technologies

The Effect on Complex Systems• Over time, simple tools became complex

systems• Due to competition, these systems added

functionality beyond the core product• The XML Bubble will cause these systems to

split again• Much of the added functionality was and will

be vertically specific, and fall into the XML Bubble

MHE - Consultants for Document and Datament Technologies

Reference• www.w3c.org - the official World Wide Web

Consortium site (you’ll find links to the XML spec here)

• http://www.w3.org/XML/ - a long but not exhaustive list of XML sites, software, and information

• “Taming The Web With XML” - an entry level article describing XML at http://www.mhe-consulting.com/writep1.html

MHE - Consultants for Document and Datament Technologies

William J. “Bill” McCalpin

MIT, LIT, CDIA, EDP

Principal, MHE

1400 Cheyenne Dr.

Richardson, Texas 75080-3921

972-231-3660 (v) 972-690-4521 (f)

[email protected]


Recommended