+ All Categories
Home > Technology > Moving from Unstructured Documents to Structured XML

Moving from Unstructured Documents to Structured XML

Date post: 14-Dec-2014
Category:
Upload: scott-abel
View: 4,565 times
Download: 2 times
Share this document with a friend
Description:
Presented by Thomas Aldous at Documentation and Training West, May 6-9, 2008 in Vancouver, BCHave you thought about converting to XML, but were afraid it was to difficult? Have you talked to consultants who make the process seem long and expensive? Wondering if you should adopt a standard like DITA or go it alone?Well, if you have a laptop, Adobe FrameMaker 7.2 or Adobe FrameMaker 8, and some sample unstructured documents (Word or FrameMaker), we'll walk through the steps that it takes to convert Word and FrameMaker files to XML, using both a custom DTD and using DITA. We will also edit those documents with some of the industry’s leading XML editors.This session is all about getting you started without the hype.Whether you own FrameMaker or not, this session is a good starting place for those thinking of making the move to structured documentation.
50
Th Ald Thomas Aldous Integrated Technologies, Inc. [email protected]
Transcript
Page 1: Moving from Unstructured Documents to Structured XML

Th AldThomas AldousIntegrated Technologies, Inc.

[email protected]

Page 2: Moving from Unstructured Documents to Structured XML

Import Non Unstructured FrameMaker Documents into Structured FrameMaker

Clean Documents before structuring

Learn conversion rule syntaxLearn conversion rule syntax

Generate conversion table

Learn how to create conversion table from scratch

Structure current unstructured document

Structure group of unstructured files

Structure unstructured bookWhat Does it take to Export XML in FrameMakerParts of Application File

Page 3: Moving from Unstructured Documents to Structured XML

From Structured FrameMaker, open Non Unstructured FrameMaker Document

k h d f l f l d hTake the default filter and choose ConvertIf filter fails, open source document in native application and backward save to a lesserapplication and backward save to a lesser version.Retry Open in Structured FrameMakerRetry Open in Structured FrameMaker

Page 4: Moving from Unstructured Documents to Structured XML

With document open in Structured FrameMaker, Choose File, Save As – MIFClose Document. Reopen MIF Document in Structured FrameMakerStructured FrameMakerSave Document as FM (Binary FrameMaker)Hidden characters have been cleansedHidden characters have been cleansed.

Page 5: Moving from Unstructured Documents to Structured XML

Method OneManually – Element by Element

Page 6: Moving from Unstructured Documents to Structured XML

Two Ways to Wrap Unstructured Data

Start by wrapping subparagraph objects like text ranges and tablestables

Then wrap contents of paragraphs together in Para elements

f SThen wrap sequences of Head and Para elements in Section elements

And so on until entire document is wrapped in single highest-And so on until entire document is wrapped in single highest-level element

Page 7: Moving from Unstructured Documents to Structured XML

Method TwoAutomatically

Page 8: Moving from Unstructured Documents to Structured XML

Two Ways to Wrap Unstructured Data

Similar to adding structure manually

FrameMaker 7.x and 8 begins by applying rules to g y pp y gdocument objects below paragraph level

Then applies rules at paragraph level

Proceeds through successively higher levels

Stops when reaches single highest-level element or when no more rules can be applied

Page 9: Moving from Unstructured Documents to Structured XML

Usually created by application developer

Provides table of mappings to automate task of adding t t t t t d d tstructure to unstructured documents

Uses paragraph and character tags, and object types (such as equations or footnotes), to identify how to(such as equations or footnotes), to identify how to wrap document components in elements

Also specifies how to wrap child elements in parent elements

Page 10: Moving from Unstructured Documents to Structured XML

Regular table, with at least 3 columns and 1 body rowAdditi l l d h di /f ti h ldAdditional columns and heading/footing rows can hold commentsEach body row holds 1 rule

Column 1 Column 2 Column 3Specifies document object, child element or

Specifies element in which to wrap

Specifies optional qualifier (“nickname”) toelement, or

sequence to wrap

( nickname ) to use as temporary label.

Page 11: Moving from Unstructured Documents to Structured XML

Wrap this object In this element With this qualifier

P:Bullet E:Item Bull

• Conversion table can be split up into several tables with text or graphics in between for comments

• Cannot have any tables other than conversion tables

• Must be saved before it can be used

• Can be in structured or unstructured document

Page 12: Moving from Unstructured Documents to Structured XML

Method OneMethod OneFrameMaker 7.x and 8 Generates Initial Table

Page 13: Moving from Unstructured Documents to Structured XML

Use if you already have an unstructured document FrameMaker 7.x and 8 looks through body page flows and identifies every object that can be structuredLists object type and format tag (if any) used in document M bj lMaps object to elementElement tag named same as format tagIf object does not have format, element tag is default name such as CELL or BODYsuch as CELL or BODYRemoves parentheses and other characters to create valid element tagObject type identifier in lowercase is preprended to any j yp p p yduplicate element tag

Developer should add additional rules to wrap elements in higher level elementselements in higher-level elements

Page 14: Moving from Unstructured Documents to Structured XML

Method TwoMethod TwoCreate Conversion table from scratch

Page 15: Moving from Unstructured Documents to Structured XML

Wrap this object In this element With this qualifier

P:Head1 Head1P:Head2 Head2P:Head2 Head2P:Body BodyP:Code CodeSV:Current Date \(Long\)

CurrentDateLong

C:Code cCodeTC: CELLTR: ROW

Page 16: Moving from Unstructured Documents to Structured XML

1. Open document with objects you want to structure in Structured FrameMaker Interface

2. From File (FrameMaker 7.x ) menu, choose Developer Tools-> or StructureTools (FrameMaker 8), Generate Conversion Table

3. Generate Conversion Table dialog box appears

4. Turn on Generate New Conversion Table5. Click Generate

NoName conversion table appears with rules based on objects in document and element tags based on format tagsdocument and element tags based on format tags

Page 17: Moving from Unstructured Documents to Structured XML

Wrap this object

In this element

With this qualifier

P:ReportTitle ReportTitle

Wrap this Object

In this element With this qualifier

P:ReportTitle ReportTitle

P:ReportAuthor ReportAuthor

P:ReportPurpose ReportPurpose

P:BodyAfterHead ReportTitle

TH: HEADING

TB: BODY

TF: FOOTING

TR: ROWP:Body Body

P:Heading1 Heading1

P:Heading2 Heading2

P:Extract Extract

TC: CELL

P:Equation Equation

G: GRAPHICP:Extract Extract

P:Heading3 Heading3

P:Bulleted Bulleted

P:BulletedCont BulletedCont

F:flow FOOTNOTE

TT: TITLE

T:FormatA FormatATT: TITLE

P:HeadingRunIn HeadingRunIn

P:Numbered1 Numbered1

P:Numbered Numbered

P:NumberedCont NumberedCont

TH: HEADINGTB: BODYTF: FOOTINGTR: ROWTC: CELL

P:TableTitle TableTitle

P:CellBody CellBody

P:Figure Figure

TC: CELL

Page 18: Moving from Unstructured Documents to Structured XML

C t d t d i t t bl ith t l t th lCreate new document and insert table with at least three columns and one body row

Can add extra columns for comments

Can add extra heading and footing rows for comments

Can break table into multiple smaller tables

Insert one rule in each rowS filSave file

Page 19: Moving from Unstructured Documents to Structured XML

Updating a conversion table to get a more complete list of objects in current documentof objects in current document

Page 20: Moving from Unstructured Documents to Structured XML

Open document with objects you want to structure

From File (FrameMaker 7.x ) or menu, choose Developer Tools-> or StructureTools (FrameMaker 8) Generate Conversion Table

G t C i T bl di l bGenerate Conversion Table dialog box appears

Turn on Update Conversion Table

From Update Conversion Table popup menu choose generate toFrom Update Conversion Table popup menu, choose generate to update

Selected table may have been generated from a different document

Click Generate

Selected conversion table is updated

Page 21: Moving from Unstructured Documents to Structured XML

Case-Sensitivity in Tags

Special character (%) in Tags

A space character in TagsA space character in Tags

Wildcard character (%) in Tags

Page 22: Moving from Unstructured Documents to Structured XML

Format tags and element tags are case-sensitive and must be specified as defined in their catalogs

Qualifier tags are case-sensitive and two occurrences of one qualifier must match exactly

Page 23: Moving from Unstructured Documents to Structured XML

( ) & | , * + ? % [ ] : \

In format tags and qualifier tags allowed but must be preceded by backslash (\) in table

In element tags not allowed

Page 24: Moving from Unstructured Documents to Structured XML

Does not need to be preceded with backslashp

For example, you can write tag Format A

Page 25: Moving from Unstructured Documents to Structured XML

Use % as in format or element tag to match zero, one, or g , ,more characters (similar to * in general rule)

For example, P:%Body matches paragraphs with format tagFor example, P:%Body matches paragraphs with format tag Body, FirstBody, or BulletBody

Page 26: Moving from Unstructured Documents to Structured XML

In Column 1:In Column 1:◦ Type one- or two-letter code to identify type of item◦ Type format (optional) to narrow definition◦ Without format tag, FrameMaker 7.x and 8 finds all

objects of specified type not identified in other rules

Page 27: Moving from Unstructured Documents to Structured XML

For this object Use this Code Followed by optionalParagraph P: Paragraph format tagText range C: Character format tagText range C: Character format tagTable T: Table format tagTable title TT: (none)Table heading TH: (none)Table heading TH: (none)Table body TB: (none)Table row TR: (none)Table cell TC: (none)Table cell TC: (none)System variable SV: Variable format nameUser variable UV: Variable format nameGraphic G: (none)Graphic G: (none)

Footnote F: Location of footnote: Table or FlowMarker M: Marker typeCross-reference X: Cross-reference format Text Inset TI: (none)Equation Q: Size of equation: Small, Medium, or Large

Page 28: Moving from Unstructured Documents to Structured XML

Wrap this object In this element With this qualifierWrap this object In this element With this qualifierP:Body

C:ReportName

T:Format Part

TT:

TH:

TB:

TR:

TC:

SV:Current Date \(Long\)

UV:Customer

G:

F:Flow

M:Index

X:ElemNumTextPage

TI:

Q:LargeQ:Large

Page 29: Moving from Unstructured Documents to Structured XML

Type the object identifier E: (optional) in Column 2 followed by the element tagfollowed by the element tag.

Page 30: Moving from Unstructured Documents to Structured XML

Wrap this object In this element With this qualifierP:Body Para

C:ReportName Report C:ReportName

T:Format Part PartsTable T:Format Part

TT: TableTitle

TH: TableHeading

TB: TableBody

TR: PartsRow

TC: PartName

SV:Current Date \(Long\) Date

UV:Customer Customer

G: Graphic

F:Flow Footnote

M:Index IndexEntry

X:ElemNumTextPage XRef X:ElemNumTextPage

TI: Para TI:

Q:Large EQ Q:Large

Page 31: Moving from Unstructured Documents to Structured XML

Type qualifier (optional) for new element tag In Column 3

Th Q lifi i d i l l diff i l fThe Qualifier is used in later rules to differentiate elements of same name when wrapping into higher-level elements

Page 32: Moving from Unstructured Documents to Structured XML

Wrap this object In this element With this qualifier

P:Bullet Item BullP:Bullet Item BullP:Step Restart Item Step1P:Step Item Step

Page 33: Moving from Unstructured Documents to Structured XML

Type the Following into column one:Type the Following into column one:

Type E: for elementType element tagType qualifier (optional) in bracketsAdd more element tags with code identifiersAdd more element tags with code identifiersUse symbols to further describe sequence (same as in general rule in EDD)

Page 34: Moving from Unstructured Documents to Structured XML

Rule Syntax Identifying Sequence to Wrap

Symbol Meaning

y y g q p

Plus sign (+) Item is required and can occur more than once

Question mark (?) Item is optional and can occur onceAsterisk (*) Item is optional and can occur more

than onceComma (,) Items must occur in order given( ) gAmpersand (&) Items can occur in any orderVertical bar (|) Any one of items in sequence can

occurParentheses () Beginning and end of sequence

Page 35: Moving from Unstructured Documents to Structured XML

Wrap this object In this element With this qualifier

P:Bullet Item BullP:Bullet Item Bull

P:StepRestart Item Step1

P:Step Item Step

E It [B ll] Li tE:Item[Bull]+ List

E:Item[Step1], E:[Step]+

List

E H d (P | S tiE:Head, (Para | List)+

Section

Page 36: Moving from Unstructured Documents to Structured XML

Attributes are optional in Column 2

Type attribute name and value in brackets after elementType attribute name and value in brackets after element tag

Separate name and value with equal sign and encloseSeparate name and value with equal sign, and enclose value in double quotation marks

Page 37: Moving from Unstructured Documents to Structured XML

Wrap this object In this element With thisWrap this object In this element With this qualifier

P:Bullet Item Bull

P:StepRestart Item Step1P:StepRestart Item Step1

P:Step Item Step

E:Item[Bull]+ List

E:Item[Step1], E:[Step]+

List

E:Head, (Para | List)+

SectionList)+

Page 38: Moving from Unstructured Documents to Structured XML

Allows you to name table element from one or more child elements

Rather than naming it from table format tag (with T: identifier)

Type the following in Column One:◦ Type object identifier TE:◦ Followed by E:◦ Followed by element tag◦ Type qualifier (optional) in brackets

W thi bj t I thi l tWrap this object In this element With this qualifier

TE:E:TableTitle,E:TableBody

PartsTableE:TableBody

Page 39: Moving from Unstructured Documents to Structured XML

Allows you to break table or graphic out of its paragraph and promote it one levelone level

When user adds structure to document, table or graphic becomes child of paragraph with anchor

FrameMaker 7.x and 8 can break table or graphic out of its paragraph and promote element to be sibling of paragraphs

Type element tag for table or graphic and add keyword promote in parentheses after element tag

Wrap this object In this element With this qualifierT:Format A ProcedureTable (promote)

Page 40: Moving from Unstructured Documents to Structured XML

Allows you to tag: In Column One:

Instances when Paragraph or Character Designer was used to make formatting changes

◦ Add rule flag paragraph format overrides

◦ Add rule flag characterto make formatting changes without saving to catalog format

Add rule flag character format overrides

Adds attribute called Override with value Yes

Wrap this object In this element With this qualifierflag paragraph format overrides

flag character format overrides

Page 41: Moving from Unstructured Documents to Structured XML

Allows you to tag: To wrap untagged formatted text:

Instances when commands from Font, Size, and Style submenus in Format menu

In Column 1, add rule untagged character formattingIn Column 2, add element tag

were used and not character format at all

Adds catchall element and wraps text in it

Wrap this object In this element With this qualifieruntagged character formatting

UntaggedText

Page 42: Moving from Unstructured Documents to Structured XML

1 Open saved conversion table file1. Open saved conversion table file.

2. Open unstructured document.

3. In unstructured document, import element definitions from existing structured template or EDD

Makes elements available in Element Catalog

If you do not perform this step, next steps produce elements inIf you do not perform this step, next steps produce elements in Element Catalog defined by rules specified in conversion table

Can always import element definitions after generating structure

Page 43: Moving from Unstructured Documents to Structured XML

4. From File menu in unstructured document, choose Utilities-,>Structure Current Document

Structure Current Document dialog box appears

5. From Conversion Table Document popup menu, choose saved conversion table file

6. Click Add Structure

New NoName document appears with content wrapped into elements as defined in rules of conversion table

7. Validate, correct errors, save file

Page 44: Moving from Unstructured Documents to Structured XML

1. Place group of files in separate directory

If directory contains others files not needing structuring, give files to be structured unique extension

2. Open saved conversion table file2. Open saved conversion table file

3. From File menu, choose Utilities->Structure DocumentsStructure Documents dialog box appears

4. From Conversion Table Document popup menu, choose saved conversion table file

5. In Input Unstructured Files text box, type directory containing unstructured files or choose from Browse

Page 45: Moving from Unstructured Documents to Structured XML

6. Optionally, if files have unique extension, in Suffix text box, type extensionextensionOtherwise, all files in directory will be structured

7. In Output Structured Files text box, type directory for saving structured files or choose from Browse

8. Turn on Allow Existing Files to Be OverwrittenAs you add structure to documents resulting files might have same names asAs you add structure to documents, resulting files might have same names as

some existing files in directory specified for storing structured files

When on, overwrites older versions

When off skips over files with existing matching filenames and presents log fileWhen off, skips over files with existing matching filenames and presents log file

Page 46: Moving from Unstructured Documents to Structured XML

9. Click Add StructureStructured Documents Report log file appears, indicating progress“Operation completed normally” alert appearsOperation completed normally alert appears

10. Click OK to dismiss alertStructured files appear in output directory with filenames matching those in input directory.p y

11. Open each file and import element definitions from any existing structured template or EDDMakes elements in Element Catalog match those in structured template or EDD

12. Validate, correct errors, save files.

Page 47: Moving from Unstructured Documents to Structured XML

To structure book and all its component files:1. Open saved conversion table file2 O t t d b k2. Open unstructured book3. In unstructured book, import element definitions from any structured

template or EDDMakes elements available in Element Catalogg

If you do not perform this step, next steps produce elements in Element Catalog defined by rules specified in conversion table

Can always import element definitions after generating structureCan always import element definitions after generating structure

4. From File menu in book, choose Utilities-> Structure->Current BookStructure Book dialog box appears

5. From Conversion Table Document popup menu, choose saved conversion p p p ,table file

6. In Output Directory text box, type directory for saving structured files or choose from Browse

Page 48: Moving from Unstructured Documents to Structured XML

7 Turn on Allow Existing Files to Be Overwritten7. Turn on Allow Existing Files to Be OverwrittenAs you add structure to documents, resulting files might have same names as some existing files in specified directory for storing structured filesdirectory for storing structured filesWhen on, overwrites older versions

When off, skips over files with existing matching filenames d t l filand presents log file

8. Click Add StructureStructured book and files appear in output directory withStructured book and files appear in output directory with filenames matching those in input directory

9. Validate, correct errors, save

Page 49: Moving from Unstructured Documents to Structured XML

File, SaveAs XML or StructureTools, Convert Structured Documents

But Setup Files Need to be Setup FirstBut … Setup Files Need to be Setup First

Page 50: Moving from Unstructured Documents to Structured XML

Application NameDTDReadWriteRulesTemplateDocTypeXSLT P fXSLT PreferencesConditional Text Handling


Recommended