+ All Categories
Home > Documents > XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold [email protected]

XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold [email protected]

Date post: 31-Mar-2015
Category:
Upload: selena-beller
View: 213 times
Download: 1 times
Share this document with a friend
Popular Tags:
97
XML Basics XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold [email protected] http://metalab.unc.edu/xml/ slides/
Transcript
Page 1: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

XML BasicsXML Basics

Wednesday May 12, 1999 SD99

Copyright 1999 Elliotte Rusty Harold

[email protected]

http://metalab.unc.edu/xml/slides/

Page 2: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

What is XML?What is XML?

• Extensible Markup Language

• A syntax for documents

• A Meta-Markup Language

• A Structural and Semantic language, not a formatting language

• Not just for Web pages

Page 3: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

XML is a Meta Markup XML is a Meta Markup LanguageLanguage

• Not like HTML, troff, LaTeX

• Make up the tags you needs as you need them

• The tags you create can be documented in a Document Type Definition (DTD)

• A meta syntax for domain-specific markup languages like MusicML, MathML, and CML

Page 4: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

XML describes structure and XML describes structure and semantics, not formattingsemantics, not formatting

• XML documents form a tree

• Element and attribute names reflect the kind of the element

• Formatting can be added with a style sheet

Page 5: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

A Song Description in HTMLA Song Description in HTML

<dt>Hot Cop<dd> by Jacques Morali, Henri Belolo, and Victor Willis

<ul><li>Producer: Jacques Morali<li>Publisher: PolyGram Records<li>Length: 6:20<li>Written: 1978<li>Artist: Village People</ul>

Page 6: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

A Song Description in XMLA Song Description in XML

<SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST></SONG>

Page 7: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Style Sheets provide Style Sheets provide formattingformatting

SONG {display: block}TITLE {display: block; font-family: Helvetica, serif; font-size: 20pt; font-weight: bold}COMPOSER {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt; font-style: italic}ARTIST {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt; font-weight: bold; font-style: italic}PUBLISHER {display: block; font-size: 14pt; font-family: Times, Times New Roman, serif}LENGTH {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt}YEAR {display: block; font-family: Times, Times New Roman, serif; font-size: 14pt}

Page 8: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Attaching style sheets to Attaching style sheets to documentsdocuments

• Processing Instruction<?xml-stylesheet type="text/css" href="song.css"?>

• Converter Program

Page 9: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

What is XML used for?What is XML used for?

• Domain-Specific Markup Languages

• Self-Describing Data

• Interchange of Data Among Applications

• Structured and Integrated Data

Page 10: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Domain-Specific Markup Domain-Specific Markup LanguagesLanguages

• Non proprietary format

• Don’t pay for what you don’t use

Page 11: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Self-Describing DataSelf-Describing Data

• Much data is lost due to format problems

• XML is very simple

• XML is self-describing

• XML is well documented

Page 12: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

<PERSON ID="p1100" SEX="M"> <NAME> <GIVEN>Judson</GIVEN> <SURNAME>McDaniel</SURNAME> </NAME> <BIRTH> <DATE>21 Feb 1834</DATE> </BIRTH> <DEATH> <DATE>9 Dec 1905</DATE> </DEATH></PERSON>

Page 13: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Interchange of Data Among Interchange of Data Among ApplicationsApplications

• E-commerce

• Syndication

Page 14: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Structured and Integrated Structured and Integrated DataData

• Can specify relationships between elements

• Can assemble data from multiple sources

Page 15: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

XML ApplicationsXML Applications

• A specific markup language uses the XML meta-syntax is called an XML application

• Different XML applications have their own more constricted syntaxes and vocabularies within the broader XML syntax

• Further syntax can be layered on top of this; e.g. data typing through DCDs or other schemas

Page 16: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Example XML ApplicationsExample XML Applications

• Web Pages

• Mathematical Equations

• Music Notation

• Vector Graphics

• Metadata

• and more…

Page 17: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Mathematical Markup LanguageMathematical Markup Language

Page 18: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Channel Definition FormatChannel Definition Format

<?xml version="1.0"?><CHANNEL HREF="http://metalab.unc.edu/xml/index.html"> <TITLE>Cafe con Leche</TITLE> <ITEM HREF="http://metalab.unc.edu/xml/books.html"> <TITLE>Books about XML</TITLE> </ITEM> <ITEM HREF="http://metalab.unc.edu/xml/tradeshows.html"> <TITLE>Trade shows and conferences about XML</TITLE> </ITEM> <ITEM HREF="http://metalab.unc.edu/xml/lists.htm"> <TITLE>Mailing Lists dedicated to XML</TITLE> </ITEM></CHANNEL>

Page 19: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Classic LiteratureClassic Literature

• The Complete Plays of Shakespeare

• The Bible

• The Koran

• The Book of Mormon

Page 20: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Vector GraphicsVector Graphics

• Vector Markup Language (VML)– Internet Explorer 5.0

– Microsoft Office 2000

• Scalable Vector Graphics (SVG)

Page 21: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

The Resource Description The Resource Description Framework (RDF)Framework (RDF)

• Meta-data

• Dublin Core

• Better Web searching

Page 22: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

An Example of RDFAn Example of RDF

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:dc="http://purl.org/DC/> <rdf:Description about="http://metalab.unc.edu/xml/>

<dc:CREATOR>Elliotte Rusty Harold</dc:CREATOR>

<dc:TITLE>Cafe con Leche</dc:TITLE> </rdf:Description></rdf:RDF>

Page 23: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

XML for XMLXML for XML

• XSL: The Extensible Stylesheet Language

• DCD: The Document Content Description Schema Language

• XLL: The Extensible Linking Language

Page 24: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

XSL: The Extensible XSL: The Extensible Stylesheet LanguageStylesheet Language

• XSL Transformations

• XSL Formatting Objects

Page 25: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

DCD: The Document Content DCD: The Document Content Description Schema Description Schema LanguageLanguage

• Data Typing in XML is Weak

• <MONTH>9</MONTH>

<DCD> <ElementDef Type="MONTH" Model="Data" Datatype="i1" Min="1" Max="12" /></DCD>

Page 26: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

XLL: The Extensible Linking XLL: The Extensible Linking LanguageLanguage• Any element can be a link

• Links can be bi-directional

• Links can be separated from the documents they connect

<footnote xlink:form="simple" href="footnote7.xml">7</footnote>

Page 27: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

File Formats, In-house File Formats, In-house applications, and other behind applications, and other behind the scenes usesthe scenes uses• Microsoft Office 2000

• Federal Express Web API

• Netscape What’s Related

Page 28: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Hello XMLHello XML

<?xml version="1.0" standalone="yes"?><FOO>Hello XML!</FOO>

• Plain ASCII or UTF-8 text

• .xml is standard file extension

• Any standard text editor will work

Page 29: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

The XML DeclarationThe XML Declaration

• version attribute– required

– always has the value 1.0

• standalone attribute– yes

– no

• encoding attribute– UTF-8

– 8859_1

– etc.

<?xml version="1.0" standalone="yes"?>

Page 30: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

The FOO elementThe FOO element

• Start tag <FOO>

• Contents "Hello XML!"

• End tag </FOO>

<FOO>Hello XML!</FOO>

Page 31: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

greeting.xmlgreeting.xml

<?xml version="1.0" standalone="yes"?><GREETING>Hello XML!</GREETING>

Page 32: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Style sheetsStyle sheets

• Separate from the XML document• Different Languages

– Cascading Style Sheets Level 1 (CSS1)Internet Explorer 5.0Mozilla 5.0

– Cascading Style Sheets Level 2 (CSS2)Internet Explorer 5 (partial)Mozilla 5.0 (partial)

– Extensible Style Language (XSL)Internet Explorer 5.0 (older draft, buggy)LotusXSL, XT, Other non-browser converters

– Document Style and Semantics Language (DSSSL)Jade

Page 33: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

xml-stylesheetxml-stylesheet• Style sheets are attached via an xml-stylesheet processing instruction in the prolog

<?xml version="1.0" standalone="yes"?><?xml-stylesheet type="text/css" href="greeting.css"?>

<GREETING>Hello XML!</GREETING>

– type attribute has the value text/css or text/xsl

– href attribute is a URL to the stylesheet, possibly relative

• Can also use non-browser converters like XT, LotusXSL, and Jade

Page 34: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

greeting.cssgreeting.css

GREETING {display: block; font-size: 24pt; font-weight: bold}

Page 35: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

A larger example: Baseball A larger example: Baseball statisticsstatistics

• Examine the data

• Design a vocabulary for the data

• Write a style sheet

Page 36: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Sample statisticsSample statisticshttp://cbs.sportsline.com/u/baseball/mlb/stats.htm

Page 37: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Organizing the DataOrganizing the Data

• XML documents are trees.

• XML elements contain other elements as well as text

• Within these limits there's more than one way to organize the data

– Hierarchically

– Relationally

– Objects

Page 38: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

What is the Root ElementWhat is the Root Element

• The League?

• The Season?

• A custom Document element?

Page 39: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

The Root ElementThe Root Element

<?xml version="1.0"?><SEASON></SEASON>

• Choose SEASON for the root element

• Everything else will be a descendant of SEASON

• This is not the only possible choice

Page 40: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

What are the Immediate What are the Immediate Children of The root?Children of The root?

• Leagues?

• Teams?

• Players?

• Games?

Page 41: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Child ElementsChild Elements

<?xml version="1.0"?><SEASON> <YEAR> 1998 </YEAR></SEASON>

Page 42: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

White space in XML is not White space in XML is not especially significantespecially significant

<?xml version="1.0"?>

<SEASON><YEAR>1998</YEAR></SEASON>

Page 43: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

LeaguesLeagues

• Major league baseball is divided into two leagues

• Each league has– a name

– three divisions

Page 44: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

DivisionsDivisions

• Each division has– name

– 4-6 teams

Page 45: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

TeamsTeams

• Each team has– Name

– City

– Players

Page 46: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Player DataPlayer Data

• Each player has– First name

– Last name

– Position

– Statistics

Page 47: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Player Batting StatisticsPlayer Batting Statistics

• G Games Played• GS Games Started• AB At Bats• R Runs• H Hits• 2B Doubles• 3B Triples• HR Home Runs• RBI Runs Batted In

• SB Stolen Bases• CS Caught Stealing• SH Sacrifice Hits• SF Sacrifice Flies• Err Errors• PB Pitcher Balked• BB Base on Balls

(Walks)• SO Strike Outs• HBP Hit By Pitch

Page 48: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

What does a player look likeWhat does a player look like

• Long names vs. short names

Page 49: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

The Complete 1998 Major The Complete 1998 Major LeagueLeague

• Long version

Page 50: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

A Style SheetA Style Sheet

• 1998shortstats.xml

• baseballstats.css

• <?xml-stylesheet type="text/css" href="baseballstats.css"?>

• styled1998shortstats.xml

Page 51: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Cascading Style SheetsCascading Style Sheets

• Partially supported by Mozilla and IE 5.0

• Full W3C Recommendation

Page 52: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

The Default RuleThe Default Rule

• Not every element needs a rule

• The root element should be at least display: block

SEASON { font-size: 14pt; background-color: white; color: black; display: block}

Page 53: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

A style rule for the YEAR A style rule for the YEAR elementelement

• Make it look like a title

YEAR { display: block; font-size: 32pt; font-weight: bold; text-align: center}

Page 54: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Style Rules for Division and Style Rules for Division and League NamesLeague Names

LEAGUE_NAME { display: block; text-align: center; font-size: 28pt; font-weight: bold}

DIVISION_NAME { display: block; text-align: center; font-size: 24pt; font-weight: bold}

Page 55: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Alternate Style Rules for Alternate Style Rules for Division and League NamesDivision and League Names

LEAGUE_NAME, DIVISION_NAME { display: block; text-align: center; font-weight: bold}LEAGUE_NAME {font-size: 28pt }DIVISION_NAME {font-size: 24pt }

Page 56: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Style Rules for TeamsStyle Rules for Teams• Team name and Team city must be one title

• Must be inline elements

• Previous and following must be block elements

TEAM_CITY { font-size: 20pt; font-weight: bold; font-style: italic}

TEAM_NAME { font-size: 20pt; font-weight: bold; font-style: italic}

TEAM, PLAYER {display: block}

Page 57: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Style Rules for PlayersStyle Rules for PlayersTEAM {display: table}TEAM_CITY {display: table-caption}TEAM_NAME {display: table-caption}PLAYER {display: table-row}

SURNAME, GIVEN_NAME, POSITION, GAMES, GAMES_STARTED, AT_BATS, RUNS, HITS, DOUBLES, TRIPLES, HOME_RUNS, RBI, STEALS,CAUGHT_STEALING, SACRIFICE_HITS, SACRIFICE_FLIES, ERRORS, WALKS, STRUCK_OUT, HIT_BY_PITCH {display: table-cell}

Page 58: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Finished Style SheetFinished Style Sheet

SEASON {font-size: 14pt; background-color: white; color: black; display: block}YEAR {display: block; font-size: 32pt; font-weight: bold; text-align: center}LEAGUE_NAME {display: block; text-align: center; font-size: 28pt; font-weight: bold}DIVISION_NAME {display: block; text-align: center; font-size: 24pt; font-weight: bold}TEAM_CITY {font-size: 20pt; font-weight: bold; font-style: italic}TEAM_NAME {font-size: 20pt; font-weight: bold; font-style: italic}TEAM {display: block}PLAYER {display: block}

Page 59: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Possible ExtensionsPossible Extensions• There should be captions like "RBI" or

"At Bats.”

• Derived numbers like batting averages are not included.

• The titles are short. E.g. "1998" instead of "1998 Major League Baseball".

• The document is so long it's hard to read. Something similar to IE5's collapsible outline view would be nice.

• Pitcher stats should be separated from batter stats.

Page 60: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Possible SolutionsPossible Solutions

• CSS Level 2

• XSL

• XSL + JavaScript

Page 61: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Well-formedness RulesWell-formedness Rules• Open and close all tags

• Empty tags end with />

• There is a unique root element

• Elements may not overlap

• Attribute values are quoted

• < and & are only used to start tags and entities

• Only the five predefined entity references are used

Page 62: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Open and close all tagsOpen and close all tags

Page 63: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Empty tags end with Empty tags end with />/>

• <BR/>, <HR/>, and <IMG/> instead of <BR>, <HR>, and <IMG>

• Web browsers deal inconsistently with these

• Can use <BR></BR> <HR></HR> <IMG></IMG> instead

Page 64: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

There is a unique root There is a unique root elementelement

• One element completely contains all other elements of the document

• This is HTML in HTML files

• XML Declaration is not an element

<?xml version="1.0" standalone="yes"?><GREETING>Hello XML!</GREETING>

Page 65: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Elements may not overlapElements may not overlap

• If an element contains a start tag for an element, it must also contain the corresponding end tag

• Empty elements may appear anywhere

• Every non root element has a parent element

Page 66: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Attribute values are quotedAttribute values are quoted

• Good: – <A

HREF="http://metalab.unc.edu/xml/">

• Bad: – <A

HREF=http://metalab.unc.edu/xml/>

Page 67: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

<< and and && are only used to start are only used to start tags and entitiestags and entities

• Good: <H1>O'Reilly &amp; Associates</H1>

• Bad: <H1> O'Reilly & Associates</H1>

• Good: – <CODE>for (int i = 0; i &lt;= args.length; i++ ) { </CODE>

• Bad: – <CODE>for (int i = 0; i <= args.length; i++ ) { </CODE>

Page 68: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Only the five predefined Only the five predefined entity references are usedentity references are used

• Good: – &amp;

– &lt;

– &gt;

– &quot;

– &apos;

• Bad:– &copy;

– &reg;

– &tm;

– &alpha;

– &eacute;

– &nbsp;

– etc.

Page 69: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

DTDs and ValidityDTDs and Validity

• A Document Type Definition describes the elements and attributes that may appear in a document

• Validation compares a particular document against a DTD

• Well-formedness is a prerequisite for validity

Page 70: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

What is a DTD?What is a DTD?

• a list of the elements, tags, attributes, and entities contained in a document, and their relationship to each other

• internal vs. external DTDs

Page 71: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

The importance of validationThe importance of validation

• Ensures that data is correct before feeding it into a program

• Ensure that a format is followed

• Establish what must be supported

• Not all documents need to be valid; sometimes well-formed is enough

Page 72: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

A DTD for greeting.xmlA DTD for greeting.xml

• greeting.xml:<?xml version="1.0"?><GREETING>Hello XML!</GREETING>

• greeting.dtd:

<!ELEMENT GREETING (#PCDATA)>

Page 73: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Document Type DeclarationsDocument Type Declarations<?xml version="1.0"?><!DOCTYPE GREETING SYSTEM "greeting.dtd">

<GREETING>Hello XML!</GREETING>

• specifies the root element

• gives a URL for the DTD

Page 74: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Invalid DocumentsInvalid Documents• Valid:

<GREETING>various random text but no markup</GREETING>

• Invalid: anything else including<GREETING> <sometag>various random text</sometag> <someEmptyTag/></GREETING>– or<GREETING> <GREETING>various random text</GREETING>

</GREETING>

Page 75: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Validating ToolsValidating Tools

• Command line programs like XJParse

• Online validators– http://www.stg.brown.edu/service/

xmlvalid/

– http://www.cogsci.ed.ac.uk/%7Erichard/xml-check.html

• Browsers

Page 76: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Element DeclarationsElement Declarations

• Each tag must be declared in a <!ELEMENT> declaration.

• A <!ELEMENT> declaration gives the name and content model of the element

• The content model uses a simple regular expression-like grammar to precisely specify what is and isn't allowed in an element

Page 77: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Content SpecificationsContent Specifications

• ANY

• #PCDATA

• Sequences

• Choices

• Mixed Content

• Modifiers

• Empty

Page 78: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

ANYANY

<!ELEMENT SEASON ANY>

• A SEASON can contain any child element and/or raw text (parsed character data)

Page 79: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

#PCDATA#PCDATA

<!ELEMENT YEAR (#PCDATA)>

• Parsed Character Data; i.e. raw text, no markup

Page 80: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

#PCDATA#PCDATA

• Valid:<YEAR>1999</YEAR><YEAR>99</YEAR><YEAR>1999 C.E.</YEAR><YEAR> The year of our Lord one thousand, nine hundred, and ninety-nine

</YEAR>

• Invalid:<YEAR><MONTH>January</MONTH><MONTH>February</MONTH><MONTH>March</MONTH><MONTH>April</MONTH><MONTH>May</MONTH><MONTH>June</MONTH><MONTH>July</MONTH><MONTH>August</MONTH><MONTH>September</MONTH><MONTH>October</MONTH><MONTH>November</MONTH><MONTH>December</MONTH></YEAR>

Page 81: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Child ElementsChild Elements

• To declare that a LEAGUE element must have a LEAGUE_NAME child:

<!ELEMENT LEAGUE (LEAGUE_NAME)>

<!ELEMENT LEAGUE_NAME (#PCDATA)>

Page 82: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

SequencesSequences

• Separate multiple required child elements with commas; e.g.

<!ELEMENT SEASON (YEAR, LEAGUE, LEAGUE)>

<!ELEMENT LEAGUE (LEAGUE_NAME, DIVISION, DIVISION, DIVISION)>

Page 83: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

One or More Children +One or More Children +

<!ELEMENT DIVISION_NAME (#PCDATA)>

<!ELEMENT DIVISION (DIVISION_NAME, TEAM+)>

Page 84: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Zero or More Children *Zero or More Children *

<!ELEMENT TEAM (TEAM_CITY, TEAM_NAME, PLAYER*)>

<!ELEMENT TEAM_CITY (#PCDATA)>

<!ELEMENT TEAM_NAME (#PCDATA)>

Page 85: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Zero or One Children ?Zero or One Children ? <!ELEMENT PLAYER (GIVEN_NAME, SURNAME, POSITION, GAMES, GAMES_STARTED, AT_BATS?, RUNS?, HITS?, DOUBLES?, TRIPLES?, HOME_RUNS?, RBI?, STEALS?, CAUGHT_STEALING?, SACRIFICE_HITS?, SACRIFICE_FLIES?, ERRORS?, WALKS?, STRUCK_OUT?, HIT_BY_PITCH?, WINS?, LOSSES?, SAVES?, COMPLETE_GAMES?, SHUT_OUTS?, ERA?, INNINGS?, EARNED_RUNS?, HIT_BATTER?, WILD_PITCHES?, BALK?,WALKED_BATTER?, STRUCK_OUT_BATTER?)

>

Page 86: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Finished DTDFinished DTD

Page 87: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

ChoicesChoices

<!ELEMENT PAYMENT (CASH | CREDIT_CARD)>

<!ELEMENT PAYMENT (CASH | CREDIT_CARD | CHECK)>

Page 88: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Grouping With ParenthesesGrouping With Parentheses

• Parentheses combine several elements into a single element.

• Parenthesized element can be nested inside other parentheses in place of a single element.

• The parenthesized element can be suffixed with a plus sign, a comma, or a question mark. <!ELEMENT dl (dt, dd)*><!ELEMENT ARTICLE (TITLE, (P | PHOTO | GRAPH | SIDEBAR | PULLQUOTE | SUBHEAD)*, BYLINE?)>

Page 89: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Mixed ContentMixed Content

• Both #PCDATA and child elements in a choice

<!ELEMENT TEAM (#PCDATA | TEAM_CITY | TEAM_NAME | PLAYER)*>

• #PCDATA must come first

• #PCDATA cannot be used in a sequence

Page 90: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Empty elementsEmpty elements

<!ELEMENT BR EMPTY>

<!ELEMENT IMG EMPTY>

<!ELEMENT HR EMPTY>

Page 91: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Internal DTDsInternal DTDs

<?xml version="1.0"?><!DOCTYPE GREETING [ <!ELEMENT GREETING (#PCDATA)>]><GREETING>Hello XML!</GREETING>

Page 92: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Internal DTD SubsetsInternal DTD Subsets

<?xml version="1.0"?><!DOCTYPE GREETING SYSTEM "greeting.dtd" [

<!ELEMENT GREETING (#PCDATA)>]><GREETING>Hello XML!</GREETING>

• Internal declarations override external declarations

Page 93: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Programming with XMLProgramming with XML

• Java works best

• C, Perl, Python etc. can also be used

• Unicode support is the biggest issue

Page 94: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

SAX, the Simple API for XMLSAX, the Simple API for XML

• Event based

• Programs can plug in different parsers

Page 95: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

The Document Object Model The Document Object Model (DOM)(DOM)

Page 96: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

To Learn More: BooksTo Learn More: Books

• XML: Extensible Markup Language

– IDG Books 1998

– ISBN 0-76453-199-9

• The XML Bible

– IDG Books 1999

– ISBN 0-76453-236-7

Page 97: XML Basics Wednesday May 12, 1999 SD99 Copyright 1999 Elliotte Rusty Harold elharo@metalab.unc.edu

Questions?Questions?


Recommended