Date post: | 17-Jan-2016 |
Category: |
Documents |
Upload: | beverly-gaines |
View: | 215 times |
Download: | 0 times |
Intro to XMLIntro to XML
Monday, May 10, 1999 SD99
Copyright 1999 Elliotte Rusty Harold
http://metalab.unc.edu/xml/slides/
What is XML?What is XML?
• Extensible Markup Language
• A syntax for documents
• A Meta-Markup Language
• A Structural and Semantic language, not a formatting language
• Not just for Web pages
XML is a Meta Markup XML is a Meta Markup LanguageLanguage
• Not like HTML, troff, LaTeX
• Make up the tags you needs as you need them
• The tags you create can be documented in a Document Type Definition (DTD)
• A meta syntax for domain-specific markup languages like MusicML, MathML, and CML
XML describes structure and XML describes structure and semantics, not formattingsemantics, not formatting
• XML documents form a tree
• Element and attribute names reflect the kind of the element
• Formatting can be added with a style sheet
A Song Description in HTMLA Song Description in HTML
<dt>Hot Cop<dd> by Jacques Morali, Henri Belolo, and Victor Willis
<ul><li>Producer: Jacques Morali<li>Publisher: PolyGram Records<li>Length: 6:20<li>Written: 1978<li>Artist: Village People</ul>
A Song Description in XMLA Song Description in XML
<SONG> <TITLE>Hot Cop</TITLE> <COMPOSER>Jacques Morali</COMPOSER> <COMPOSER>Henri Belolo</COMPOSER> <COMPOSER>Victor Willis</COMPOSER> <PRODUCER>Jacques Morali</PRODUCER> <PUBLISHER>PolyGram Records</PUBLISHER> <LENGTH>6:20</LENGTH> <YEAR>1978</YEAR> <ARTIST>Village People</ARTIST></SONG>
Style Sheets provide Style Sheets provide formattingformatting
<?xml version="1.0"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/"> <html> <head><title>Song</title></head> <body> <xsl:value-of select="."/> </body> </html> </xsl:template>
</xsl:stylesheet>
Attaching style sheets to Attaching style sheets to documentsdocuments
• <?xml-stylesheet type="text/xsl" href="song1.xsl"?>
• C:\> java ‑Dcom.jclark.xsl.sax.parser= com.jclark.xml.sax.CommentDriver com.jclark.xsl.sax.Driver hotcop.xml song1.xsl hotcop.html
• C:\> xt hotcop.xml xtsong1.xsl hotcop.html
Templates for Other Templates for Other ElementsElements <xsl:template match="/"> <html> <head><title>Song</title></head> <body> <xsl:apply-templates/> </body> </html> </xsl:template> <xsl:template match="TITLE"> <h1><xsl:value-of select="."/></h1> </xsl:template>
Style Sheets can be quite Style Sheets can be quite complexcomplex
<xsl:template match="SONG"> <h1><xsl:value-of select="TITLE"/> by the <xsl:value-of select="ARTIST"/></h1>
<ul> <li>Length: <xsl:value-of select="LENGTH"/> </li>
<li>Producer: <xsl:value-of select="PRODUCER"/> </li>
<li>Publisher: <xsl:value-of select="PUBLISHER"/> </li>
<li>Year: <xsl:value-of select="YEAR"/> </li> <xsl:apply-templates select="COMPOSER"/> </ul></xsl:template>
What is XML used for?What is XML used for?
• Domain-Specific Markup Languages
• Self-Describing Data
• Interchange of Data Among Applications
• Structured and Integrated Data
Domain-Specific Markup Domain-Specific Markup LanguagesLanguages
• Non proprietary format
• Don’t pay for what you don’t use
Self-Describing DataSelf-Describing Data
• Much data is lost due to format problems
• XML is very simple
• XML is self-describing
• XML is well documented
<PERSON ID="p1100" SEX="M"> <NAME> <GIVEN>Judson</GIVEN> <SURNAME>McDaniel</SURNAME> </NAME> <BIRTH> <DATE>21 Feb 1834</DATE> </BIRTH> <DEATH> <DATE>9 Dec 1905</DATE> </DEATH></PERSON>
Interchange of Data Among Interchange of Data Among ApplicationsApplications
• E-commerce
• Syndication
Structured and Integrated Structured and Integrated DataData
• Can specify relationships between elements
• Can assemble data from multiple sources
XML ApplicationsXML Applications
• A specific markup language uses the XML meta-syntax is called an XML application
• Different XML applications have their own more constricted syntaxes and vocabularies within the broader XML syntax
• Further syntax can be layered on top of this; e.g. data typing through DCDs or other schemas
Example XML ApplicationsExample XML Applications
• Web Pages
• Mathematical Equations
• Music Notation
• Vector Graphics
• Metadata
• and more…
Mathematical Markup LanguageMathematical Markup Language
Channel Definition FormatChannel Definition Format
<?xml version="1.0"?><CHANNEL HREF="http://metalab.unc.edu/xml/index.html"> <TITLE>Cafe con Leche</TITLE> <ITEM HREF="http://metalab.unc.edu/xml/books.html"> <TITLE>Books about XML</TITLE> </ITEM> <ITEM HREF="http://metalab.unc.edu/xml/tradeshows.html"> <TITLE>Trade shows and conferences about XML</TITLE> </ITEM> <ITEM HREF="http://metalab.unc.edu/xml/lists.htm"> <TITLE>Mailing Lists dedicated to XML</TITLE> </ITEM></CHANNEL>
Classic LiteratureClassic Literature
• The Complete Plays of Shakespeare
• The Bible
• The Koran
• The Book of Mormon
Vector GraphicsVector Graphics
• Vector Markup Language (VML)– Internet Explorer 5.0
– Microsoft Office 2000
• Scalable Vector Graphics (SVG)
The Resource Description The Resource Description Framework (RDF)Framework (RDF)
• Meta-data
• Dublin Core
• Better Web searching
An Example of RDFAn Example of RDF
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/DC/> <rdf:Description about="http://metalab.unc.edu/xml/>
<dc:CREATOR>Elliotte Rusty Harold</dc:CREATOR>
<dc:TITLE>Cafe con Leche</dc:TITLE> </rdf:Description></rdf:RDF>
XML for XMLXML for XML
• XSL: The Extensible Stylesheet Language
• DCD: The Document Content Description Schema Language
• XLL: The Extensible Linking Language
XSL: The Extensible XSL: The Extensible Stylesheet LanguageStylesheet Language
• XSL Transformations
• XSL Formatting Objects
DCD: The Document Content DCD: The Document Content Description Schema Description Schema LanguageLanguage
• Data Typing in XML is Weak
• <MONTH>9</MONTH>
<DCD> <ElementDef Type="MONTH" Model="Data" Datatype="i1" Min="1" Max="12" /></DCD>
XLL: The Extensible Linking XLL: The Extensible Linking LanguageLanguage• Any element can be a link
• Links can be bi-directional
• Links can be separated from the documents they connect
<footnote xlink:form="simple" href="footnote7.xml">7</footnote>
File Formats, in-house applications, File Formats, in-house applications, and other behind the scenes usesand other behind the scenes uses
• Microsoft Office 2000
• Federal Express Web API
• Netscape What’s Related
Hello XMLHello XML
<?xml version="1.0" standalone="yes"?><FOO>Hello XML!</FOO>
• Plain ASCII or UTF-8 text
• .xml is standard file extension
• Any standard text editor will work
The XML DeclarationThe XML Declaration
• version attribute– required
– always has the value 1.0
• standalone attribute– yes
– no
• encoding attribute– UTF-8
– 8859_1
– etc.
<?xml version="1.0" standalone="yes"?>
The FOO elementThe FOO element
• Start tag <FOO>
• Contents "Hello XML!"
• End tag </FOO>
<FOO>Hello XML!</FOO>
greeting.xmlgreeting.xml
<?xml version="1.0" standalone="yes"?><GREETING>Hello XML!</GREETING>
Style sheetsStyle sheets• Separate from the XML document
• Different Languages– Cascading Style Sheets Level 1 (CSS1)
Internet Explorer 5.0Mozilla 5.0
– Cascading Style Sheets Level 2 (CSS2)Internet Explorer 5 (partial)Mozilla 5.0 (partial)
– Extensible Style Language (XSL)Internet Explorer 5.0 (older draft, buggy)LotusXSL, XT, Other non-browser converters
– Document Style and Semantics Language (DSSSL)Jade
xml-stylesheetxml-stylesheet• Style sheets are attached via an xml-stylesheet processing instruction in the prolog
<?xml version="1.0" standalone="yes"?><?xml-stylesheet type="text/css" href="greeting.css"?>
<GREETING>Hello XML!</GREETING>
• Can also use non-browser converters like XT, LotusXSL, and Jade
greeting.cssgreeting.css
GREETING {display: block; font-size: 24pt; font-weight: bold}
greeting.xslgreeting.xsl<?xml version="1.0"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/"> <html> <body> <h1> <xsl:value-of select="GREETING"/>
</h1> </body> </html> </xsl:template>
</xsl:stylesheet>
Attaching a style sheet to an Attaching a style sheet to an XML documentXML document
• xml-stylesheet processing instruction after the XML declaration and before the root element
• type attribute has the value text/css or text/xsl
• href attribute is a URL to the stylesheet, possibly relative
A larger example: Baseball A larger example: Baseball statisticsstatistics
• Examine the data
• Design a vocabulary for the data
• Write a style sheet
Sample statisticsSample statisticshttp://cbs.sportsline.com/u/baseball/mlb/stats.htm
Organizing the DataOrganizing the Data
• XML documents are trees.
• XML elements contain other elements as well as text
• Within these limits there's more than one way to organize the data
– Hierarchically
– Relationally
– Objects
What is the Root ElementWhat is the Root Element
• The League?
• The Season?
• A custom Document element?
The Root ElementThe Root Element
<?xml version="1.0"?><SEASON></SEASON>
• Choose SEASON for the root element
• Everything else will be a descendant of SEASON
• This is not the only possible choice
What are the Immediate What are the Immediate Children of The root?Children of The root?
• Leagues?
• Teams?
• Players?
• Games?
Child ElementsChild Elements
<?xml version="1.0"?><SEASON> <YEAR> 1998 </YEAR></SEASON>
White space in XML is not White space in XML is not especially significantespecially significant
<?xml version="1.0"?>
<SEASON><YEAR>1998</YEAR></SEASON>
LeaguesLeagues
• Major league baseball is divided into two leagues
• Each league has– a name
– three divisions
DivisionsDivisions
• Each division has– name
– 4-6 teams
TeamsTeams
• Each team has– Name
– City
– Players
Player DataPlayer Data
• Each player has– First name
– Last name
– Position
– Statistics
Player Batting StatisticsPlayer Batting Statistics
• G Games Played• GS Games Started• AB At Bats• R Runs• H Hits• 2B Doubles• 3B Triples• HR Home Runs• RBI Runs Batted In
• SB Stolen Bases• CS Caught Stealing• SH Sacrifice Hits• SF Sacrifice Flies• Err Errors• PB Pitcher Balked• BB Base on Balls
(Walks)• SO Strike Outs• HBP Hit By Pitch
What does a player look likeWhat does a player look like
• Long names vs. short names
The Complete 1998 Major The Complete 1998 Major LeagueLeague
• Long version
A Style SheetA Style Sheet
• 1998shortstats.xml
• baseballstats.css
• <?xml-stylesheet type="text/css" href="baseballstats.css"?>
• styled1998shortstats.xml
Cascading Style SheetsCascading Style Sheets
• Partially supported by Mozilla and IE 5.0
• Full W3C Recommendation
The Default RuleThe Default Rule
• Not every element needs a rule
• The root element should be at least display: block
SEASON { font-size: 14pt; background-color: white; color: black; display: block}
A style rule for the YEAR A style rule for the YEAR elementelement
• Make it look like a title
YEAR { display: block; font-size: 32pt; font-weight: bold; text-align: center}
Style Rules for Division and Style Rules for Division and League NamesLeague Names
LEAGUE_NAME { display: block; text-align: center; font-size: 28pt; font-weight: bold}
DIVISION_NAME { display: block; text-align: center; font-size: 24pt; font-weight: bold}
Alternate Style Rules for Alternate Style Rules for Division and League NamesDivision and League Names
LEAGUE_NAME, DIVISION_NAME { display: block; text-align: center; font-weight: bold}LEAGUE_NAME {font-size: 28pt }DIVISION_NAME {font-size: 24pt }
Style Rules for TeamsStyle Rules for Teams• Team name and Team city must be one title
• Must be inline elements
• Previous and following must be block elements
TEAM_CITY { font-size: 20pt; font-weight: bold; font-style: italic}
TEAM_NAME { font-size: 20pt; font-weight: bold; font-style: italic}
TEAM, PLAYER {display: block}
Style Rules for PlayersStyle Rules for PlayersTEAM {display: table}TEAM_CITY {display: table-caption}TEAM_NAME {display: table-caption}PLAYER {display: table-row}
SURNAME, GIVEN_NAME, POSITION, GAMES, GAMES_STARTED, AT_BATS, RUNS, HITS, DOUBLES, TRIPLES, HOME_RUNS, RBI, STEALS,CAUGHT_STEALING, SACRIFICE_HITS, SACRIFICE_FLIES, ERRORS, WALKS, STRUCK_OUT, HIT_BY_PITCH {display: table-cell}
Finished Style SheetFinished Style Sheet
SEASON {font-size: 14pt; background-color: white; color: black; display: block}YEAR {display: block; font-size: 32pt; font-weight: bold; text-align: center}LEAGUE_NAME {display: block; text-align: center; font-size: 28pt; font-weight: bold}DIVISION_NAME {display: block; text-align: center; font-size: 24pt; font-weight: bold}TEAM_CITY {font-size: 20pt; font-weight: bold; font-style: italic}TEAM_NAME {font-size: 20pt; font-weight: bold; font-style: italic}TEAM {display: block}PLAYER {display: block}
Possible ExtensionsPossible Extensions• There should be captions like "RBI" or
"At Bats.”
• Derived numbers like batting averages are not included.
• The titles are short. E.g. "1998" instead of "1998 Major League Baseball".
• The document is so long it's hard to read. Something similar to IE5's collapsible outline view would be nice.
• Pitcher stats should be separated from batter stats.
Possible SolutionsPossible Solutions
• CSS Level 2
• XSL
• XSL + JavaScript
AttributesAttributes
• name=value
• Values must be quoted
• An element may not have two attributes with the same name
<IMG SRC="cup.gif" WIDTH="89" HEIGHT="67" ALT="Cup of coffee"></IMG>
Attributes in the Baseball Attributes in the Baseball ExampleExample
<SEASON YEAR="1998"> <!--leagues go here --></SEASON>
Leagues are still child Leagues are still child elementselements• Attribute names must be unique
• Leagues have sub-structure (e.g. they contain divisions, teams, players, etc.)
<SEASON YEAR="1998" LEAGUE="National" League="American"></SEASON>
Team AttributesTeam Attributes• Divisions and teams can also have
NAME attributes without any fear of confusion with the name of a league
• Thus you Names like NAME instead of LEAGUE_NAME
• Team cities are also good as attributes
Team AttributesTeam Attributes
<LEAGUE NAME="American League"> <DIVISION NAME="East"> <TEAM NAME="Orioles" CITY="Baltimore"></TEAM> <TEAM NAME="Red Sox" CITY="Boston"></TEAM> <TEAM NAME="Yankees" CITY="New York"></TEAM> <TEAM NAME="Blue Jays" CITY="Toronto"></TEAM> </DIVISION></LEAGUE>
Player AttributesPlayer Attributes
<PLAYER GIVEN_NAME="Joe" SURNAME="Girardi" GAMES="78" AT_BATS="254" RUNS="31" HITS="70" DOUBLES="11" TRIPLES="4" HOME_RUNS="3" RUNS_BATTED_IN="31" WALKS="14" STRUCK_OUT="38" STOLEN_BASES="2" CAUGHT_STEALING="4" SACRIFICE_FLY="1" SACRIFICE_HIT="8" HIT_BY_PITCH="2"></PLAYER>
Attributes and ElementsAttributes and Elements
<P> On Tuesday <PLAYER GAMES="78" AT_BATS="254" RUNS="31" HITS="70" DOUBLES="11" TRIPLES="4" HOME_RUNS="3" RUNS_BATTED_IN="31" WALKS="14" STRIKE_OUTS="38" STOLEN_BASES="2" CAUGHT_STEALING="4" SACRIFICE_FLY="1" SACRIFICE_HIT="8" HIT_BY_PITCH="2"> <FIRST_NAME>Joe</FIRST_NAME> <SURNAME>Girardi </SURNAME></PLAYER> struck out twice and...</P>
Attributes vs. ElementsAttributes vs. Elements
• Attribute are for meta-data; elements are for data
• Does the reader want to see the information? If yes, use element content; if no, use attributes
• Attributes are good for ID numbers, URLs, references, and other information not directly relevant to the reader
When not to use attributesWhen not to use attributes
• Attributes can't hold structure well.
• Elements allow you to include meta-meta-data (information about the information about the information).
• Not everyone always agrees on what is and isn't meta-data.
• Elements are more extensible in the face of future changes.
Empty TagsEmpty Tags
• End with a />– e.g. <PLAYER/>
• Same as <PLAYER></PLAYER>
The Extensible Style The Extensible Style LanguageLanguage
• Partially supported by IE 5.0
• Many third party tools
• W3C Working Draft
The Two Parts of XSLThe Two Parts of XSL
• Transformation Language
• Formatting Objects
TemplatesTemplates
<HTML> <HEAD> <TITLE> XSL Instructions to get the title </TITLE> </HEAD> <H1>XSL Instructions to get the title</H1> <BODY> XSL Instructions to get the statistics </BODY></HTML>
XSL InstructionsXSL Instructions
• An XSL style sheet is a well-formed XML document
• XSL instructions are particular XML elements– xsl:apply-templates
– xsl:template
– xsl:for-each
– xsl:value-of
– a few others
An XSL style sheetAn XSL style sheet<?xml version="1.0"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/TR/WD-xsl">
<xsl:template match="/"><HTML xmlns:xsl="http://www.w3.org/TR/WD-xsl"><HEAD><TITLE>Major League Baseball Statistics</TITLE></HEAD><BODY> <H1>Major League Baseball Statistics</H1></BODY></HTML></xsl:template></xsl:stylesheet>
xsl:for-each and xsl:value-xsl:for-each and xsl:value-ofof<xsl:template match="/"> <HTML xmlns:xsl="http://www.w3.org/TR/WD-xsl"><HEAD><TITLE> <xsl:for-each select="SEASON"> <xsl:value-of select="@YEAR"/> </xsl:for-each> Major League Baseball Statistics</TITLE></HEAD><BODY> <xsl:for-each select="SEASON"><H1><xsl:value-of select="@YEAR"/> Major League Baseball Statistics</H1> </xsl:for-each> </BODY></HTML></xsl:template>
xsl:for-each and xsl:value-xsl:for-each and xsl:value-ofof
<xsl:for-each select="SEASON"> <xsl:value-of select="@YEAR"/> </xsl:for-each>
Namespaces and XSLNamespaces and XSL
• XSL instructions are in the xsl namespace to distinguish them from output HTML elements.
• The namespace is identified by the xmlns:xsl attribute of the root element of the style sheet.
• The value of that attribute is http://www.w3.org/TR/WD-xsl.
LeaguesLeagues
Divisions and TeamsDivisions and Teams
PlayersPlayers
CSS or XSL?CSS or XSL?
• CSS has broader support
• CSS is more stable
• XSL is more powerful
Well-formedness RulesWell-formedness Rules• Open and close all tags
• Empty tags end with />
• There is a unique root element
• Elements may not overlap
• Attribute values are quoted
• < and & are only used to start tags and entities
• Only the five predefined entity references are used
Open and close all tagsOpen and close all tags
Empty tags end with Empty tags end with />/>
• <BR/>, <HR/>, and <IMG/> instead of <BR>, <HR>, and <IMG>
• Web browsers deal inconsistently with these
• Can use <BR></BR> <HR></HR> <IMG></IMG> instead
There is a unique root There is a unique root elementelement
• One element completely contains all other elements of the document
• This is HTML in HTML files
• XML Declaration is not an element
<?xml version="1.0" standalone="yes"?><GREETING>Hello XML!</GREETING>
Elements may not overlapElements may not overlap
• If an element contains a start tag for an element, it must also contain the corresponding end tag
• Empty elements may appear anywhere
• Every non root element has a parent element
Attribute values are quotedAttribute values are quoted
• Good: – <A
HREF="http://metalab.unc.edu/xml/">
• Bad: – <A
HREF=http://metalab.unc.edu/xml/>
<< and and && are only used to start are only used to start tags and entitiestags and entities
• Good: <H1>O'Reilly & Associates</H1>
• Bad: <H1> O'Reilly & Associates</H1>
• Good: – <CODE>for (int i = 0; i <= args.length; i++ ) { </CODE>
• Bad: – <CODE>for (int i = 0; i <= args.length; i++ ) { </CODE>
Only the five predefined Only the five predefined entity references are usedentity references are used
• Good: – &
– <
– >
– "
– '
• Bad:– ©
– ®
– &tm;
– α
– é
–
– etc.
DTDs and ValidityDTDs and Validity
• A Document Type Definition describes the elements and attributes that may appear in a document
• Validation compares a particular document against a DTD
• Well-formedness is a prerequisite for validity
What is a DTD?What is a DTD?
• a list of the elements, tags, attributes, and entities contained in a document, and their relationship to each other
• internal vs. external DTDs
The importance of validationThe importance of validation
• Ensures that data is correct before feeding it into a program
• Ensure that a format is followed
• Establish what must be supported
• Not all documents need to be valid; sometimes well-formed is enough
A DTD for greeting.xmlA DTD for greeting.xml
• greeting.xml:<?xml version="1.0"?><GREETING>Hello XML!</GREETING>
• greeting.dtd:
<!ELEMENT GREETING (#PCDATA)>
Document Type DeclarationsDocument Type Declarations<?xml version="1.0"?><!DOCTYPE GREETING SYSTEM "greeting.dtd">
<GREETING>Hello XML!</GREETING>
• specifies the root element
• gives a URL for the DTD
Invalid DocumentsInvalid Documents• Valid:
<GREETING>various random text but no markup</GREETING>
• Invalid: anything else including<GREETING> <sometag>various random text</sometag> <someEmptyTag/></GREETING>– or<GREETING> <GREETING>various random text</GREETING>
</GREETING>
Validating ToolsValidating Tools
• Command line programs like XJParse
• Online validators– http://www.stg.brown.edu/service/
xmlvalid/
– http://www.cogsci.ed.ac.uk/%7Erichard/xml-check.html
• Browsers
Element DeclarationsElement Declarations
• Each tag must be declared in a <!ELEMENT> declaration.
• A <!ELEMENT> declaration gives the name and content model of the element
• The content model uses a simple regular expression-like grammar to precisely specify what is and isn't allowed in an element
Content SpecificationsContent Specifications
• ANY
• #PCDATA
• Sequences
• Choices
• Mixed Content
• Modifiers
• Empty
ANYANY
<!ELEMENT SEASON ANY>
• A SEASON can contain any child element and/or raw text (parsed character data)
#PCDATA#PCDATA
<!ELEMENT YEAR (#PCDATA)>
• Parsed Character Data; i.e. raw text, no markup
#PCDATA#PCDATA
• Valid:<YEAR>1999</YEAR><YEAR>99</YEAR><YEAR>1999 C.E.</YEAR><YEAR> The year of our Lord one thousand, nine hundred, and ninety-nine
</YEAR>
• Invalid:<YEAR><MONTH>January</MONTH><MONTH>February</MONTH><MONTH>March</MONTH><MONTH>April</MONTH><MONTH>May</MONTH><MONTH>June</MONTH><MONTH>July</MONTH><MONTH>August</MONTH><MONTH>September</MONTH><MONTH>October</MONTH><MONTH>November</MONTH><MONTH>December</MONTH></YEAR>
Child ElementsChild Elements
• To declare that a LEAGUE element must have a LEAGUE_NAME child:
<!ELEMENT LEAGUE (LEAGUE_NAME)>
<!ELEMENT LEAGUE_NAME (#PCDATA)>
SequencesSequences
• Separate multiple required child elements with commas; e.g.
<!ELEMENT SEASON (YEAR, LEAGUE, LEAGUE)>
<!ELEMENT LEAGUE (LEAGUE_NAME, DIVISION, DIVISION, DIVISION)>
One or More Children +One or More Children +
<!ELEMENT DIVISION_NAME (#PCDATA)>
<!ELEMENT DIVISION (DIVISION_NAME, TEAM+)>
Zero or More Children *Zero or More Children *
<!ELEMENT TEAM (TEAM_CITY, TEAM_NAME, PLAYER*)>
<!ELEMENT TEAM_CITY (#PCDATA)>
<!ELEMENT TEAM_NAME (#PCDATA)>
Zero or One Children ?Zero or One Children ? <!ELEMENT PLAYER (GIVEN_NAME, SURNAME, POSITION, GAMES, GAMES_STARTED, AT_BATS?, RUNS?, HITS?, DOUBLES?, TRIPLES?, HOME_RUNS?, RBI?, STEALS?, CAUGHT_STEALING?, SACRIFICE_HITS?, SACRIFICE_FLIES?, ERRORS?, WALKS?, STRUCK_OUT?, HIT_BY_PITCH?, WINS?, LOSSES?, SAVES?, COMPLETE_GAMES?, SHUT_OUTS?, ERA?, INNINGS?, EARNED_RUNS?, HIT_BATTER?, WILD_PITCHES?, BALK?,WALKED_BATTER?, STRUCK_OUT_BATTER?)
>
Finished DTDFinished DTD
ChoicesChoices
<!ELEMENT PAYMENT (CASH | CREDIT_CARD)>
<!ELEMENT PAYMENT (CASH | CREDIT_CARD | CHECK)>
Grouping With ParenthesesGrouping With Parentheses
• Parentheses combine several elements into a single element.
• Parenthesized element can be nested inside other parentheses in place of a single element.
• The parenthesized element can be suffixed with a plus sign, a comma, or a question mark. <!ELEMENT dl (dt, dd)*><!ELEMENT ARTICLE (TITLE, (P | PHOTO | GRAPH | SIDEBAR | PULLQUOTE | SUBHEAD)*, BYLINE?)>
Mixed ContentMixed Content
• Both #PCDATA and child elements in a choice
<!ELEMENT TEAM (#PCDATA | TEAM_CITY | TEAM_NAME | PLAYER)*>
• #PCDATA must come first
• #PCDATA cannot be used in a sequence
Empty elementsEmpty elements
<!ELEMENT BR EMPTY>
<!ELEMENT IMG EMPTY>
<!ELEMENT HR EMPTY>
Attribute DeclarationsAttribute Declarations
• Consider this element:<GREETING LANGUAGE="Spanish"> Hola!</GREETING>
• It is declared like this:<!ELEMENT GREETING (#PCDATA)><!ATTLIST GREETING LANGUAGE CDATA "English">
• <!ATTLIST Element_name Attribute_name Type Default_value>
Multiple Attribute Multiple Attribute DeclarationsDeclarations
• Consider this element
<RECT LENGTH="70px" WIDTH="85px"/>
• With two attribute declarations:<!ELEMENT RECTANGLE EMPTY><!ATTLIST RECTANGLE LENGTH CDATA "0px"><!ATTLIST RECTANGLE WIDTH CDATA "0px">
• With one attribute declaration<!ATTLIST RECTANGLE LENGTH CDATA "0px"
WIDTH CDATA "0px">
• Indentation is a convetion, not a requirement
Attribute TypesAttribute Types
• CDATA
• ID
• IDREF
• IDREFS
• ENTITY
• ENTITIES
• NOTATION
• NMTOKEN
• NMTOKENS
• Enumerated
CDATACDATA
• Most general attribute type
• Value can be any string of text not containing a less-than sign (<) or quotation marks (")
IDID
• Value must be an XML name– May include letters, digits, underscores,
hyphens, and periods
– May not include whitespace
– May contain colons only if used for namespaces
• Value must be unique within ID type attributes in the document
• Generally the default value is #REQUIRED
IDREFIDREF
• Value matches the ID of an element in the same document
• Used for links and the like
IDREFSIDREFS
• A list of ID values in the same document
• Separated by white space
ENTITYENTITY
• Value is the name of an unparsed general entity declared in the DTD
ENTITIESENTITIES
• Value is a list of unparsed general entities declared in the DTD
• Separated by white space
NOTATIONNOTATION
• Value is the name of a notation declared in the DTD
NMTOKENNMTOKEN
• Value is any legal XML name
NMTOKENSNMTOKENS
• Value is a list of XML names
• Separated by white space
EnumeratedEnumerated
• Not a keyword
• Refers to a list of possible values from which one must be chosen
• Default value is generally provided explicitly
<!ATTLIST P VISIBLE (TRUE | FALSE) "TRUE">
Attribute Default ValuesAttribute Default Values
• A literal string value
• One of these three keywords– #REQUIRED
– #IMPLIED
– #FIXED
#REQUIRED#REQUIRED
• No default value is provided in the DTD
• Document authors must provide attribute value for each element
<!ELEMENT IMG EMPTY><!ATTLIST IMG ALT CDATA #REQUIRED><!ATTLIST IMG WIDTH CDATA #REQUIRED><!ATTLIST IMG HEIGHT CDATA #REQUIRED>
#IMPLIED#IMPLIED
• No default value in the DTD
• Author may(but does not have to) provide a value with each element
#FIXED#FIXED
• Value is the same for all elements
• Default value must be provided in DTD
• Document author may not change default value<!ELEMENT AUTHOR EMPTY><!ATTLIST AUTHOR NAME CDATA #REQUIRED>
<!ATTLIST AUTHOR EMAIL CDATA #REQUIRED>
<!ATTLIST AUTHOR EXTENSION CDATA #IMPLIED><!ATTLIST AUTHOR COMPANY CDATA #FIXED "TIC">
Internal DTDsInternal DTDs
<?xml version="1.0"?><!DOCTYPE GREETING [ <!ELEMENT GREETING (#PCDATA)>]><GREETING>Hello XML!</GREETING>
Internal DTD SubsetsInternal DTD Subsets
<?xml version="1.0"?><!DOCTYPE GREETING SYSTEM "greeting.dtd" [
<!ELEMENT GREETING (#PCDATA)>]><GREETING>Hello XML!</GREETING>
• Internal declarations override external declarations
Programming with XMLProgramming with XML
• Java works best
• C, Perl, Python etc. can also be used
• Unicode support is the biggest issue
SAX, the Simple API for XMLSAX, the Simple API for XML
• Event based
• Programs can plug in different parsers
The Document Object Model The Document Object Model (DOM)(DOM)
Additional TechnologiesAdditional Technologies
• Namespaces
• XLinks
• XPointers
• RDF
NamespacesNamespaces
• Attach a prefix to each element
• Prefix is connected to a unique URI by an xmlns attribbute– Uniform Resource Identifier
– URI need not point to a page
xmlns:bb="http://metalab.unc.edu/xml/baseball/"
XLinksXLinks
XPointersXPointers
The Resource Description The Resource Description FrameworkFramework
To Learn More: BooksTo Learn More: Books
• XML: Extensible Markup Language
– IDG Books 1998
– ISBN 0-76453-199-9
• The XML Bible
– IDG Books 1999
– ISBN 0-76453-236-7
Questions?Questions?