Final Exam ReviewCIS-189
“Markup” refers to the use of tags to describe data◦ Data describing data is meta data◦ Tags identify where data begins and ends, and
has some information about that data◦ Often referred to as “self describing”
Standard Generalized Markup Language was created to offer universal standards for sharing and moving information
Markup
Extensible Markup Language fills the gap between display of HTML and complexity of SGML◦ XML is compatible with rules of SGML
XML isn’t a language◦ Set of standards about how to create a language
to define and work with particular data
XML
Tags are used similar to HTML◦ A tag must always have a close
<name>Randy</name><middle />
Tags are defined as needed◦ No set of predefined tags as in HTML◦ Tags typically aren’t about display
Display is separated from data, unlike HTML
Using XML
XML is hierarchical◦ Individual items in XML are elements◦ One element can belong to another
Child and parent Similar to a one-to-many relationship
Structure is called a ‘tree’ ◦ An item with children is called a branch◦ An item with no children is a leaf
XML Structure
An element can contain data An element can contain other elements An element can contain data and other
elements Definition of elements for specific data
make up a vocabulary
Elements
Complies with rules ◦ Rules allow easy transfer and read of data
independent of platform, application Parser reads XML file
◦ Parser typically runs as a service to another application
◦ A file that doesn’t comply with rules has a fatal error and parser cannot continue
◦ By definition, a file that isn’t well formed has a fatal error Any violation of rules is an error
Well-Formed XML
Start tag must have an end tag or be self-closing
Tags cannot overlap Must have one – and only one – root
element Element names obey naming conventions XML is case-sensitive Whitespace is maintained in PCDATA
Well Formed
Element name can include a space after the name
Element cannot have a space at beginning of start tag
Element name must begin with letters or dash◦ After the first character, numbers, hyphens, periods
are acceptable Cannot use spaces or : (colon) in names
◦ Colon is reserved for special uses XML cannot be used as the first 3 letters of a
name (Upper, lower or mixed case)
Naming Elements
CDATA refers to character data◦ Values that are treated as those characters
PCDATA refers to parsed character data◦ Values that are translated for a specific meaning
or purpose Whitespace is treated differently than HTML
◦ Maintained◦ Carriage return and linefeed characters are both
treated as single linefeed by parser
Working with Data
Provide another way to represent values Defined within the start tag of an element Work in a name/value pair
◦ Must include both a name and value for a valid statement (an empty string is a valid value)
◦ Value must be enclosed in single or double quotes Opening quote must be same as closing (can’t pair a
single quote and double quote) Be consistent for ease of coding, reading, and
maintenance
Attributes
Attribute names must conform to same rules as element names◦ Start with letter or dash◦ Can use numbers, hyphens, periods after the first
character◦ Name of each attribute must be unique within an
element
Attribute Names
Elements can be more complex◦ Can include child elements if needed◦ Attributes are about a single value
Attributes can simplify logic◦ Can avoid or reduce nesting◦ Can simplify logic
Choice of element or attribute most often simply a design choice, preference
Using Attributes and Elements
Provide information to aid the processing of the file<? XML version=“1.0”?>
If include XML declaration must be first entry◦ Cannot have any character preceding the open
tag If include XML declaration must have at
least the version◦ Have versions 1.0, 1.1
XML Declarations
Optional settings are encoding, standalone◦ Encoding specifies which character set is being
used (how characters are represented)◦ Standalone tells the parser if document is
complete by itself, or relies on another file
Optional XML Declarations
Processing instructions are for consuming application◦ Not used by XML parser◦ Includes information/commands that application
needs to complete some task
<? Statement ?>
Processing Instructions
Some symbols have special meaning◦ Less than (<)◦ Greater than (>)◦ Ampersand (&)
Cannot use these characters directly unless wrapped in a CDATA section
If need single symbol can substitute◦ < for <, > for >, & for &
Special Characters
DTD stands for Document Type Definition Allows an XML document to go further than
meeting the requirements of being well-formed
Specifies requirements to be valid◦ A valid XML document matches definitions of
allowable elements, attributes
DTD Overview
Validation can be done in code (i.e. using javascript, VB and DOM)
DTD’s allow use of a validating parser that compares the document against specifications◦ Typically makes application changes and
maintenance easier◦ Less tied to a particular programming
language/environment
Validation
Includes name of root element Allows specification of where the DTD is
located◦ DTD can be embedded in the XML file (local)◦ DTD can refer to external file, Uniform Resource
Identifier (URI)◦ Local takes precedence over external
Document Type Declaration
Element Declaration has 3 parts:◦ Declaration◦ Element name◦ Element content
Element content can include a list of child elements or data
Element Declaration
DTD included in XML document Definition of a student:<!DOCTYPE student[
<!ELEMENT student(first, last, studentID)><!ELEMENT first (#PCDATA)><!ELEMENT last(#PCDATA)><ELEMENT studentID(#PCDATA)>
]>
Local DTD
Element Declaration
Document Type Declaration
A student element is made up of first name, last name, and student id elements
DTD exists in external file/location Must use keyword to specify type of location
◦ SYSTEM is a reference to local file system◦ PUBLIC is reference to DTD accessed through a
catalog Can use both together
◦ If can’t find catalog reference can use specified file
External Definition
Reference in XML file:<!DOCTYPE student SYSTEM
“student.dtd”> External file:<!ELEMENT student(first, last, studentID)>
<!ELEMENT first (#PCDATA)><!ELEMENT last(#PCDATA)><ELEMENT studentID(#PCDATA)>
]>
Sample External DefinitionDocument Type Declaration
Element Declaration
Element name must match name in XML document◦ If using namespaces, prefixes must match
Content Model defines what the element can store◦ An element◦ Mixed (i.e. data and element)◦ Empty◦ Any
Working With Elements
Error raised if an element is missing Error raised if there are extra elements Error raised if elements in a different order
For a student, our content must be in firstname, lastname, studentID order◦ If find an element “major”, error◦ If order varies, error◦ If missing first, last, or studentID, error
Content by Sequence
Can allow content to vary between elements◦ | (vertical bar or pipe) indicates OR
If add a Grade element to a student that can be a letter or percent:
<!ELEMENT grade (letter | percent)><!ELEMENT letter (#PCDATA)><!ELEMENT percent (#PCDATA)
Indicates that must have letter or percent element
Content by Choice
Allows combination of elements and parsed character data◦ Can include additional information within an
element, eg. how to display Rules:1. Managed by using Choice (or)2. PCDATA must appear first in list of elements3. List cannot include inner content model (only
simple elements)4. If there are child elements, include *
◦ * Indicates that may appear zero or more times
Mixed Content
If want to include emphasis with the letter grade Data: <letter><em>4</em></letter> Declaration:
<!ELEMENT letter (#PCDATA | em)*>
Describes a letter element as the content (pcdata) plus emphasis element
Mixed Content -2
An element can be empty◦ <br /> (never has child, content)
Declaration includes EMPTY:
<!ELEMENT br EMPTY>
Means that the element CANNOT contain content
Empty Content
An element can contain any kind of value (or be empty)◦ Any elements declared in the DTD can occur, any
number of times Only elements that are part of the DTD can be part
of the document!◦ May be empty◦ May contain PCDATA
Least restrictive model
Any Content
How many times can an element occur? How many times must an element occur?
Cardinality
Indicator Purpose
(none) No indicator means an element must appear once – and ONLY once
? An element may appear once or not at all
+ An element may appear once or more times
* An element may appear zero or more times
Elements tend to be used to describe a logical unit of information
Attributes are typically used to store data about characteristics (properties)
May have a Movie element with attributes for Title, Rental Price, Rental Days
No specific rules about how to use elements and attributes
Attributes and DTD’s
Attributes allow more limits on data◦ Can have a list of acceptable values◦ Can have a default value◦ Some ability to specify a data type◦ Concise, about a single name/value pair
Attributes have limits◦ Can’t store long strings of text◦ Can’t nest values◦ Whitespace can’t be ignored
Attributes and Elements
Declaration:<!ATTLIST ElementName AttrName AttrType Default>
Specify the Element the attribute belongs to Specify the Name of the attribute Specify the Type of data the attribute stores Specify characteristics of the values
(Default or attribute value)◦ List either the default value or other characteristic
of value – required, optional
Specifying Attributes
CDATA – unparsed character data Enumerated – series/list of string values Entity/Entities – reference entity definition(s) ID – unique identifier for the element IDREF – refer to the ID of another element IDREFS – list of ID’s of other elements
separated by whitespace NMTOKEN/NMTOKENS – value(s) of attribute
can be anything that follows rules for XML name
Sample Attribute Data Types
Specifies that attribute value must be found in a particular list
Each value in list must be valid XML name◦ Limits on spaces, characters
Use | (pipe) to separate members of list If specifying list letter grades for a student:
<!ATTLIST student grade (A | B | C | D | F | V | W | I) #IMPLIED>
Enumerated Attributes
ElementAttribute Enumerated List
An ID specifies that the element must have a unique value within the document◦ Allows reliable way to refer to a specific element◦ No spaces allowed in value
Typically replace space with underscore◦ Attribute list can include only one ID
IDREF, IDREFS allows an element to be associated with another or multiple other elements
A student element must have a student ID:
<!ATTLIST student studentID ID #REQUIRED>
ID, IDREF, IDREFS
Attributes can refer to entities◦ “Entity” refers to substituting a reference for a text
value & refers to the & character
◦ Unparsed Entity is a reference that isn’t parsed◦ Can reuse references for long values, or hard to
manage characters (i.e. tab, line feed) Entity must be declared in the DTD
<!ENTITY classTitle “XML”>◦ When classTitle found in document, replaced with
XML
Entities and Attributes
Can specify how the value will appear in the document◦ Must always specify a value declaration
DEFAULT sets a value for an attribute if a value isn’t provided◦ Include default value in double quotes
FIXED sets a value that must occur; if an attribute has a different value, a validation error occurs
REQUIRED specifies that the attribute (and value) must exist
IMPLIED means the attribute is optional
Attribute Value Declarations
Alternative to DTD’s as way to define structure◦ Essentially defining a language◦ Structure may be also referred to as vocabulary
Ensures that data matches specifications Serves as basis for other XML-related
technologies
XML Schemas
Use XML for definition◦ Doesn’t have separate structure like DTD’s◦ Schema must be well-formed
Support Namespace recommendations◦ Allows same name to be used in different
Schemas and properly understood Provides for built-in and user-defined data
types Can be easily reused Supports concepts such as inheritance
◦ One object is based on another
Working with Schemas
Allows more specificity than DTD’s◦ Can specify dates, numbers, ranges
Datatypes fall into two categories:◦ Simple deals with basic values◦ Complex describes more intricate values or
structures
Schema Datatypes
Schema file uses an .xsd extension Root element is the schema
◦ Can nest all elements within the schema Everything is hierarchical
OR◦ Can have multiple elements as child elements of
the schema root Allows use of a definition any place in the document
(data) file Elements which are child elements of schema are
global
Creating Schemas
Simple data type is about text, numbers, date◦ Sometime referred to as “primitives”
Data types built in to Schema vocabulary (and related elements, attributes) are in the XML Schema namespace◦ Need reference to namespace to have valid XML –
where to find the definition Elements that are Simple Datatypes don’t
have attributes◦ Including an attribute makes an element Complex
Simple Datatypes
The simpleType allows customization of base types
Can create limits on values◦ Specify ranges◦ Specify lists
<xsd:simpleType=“Degrees”><xsd:restriction base=“xsd:string”>
<xsd:enumeration value=“AA” /><xsd:enumeration value=“AS” />
</xsd: restriction></xsd:simpleType>
Defining (Simple) Datatypes
Allows combination of different elements and specification of order, new data types
Can create an element Course which is comprised of simple types
<xsd:element name=“course”><xsd:complexType>
<xsd:sequence><xsd:element name=“department” type=“xsd:string”/><xsd:element name=“number” type=“xsd:string”/><xsd:element name=“title” type=“xsd:string”/><xsd:element name=“credits” type=“xsd:integer”/></xsd:sequence>
</xsd:complexType></xsd:element>
Complex Datatypes
When using a schema, need to create a reference from data file
Use either the schemaLocation or noNamespaceSchemaLocation attribute of the root element
<course xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xsi:noNamespaceShemaLocation=“course.xsd”>
Using a Schema
Qualification refers to whether a value (element, attribute) must be qualified by a its namespace◦ When an element (or attribute) doesn’t have a
namespace declaration it’s unqualified◦ Determines how name is used in data (instance)
document A schema has the attributes
elementFormDefault and elementFormDefault◦ Set to qualified or unqualified◦ By setting to qualified, must include a namespace
when use attributes or elements
Qualification
Allows elements to appear in any order or not at all
Rules governing use1. Must be only content model declaration of a
<complexType> definition For example, can’t follow with <sequence>
2. Can only have element declarations as children3. The children of the <all> element may appear
once – or not at all
<all> Declaration
Can create a group of attributes similar to element groups
Allows re-use of common members without multiple definitions◦ Attribute groups cannot be recursive (refer to
themselves
Attribute Groups
A list allows an element or attribute to store multiple values◦ Values are separated by whitespace, so
whitespace cannot be part of the content itemType attribute defines the data type
◦ Can be built-in XML or a defined simpleType data type
<list> Declarations
<union> allows the combination of two data type for an element or attribute
If have a possiblePoints element, expected value would be an integer; <union> would allow a string entry to note a “Missing” value
Separate data types with whitespace
<simpleType name=“CreditValue”><union memberTypes=“xs:integer
xs:string /></simpleType
<union> Declarations
XSL stands for Extensible Stylesheet◦ Stylesheets are used to manage organization and
presentation of data Implemented as an XML language
◦ Rules of XML apply Made up of
◦ XSL-FO (Formatting Objects)◦ XSLT (Transformations)
XSL
XSL-FO focused on presentation to screen and paper◦ Not well-supported by browsers
XSLT emphasizes re-organization of data◦ Typically used for presentation but can also be
used for conversion of data storage format◦ XSLT is a declarative language
Similar to SQL, describe results not steps
XSL Implementation
Cascading Style Sheets used to separate presentation from data
XSLT used to change – transform – data◦ Convert an XML document to XHTML
Can use both together
XSLT v CSS
XSL requires several steps1. XML processor reads document
◦ Creates document tree
2. XSL processor applies rules from stylesheet
◦ Rules applied to document tree◦ Rules applied by using pattern matching
Identify nodes to apply rules to◦ Rules are stored as templates
Using XSL
XSL works by using an Input Tree◦ Input Tree comes from XML processor
Process of changing input values is call Tree Transformation
Result of transformation is the Result Tree◦ Result Tree can include
XML HTML (must adhere to XML rules, i.e., XHTML) Formatting Objects
XSL Process
Extensible Stylesheet Transformations is method of changing (transforming) XML based on rules of a stylesheet
Xpath allows manipulation of parts of XML document◦ Not XML-based◦ Provides compact references
Useful in URI’s, attributes◦ Document must exist as nodes (previously
parsed)
XSLT
Templates are definitions of rules, organization
Patterns define values searching for (where to apply templates)
Expressions allow use of functions using nodes as inputs
When referring to document attributes preface name with “@”
XSLT Constructs
<xsl:stylesheet> is root element◦ Uses namespace to define elements, attributes
valid in a stylesheet <xsl:template> defines the rules/
transformations to apply◦ Match attribute specifies pattern to apply rules to
Functions similar to criteria <xsl:apply-templates> applies the rules
defined for a particular element◦ Select attribute specifies elements to apply to
XSLT Elements
<xsl:value-of> returns the value of a specified node, function◦ Select attribute specifies value source
<xsl:copy> copies a node to the result tree without any child nodes or attributes
<xsl:copy-of> copies a node and child/attribute nodes
<xsl:output> controls the result tree◦ method=“xml|html|text”
XSLT Elements – 2
<xsl:if> provides a boolean test to determine processing
<xsl:choose> offers an IF ... THEN ... ELSE construct
<xsl:for-each> allows each node in a group to be processed
<xsl:sort> specifies order for a group of nodes
XSLT Elements – 3
Match can use ◦ node name◦ current position (represented by “.”)◦ relative position (for example, parent = “..”)
Specifies where the transformation to be applied
Match
XPath provides a logical model for working with XML document
Nodes are used to represent serialized XML (in memory)◦ Not all parts of XML document are represented
(XML declaration, DOCTYPE) XPath used in combination with other tools
(such as XSLT)
XPATH Introduction
Legal XPath code is called an expression◦ XPath expressions that return a node set is a
location path Expressions can be absolute and relative
◦ Absolute path includes a full definition of how to find node
◦ Relative path is based on current context (location)
XPath Expressions
Root node represents document◦ Can have only one child node (document element)
Element node represents elements◦ QName (qualified name) includes namespace prefix and element
name Attribute node represent attributes
◦ Have name and value◦ Are not represented as child nodes
Text node represents text value of an element◦ Does not have a name
Namespace node gives access to the namespace URI and prefix
Comment node Processing Instruction Node
Node Types
Boolean◦ Written as true() and false()
String Number – floating point values Node-set – unordered set
◦ Follows document order
XPath 1.0 Types
Element node references can be spelled out or abbreviated/child::movies/child::movie/child::priceOR/movies/movie/price
◦ child::nodename can also be written nodename Attribute node references
attribute::attributenameOR@attributename
XPath Abbreviations
self child attribute ancestor ancestor-or-self descendant descendent-or-self
following following-sibling namespace parent preceding preceding-sibling
13 Axes
Default axis Selects nodes that are immediate nodes of
context (current) node Can use * to refer to all child nodes
Child Axis
Can use node() to return all child nodes including comments, processing instructions, and text nodes
Can return just text nodes using text()◦ Text nodes are unnamed
Child Axis References
Used to select attributes belonging to a particular element node
To return all attributes◦ attribute::*◦ @*
To return particular attribute◦ attribute::attributename◦ @attributename
Attribute Axis
Used to filter node sets◦ Predicate similar to query criteria
Can use specific values or location references
Predicates