Download - CIS-189 Final Review

Final Exam ReviewCIS-189

“Markup” refers to the use of tags to describe data◦ Data describing data is meta data◦ Tags identify where data begins and ends, and

has some information about that data◦ Often referred to as “self describing”

Standard Generalized Markup Language was created to offer universal standards for sharing and moving information

Markup

Extensible Markup Language fills the gap between display of HTML and complexity of SGML◦ XML is compatible with rules of SGML

XML isn’t a language◦ Set of standards about how to create a language

to define and work with particular data

XML

Tags are used similar to HTML◦ A tag must always have a close

<name>Randy</name><middle />

Tags are defined as needed◦ No set of predefined tags as in HTML◦ Tags typically aren’t about display

Display is separated from data, unlike HTML

Using XML

XML is hierarchical◦ Individual items in XML are elements◦ One element can belong to another

Child and parent Similar to a one-to-many relationship

Structure is called a ‘tree’ ◦ An item with children is called a branch◦ An item with no children is a leaf

XML Structure

An element can contain data An element can contain other elements An element can contain data and other

elements Definition of elements for specific data

make up a vocabulary

Elements

Complies with rules ◦ Rules allow easy transfer and read of data

independent of platform, application Parser reads XML file

◦ Parser typically runs as a service to another application

◦ A file that doesn’t comply with rules has a fatal error and parser cannot continue

◦ By definition, a file that isn’t well formed has a fatal error Any violation of rules is an error

Well-Formed XML

Start tag must have an end tag or be self-closing

Tags cannot overlap Must have one – and only one – root

element Element names obey naming conventions XML is case-sensitive Whitespace is maintained in PCDATA

Well Formed

Element name can include a space after the name

Element cannot have a space at beginning of start tag

Element name must begin with letters or dash◦ After the first character, numbers, hyphens, periods

are acceptable Cannot use spaces or : (colon) in names

◦ Colon is reserved for special uses XML cannot be used as the first 3 letters of a

name (Upper, lower or mixed case)

Naming Elements

CDATA refers to character data◦ Values that are treated as those characters

PCDATA refers to parsed character data◦ Values that are translated for a specific meaning

or purpose Whitespace is treated differently than HTML

◦ Maintained◦ Carriage return and linefeed characters are both

treated as single linefeed by parser

Working with Data

Provide another way to represent values Defined within the start tag of an element Work in a name/value pair

◦ Must include both a name and value for a valid statement (an empty string is a valid value)

◦ Value must be enclosed in single or double quotes Opening quote must be same as closing (can’t pair a

single quote and double quote) Be consistent for ease of coding, reading, and

maintenance

Attributes

Attribute names must conform to same rules as element names◦ Start with letter or dash◦ Can use numbers, hyphens, periods after the first

character◦ Name of each attribute must be unique within an

element

Attribute Names

Elements can be more complex◦ Can include child elements if needed◦ Attributes are about a single value

Attributes can simplify logic◦ Can avoid or reduce nesting◦ Can simplify logic

Choice of element or attribute most often simply a design choice, preference

Using Attributes and Elements

Provide information to aid the processing of the file<? XML version=“1.0”?>

If include XML declaration must be first entry◦ Cannot have any character preceding the open

tag If include XML declaration must have at

least the version◦ Have versions 1.0, 1.1

XML Declarations

Optional settings are encoding, standalone◦ Encoding specifies which character set is being

used (how characters are represented)◦ Standalone tells the parser if document is

complete by itself, or relies on another file

Optional XML Declarations

Processing instructions are for consuming application◦ Not used by XML parser◦ Includes information/commands that application

needs to complete some task

<? Statement ?>

Processing Instructions

Some symbols have special meaning◦ Less than (<)◦ Greater than (>)◦ Ampersand (&)

Cannot use these characters directly unless wrapped in a CDATA section

If need single symbol can substitute◦ &lt for <, &gt for >, &amp for &

Special Characters

DTD stands for Document Type Definition Allows an XML document to go further than

meeting the requirements of being well-formed

Specifies requirements to be valid◦ A valid XML document matches definitions of

allowable elements, attributes

DTD Overview

Validation can be done in code (i.e. using javascript, VB and DOM)

DTD’s allow use of a validating parser that compares the document against specifications◦ Typically makes application changes and

maintenance easier◦ Less tied to a particular programming

language/environment

Validation

Includes name of root element Allows specification of where the DTD is

located◦ DTD can be embedded in the XML file (local)◦ DTD can refer to external file, Uniform Resource

Identifier (URI)◦ Local takes precedence over external

Document Type Declaration

Element Declaration has 3 parts:◦ Declaration◦ Element name◦ Element content

Element content can include a list of child elements or data

Element Declaration

DTD included in XML document Definition of a student:<!DOCTYPE student[

<!ELEMENT student(first, last, studentID)><!ELEMENT first (#PCDATA)><!ELEMENT last(#PCDATA)><ELEMENT studentID(#PCDATA)>

]>

Local DTD

Element Declaration

Document Type Declaration

A student element is made up of first name, last name, and student id elements

DTD exists in external file/location Must use keyword to specify type of location

◦ SYSTEM is a reference to local file system◦ PUBLIC is reference to DTD accessed through a

catalog Can use both together

◦ If can’t find catalog reference can use specified file

External Definition

Reference in XML file:<!DOCTYPE student SYSTEM

“student.dtd”> External file:<!ELEMENT student(first, last, studentID)>

<!ELEMENT first (#PCDATA)><!ELEMENT last(#PCDATA)><ELEMENT studentID(#PCDATA)>

]>

Sample External DefinitionDocument Type Declaration

Element Declaration

Element name must match name in XML document◦ If using namespaces, prefixes must match

Content Model defines what the element can store◦ An element◦ Mixed (i.e. data and element)◦ Empty◦ Any

Working With Elements

Error raised if an element is missing Error raised if there are extra elements Error raised if elements in a different order

For a student, our content must be in firstname, lastname, studentID order◦ If find an element “major”, error◦ If order varies, error◦ If missing first, last, or studentID, error

Content by Sequence

Can allow content to vary between elements◦ | (vertical bar or pipe) indicates OR

If add a Grade element to a student that can be a letter or percent:

<!ELEMENT grade (letter | percent)><!ELEMENT letter (#PCDATA)><!ELEMENT percent (#PCDATA)

Indicates that must have letter or percent element

Content by Choice

Allows combination of elements and parsed character data◦ Can include additional information within an

element, eg. how to display Rules:1. Managed by using Choice (or)2. PCDATA must appear first in list of elements3. List cannot include inner content model (only

simple elements)4. If there are child elements, include *

◦ * Indicates that may appear zero or more times

Mixed Content

If want to include emphasis with the letter grade Data: <letter><em>4</em></letter> Declaration:

<!ELEMENT letter (#PCDATA | em)*>

Describes a letter element as the content (pcdata) plus emphasis element

Mixed Content -2

An element can be empty◦ <br /> (never has child, content)

Declaration includes EMPTY:

<!ELEMENT br EMPTY>

Means that the element CANNOT contain content

Empty Content

An element can contain any kind of value (or be empty)◦ Any elements declared in the DTD can occur, any

number of times Only elements that are part of the DTD can be part

of the document!◦ May be empty◦ May contain PCDATA

Least restrictive model

Any Content

How many times can an element occur? How many times must an element occur?

Cardinality

Indicator Purpose

(none) No indicator means an element must appear once – and ONLY once

? An element may appear once or not at all

+ An element may appear once or more times

* An element may appear zero or more times

Elements tend to be used to describe a logical unit of information

Attributes are typically used to store data about characteristics (properties)

May have a Movie element with attributes for Title, Rental Price, Rental Days

No specific rules about how to use elements and attributes

Attributes and DTD’s

Attributes allow more limits on data◦ Can have a list of acceptable values◦ Can have a default value◦ Some ability to specify a data type◦ Concise, about a single name/value pair

Attributes have limits◦ Can’t store long strings of text◦ Can’t nest values◦ Whitespace can’t be ignored

Attributes and Elements

Declaration:<!ATTLIST ElementName AttrName AttrType Default>

Specify the Element the attribute belongs to Specify the Name of the attribute Specify the Type of data the attribute stores Specify characteristics of the values

(Default or attribute value)◦ List either the default value or other characteristic

of value – required, optional

Specifying Attributes

CDATA – unparsed character data Enumerated – series/list of string values Entity/Entities – reference entity definition(s) ID – unique identifier for the element IDREF – refer to the ID of another element IDREFS – list of ID’s of other elements

separated by whitespace NMTOKEN/NMTOKENS – value(s) of attribute

can be anything that follows rules for XML name

Sample Attribute Data Types

Specifies that attribute value must be found in a particular list

Each value in list must be valid XML name◦ Limits on spaces, characters

Use | (pipe) to separate members of list If specifying list letter grades for a student:

<!ATTLIST student grade (A | B | C | D | F | V | W | I) #IMPLIED>

Enumerated Attributes

ElementAttribute Enumerated List

An ID specifies that the element must have a unique value within the document◦ Allows reliable way to refer to a specific element◦ No spaces allowed in value

Typically replace space with underscore◦ Attribute list can include only one ID

IDREF, IDREFS allows an element to be associated with another or multiple other elements

A student element must have a student ID:

<!ATTLIST student studentID ID #REQUIRED>

ID, IDREF, IDREFS

Attributes can refer to entities◦ “Entity” refers to substituting a reference for a text

value & refers to the & character

◦ Unparsed Entity is a reference that isn’t parsed◦ Can reuse references for long values, or hard to

manage characters (i.e. tab, line feed) Entity must be declared in the DTD

<!ENTITY classTitle “XML”>◦ When classTitle found in document, replaced with

XML

Entities and Attributes

Can specify how the value will appear in the document◦ Must always specify a value declaration

DEFAULT sets a value for an attribute if a value isn’t provided◦ Include default value in double quotes

FIXED sets a value that must occur; if an attribute has a different value, a validation error occurs

REQUIRED specifies that the attribute (and value) must exist

IMPLIED means the attribute is optional

Attribute Value Declarations

Alternative to DTD’s as way to define structure◦ Essentially defining a language◦ Structure may be also referred to as vocabulary

Ensures that data matches specifications Serves as basis for other XML-related

technologies

XML Schemas

Use XML for definition◦ Doesn’t have separate structure like DTD’s◦ Schema must be well-formed

Support Namespace recommendations◦ Allows same name to be used in different

Schemas and properly understood Provides for built-in and user-defined data

types Can be easily reused Supports concepts such as inheritance

◦ One object is based on another

Working with Schemas

Allows more specificity than DTD’s◦ Can specify dates, numbers, ranges

Datatypes fall into two categories:◦ Simple deals with basic values◦ Complex describes more intricate values or

structures

Schema Datatypes

Schema file uses an .xsd extension Root element is the schema

◦ Can nest all elements within the schema Everything is hierarchical

OR◦ Can have multiple elements as child elements of

the schema root Allows use of a definition any place in the document

(data) file Elements which are child elements of schema are

global

Creating Schemas

Simple data type is about text, numbers, date◦ Sometime referred to as “primitives”

Data types built in to Schema vocabulary (and related elements, attributes) are in the XML Schema namespace◦ Need reference to namespace to have valid XML –

where to find the definition Elements that are Simple Datatypes don’t

have attributes◦ Including an attribute makes an element Complex

Simple Datatypes

The simpleType allows customization of base types

Can create limits on values◦ Specify ranges◦ Specify lists

<xsd:simpleType=“Degrees”><xsd:restriction base=“xsd:string”>

<xsd:enumeration value=“AA” /><xsd:enumeration value=“AS” />

</xsd: restriction></xsd:simpleType>

Defining (Simple) Datatypes

Allows combination of different elements and specification of order, new data types

Can create an element Course which is comprised of simple types

<xsd:element name=“course”><xsd:complexType>

<xsd:sequence><xsd:element name=“department” type=“xsd:string”/><xsd:element name=“number” type=“xsd:string”/><xsd:element name=“title” type=“xsd:string”/><xsd:element name=“credits” type=“xsd:integer”/></xsd:sequence>

</xsd:complexType></xsd:element>

Complex Datatypes

When using a schema, need to create a reference from data file

Use either the schemaLocation or noNamespaceSchemaLocation attribute of the root element

<course xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance xsi:noNamespaceShemaLocation=“course.xsd”>

Using a Schema

Qualification refers to whether a value (element, attribute) must be qualified by a its namespace◦ When an element (or attribute) doesn’t have a

namespace declaration it’s unqualified◦ Determines how name is used in data (instance)

document A schema has the attributes

elementFormDefault and elementFormDefault◦ Set to qualified or unqualified◦ By setting to qualified, must include a namespace

when use attributes or elements

Qualification

Allows elements to appear in any order or not at all

Rules governing use1. Must be only content model declaration of a

<complexType> definition For example, can’t follow with <sequence>

2. Can only have element declarations as children3. The children of the <all> element may appear

once – or not at all

<all> Declaration

Can create a group of attributes similar to element groups

Allows re-use of common members without multiple definitions◦ Attribute groups cannot be recursive (refer to

themselves

Attribute Groups

A list allows an element or attribute to store multiple values◦ Values are separated by whitespace, so

whitespace cannot be part of the content itemType attribute defines the data type

◦ Can be built-in XML or a defined simpleType data type

<list> Declarations

<union> allows the combination of two data type for an element or attribute

If have a possiblePoints element, expected value would be an integer; <union> would allow a string entry to note a “Missing” value

Separate data types with whitespace

<simpleType name=“CreditValue”><union memberTypes=“xs:integer

xs:string /></simpleType

<union> Declarations

XSL stands for Extensible Stylesheet◦ Stylesheets are used to manage organization and

presentation of data Implemented as an XML language

◦ Rules of XML apply Made up of

◦ XSL-FO (Formatting Objects)◦ XSLT (Transformations)

XSL

XSL-FO focused on presentation to screen and paper◦ Not well-supported by browsers

XSLT emphasizes re-organization of data◦ Typically used for presentation but can also be

used for conversion of data storage format◦ XSLT is a declarative language

Similar to SQL, describe results not steps

XSL Implementation

Cascading Style Sheets used to separate presentation from data

XSLT used to change – transform – data◦ Convert an XML document to XHTML

Can use both together

XSLT v CSS

XSL requires several steps1. XML processor reads document

◦ Creates document tree

2. XSL processor applies rules from stylesheet

◦ Rules applied to document tree◦ Rules applied by using pattern matching

Identify nodes to apply rules to◦ Rules are stored as templates

Using XSL

XSL works by using an Input Tree◦ Input Tree comes from XML processor

Process of changing input values is call Tree Transformation

Result of transformation is the Result Tree◦ Result Tree can include

XML HTML (must adhere to XML rules, i.e., XHTML) Formatting Objects

XSL Process

Extensible Stylesheet Transformations is method of changing (transforming) XML based on rules of a stylesheet

Xpath allows manipulation of parts of XML document◦ Not XML-based◦ Provides compact references

Useful in URI’s, attributes◦ Document must exist as nodes (previously

parsed)

XSLT

Templates are definitions of rules, organization

Patterns define values searching for (where to apply templates)

Expressions allow use of functions using nodes as inputs

When referring to document attributes preface name with “@”

XSLT Constructs

<xsl:stylesheet> is root element◦ Uses namespace to define elements, attributes

valid in a stylesheet <xsl:template> defines the rules/

transformations to apply◦ Match attribute specifies pattern to apply rules to

Functions similar to criteria <xsl:apply-templates> applies the rules

defined for a particular element◦ Select attribute specifies elements to apply to

XSLT Elements

<xsl:value-of> returns the value of a specified node, function◦ Select attribute specifies value source

<xsl:copy> copies a node to the result tree without any child nodes or attributes

<xsl:copy-of> copies a node and child/attribute nodes

<xsl:output> controls the result tree◦ method=“xml|html|text”

XSLT Elements – 2

<xsl:if> provides a boolean test to determine processing

<xsl:choose> offers an IF ... THEN ... ELSE construct

<xsl:for-each> allows each node in a group to be processed

<xsl:sort> specifies order for a group of nodes

XSLT Elements – 3

Match can use ◦ node name◦ current position (represented by “.”)◦ relative position (for example, parent = “..”)

Specifies where the transformation to be applied

Match

XPath provides a logical model for working with XML document

Nodes are used to represent serialized XML (in memory)◦ Not all parts of XML document are represented

(XML declaration, DOCTYPE) XPath used in combination with other tools

(such as XSLT)

XPATH Introduction

Legal XPath code is called an expression◦ XPath expressions that return a node set is a

location path Expressions can be absolute and relative

◦ Absolute path includes a full definition of how to find node

◦ Relative path is based on current context (location)

XPath Expressions

Root node represents document◦ Can have only one child node (document element)

Element node represents elements◦ QName (qualified name) includes namespace prefix and element

name Attribute node represent attributes

◦ Have name and value◦ Are not represented as child nodes

Text node represents text value of an element◦ Does not have a name

Namespace node gives access to the namespace URI and prefix

Comment node Processing Instruction Node

Node Types

Boolean◦ Written as true() and false()

String Number – floating point values Node-set – unordered set

◦ Follows document order

XPath 1.0 Types

Element node references can be spelled out or abbreviated/child::movies/child::movie/child::priceOR/movies/movie/price

◦ child::nodename can also be written nodename Attribute node references

attribute::attributenameOR@attributename

XPath Abbreviations

self child attribute ancestor ancestor-or-self descendant descendent-or-self

following following-sibling namespace parent preceding preceding-sibling

13 Axes

Default axis Selects nodes that are immediate nodes of

context (current) node Can use * to refer to all child nodes

Child Axis

Can use node() to return all child nodes including comments, processing instructions, and text nodes

Can return just text nodes using text()◦ Text nodes are unnamed

Child Axis References

Used to select attributes belonging to a particular element node

To return all attributes◦ attribute::*◦ @*

To return particular attribute◦ attribute::attributename◦ @attributename

Attribute Axis

Used to filter node sets◦ Predicate similar to query criteria

Can use specific values or location references

Predicates