+ All Categories
Home > Documents > COMS E6125 Web-enHanced Information Management (WHIM)

COMS E6125 Web-enHanced Information Management (WHIM)

Date post: 19-Jan-2016
Category:
Upload: menefer
View: 31 times
Download: 0 times
Share this document with a friend
Description:
COMS E6125 Web-enHanced Information Management (WHIM). Prof. Gail Kaiser Spring 2008. Today’s Topics:. Document Structure Definition Document Type Definition (DTD) XML Schema (XSD) Querying XML Documents NOT the same as Web search engines! XPath XQuery. A. < A > < B > foo - PowerPoint PPT Presentation
Popular Tags:
105
12 February 2008 Kaiser: COMS E6125 1 COMS E6125 Web-enHanced COMS E6125 Web-enHanced Information Management Information Management (WHIM) (WHIM) Prof. Gail Kaiser Prof. Gail Kaiser Spring 2008 Spring 2008
Transcript
Page 1: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 1

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

Prof. Gail KaiserProf. Gail Kaiser

Spring 2008Spring 2008

Page 2: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 2

Today’s Topics:

• Document Structure Definition– Document Type Definition (DTD)– XML Schema (XSD)

• Querying XML Documents– NOT the same as Web search engines!– XPath– XQuery

Page 3: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 3

Pure XML - Instance Model

<A> <B>foo</B> <C>bar</C> <C>psl</C></A>

A

B C

"foo" "bar"

C:"bar"

A:

B: "foo"

C:"psl"

"psl"

C

children are ordered

• XML 1.0 implicit data model: – nested containers ("boxes within boxes")– labeled ordered trees (= semistructured data

model)– Relational or object-oriented easy to encode

Page 4: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 4

XML NamespacesAllows mixing of different tag

vocabularies

• Only identifies the vocabulary (lexicon)

• Additional mechanisms required for structure and meaning (or at least metadata) of tags

Page 5: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 5

From Documents to Data

• We want to be able to – Extract the element

structure of a document

– Re-use this structure for other similar documents

– Share structure and metadata with others

– Automate processing of this structure and metadata

<invoice> <orderDate>2007-12-01</orderDate> <shipDate>2007-12-26</shipDate><billingAddress> <name>Gail Kaiser</name> <street>500 West 120th Street</street> <city>New York</city> <state>NY</state> <zip>10027</zip> </billingAddress> <voice>212-555-1234</voice> <fax>212-555-4321</fax> </invoice>

<memo importance='high' date=‘2008-02-11'>

<from>Gail Kaiser</from> <to>Swapneel Sheth</to>

<subject>whim tomorrow</subject>

<body>Remember to pick up the sign-in sheet after class tomorrow

</body>

</memo>

Page 6: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 6

Adding Structure and Semantics

• A Document Structure Description (DSD) defines the syntax of XML documents for a particular application domain

• Defines the grammar for an XML-based markup language

Page 7: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 7

Processing XML• Non-validating parser:

– checks that XML doc is syntactically well-formed, e.g., all open-tags have matching close-tags and they are properly nested, attributes only appear once in an element, etc.

• Validating parser:– checks that XML doc is also valid wrt a

given DSD (now usually XML Schema)

Page 8: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 8

Using DSD Validators

•A DSD processor can be useful both on the server side (when writing XML documents) and on the client side (when processing XML documents): – Checking validity (conformance) of XML documents

– Performing default insertion (inserts missing fragments)

Page 9: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 9

DSD Processing

Page 10: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 10

Several Proposed DSDs• XML Document Type Definitions (DTDs):

– Define the structure of “allowed” documents

Database schema– Non-XML syntax

• XML Schemas (XSDs)– Define structure and data types – Allows developers to build their own libraries of

interchange-able data types– Written in an XML vocabulary

• Others (e.g., RELAX NG, Schematron)

Page 11: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 11

Document Type Definitions

• A DTD is a grammar defining XML structure – XML document specifies an

associated DTD, plus the root element

– DTD specifies children of the root element, their children, and so on

Page 12: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 12

Example DTD<!ELEMENT bib (book *)><!ELEMENT book (thesis | article)><!ELEMENT thesis (title, author, year, school,

committeemember*)><!ATTLIST thesis

date CDATA #REQUIREDkey ID #REQUIREDadvisor CDATA #IMPLIEDidref IDREF>

<!ELEMENT article (title, (author+ | editor+), publisher)>

<!ELEMENT title (#PCDATA)><!ELEMENT author (name)><!ATTLIST author id ID #REQUIRED>. . .

Page 13: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 13

DTD Interpretation

• CDATA “Character Data”, a sequence of characters

• #PCDATA “Parsed Character Data”, text and character entities (e.g., &amp; -> &, &eacute; -> acute e)

• ID unique• IDREF reference to entity• #IMPLIED A default

value must be supplied by the processor.

• ( ... ) Specifies a group. • A | B Both A and B are

permitted in any order. • A , B A must occur before

B. • A & B A and B must both

occur once, but may do so in any order.

• A? A can occur zero or one times

• A* A can occur zero or more times

• A+ A can occur one or more times

Page 14: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 14

DTD Defines Special Significance for Attributes

• IDs – special attributes that are analogous to relational database keys (globally unique IDs for elements)

• IDREF – reference to an ID• IDREFS – a list of IDREFs

Page 15: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 15

Instance Visualization as a Graph

<?xml version="1.0"?>

<!DOCTYPE bib SYSTEM “http://webserver/bib.dtd">

<bib>

<author id="author1"> <name>John Smith</name>

</author>

<article>

<author idref="author1" />

<title>Paper1</title>

</article>

<article>

<author idref="author1" />

<title>Paper2</title>

</article>

. . .

Page 16: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 16

Graph Data ModelRoot

!DOCTYPE

bib

authorarticle

nametitle

idrefidref

John Smith

author1author1

Paper2

?xml

article

id

author1

author authortitle

Paper1

Page 17: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 17

Drawbacks of DTDs

• Not themselves XML - additional effort to build tools

• No support for data types - cannot do data validation

• No support for OO-like structures (e.g, inheritance)

• Horrible syntax

Page 18: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 18

Several Proposed DSDs• XML Document Type Definitions (DTDs):

– Define the structure of “allowed” documents

Database schema– Non-XML syntax

• XML Schemas (XSDs)– Defines structure and data types – Allows developers to build their own libraries of

interchange-able data types– Written in an XML vocabulary

• Others (e.g., RELAX NG, Schematron)

Page 19: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 19

XML Schema Design Principles

1. More expressive than DTDs (which came from SGML, although modified slightly in XML 1.0)

2. Notation is itself an XML vocabulary3. Self-describing 4. Usable by a wide variety of applications that

employ XML 5. Straightforwardly usable on the Internet6. Optimized for interoperability7. Simple enough to be implemented with modest

design and runtime resources8. Coordinated with relevant W3C specs

Page 20: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 20

Purpose of an XML Schema

•Defines a class of XML instances•Neither instances nor schemas need

exist as documents, per se, may exist as:–Byte stream sent between applications–Fields in a database record–Collection of XML “infoset” information items

Page 21: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 21

What is an XML “infoset”?

• XML Information Set, 2nd edition, W3C Recommendation February 2004

• For use by other specs that need to refer to the information in a well-formed XML document [or PSVI = post schema validated infoset]

• Defines abstract data set generated by parser or by other means, conceptually tree of items each with several properties

Page 22: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 22

(Some) Information Items

• Document (root of infoset) – properties include base URI, XML version, character encoding, etc.

• One root element - and its children• Attributes of elements• Namespace scoping for elements• Processing instructions• Unexpanded entities (processor may or

may not expand all entities)

Page 23: COMS E6125 Web-enHanced Information Management (WHIM)

Example Instance Document<?xml version="1.0"?> <purchaseOrder orderDate=“2007-10-20"> <shipTo country="US"> <name>Alice Smith</name> <street>123 Maple Street</street> <city>Mill Valley</city> <state>CA</state> <zip>90952</zip> </shipTo> <billTo country="US"> <name>Robert Smith</name> <street>8 Oak Avenue</street> <city>Old Town</city> <state>PA</state> <zip>95819</zip> </billTo> <comment>Hurry, my lawn is going wild!</comment> <items> <item partNum="872-AA"> <productName>Lawnmower</productName> <quantity>1</quantity> <USPrice>148.95</USPrice> <comment>Confirm this is electric</comment> </item> <item partNum="926-AA"> . . . </item> </items> </purchaseOrder>

file

po.

xml

Page 24: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 24

Where is the Schema?• The instance document may reference a

schema explicitly, or a processor may obtain a schema separately without reference from the instance

• Schema defines elements and attributes, and their complex and simple types

• Determines the appearance of elements and their content in instance documents

Page 25: COMS E6125 Web-enHanced Information Management (WHIM)

Example Schema<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:annotation> . . . </xsd:annotation> <xsd:element name="purchaseOrder" type="PurchaseOrderType"/> <xsd:element name="comment" type="xsd:string"/> <xsd:complexType name="PurchaseOrderType"> . . . </xsd:complexType></xsd:schema>

• The schema consists of a schema element and various subelements, e.g., element, complexType

• The prefix xsd: associates names with the XML Schema namespace specified in the xmlns:xsd declaration

• Same prefix, and hence same association, also appears on names of built-in types, e.g., xsd:string

• Identifies elements and simple types as belonging to XML Schema language vocabulary rather than vocabulary of schema author

file

po.

xsd

Page 26: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 26

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:annotation> . . . </xsd:annotation> <xsd:element name="purchaseOrder" type="PurchaseOrderType"/> <xsd:element name="comment" type="xsd:string"/> <xsd:complexType name="PurchaseOrderType"> . . . </xsd:complexType></xsd:schema>

Example Schema

• An annotation element may appear at the beginning of most schema constructions

• Contains two subelements– Documentation: Human readable material– appInfo: For tools and applications

file

po.

xsd

Page 27: COMS E6125 Web-enHanced Information Management (WHIM)

Complex Type Definitions

<xsd:complexType name="USAddress"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="state" type="xsd:string"/> <xsd:element name="zip" type="xsd:decimal"/> </xsd:sequence> <xsd:attribute name="country" type="xsd:NMTOKEN"

fixed="US"/> </xsd:complexType>

• New complex types are defined using the complexType element; it contains element declarations, attribute declarations and element references

• This example says elements of type USAddress must have– 5 subelements that must be called name, street, city, state and zip (in

this order), each having the corresponding type declared above– 1 attribute called country may appear with the element; NMTOKEN

represents an atomic indivisible value• All element declarations within USAddress involve simple types

Page 28: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 28

Complex Type Definitions

<xsd:complexType name="USAddress"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> <xsd:element name="street" type="xsd:string"/> <xsd:element name="city" type="xsd:string"/> <xsd:element name="state" type="xsd:string"/> <xsd:element name="zip" type="xsd:decimal"/> </xsd:sequence> <xsd:attribute name="country" type="xsd:NMTOKEN"

fixed="US"/> </xsd:complexType>

• An attribute may be specified as fixed or default.• Default attribute values apply when attributes are missing.• For fixed attributes, if a value appears, it must be the value

declared with a fixed value. • The schema processor will provide the value for missing

attributes.

Page 29: COMS E6125 Web-enHanced Information Management (WHIM)

Complex Type Definitions

<xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType>

• A declaration may reference an existing element, e.g., comment; the value of the ref attribute must reference a global element (i.e., declared under schema)

• Every element of type PurchaseOrderType must consist of subelements shipTo and billTo, each containing the five subelements declared as part of USAddress, items and (optionally) comment; it may have one attribute called orderDate

Page 30: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 30

Complex Type Definitions

<xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType>

• Occurrence constraint may specify minoccurs and/or maxoccurs

Page 31: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 31

Complex Type Definitions

<xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType>

• Attributes may appear once or not at all (the default), but no more than once

• use may be specified as optional, required, or prohibited

Page 32: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 32

Simple Built-in Types• string, normalizedString,

token• byte, unsignedByte• integer, positiveInteger, etc• long, short, etc• decimal, float, double• boolean• time, dateTime, duration,

date, etc• anyURI• etc

• ID• IDREF, IDREFS• ENTITY, ENTITIES• NMTOKEN, NMTOKENS

• The types in this column should only be used in attributes (to retain compatibility with XML 1.0 DTDs)

Page 33: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 33

Simple Derived Types

• The simpleType element is used to define and name a new simple type

• The restriction element indicates the base type and identifies the “facets” that constrain the range of values (here minInclusive and maxInclusive)

<xsd:simpleType name="myInteger"> <xsd:restriction base="xsd:integer"> <xsd:minInclusive value="10000"/> <xsd:maxInclusive value="99999"/> </xsd:restriction></xsd:simpleType>

Page 34: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 34

Simple Derived Types (pattern facet)

<!-- Stock Keeping Unit, a code for identifying products -->

<xsd:simpleType name="SKU"> <xsd:restriction base="xsd:string"> <xsd:pattern value="\d{3}-[A-Z]{2}"/> </xsd:restriction> </xsd:simpleType>

• Constrain the values of SKU using the pattern facet in conjunction with the regular expression "\d{3}-[A-Z]{2}“ (3 digits followed by a hyphen followed by 2 upper-case ASCII letters)

Page 35: COMS E6125 Web-enHanced Information Management (WHIM)

Simple Derived Types (enumeration facet)

• The enumeration facet limits a simple type to a set of distinct values

• Enables a better definition of USAddress type

<xsd:simpleType name="USState"> <xsd:restriction base="xsd:string"> <xsd:enumeration value="AK"/> <xsd:enumeration value="AL"/> <xsd:enumeration value="AR"/> <!-- and so on ... --> </xsd:restriction></xsd:simpleType>

<xsd:complexType name="USAddress"> . . . <xsd:element name="state" type="USState"/> . . .</xsd:complexType

Page 36: COMS E6125 Web-enHanced Information Management (WHIM)

<xsd:complexType name="Items"> <xsd:sequence> <xsd:element name="item" minOccurs="0"

maxOccurs="unbounded"> <xsd:complexType> <xsd:sequence> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="quantity"> <xsd:simpleType> <xsd:restriction base="xsd:positiveInteger"> <xsd:maxExclusive value="100"/> </xsd:restriction> </xsd:simpleType> </xsd:element> <xsd:element name="USPrice" type="xsd:decimal"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="shipDate" type="xsd:date"

minOccurs="0"/> </xsd:sequence> <xsd:attribute name="partNum" type="SKU" use="required"/> </xsd:complexType> </xsd:element> </xsd:sequence> </xsd:complexType>

Anonymous Type Definitions

Page 37: COMS E6125 Web-enHanced Information Management (WHIM)

Recap Example Schema

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <xsd:annotation> . . . </xsd:annotation> <xsd:element name="purchaseOrder" type="PurchaseOrderType"/> <xsd:element name="comment" type="xsd:string"/>

<xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/> </xsd:complexType>

<xsd:complexType name="USAddress"> . . . </xsd:complexType>

<xsd:complexType name="Items"> . . . </xsd:complexType></xsd:schema>

file

po.

xsd

Page 38: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 38

XML Schema Data Types

• Complex types• Built-in simple types• Derived simple types• Also derived complex types, lists and

unions of simple types

Define structure – what about the content?

Page 39: COMS E6125 Web-enHanced Information Management (WHIM)

<xsd:element name="internationalPrice"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:decimal"> <xsd:attribute name="currency“ type="xsd:string"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element>

Element Content: Simple content

• Declare an element that has an attribute and contains a simple value

<internationalPrice currency="EUR">423.46</internationalPrice>

Page 40: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 40

Element Content:Empty content

• Declare an element with attributes only - no content at all

<xsd:element name="internationalPrice"> <xsd:complexType> <xsd:attribute name="currency" type="xsd:string"/> <xsd:attribute name="value" type="xsd:decimal"/> </xsd:complexType></xsd:element>

<internationalPrice currency="EUR" value="423.46"/>

Page 41: COMS E6125 Web-enHanced Information Management (WHIM)

Element Content: Entire element omitted

• The absence of an element does not carry any particular meaning; it could be– Information unknown– Information not applicable– I just forgot to enter the information

• Absence does/should not imply some value like zero, empty string, empty list, etc.

• Database systems faced with similar problems have introduce “null” values

• XML does not provide a null value representation that actually appears in element content; instead, there is an attribute to indicate content is nil

<xsd:element name="shipDate" type="shipDateType" nillable="true">

<shipDate xsi:nil="true"></shipDate>

Page 42: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 42

Element Content:Mixed content

• Text appears between the elements salutation, quantity, productName, and shipDate (all children of letterBody)

• To allow this, the mixed attribute of the parent’s complexType must be set to true

<letterBody><salutation>Dear Mr.<name>Robert

Smith</name>.</salutation>Your order of <quantity>1</quantity> <productName>BabyMonitor</productName> shipped from our warehouse on<shipDate>1999-05-21</shipDate>. ....</letterBody>

Page 43: COMS E6125 Web-enHanced Information Management (WHIM)

Element Content: Mixed content

<xsd:element name="letterBody"> <xsd:complexType mixed="true"> <xsd:sequence> <xsd:element name="salutation"> <xsd:complexType mixed="true"> <xsd:sequence> <xsd:element name="name" type="xsd:string"/> </xsd:sequence> </xsd:complexType> </xsd:element> <xsd:element name="quantity"

type="xsd:positiveInteger"/> <xsd:element name="productName" type="xsd:string"/> <xsd:element name="shipDate" type="xsd:date"

minOccurs="0"/> <!-- etc. --> </xsd:sequence> </xsd:complexType></xsd:element>

• The order and number of child elements appearing in an instance must agree with order/number of child elements specified in the content model

Page 44: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 44

Element Content:anyType

• The anyType type does not constrain its content in any way

• When no type is defined, anyType is the default, so could be written as

<xsd:element name="anything" type="anyType"/>

<xsd:element name="anything"/>

Page 45: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 45

Grouping Content Elements

– group & sequence• group – groups elements so that they can be

used as a unit to build up types• sequence grouping (default) – elements in

instance doc must appear in the listed order

<xsd:group name="shipAndBill"> <xsd:sequence> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> </xsd:sequence></xsd:group>

Page 46: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 46

Content Groups - choice

• choice grouping – only one element appears in an instance

<xsd:complexType name="PurchaseOrderType"> <xsd:sequence> <xsd:choice> <xsd:group ref="shipAndBill"/> <xsd:element name="singleUSAddress"

type="USAddress"/> </xsd:choice> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:sequence> <xsd:attribute name="orderDate" type="xsd:date"/></xsd:complexType>

Page 47: COMS E6125 Web-enHanced Information Management (WHIM)

Content Groups - all• all grouping – elements may appear in any

order, each element appears zero or one times• An all group must appear as the sole child at the

top of a content model

<xsd:complexType name="PurchaseOrderType"> <xsd:all> <xsd:element name="shipTo" type="USAddress"/> <xsd:element name="billTo" type="USAddress"/> <xsd:element ref="comment" minOccurs="0"/> <xsd:element name="items" type="Items"/> </xsd:all> <xsd:attribute name="orderDate" type="xsd:date"/></xsd:complexType>

Page 48: COMS E6125 Web-enHanced Information Management (WHIM)

Attribute Grouping• We can create a named attribute group

containing all the desired attributes and reference this group by name in an element

<xsd:element name="Item"> </xsd:complexType> . . . <xsd:attribute name="partNum" type="SKU" use="required"/> <xsd:attribute name="weightKg" type="xsd:decimal"/> <xsd:attribute name="shipBy"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="air"/> <xsd:enumeration value="land"/> <xsd:enumeration value="any"/> </xsd:restriction> </xsd:simpleType> </xsd:attribute> </xsd:complexType></xsd:element>

Page 49: COMS E6125 Web-enHanced Information Management (WHIM)

Attribute Groups<xsd:element name="Item"> </xsd:complexType> . . . <xsd:attributeGroup ref="ItemDelivery"/> </xsd:complexType></xsd:element>

<xsd:attributeGroup name="ItemDelivery"> <xsd:attribute name="partNum" type="SKU"

use="required"/> <xsd:attribute name="weightKg"

type="xsd:decimal"/> <xsd:attribute name="shipBy"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:enumeration value="air"/> <xsd:enumeration value="land"/> <xsd:enumeration value="any"/> </xsd:restriction> </xsd:simpleType> </xsd:attribute></xsd:attributeGroup>

Page 50: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 50

Target Namespaces• Tired of repeating the prefix xsd: ?• We could make the XMLSchema namespace

the default namespace (so no more xsd prefixes) but then we would have to prefix the locally defined types and locally declared elements and attributes

• The solution is Target Namespaces• Target namespaces enable distinguishing

between definitions and declarations from different vocabularies

Page 51: COMS E6125 Web-enHanced Information Management (WHIM)

Target Namespace Example

<schema targetNamespace="http://www.example.com/PO" xmlns="http://www.w3.org/2001/XMLSchema" xmlns:po="http://www.example.com/PO">

<element name="purchaseOrder" type="po:PurchaseOrderType"/> <element name="comment" type="string"/>

<complexType name="PurchaseOrderType"> <sequence> <element name="shipTo" type="po:USAddress"/> <element name="billTo" type="po:USAddress"/> <element ref="po:comment" minOccurs="0"/> <!– etc. --> </complexType>

<complexType name="USAddress"> <sequence> <element name= "name" type="string"/> <-- etc. --> </complexType></schema>

Page 52: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 52

Undeclared Target Namespaces

• What is the target namespace when a schema does not declare one?– All its definitions and declarations are referenced without

qualification– They can only validate unqualified names in instance

documents

• What is the target namespace when an instance document does not declare one?– All pre-XMLSchema XML 1.0 documents are like this– To validate such instance documents, the validation

processor must be provided with a schema with no target namespace

Page 53: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 53

Other XML Schema Issues• A schema can be distributed across multiple

documents, one of which is topmost and the rest “included”

• Types can be “imported” from other schemas• Abstract types allow a form of inheritance

[beyond derived types] with substitution groups

• Keys (as in relational databases)• …

Page 54: COMS E6125 Web-enHanced Information Management (WHIM)

Drawbacks of XML Schemas

• Another vocabulary to learn• Verbose (like XML itself)• Many constraints cannot be expressed

(without adding separate stylesheet or code)

<Demo xmlns="http://www.demo.org" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.demo.org demo.xsd">

<A>10</A> <B>20</B> </Demo>

• Can constrain: the Demo element contains a sequence of elements A followed by B; the A element contains an integer; the B element contains an integer

• Can’t constraint: A>B

Page 55: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 55

Today’s Topics:

• Document Structure Definition– Document Type Definitions (DTDs)– XML Schemas (XSD)

• Querying XML Documents– NOT the same as Web search engines!– XPath– XQuery

Page 56: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 56

Why not use SQL?• Table rows vs. XML elements• Homogeneous vs. heterogeneous - two

elements of the same type may have different structure (due to minOccurs, maxOccurs, choice, etc)

• Flat vs. multi-nested• Unordered sets/tuples vs. ordered elements • “Dense” vs. “Sparse” - not all potential

subelements are present or have values

Page 57: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 57

XML Data Model• Basically a sequence, an ordered list of zero

or more items • No sequences of sequences• An item is either a node or an atomic value• An atomic value – a built-in data type or a

simple type derived by restriction• A node is one of seven kinds: element,

attribute, text, document, comment, processing instruction, and namespace

Page 58: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 58

Document Order• Among all nodes in a hierarchy there is a total

order, called document order, in which each node appears before its children

• Preorder traversal• Informally, the document order corresponds to

the order in which the first character of the XML representation of each node occurs in the XML representation of the document

Page 59: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 59

XPath Overview• Language that expresses simple queries

on individual XML documents (or streams) for retrieving parts of the XML document

• Operates on the abstract, logical structure of an XML document, rather than its surface syntax

• Basic facilities for manipulating strings, arithmetic and boolean expressions

• Compact, non-XML syntax to facilitate use within URIs and XML attribute values

Page 60: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 60

XPath Expressions• Similar to filesystem addressing• Consists of a series of steps, separated by

“/” or “//”• Each step is evaluated in the context of a

particular node, called the context node• The result of each step is a sequence of

nodes, which serve in turn as context nodes for the following step

• The value of a path expression is the node sequence that results from the last step

Page 61: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 61

XPath Example

• If the path starts with the slash “/” , then it represents an absolute path to the required element

/AAA/DDD/BBBSelect all elements BBB

that are children of DDD that are children of the root element AAA  

    <AAA>           <BBB/>           <CCC/>           <BBB/>           <BBB/>           <DDD>                <BBB/>           </DDD>           <CCC/>      </AAA>

Page 62: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 62

XPath Example

• If the path starts with double-slash “//” , then all elements in the document which fulfill the criteria are selected

//BBBSelect all elements BBB

     

<AAA>           <BBB/>           <CCC/>           <BBB/>           <DDD>                <BBB/>           </DDD>           <CCC>                <DDD>                     <BBB/>                     <BBB/>                </DDD>           </CCC>      </AAA>

Page 63: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 63

XPath Example

//DDD/BBB Select all elements

BBB that are children of DDD     

<AAA>           <BBB/>           <CCC/>           <BBB/>           <DDD>                <BBB/>           </DDD>           <CCC>                <DDD>                     <BBB/>                     <BBB/>                </DDD>           </CCC>      </AAA>

Page 64: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 64

XPath Example• The star “*” selects

all elements located by preceding path

/AAA/CCC/DDD/*Select all elements

enclosed by elements /AAA/CCC/DDD     

<AAA>           <XXX>                <DDD>                     <BBB/>                     <BBB/>                     <EEE/>                     <FFF/>                </DDD>           </XXX>           <CCC>                <DDD>                     <BBB/>                     <BBB/>                     <EEE/>                     <FFF/>                </DDD>           </CCC>           <CCC>                <BBB>                     <BBB>                          <BBB/>                     </BBB>                </BBB>           </CCC>      </AAA>

Page 65: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 65

XPath Example/*/*/*/BBBSelect all elements

BBB that have 3 ancestors     

<AAA>           <XXX>                <DDD>                     <BBB/>                     <BBB/>                     <EEE/>                     <FFF/>                </DDD>           </XXX>           <CCC>                <DDD>                     <BBB/>                     <BBB/>                     <EEE/>                     <FFF/>                </DDD>           </CCC>           <CCC>                <BBB>                     <BBB>                          <BBB/>                     </BBB>                </BBB>           </CCC>      </AAA>

Page 66: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 66

Moving Through the Node Hierarchy

• A kind of step in XPath, called an axis step, helps move through the node hierarchy in a particular direction, called an axis

• Forward axis – only contains the context node or nodes that are after the context node in document order– child, descendant, self, descendant-or-self,

following, following-sibling, attribute, namespace

• Reverse axis – only contains the context node or nodes that are before the context node in document order– parent, ancestor, preceding, preceding-sibling,

ancestor-or-self

Page 67: COMS E6125 Web-enHanced Information Management (WHIM)

XPath Examples

//CCC/descendant::*Select all elements that have

CCC among their ancestors

<AAA>           <BBB>                <DDD>                     <CCC>                          <DDD/>                          <EEE/>                     </CCC>                </DDD>           </BBB>           <CCC>                <DDD>                     <EEE>                          <DDD>                               <FFF/>                          </DDD>                     </EEE>                </DDD>           </CCC>      </AAA>

     

//DDD/parent::*Select all parents of DDD

element

<AAA>           <BBB>                <DDD>                     <CCC>                          <DDD/>                          <EEE/>                     </CCC>                </DDD>           </BBB>           <CCC>                <DDD>                     <EEE>                          <DDD>                               <FFF/>                          </DDD>                     </EEE>                </DDD>           </CCC>      </AAA>

    

Page 68: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 68

XPath Predicates• Expressions in square brackets “[ ]” can

further specify an element• Used to filter a sequence of values in a step• A number in the brackets gives the ordinal

position of the element in the selected set• The function “last()” selects the last element

in the selection• Function “count()” counts the number of

selected elements• Many other functions and operators

Page 69: COMS E6125 Web-enHanced Information Management (WHIM)

XPath Examples

//BBB[position() mod 2 = 0 ]

Select even BBB elements

<AAA>           <BBB/>           <BBB/>           <BBB/>           <BBB/>           <BBB/>           <BBB/>           <BBB/>           <BBB/>           <CCC/>           <CCC/>           <CCC/>      </AAA>

     

//*[count(BBB)=2]Select elements that have

two children BBB

<AAA>           <CCC>                <BBB/>                <BBB/>                <BBB/>           </CCC>           <DDD>                <BBB/>                <BBB/>           </DDD>           <EEE>                <CCC/>                <DDD/>           </EEE>      </AAA>

Page 70: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 70

XPath Predicates• Attribute names are specified by

the at-sign “@” prefix• Non-prefixed names are the

names of element nodes

Page 71: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 71

XPath Examples

//BBB[@id]Select BBB elements

that have attribute id

<AAA>  <BBB id = "b1"/>  <CCC id = "b2"/>  <BBB name = "bbb"/>  <BBB/>

</AAA>      

//BBB[not(@*)]Select BBB elements

without any attribute

<AAA>  <BBB id = "b1"/>  <CCC id = "b2"/> <BBB name = "bbb"/>  <BBB/>

</AAA>

Page 72: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 72

XPath Predicates• Function “name()” returns name of the

element• The “starts-with” function returns true if

the first argument string starts with (prefixed by) the second argument string

• The “contains” function returns true if the first string contains the second string

• The “string-length” function returns the number of characters in the string

Page 73: COMS E6125 Web-enHanced Information Management (WHIM)

XPath Examples

//*[name()='BBB']Select all elements with

name BBB, equivalent to //BBB

<AAA>           <BCC>                <BBB/>                <BBB/>                <BBB/>           </BCC>           <DDB>                <BBB/>                <BBB/>           </DDB>           <BEC>                <CCC/>                <DBD/>           </BEC>      </AAA>

    

//*[contains(name(),'C')]Select all elements name of

that contain letter C

<AAA>           <BCC>                <BBB/>                <BBB/>                <BBB/>           </BCC>           <DDB>                <BBB/>                <BBB/>           </DDB>           <BEC>                <CCC/>                <DBD/>           </BEC>      </AAA>

Page 74: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 74

More XPath

• Several paths can be combined with the union (“|”), intersect, except separators

• Several values can be concatenated to form a sequence with the comma “,” and “to” operators

• A variable is a name that begins with a dollar sign “$”, may be bound to a value and used in an expression

• Originated ~1998 as part of XSLT, XPath 1.0 W3C Recommendation November 1999, XPath 2.0 W3C Recommendation January 2007

Page 75: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 75

XPath Limitations

• Path expressions are very powerful but there are some drawbacks

• XPath can only select existing node• Cannot construct new elements and

attributes and specify contents and relationships

• XPath operates on a single document

Page 76: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 76

XQuery Design Goals• Express at least the queries possible in

known query languages like SQL and various OO query languages

• Query the many kinds of data XML contains

• Implementable in many environments– Databases, XML programming

environments

Page 77: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 77

XQuery Overview• Operates on the XPath data model• Can query over multiple documents

(e.g., a database of XML documents)• Sequence of (list of ordered) trees• A document is a list of size 1• Functional query language – made up

of expressions that return values and do not have side effects

Page 78: COMS E6125 Web-enHanced Information Management (WHIM)

chapter

Using XPath

doc(“mp3.xml”)//chapter//figure[caption = “ipod nano”]

In any chapter of the document mp3.xml find figures with caption “ipod nano”

book

chapter chapter appendixpart

section

paragraph

figure

caption

“ipod nano”

chapter

chapter

paragraph

figure

caption

“ipod classic”

part

Page 79: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 79

Using XQuery

<result> { doc(“mp3.xml”)//chapter//figure[caption=“ipod nano”] }</result>

In a chapter of the document mp3.xml find the figures with caption “ipod nano” and place them into an

element called result

figure

caption

“ipod nano”

result New element has itsown node identity

Page 80: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 80

Element Construction• XQuery provides for the construction of

new elements• An element constructor looks exactly

like an XML element• Using XQuery expressions, we may

have computed values<result> <figure> <caption>ipod nano</caption> </figure></result>

Page 81: COMS E6125 Web-enHanced Information Management (WHIM)

Bibliography Example Data Set

<bib> <book> <author> Aho </author> <author> Lam </author> <author> Sethi </author> <author> Ullman </author> <title> Compilers </title> <publisher> Addison Wesley </publisher> <year> 2006 </year> </book> <book> <author> Rowling </author> <title> Harry Potter 6 </title> <publisher> Scholastic </publisher> <year> 2005 </year> </book> <book> <author> Patton </author> <title> Software Testing </title> <publisher> SAMS </publisher> <year> 2005 </year> </book></bib>

Page 82: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 82

Reviews Example Data Set

<reviews> <review> <title> Compilers </title> <comment> It’s the best </comment> <comment> A definitive textbook </comment> </review> <review> <title> Harry Potter 6 </title> <comment> Spoiler: Dumbledore dies </comment> <comment> When will the next book come out? </comment> </review> …</reviews>

Page 83: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 83

FOR-WHERE-RETURN

FOR $b in doc(“bib.xml”)//bookWHERE $b/year/text() = “2005”RETURN $b/title

List the titles of books published in 2005

year

bib

book

book book

publisher

AddisonWesley

yearpublisher

Scholastic2006 2005

book

yearpublisher

SAMS 2005

title title title

Page 84: COMS E6125 Web-enHanced Information Management (WHIM)

Tuples of variable bindings

FOR/LET

WHERE

RETURN

Ordered lists of tuplesof variable bindings

Tuples thatsatisfy the conditions

List of trees

$bbookbookbook

$bbookbooktitle

year

bib

book book

publisher

AddisonWesley

yearpublisher

Scholastic2006 2005

book

yearpublisher

SAMS2005

title title title

title

Page 85: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 85

RETURN

FOR $b in doc(“bib.xml”)//book WHERE $b/year/text() = “2005” RETURN $b/author

Return the list of authors who

published in 2005

Page 86: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 86

WHERE

FOR $b in doc(“bib.xml”)//bookWHERE $b/publisher/text() = “Addison Wesley” AND $b/year/text() = “2006”RETURN $b/title

List the titles of books published by “Addison Wesley” in 2006

Page 87: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 89

Joins

FOR $b in doc(“bib.xml”)/book, $r in doc(“review.xml”)/reviewWHERE $b/title/text() = $r/title/text()RETURN <book_with_review> {$b/@*} {$b/*} {$r/comment} </book_with_review>

For every book with a matching review output a book_with_review

that contains all the attributes and subelements of book

and the comment subelements of review

Page 88: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 90

Join Example Result<book_with_review> <author> Aho </author> <author> Lam </author> <author> Sethi </author> <author> Ullman </author> <title> Compilers </title> <publisher> Addison Wesley </publisher> <year> 2006 </year> <comment> It’s the best </comment> <comment> A definitive textbook </comment> </book_with_review> <book_with_review> <author> Rowling </author> … </book_with_review>

Page 89: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 91

Nested queries

FOR $a IN distinct(document(“bib.xml”)//author/text())RETURN <author> <name> {$a} </name> { FOR $b IN document(“bib.xml”)//book WHERE $a = $b/author/text() RETURN $b/title } </author>

Invert the structure of the input document so that there is a list of author elements containing the name of the author and the list of books he/she

wrote

Page 90: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 92

Conditionals

FOR $b IN doc(“bib.xml”)/bookRETURN <short> {$b/title} <author> {IF count($b/author) < 3 $b/author ELSE $b/author[1], <author>and others</author> } </author> </short>

Leave alone books with less than 3 authors

Otherwise shorten the author list

Page 91: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 93

Existential Quantification

FOR $b in doc(“bib.xml”)/bookWHERE SOME $author IN $b/author

SATISFIES $author/text() = “Aho”RETURN $b

Return books where at least one of the authors is “Aho”

Page 92: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 94

Universal Quantification

FOR $b in doc(“bib.xml”)/bookWHERE EVERY $author IN $b/author SATISFIES $author/text() = “Aho”RETURN $b

Return books where all authors are “Aho”

Page 93: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 95

More XQuery Notation• Also Order by• Can define and use functions• Supported by all major database

engines• XQuery has been around in draft form

since ~2001, but only became a W3C Recommendation in January 2007

Page 94: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 96

Summary XPath expresses simple queries on

individual XML documents (or streams) Optimization analogous to string matching

XQuery expresses sophisticated SQL-like queries, joins and views on databases, messages, etc. whether stored as XML or not Optimization analogous to database

query/join Both orthogonal to XML Schema (or DTD)

But its hard to write queries without knowing what syntax will occur in the documents…

Page 95: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 97

Second Assignment: Revised Paper

Proposal• Due Monday February 18th at 5pm• Maximum three pages (not including

figures, if any), plus references (required)• Plan and outline your paper (which will be

~15 pages)• See

http://york.cs.columbia.edu/classes/cs6125/revised_paper_proposal.htm

Page 96: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 98

Revised Paper Proposal• Each full paper should have title, author,

abstract (~200 words), introduction, body sections, conclusions, bibliography (cited references)

• The point of this assignment is to determine what will be in those sections

• Assume a reader who is taking the class but may not know anything at all about your specific topic

Page 97: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 99

Revised Paper Proposal: Introduction

and Conclusion• What is your topic?• What is the problem being addressed?• What is the solution, or design space of

solutions, proposed or actualized?• What is your argument?• What is your point of view?• What is the opposing point of view?

Page 98: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 100

Revised Paper Proposal: Body Sections

• What sections? (usually 3-5)• What subsections? (perhaps down to

subsubsections)• Motivate your literature reading to fill

those sections• Full paper will be due Friday March 14th

Page 99: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 101

A Note about Citations and Bibliographic

References• References should be cited in the text

like this “Pogue said blah blah [1]” or this “[Pog07] describes mumble”

• Bibliography entries should appear something like this[1] David Pogue, Behind the Scenes of “iPhone: The Musical”, The New York Times, online edition, July 12, 2007. <http://pogue.blogs.nytimes.com/2007/07/12/> accessed February 8, 2008.

Page 100: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 102

Another Note about Bibliographic References

• Bibliographic references should be as complete as possible (but official MLA, APA, etc. format is not required)

• There is a variety of free software available to help manage reference lists– http://www.easybib.com/,

http://www.bibme.org/ and others used online (web search for “free bibliography”)

– Downloads from http://www.columbia.edu/acis/software/libraries.html (requires your uni/password)

Page 101: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 103

Second Assignment: Logistics

• Submit by posting in Revised Paper Proposal folder on CourseWorks

• Must be in a format I can read, which means pdf 8 or earlier, word 2003 or earlier, powerpoint 2003 or earlier, html, plain ascii text

• With all figures embedded in the file or separately viewable in Firefox 2 or IE 7 on Windows XP

• Bundled archives must be openable with WinZip 11

Page 102: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 104

Heads Up on Project• Preliminary Proposal due Monday March 10th

(note this is before the full paper)• Optionally work in teams (see

http://york.cs.columbia.edu/classes/cs6125/team_advice)

• Build a new system or extend an existing system – submit code, demo system

• OR evaluate/compare one or more existing system(s) – submit procedures and findings, show system(s)

• You may "continue" your paper topic towards the project, or do something entirely different

Page 103: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 105

Heads Up on Presentation• Individual ~10 talk in class during one

of last few class sessions• No proposal, just do it• May be based on paper, project, or

some other topic (in the case of team members all presenting on the same project, please coordinate to avoid redundancy and discuss your plans with the instructor in advance)

Page 104: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 106

Reminders

• Class participation is important! (10% corresponds to a whole letter grade)

• Revised paper proposal due Monday February 18th by 5pm

• Preliminary project proposal due March 10th

• Paper must be individual, projects may optionally be done in teams

Page 105: COMS E6125 Web-enHanced Information Management (WHIM)

12 February 2008 Kaiser: COMS E6125 107

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

COMS E6125 Web-COMS E6125 Web-enHanced Information enHanced Information Management (WHIM)Management (WHIM)

Prof. Gail KaiserProf. Gail Kaiser

Spring 2008Spring 2008


Recommended