A Short PMML Tutorial by LatentView

Post on 20-Jan-2015

4,249 views 2 download

description

A Short PMML Tutorial by LatentView

transcript

Ramesh Hariharan

PMML Tutorial

www.LatentView.com

This presentation is solely for the use of LatentView. No part of this presentation may be circulated, quoted, or reproduced for distribution without prior written approval from LatentView.

12-Feb-2009

www.LatentView.com

www.latentview.com/blog

Agenda

• PMML Overview

• Constructing a PMML

• XSD Overview

• Reading the PMML Specification

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential) 2

• Next Steps…

Agenda

• PMML Overview

• Constructing a PMML

• XSD Overview

• Reading the PMML Specification

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential) 3

• Next Steps…

PMML Overview

PMML – Predictive Modeling Mark-up Language� Used for Model Scoring� XML Document� Owned by DMG. A consortium led by SPSS, SAS, IBM, Microsoft, Oracle and others� Currently in version 3.2

Advantages of PMML

� Portability of models� Metadata standardization� Model once, score anywhere (MOSA ☺)

Drawbacks of PMML

� Least Common Denominator� Potential loss of precision� Lack of support for complex transformations

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

Some of the Model Types Supported� Association Rules, Clustering, General Regression, Naïve Bayes, Neural Networks, Support Vector

Machines

Capabilities of PMML� Model Composition – model sequencing & model selection� Built-in and User-defined functions� Usual data types – date, numbers, category� Model Verification – sample results for testing� Output field – create output tables based on the models� Extension Mechanisms

4

� Model once, score anywhere (MOSA ☺) � Lack of support for complex transformations� Lack of support from Tools

PMML in the Decision Management Architecture

Create Rules

Client Managers

Business Rules formulation

Scores and Decisions

Requests

Business Rules

Decision Models

Model Repository

Ope

ratio

nal S

yste

ms

Sales & Marketing

Customer Management

Risk Management

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

AnalyticModeling

LatentView Analysts Enterprise Decision Engine

Model Development

Enterprise Data

ProductData

ChannelData

CustomerData

Payment History Data

Interaction Data

Ope

ratio

nal S

yste

ms

Other Applications

Analytics Data Backbone

Agenda

• PMML Overview

• Constructing a PMML

• XSD Overview

• Reading the PMML Specification

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential) 6

• Next Steps…

Constructing a PMML<?xml version="1.0"?> <PMML version="3.2" xmlns="http://www.dmg.org/PMML-3_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > <Header copyright="Example.com"/> <DataDictionary> ... </DataDictionary> ... a model ...

</PMML>

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential) 7

www.dmg.orghttp://dmg.org/v3-2/GeneralStructure.htmlhttp://dmg.org/v3-2/pmml-3-2.xsd

Constructing a PMML<?xml version="1.0"?> <PMML version="3.2" xmlns="http://www.dmg.org/PMML-3_2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > <Header copyright="Example.com"/> <DataDictionary> ... </DataDictionary> ... a model ...

</PMML>

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential) 8

www.dmg.orghttp://dmg.org/v3-2/GeneralStructure.htmlhttp://dmg.org/v3-2/pmml-3-2.xsd

Agenda

• PMML Overview

• Constructing a PMML

• XSD Overview

• Reading the PMML Specification

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential) 9

• Next Steps…

XSD Overview

XSD – XML Schema Definition

The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD.

An XML Schema:• defines elements that can appear in a document • defines attributes that can appear in a document • defines which elements are child elements • defines the order of child elements • defines the number of child elements

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

• defines the number of child elements • defines whether an element is empty or can include text • defines data types for elements and attributes • defines default and fixed values for elements and attributes

A First Example

Look at this simple XML document called "note.xml":

<?xml version="1.0"?> <note> <to>Tove</to>

<from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend!</body>

</note>

Look at the XML Schema for the same

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

<?xml version="1.0"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" targetNamespace="http://www.w3schools.com" xmlns="http://www.w3schools.com" elementFormDefault="qualified">

<xs:element name="note"> <xs:complexType>

<xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/>

</xs:sequence> </xs:complexType>

</xs:element></xs:schema>

Simple Elements

<xs:element name="xxx" type="yyy"/>

XML Schema has a lot of built-in data types. The most common types are:• xs:string• xs:decimal• xs:integer• xs:boolean• xs:date• xs:time

Example

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

Example

<lastname>Refsnes</lastname> <age>36</age><dateborn>1970-03-27</dateborn>

<xs:element name="lastname" type="xs:string"/> <xs:element name="age" type="xs:integer"/> <xs:element name="dateborn" type="xs:date"/>

XSD Attributes

Simple elements cannot have attributes. If an element has attributes, it is considered to be of a complex type. But the attribute itself is always declared as a simple type.

<xs:attribute name="xxx" type="yyy"/>

where xxx is the name of the attribute and yyy specifies the data type of the attribute. XML Schema has a lot of built-in data types. The most common types are:• xs:string• xs:decimal• xs:integer• xs:boolean

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

• xs:boolean• xs:date• xs:time

Example

<lastname lang="EN">Smith</lastname>

<xs:attribute name="lang" type="xs:string"/>

Simple Elements: Restrictions

Restrictions are used to define acceptable values f or XML elements or attributes. Restrictions on XML elements are called facets.

Restrictions on Values<xs:element name="age">

<xs:simpleType> <xs:restriction base="xs:integer">

<xs:minInclusive value="0"/> <xs:maxInclusive value="120"/>

</xs:restriction> </xs:simpleType>

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

</xs:simpleType></xs:element>

Restrictions on a set of Values<xs:element name="car" type="carType"/>

<xs:simpleType name="carType"> <xs:restriction base="xs:string">

<xs:enumeration value="Audi"/> <xs:enumeration value="Golf"/> <xs:enumeration value="BMW"/>

</xs:restriction> </xs:simpleType>

Complex Elements

<employee> <firstname>John</firstname> <lastname>Smith</lastname>

</employee>

<xs:element name="employee" type="personinfo"/><xs:complexType name="personinfo">

<xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/>

</xs:sequence>

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

</xs:sequence> </xs:complexType>

<xs:element name="employee“><xs:complexType>

<xs:sequence> <xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/>

</xs:sequence> </xs:complexType>

<xs:element>

More Complex Elements

You can also base a complex element on an existing complex element and add some elements, like this:

<xs:element name="employee" type="fullpersoninfo"/>

<xs:complexType name="personinfo"> <xs:sequence>

<xs:element name="firstname" type="xs:string"/> <xs:element name="lastname" type="xs:string"/>

</xs:sequence> </xs:complexType>

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

</xs:complexType>

<xs:complexType name="fullpersoninfo"> <xs:complexContent>

<xs:extension base="personinfo"> <xs:sequence>

<xs:element name="address" type="xs:string"/> <xs:element name="city" type="xs:string"/> <xs:element name="country" type="xs:string"/>

</xs:sequence> </xs:extension>

</xs:complexContent> </xs:complexType>

XSD Indicators

You can also base a complex element on an existing complex element and add some elements, like this:

IndicatorsThere are seven indicators:

Order indicators:• All • Choice • Sequence

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

Occurrence indicators:• maxOccurs• minOccurs

Group indicators:• Group name • attributeGroup name

Complex Type: Example

Let's have a look at this XML document called "ship order.xml":

<?xml version="1.0" encoding="ISO-8859-1"?><shiporder orderid="889923" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="shiporder.xsd">

<orderperson>John Smith</orderperson> <shipto>

<name>Ola Nordmann</name> <address>Langgt 23</address> <city>4000 Stavanger</city> <country>Norway</country>

</shipto> <item>

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

<item> <title>Empire Burlesque</title> <note>Special Edition</note> <quantity>1</quantity> <price>10.90</price>

</item> <item>

<title>Hide your heart</title> <quantity>1</quantity> <price>9.90</price>

</item> </shiporder>

Complex Type: Example Solution

The XSD for the file:

<?xml version="1.0" encoding="ISO-8859-1" ?><xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:simpleType name="stringtype"><xs:restriction base="xs:string"/>

</xs:simpleType><xs:simpleType name="inttype">

<xs:restriction base="xs:positiveInteger"/></xs:simpleType><xs:simpleType name="dectype">

<xs:restriction base="xs:decimal"/></xs:simpleType><xs:simpleType name="orderidtype">

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

<xs:simpleType name="orderidtype"><xs:restriction base="xs:string"><xs:pattern value="[0-9]{6}"/></xs:restriction>

</xs:simpleType><xs:complexType name="shiptotype">

<xs:sequence><xs:element name="name" type="stringtype"/><xs:element name="address" type="stringtype"/><xs:element name="city" type="stringtype"/><xs:element name="country" type="stringtype"/>

</xs:sequence></xs:complexType>

continued next slide

Complex Type: Example Solution

The XSD for the file:

…continuous from the previous slide

<xs:complexType name="itemtype"><xs:sequence>

<xs:element name="title" type="stringtype"/><xs:element name="note" type="stringtype" minOccurs="0"/><xs:element name="quantity" type="inttype"/><xs:element name="price" type="dectype"/>

</xs:sequence></xs:complexType>

<xs:complexType name="shipordertype">

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

<xs:complexType name="shipordertype"><xs:sequence>

<xs:element name="orderperson" type="stringtype"/><xs:element name="shipto" type="shiptotype"/><xs:element name="item" maxOccurs="unbounded" type="itemtype"/>

</xs:sequence>

<xs:attribute name="orderid" type="orderidtype" use="required"/></xs:complexType>

<xs:element name="shiporder" type="shipordertype"/></xs:schema>

Agenda

• PMML Overview

• Constructing a PMML

• XSD Overview

• Reading the PMML Specification

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential) 21

• Next Steps…

PMML: Headers

<Header copyright="Copyright (c) 2009 LatentView" description="LatentView Logit Model v1.0">

<Extension name="timestamp" value="2009-01-19 19:38:13" extender="Rattle" /><Extension name="description" value="Administrator" extender="Rattle" /><Application name="Rattle/PMML" version="1.2.0" />

</Header>

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

PMML: Data Dictionary

<DataDictionary numberOfFields="23"><DataField name="ind_Sale" optype="continuous"

dataType="double" />…

<DataField name="STATE" optype="categorical" dataType="string" />

</DataDictionary>

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

PMML Transformations

PMML defines various kinds of simple data transformations:� Normalization : map values to numbers, the input can be continuous or discrete. � Discretization : map continuous values to discrete values. � Value mapping : map discrete values to discrete values. � Functions : derive a value by applying a function to one or more parameters � Aggregation : summarize or collect groups of values, e.g., compute average.

Value Mapping<DerivedField name="ETHNICGROUPCODE_02" optype="ordinal" dataType="integer">

<MapValues outputColumn="derived" defaultValue="0" mapMissingTo="0"><FieldColumnPair field="ETHNICGROUPCODE" column="original" />

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

<FieldColumnPair field="ETHNICGROUPCODE" column="original" /><InlineTable><row><original>02</original><derived>1</derived>

</row></InlineTable>

</MapValues></DerivedField>

Built-in Function<DerivedField name="I1EXACTAGE_dr" optype="continuous" dataType="double">

<Apply function="sum"><FieldRef field="I1EXACTAGE"/><FieldRef field="I1ESTIMATEDAGE"/>

</Apply></DerivedField>

PMML: Mining Schema

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

PMML: Mining Schema

< <MiningSchema><MiningField name="ind_Sale" usageType="predicted" missingValueReplacement="-1"

missingValueTreatment="asValue" /><MiningField name="I1ESTIMATEDAGE" usageType="active" missingValueReplacement="-1"

missingValueTreatment="asValue"/><MiningField name="I2ESTIMATEDAGE" usageType="active" missingValueReplacement="-1"

missingValueTreatment="asValue"/>…

<MiningField name="I1EXACTAGE" usageType="active" missingValueReplacement="-1" missingValueTreatment="asValue"/>

</MiningSchema>

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

Agenda

• PMML Overview

• Constructing a PMML

• XSD Overview

• Reading the PMML Specification

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential) 27

• Next Steps…

Next Steps

� Create a PMML file from your models – one for Logistic, Clustering and Decision Tree models

� Build PMML manually, and validate it using an XML editor such as XMLFox (a syntactically valid PMML may not be logically valid)

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

Thank You !

www.LatentView.com

LatentView Analytics Pvt. Ltd (Confidential)

JVL Plaza, Ground Floor,626 Anna Salai, Teynampet,Chennai – 600 018

Phone: +91-44-4509 4039/40

80, Broad Street, 5th FloorNew York, NY 10004

Phone: +1-212-837-7874