Decision Tree .

• Decision Tree

https://store.theartofservice.com/the-decision-tree-toolkit.html

Machine learning - Decision tree learning

1 Decision tree learning uses a decision tree as a predictive model which maps observations about an

item to conclusions about the item's target value.


Alternating decision tree - History

1 However, the algorithm as presented had several typographical errors. Clarifications and optimizations were later presented by Bernhard

Pfahringer, Geoffrey Holmes and Richard Kirkby.Bernhard Pfahringer, Geoffrey Holmes

and Richard Kirkby. Optimizing the Induction of Alternating Decision Trees. Proceedings of the

Fifth Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining. 2001, pp. 477-487 Implementations are available in

Weka (machine learning)|Weka and JBoost.


Alternating decision tree - Motivation

1 or decision trees as weak hypotheses. As an example, boosting decision stumps creates


Alternating decision tree - Motivation

1 Boosting a simple learner results in an unstructured set of T hypotheses, making it

difficult to infer correlations between attributes. Alternating decision trees

introduce structure to the set of hypotheses by requiring that they build off a hypothesis that was produced in an earlier iteration.

The resulting set of hypotheses can be visualized in a tree based on the

relationship between a hypothesis and its parent.


Alternating decision tree - Alternating decision tree structure

1 An alternating decision tree consists of decision nodes and

prediction nodes


Alternating decision tree - Empirical results

1 Figure 6 in the original paper demonstrates that ADTrees are typically as robust as boosted

decision trees and boosted decision stumps. Typically, equivalent

accuracy can be achieved with a much simpler tree structure than recursive partitioning algorithms.


Gene expression programming - Decision trees

1 Decision trees (DT) are classification models where a series of questions

and answers are mapped using nodes and directed edges.



1 Decision trees have three types of nodes: a root node, internal nodes, and leaf or terminal nodes. The root

node and all internal nodes represent test conditions for different attributes or variables in a dataset. Leaf nodes specify the class label for all different

paths in the tree.



1 Most decision tree induction algorithms involve selecting an

attribute for the root node and then make the same kind of informed

decision about all the nodes in a tree.



1 Decision trees can also be created by gene expression programming, with the advantage that all the decisions

concerning the growth of the tree are made by the algorithm itself without

any kind of human input.



1 This aspect of decision tree induction also carries to gene expression

programming and there are two GEP algorithms for decision tree

induction: the evolvable decision trees (EDT) algorithm for dealing

exclusively with nominal attributes and the EDT-RNC (EDT with random numerical constants) for handling

both nominal and numeric attributes.https://store.theartofservice.com/the-decision-tree-toolkit.html


1 In the decision trees induced by gene expression programming, the

attributes behave as function nodes in the gene expression

programming#The basic gene expression algorithm|basic gene

expression algorithm, whereas the class labels behave as terminals



1 This again ensures that all decision trees designed by GEP are always valid programs



1 For example, consider the decision tree below to decide whether to play outside:



1 Then the chromosomes are expressed as decision trees and their fitness evaluated against a training

dataset



1 Decision trees with both nominal and numeric attributes are also easily induced with gene

expression programming using the framework described gene expression programming#The

GEP-RNC algorithm|above for dealing with random numerical constants. The chromosomal

architecture includes an extra domain for encoding random numerical constants, which are used as thresholds for splitting the data at each branching node. For example, the gene below with a head size of 5 (the Dc starts at position

16):



1 These random numerical constants are encoded in the Dc domain and

their expression follows a very simple scheme: from top to bottom and

from left to right, the elements in Dc are assigned one-by-one to the elements in the decision tree



1 which can also be represented more colorfully as a conventional decision tree:


Decision tree learning

1 'Decision tree learning' uses a decision tree as a Predictive

modelling|predictive model which maps observations about an item to conclusions about the item's target

value


Decision tree learning

1 In decision analysis, a decision tree can be used to visually and explicitly

represent decisions and decision making. In data mining, a decision

tree describes data but not decisions; rather the resulting classification tree can be an input for decision making. This page deals with decision trees in

data mining.


Decision tree learning - General

1 Decision tree learning is a method commonly used in data mining. The

goal is to create a model that predicts the value of a target

variable based on several input variables. An example is shown on

the right. Each interior node corresponds to one of the input

variables; there are edges to children for each of the possible values of

that input variable. Each leaf represents a value of the target

variable given the values of the input variables represented by the path

from the root to the leaf.



1 A decision tree is a simple representation for classifying

examples. Decision tree learning is one of the most successful techniques for supervised

classification learning. For this section, assume that all of the features have finite discrete

domains, and there is a single target feature called the classification. Each

element of the domain of the classification is called a class.



1 A decision tree or a classification tree is a tree in which each internal (non-

leaf) node is labeled with an input feature. The arcs coming from a node

labeled with a feature are labeled with each of the possible values of the feature. Each leaf of the tree is labeled with a class or a probability

distribution over the classes.



1 Machine Learning 1: 81-106, Kluwer Academic Publishers is an example of

a greedy algorithm, and it is by far the most common strategy for

learning decision trees from data.



1 In data mining, decision trees can be described also as the combination of

mathematical and computational techniques to aid the description,

categorisation and generalisation of a given set of data.


Decision tree learning - Types

1 Decision trees used in data mining are of

two main types:



1 Some techniques, often called ensemble methods, construct more than one decision

tree:



1 *'Bootstrap aggregating|Bagging' decision trees, an early ensemble method, builds multiple decision trees by repeatedly resampling

training data with replacement, and voting the trees for a consensus prediction.Breiman, L. (1996). Bagging Predictors. Machine Learning, 24: pp. 123-140.



1 *A 'Random forest|Random Forest' classifier uses a number of decision

trees, in order to improve the classification rate.



1 *'Rotation forest' - in which every decision tree is trained by first applying principal component

analysis (PCA) on a random subset of the input features.Rodriguez, J.J. and

Kuncheva, L.I. and Alonso, C.J. (2006), Rotation forest: A new

classifier ensemble method, IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10):1619-

1630.https://store.theartofservice.com/the-decision-tree-toolkit.html


1 'Decision tree learning' is the construction of a decision tree from

class-labeled training tuples. A decision tree is a flow-chart-like

structure, where each internal (non-leaf) node denotes a test on an

attribute, each branch represents the outcome of a test, and each leaf (or terminal) node holds a class label. The topmost node in a tree is the

root node.https://store.theartofservice.com/the-decision-tree-toolkit.html


1 * Multivariate adaptive regression splines|MARS: extends decision trees

to better handle numerical data.



1 ID3 and CART were invented independently at around same time (b/w 1970-80), yet follow a similar approach for learning decision tree

from training tuples.


Decision tree learning - Formulae

1 Algorithms for constructing decision trees usually work top-down, by choosing a

variable at each step that best splits the set of items. Different algorithms use different

metrics for measuring best. These generally measure the homogeneity of the target

variable within the subsets. Some examples are given below. These metrics are applied to

each candidate subset, and the resulting values are combined (e.g., averaged) to

provide a measure of the quality of the split.


Decision tree learning - Information gain

1 Used by the ID3 algorithm|ID3, C4.5 algorithm|C4.5 and C5.0 tree-

generation algorithms. Information gain in decision trees|Information gain is based on the concept of

information entropy|entropy from information theory.


Decision tree learning - Decision tree advantages

1 Amongst other data mining methods,

decision trees have various advantages:


Decision tree learning - Decision tree advantages

1 * 'Simple to understand and interpret.' People are able to

understand decision tree models after a brief explanation.


Decision tree learning - Limitations

1 Such algorithms cannot guarantee to return the globally-optimal decision tree.



1 * Decision-tree learners can create over-complex trees that do not

generalise well from the training data. (This is known as overfitting.)

Mechanisms such as Pruning (decision trees)|pruning are

necessary to avoid this problem.



1 * There are concepts that are hard to learn because decision trees do not express them easily, such as XOR, parity bit#Parity|parity or

multiplexer problems. In such cases, the decision tree becomes

prohibitively large. Approaches to solve the problem involve either

changing the representation of the problem domain (known as

propositionalisation) or using learning algorithms based on more expressive representations (such as

statistical relational learning or inductive logic programming).



1 * For data including categorical variables with different numbers of levels, information gain in decision

trees is biased in favor of those attributes with more levels.


Decision tree learning - Decision graphs

1 In a decision tree, all paths from the root node to the leaf node proceed

by way of conjunction, or AND.


Decision tree learning - Decision graphs

1 In general, decision graphs infer models with fewer leaves than decision trees.


Decision tree learning - Alternative search methods

1 Breeding Decision Trees Using Evolutionary Techniques, Proceedings

of the Eighteenth International Conference on Machine Learning,

p.393-400, June 28-July 01, 2001Barros, Rodrigo C., Basgalupp,

M


Decision trees

1 A 'decision tree' is a decision support tool that uses a tree-like Diagram|graph or Causal model|

model of decisions and their possible consequences, including probability|

chance event outcomes, resource costs, and utility. It is one way to

display an algorithm.


Decision trees

1 Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal.


Decision trees - Overview

1 A decision tree is a flowchart-like structure in which internal node

represents test on an attribute, each branch represents outcome of test

and each leaf node represents class label (decision taken after computing

all attributes). A path from root to leaf represents classification rules.



1 In decision analysis a decision tree and the closely related influence diagram is used as a visual and analytical decision support tool, where the expected values (or expected utility) of competing alternatives are calculated.



1 Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal. If in practice decisions have to be taken online with no recall under incomplete knowledge, a decision

tree should be paralleled by a probability model as a best choice

model or online selection model algorithm. Another use of decision trees is as a descriptive means for calculating conditional probability|

conditional probabilities.



1 Decision trees, influence diagrams, utility functions, and other decision

analysis tools and methods are taught to undergraduate students in

schools of business, health economics, and public health, and

are examples of operations research or management science methods.


Decision trees - Decision tree elements

1 Drawn from left to right, a decision tree has only burst nodes (splitting

paths) but no sink nodes (converging paths). Therefore, used manually,

they can grow very big and are then often hard to draw fully by hand. Traditionally, decision trees have

been created manually - as the aside example shows - although

increasingly, specialized software is employed.


Decision trees - Decision tree using flow chart symbols

1 Commonly a decision tree is drawn using flow chart symbols as it is

easier for many to read and understand.


Decision trees - Another example

1 Shaw, [http://www.sciencedirect.com/scienc

e?_ob=ArticleURL_udi=B6V05-4007D5X-

C_user=793840_coverDate=01%2F27%2F1995_fmt=summary_orig=search_cdi=5637view=c_acct=C000043460_version=1_urlVersion=0_userid=793840md5=b66b56153f6780c30e07201eadd454cfref=full Induction of

fuzzy decision trees]https://store.theartofservice.com/the-decision-tree-toolkit.html

Decision trees - Influence diagram

1 A decision tree can be represented more compactly as an influence

diagram, focusing attention on the issues and relationships between

events.


Decision trees - Advantages and disadvantages

1 Amongst decision support tools, decision trees (and influence

diagrams) have several advantages. Decision trees:



1 * Are simple to understand and interpret. People are able to

understand decision tree models after a brief explanation.



1 * For data including categorical variables with different number of levels, information gain in decision trees are biased in favor of those

attributes with more levels.


List of important publications in computer science - Induction of Decision Trees

1 Description: Decision Trees are a common learning algorithm and a

decision representation tool. Development of decision trees was done by many researchers in many

areas, even before this paper. Though this paper is one of the most

influential in the field.


Game complexity - Decision trees

1 A decision tree is a subtree of the game tree, with each position

labelled with player A wins, player B wins or drawn, if that position can be proved to have that value (assuming

best play by both sides) by examining only other positions in the

graph


Information gain in decision trees

1 In information theory and machine learning, 'information gain' is a synonym for Kullback–Leibler

divergence. However, in the context of decision trees, the term is

sometimes used synonymously with mutual information, which is the

expectation value of the Kullback–Leibler divergence of a conditional

probability distribution.https://store.theartofservice.com/the-decision-tree-toolkit.html

Information gain in decision trees

1 In machine learning, this concept can be used to define a preferred

sequence of attributes to investigate to most rapidly narrow down the

state of X. Such a sequence (which depends on the outcome of the

investigation of previous attributes at each stage) is called a Decision tree

learning|decision tree. Usually an attribute with high mutual

information should be preferred to other attributes.


Learning algorithms - Decision tree learning

1 Decision tree learning uses a decision tree as a predictive

modelling|predictive model which maps observations about an item to conclusions about the item's target

value.


Decision tree model

1 In computational complexity theory|computational complexity and communication complexity theories

the 'decision tree model' is the model of computation or

communication in which an algorithm or communication process is

considered to be basically a decision tree, i.e., a sequence of branching

operations based on comparisons of some quantities, the comparisons

being assigned the unit computational cost.


Decision tree model

1 Several variants of decision tree models may be considered,

depending on the complexity of the operations allowed in the

computation of a single comparison and the way of branching.


Decision tree model

1 The computation complexity of a problem or an algorithm expressed in

terms of the decision tree model is called 'decision tree complexity' or

'query complexity'.


Decision tree model - Simple decision tree

1 The model in which every decision is based on the comparison of two numbers within constant time is

called simply a decision tree model. It was introduced to establish

computational complexity of sorting and searching.Data structures and

algorithms, by Alfred V. Aho, John E. Hopcroft, Jeffrey D. Ullman


Decision tree model - Simple decision tree

1 In this case the decision tree model is a binary tree


Decision tree model - Linear decision tree

1 Linear decision trees, just like the simple decision trees, make a

branching decision based on a set of values as input. As opposed to binary decision trees, linear decision trees

have three output branches. A linear function f(x_1, \dots, x_i) is being

tested and branching decisions are made based on the sign of the

function (negative, positive, or 0).https://store.theartofservice.com/the-decision-tree-toolkit.html

Decision tree model - Algebraic decision tree

1 Algebraic decision trees are a generalization of linear decision trees

to allow test functions to be polynomials of degree d.

Geometrically, the space is divided into semi-algebraic sets (a

generalization of hyperplane). The evaluation of the complexity is more

difficult.


Decision tree model - Deterministic decision tree

1 If the output of a decision tree is f(x), for all x\in \^n , the decision tree is said to compute f. The depth of a tree is the maximum number of

queries that can happen before a leaf is reached and a result obtained.

'D(f)', the 'deterministic decision tree' complexity of f is the smallest depth

among all deterministic decision trees that compute f.


Decision tree model - Randomized decision tree

1 'R_2(f)' is defined as the complexity of the lowest-depth randomized

decision tree whose result is f(x) with probability at least 2/3 for all x\in \^n

(i.e., with bounded 2-sided error).


Decision tree model - Randomized decision tree

1 'R_2(f)' is known as the Monte Carlo algorithm|Monte Carlo randomized decision-

tree complexity, because the result is allowed to be incorrect with bounded two-sided error. The Las Vegas algorithm|Las Vegas decision-tree complexity 'R_0(f)'

measures the expected depth of a decision tree that must be correct (i.e., has zero-

error). There is also a one-sided bounded-error version known as 'R_1(f)'.


Decision tree model - Nondeterministic decision tree

1 The nondeterministic decision tree complexity of a function is known more commonly as the Certificate

(complexity)|certificate complexity of that function. It measures the

number of input bits that a nondeterministic algorithm would

need to look at in order to evaluate the function with certainty.


Decision tree model - Quantum decision tree

1 Q_2(f) and Q_E(f) are more commonly known as 'quantum query

complexities', because the direct definition of a quantum decision tree

is more complicated than in the classical case


Decision tree model - Relationship between different models

1 Noam Nisan found that the Monte Carlo randomized decision tree complexity is also polynomially

related to deterministic decision tree complexity: D(f) = O(R_2(f)^3)


Decision tree model - Relationship between different models

1 The quantum decision tree complexity Q_2(f) is also polynomially related to D(f). Midrijanis showed that D(f) = O(Q_E(f)^3), improving a quartic bound due to Beals et al. Beals et al. also showed that D(f) = O(Q_2(f)^6), and this is still the best known bound. However, the

largest known gap between deterministic and quantum query complexities is only

quadratic. A quadratic gap is achieved for the Grover's algorithm|OR function; D(OR_n)

= n while Q_2(OR_n) = \Theta(\sqrt).


For More Information, Visit:

• https://store.theartofservice.com/the-decision-tree-toolkit.html

The Art of Servicehttps://store.theartofservice.com






Date post:	27-Dec-2015
Category:	Documents
Upload:	thomasine-allison
View:	228 times
Download:	1 times

Decision Tree .

Documents