+ All Categories
Home > Documents > Inferring phylogenetic graphs of Natural Languages using...

Inferring phylogenetic graphs of Natural Languages using...

Date post: 30-Dec-2019
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
33
Inferring phylogenetic graphs of Inferring phylogenetic graphs of Natural Languages using Minimum Natural Languages using Minimum Message Length Message Length Jane N. Ooi and David L. Dowe, Jane N. Ooi and David L. Dowe, Monash University, Australia, Monash University, Australia, www.csse.monash.edu.au/~dld www.csse.monash.edu.au/~dld
Transcript
Page 1: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Inferring phylogenetic graphs of Inferring phylogenetic graphs of Natural Languages using Minimum Natural Languages using Minimum

Message LengthMessage Length

Jane N. Ooi and David L. Dowe,Jane N. Ooi and David L. Dowe,

Monash University, Australia,Monash University, Australia,

www.csse.monash.edu.au/~dldwww.csse.monash.edu.au/~dld

Page 2: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Table of ContentsTable of Contents

Motivation and BackgroundMotivation and Background What is a phylogenetic model?What is a phylogenetic model? Phylogenetic Trees and GraphsPhylogenetic Trees and Graphs Types of evolution of languagesTypes of evolution of languages Minimum Message Length (MML)Minimum Message Length (MML)

Multistate distribution – modelling of mutationsMultistate distribution – modelling of mutations Results/DiscussionResults/Discussion Conclusion and future workConclusion and future work

Page 3: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

MotivationMotivation

To study how languages have evolved (Phylogeny To study how languages have evolved (Phylogeny of languages).of languages). e.g. Artificial languages,e.g. Artificial languages, European languages.European languages.

To refine natural language compression method. To refine natural language compression method.

Page 4: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Evolution of languagesEvolution of languages What is phylogeny?What is phylogeny?

Phylogeny means Phylogeny means EvolutionEvolution

What is a phylogeneticWhat is a phylogenetic model?model?

A A phylogenetic tree/graphphylogenetic tree/graph is isa a treetree/graph/graph showing the showing the evolutionaryevolutionary interrelationships among interrelationships among various various speciesspecies or other entities that are believed to have a or other entities that are believed to have a common ancestorcommon ancestor. .

Page 5: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Difference between a phylogenetic tree and a Difference between a phylogenetic tree and a phylogenetic graphphylogenetic graph

Phylogenetic trees Phylogenetic trees Each child node has exactly Each child node has exactly oneone parent node. parent node.

Phylogenetic graphs (new concept)Phylogenetic graphs (new concept) Each child node can descend from Each child node can descend from one or moreone or more parent parent

node(s).node(s).

Y Z

X

YX

Z

Page 6: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Evolution of languagesEvolution of languages

3 types of evolution3 types of evolution Evolution of Evolution of phonology/pronunciationphonology/pronunciation

Evolution of Evolution of written script/spellingwritten script/spelling

Evolution of Evolution of grammatical structuresgrammatical structures

lezhurelezhureleezhureleezhureleisureleisure

shedulesheduleskeduleskedulescheduleschedule

UKUKUSUSWordsWords

TelevisyenTelevisyenTelevisionTelevision

MobilMobilMobileMobile

MalayMalayEnglishEnglish

Page 7: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Minimum Message Length (MML)Minimum Message Length (MML)

What is MML?What is MML? A measure of A measure of goodness of classification goodness of classification based on based on

information theory (Wallace and Boulton, 1968; Wallace information theory (Wallace and Boulton, 1968; Wallace and Dowe, 1999a; Wallace, 2005).and Dowe, 1999a; Wallace, 2005).

Data can be described using “models”Data can be described using “models” MML methods favour the “MML methods favour the “bestbest” description of data ” description of data

wherewhere ““bestbest” = ” = shortestshortest overall two-part message length overall two-part message length

Two part messageTwo part message Msglength = Msglength(model) + msglength(data|model)Msglength = Msglength(model) + msglength(data|model)

Page 8: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Minimum Message Length (MML)Minimum Message Length (MML)

Degree of similarity between languages can be measured Degree of similarity between languages can be measured by compressing them in terms of one another.by compressing them in terms of one another.

Example : Example : Language A Language BLanguage A Language B

• 3 possibilities – 3 possibilities – Unrelated Unrelated – shortest message length when compressed – shortest message length when compressed

separately. separately. A descended from B A descended from B – shortest message length when B compressed – shortest message length when B compressed

and then A compressed in terms of B.and then A compressed in terms of B. B descended from A B descended from A – shortest message length when A compressed – shortest message length when A compressed

and then B compressed in terms of A.and then B compressed in terms of A.

Page 9: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Minimum Message Length (MML)Minimum Message Length (MML)

The The bestbest phylogenetic model is the phylogenetic model is the tree/graph that achieves the tree/graph that achieves the shortestshortest

overall two-part message length.overall two-part message length.

Page 10: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Modelling mutation between wordsModelling mutation between words

Root languageRoot language Equal frequencies for all characters.Equal frequencies for all characters.

• Log(size of alphabet) * no. of chars.Log(size of alphabet) * no. of chars. Some characters occur more frequently than others. Some characters occur more frequently than others.

• e.g.: English “x” compared with “a”.e.g.: English “x” compared with “a”.

• Multi-state (multinomial) distribution of characters.Multi-state (multinomial) distribution of characters.

Page 11: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Modelling mutation between wordsModelling mutation between words

Child languages Child languages Muti-state distributionMuti-state distribution

• 4 states.4 states. InsertInsert DeleteDelete CopyCopy ChangeChange

Use Use string alignment string alignment techniques to find the best alignment techniques to find the best alignment between words.between words.

Dynamic Programming Algorithm to find alignment between Dynamic Programming Algorithm to find alignment between strings.strings.

MML favors the alignment between words that produces the MML favors the alignment between words that produces the shortestshortest overall message length. overall message length.

Page 12: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Example:Example:

----ddnneemmmmoocceerr

||||||||||||||||||||||

rreeddnnaammmmoocceerr

Page 13: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Work to dateWork to date

Preliminary model Preliminary model Only Only copy copy and and changechange mutations mutations Words of the same lengthWords of the same length artificial and some European languages.artificial and some European languages.

Expanded modelExpanded model CopyCopy, , changechange, , insertinsert and and delete delete mutationsmutations Words of different lengthWords of different length artificial and some European languages.artificial and some European languages.

Page 14: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Results – Preliminary modelResults – Preliminary model

Artificial languagesArtificial languages A – randomA – random B – 5% mutation B – 5% mutation from from

AA C – 5% mutation C – 5% mutation from from

BB Full stop “.” marks the Full stop “.” marks the

end of string.end of string.

11

22

33

44

……

5050

assfge.assfge.

zlchrya.zlchrya.

wbt.wbt.

vsagt.vsagt.

……....

……....

assfge.assfge.

zlcdrya.zlcdrya.

wet.wet.

vsegt.vsegt.

……....

……....

asdfge.asdfge.

zlsdrya.zlsdrya.

wet.wet.

vsert.vsert.

……....

……....

CCBBAA

Page 15: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Results – Preliminary modelResults – Preliminary model

Possible Possible tree topologies tree topologies for 3 languages :for 3 languages :

X

Y Z

Expanded model

only

X

Z

Y

Null hypothesis : totally unrelated

Fully related

Partially related

Expected topology

X

Y

Z

X

Y

Z

Page 16: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Results – Preliminary modelResults – Preliminary model

Possible Possible graph topologies graph topologies for for

3 languages:3 languages:

YX

Z

Y

Z

X

Non-related parents Related parents

Page 17: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Results – Preliminary modelResults – Preliminary model Results :Results :

Best tree =Best tree =

Language BLanguage B/ \/ \

Pmut(B,A)~ 0.051648 Pmut(B,C)~ 0.049451Pmut(B,A)~ 0.051648 Pmut(B,C)~ 0.049451/ \/ \

v vv vLanguage A Language CLanguage A Language C

Overall Message Length = Overall Message Length = 2933.262933.26 bits bits• Cost of topology = log(5)Cost of topology = log(5)• Cost of fixing root language (B) = log(3)Cost of fixing root language (B) = log(3)• Cost of root language = 2158.7186 bitsCost of root language = 2158.7186 bits• Branch 1Branch 1

Cost of child language (Lang. A) binomial distribution = 392.069784 bitsCost of child language (Lang. A) binomial distribution = 392.069784 bits• Branch 2Branch 2

Cost of child language (Lang. C) binomial distribution = 378.562159 bitsCost of child language (Lang. C) binomial distribution = 378.562159 bits

B

A C

Page 18: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Results – Preliminary modelResults – Preliminary model

European Languages (with European Languages (with accents removed)accents removed) FrenchFrench EnglishEnglish SpanishSpanish

11

22

33

44

……

3030

nene.nene.

playa.playa.

bizcocho.bizcocho.

crema.crema.

……....

……....

bebe.bebe.

plage.plage.

biscuits.biscuits.

creme.creme.

……....

……....

baby.baby.

beach.beach.

biscuits.biscuits.

cream.cream.

……....

……....

SpanishSpanishFrenchFrenchEnglishEnglish

Page 19: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Results – Preliminary modelResults – Preliminary model

FrenchFrench P(from French)~ 0.834297 Pmut(French,Spanish) ~ 0.245174P(from French)~ 0.834297 Pmut(French,Spanish) ~ 0.245174 P(from Spanish P(from Spanish not French) ~ 0.090559 Spanishnot French) ~ 0.090559 Spanish P(from neither)~ 0.075145 P(from neither)~ 0.075145 EnglishEnglish

French

English

Spanish

Cost of “parent” language (French) =1226.76 bitsCost of language (Spanish) binomial distribution = 734.59 bitsCost of child language (English) trinomial distribution = 537.70 bitsTotal tree cost = log(5) + log(3) + log(2) + 1226.76 + 734.59 + 537.70

= 2503.95 bits

Page 20: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Results – Expanded modelResults – Expanded model

16 sets of 4 languages16 sets of 4 languages Different length vocabulariesDifferent length vocabularies

A – randomly generatedA – randomly generated B – mutated from AB – mutated from A C – mutated from AC – mutated from A D – mutated from BD – mutated from B

Mutation probabilitiesMutation probabilities Copy – 0.65Copy – 0.65 Change – 0.20Change – 0.20 Insert – 0.05Insert – 0.05 Delete – 0.10Delete – 0.10

Page 21: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Results – Expanded modelResults – Expanded model

5050

……....

……....

55

44

33

22

11

…………....…………....…………....…………....

…………....…………....…………....…………....

fidgw.fidgw.foijnw.foijnw.fiogw.fiogw. foijgnw. foijgnw.

…………....…………....…………....…………....

Language DLanguage DLanguage CLanguage CLanguage BLanguage BLanguage ALanguage A

eol.eol.enc.enc.eol.eol. eni. eni.

domnit.domnit.deoinet.deoinet.domnitdomnitdoinet.doinet.

bave.bave.auke.auke.baxke.baxke.bauke.bauke.

afjnv.afjnv.wqmv.wqmv.afjmv.afjmv.awjmv. awjmv.

Examples of a set of 4 vocabularies used

Page 22: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Results – Expanded modelResults – Expanded model

Possible tree structures for 4 languages:Possible tree structures for 4 languages:

A

C

B

D

Null hypothesis :

totally unrelated

A B

C D

A B

DC

Partially related

A

B

C

D

Page 23: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Results – Expanded modelResults – Expanded model

A

B C D

A

B

C

D

A

B C

D

Fully related

A

B

C D

Expected topology

Page 24: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Results – Expanded modelResults – Expanded model

Correct tree structure 100% of the time.Correct tree structure 100% of the time. Sample of inferred tree and cost :Sample of inferred tree and cost :

Language A : size = 383 chars, cost = 1821.121913 bits Language A : size = 383 chars, cost = 1821.121913 bits

A

B C

D

Page 25: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Results – Expanded modelResults – Expanded model

Pr(Delete) = 0.076250 Pr(Delete) = 0.076250 Pr(Insert) = 0.038750 Pr(Insert) = 0.038750 Pr(Mismatch) = 0.186250 Pr(Mismatch) = 0.186250 Pr(Match) = 0.698750 Pr(Match) = 0.698750 4 state Multinomial cost = 4 state Multinomial cost = 930.108894 bits 930.108894 bits

Pr(Delete) = 0.071250 Pr(Delete) = 0.071250 Pr(Insert) = 0.038750 Pr(Insert) = 0.038750 Pr(Mismatch) = 0.183750 Pr(Mismatch) = 0.183750 Pr(Match) = 0.706250 Pr(Match) = 0.706250 4 state Multinomial cost = 4 state Multinomial cost = 916.979371 bits 916.979371 bits

*Note that all multinomial cost includes and extra cost of log(26) to state the new *Note that all multinomial cost includes and extra cost of log(26) to state the new character for mismatch and insert *character for mismatch and insert *

A

B

A

C

Page 26: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Results – Expanded modelResults – Expanded model

Pr(Delete) = 0.066580 Pr(Delete) = 0.066580 Pr(Insert) = 0.035248 Pr(Insert) = 0.035248 Pr(Mismatch) = 0.189295 Pr(Mismatch) = 0.189295 Pr(Match) = 0.708877 Pr(Match) = 0.708877 4 state Multinomial cost = 873.869382 bits 4 state Multinomial cost = 873.869382 bits

Cost of fixing topology = log(7) = 2.81 bitsCost of fixing topology = log(7) = 2.81 bits Total tree cost Total tree cost = 930.11 + 916.98 + 873.87 + = 930.11 + 916.98 + 873.87 +

1821.11 + log(7) + log(4) + log(3) + log(2) 1821.11 + log(7) + log(4) + log(3) + log(2)

= = 4549.46 bits4549.46 bits

B

D

Page 27: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Results – Expanded modelResults – Expanded model

European LanguagesEuropean Languages FrenchFrench EnglishEnglish GermanGerman 11

22

33

44

……

601601

sogar.sogar.

auge.auge.

falsch.falsch.

angst.angst.

……....

……....

meme.meme.

oeil.oeil.

faux.faux.

peur.peur.

……....

……....

even.even.

eyes.eyes.

false.false.

fear.fear.

……....

……....

GermanGermanFrenchFrenchEnglishEnglish

Page 28: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Results – Expanded modelResults – Expanded model

Total cost of this tree = 56807.155 bitsTotal cost of this tree = 56807.155 bits

Cost of fixing topology = log(4) = 2 bitsCost of fixing topology = log(4) = 2 bits

Cost of fixing root language (French) = log(3) = 1.585 bitsCost of fixing root language (French) = log(3) = 1.585 bits Cost of French = no. of chars * log(27) = 21054.64 bitsCost of French = no. of chars * log(27) = 21054.64 bits

French

English

German

Page 29: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Cost of fixing parent/child language (English) = log(2) = 1 bitCost of fixing parent/child language (English) = log(2) = 1 bit Cost of multistate distribution (French -> English) = 15567.98 bitsCost of multistate distribution (French -> English) = 15567.98 bits MML inferred probabilities:MML inferred probabilities:

Pr(Delete) = 0.164322Pr(Delete) = 0.164322 Pr(Insert) = 0.071429Pr(Insert) = 0.071429 Pr(Mismatch) = 0.357143Pr(Mismatch) = 0.357143 Pr(Match) = 0.407106Pr(Match) = 0.407106

Cost of multistate distribution (English -> German) = 20179.95 bitsCost of multistate distribution (English -> German) = 20179.95 bits MML inferred probabilities:MML inferred probabilities:

Pr(Delete) = 0.069480Pr(Delete) = 0.069480 Pr(Insert) = 0.189866Pr(Insert) = 0.189866 Pr(Mismatch) = 0.442394 Pr(Mismatch) = 0.442394 Pr(Match) = 0.298260Pr(Match) = 0.298260

Note that an extra cost of log(26) is needed for each mismatch and log(27) for each Note that an extra cost of log(26) is needed for each mismatch and log(27) for each insert to state the new character.insert to state the new character.

Results – Expanded modelResults – Expanded model

Page 30: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

ConclusionConclusion

MML methods have managed to MML methods have managed to infer the infer the correctcorrect phylogenetic tree/graphs for phylogenetic tree/graphs for

artificial languagesartificial languages.. infer phylogenetic trees/graphs for languages by infer phylogenetic trees/graphs for languages by

encoding them in terms of one another.encoding them in terms of one another.

We can not (or can we?) conclude that one We can not (or can we?) conclude that one language really descends from another language. language really descends from another language. We can only conclude that they are We can only conclude that they are relatedrelated..

Page 31: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Future work :Future work :

Compression – grammar and vocabulary.Compression – grammar and vocabulary. Compression – phonemes of languages.Compression – phonemes of languages. Endangered languages – Indigenous languages.Endangered languages – Indigenous languages. Refine coding scheme.Refine coding scheme.

Some characters occur more frequently than others. Some characters occur more frequently than others. E.g.: English - “x” compared with “a”.E.g.: English - “x” compared with “a”.

Some characters are more likely to mutate from one language to Some characters are more likely to mutate from one language to another language.another language.

Page 32: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Questions?Questions?

Page 33: Inferring phylogenetic graphs of Natural Languages using …users.monash.edu/~dld/Publications/2005/DDowe_CAEPIA2005... · 2005-11-13 · MMMMLL inferred probabilities: Pr(Delete)

Some further reading on MMLSome further reading on MML C. S. Wallace and P. R. Freeman. Single factor analysis by MML estimation. Journal of the Royal Statistical Society. C. S. Wallace and P. R. Freeman. Single factor analysis by MML estimation. Journal of the Royal Statistical Society.

Series B, 54(1):195-209, 1992.Series B, 54(1):195-209, 1992. C. S.Wallace. Multiple factor analysis by MML estimation. Technical Report CS 95/218, Department of Computer C. S.Wallace. Multiple factor analysis by MML estimation. Technical Report CS 95/218, Department of Computer

Science, Monash University, 1995.Science, Monash University, 1995. C. S. Wallace and D. L. Dowe. MML estimation of the von Mises concentration parameter. Technical Report CS C. S. Wallace and D. L. Dowe. MML estimation of the von Mises concentration parameter. Technical Report CS

93/193, Department of Computer Science, Monash University,1993.93/193, Department of Computer Science, Monash University,1993. C. S. Wallace and D. L. Dowe. Refinements of MDL and MML coding. The Computer Journal, 42(4):330-337, 1999.C. S. Wallace and D. L. Dowe. Refinements of MDL and MML coding. The Computer Journal, 42(4):330-337, 1999. P. J. Tan and D. L. Dowe. MML inference of decision graphs with multi-way joins. In Proceedings of the 15th Australian P. J. Tan and D. L. Dowe. MML inference of decision graphs with multi-way joins. In Proceedings of the 15th Australian

Joint Conference on Artificial Intelligence, Canberra, Australia, 2-6 December 2002, published in Lecture Notes in Joint Conference on Artificial Intelligence, Canberra, Australia, 2-6 December 2002, published in Lecture Notes in Artificial Intelligence (LNAI) 2557, pages 131-142. Springer-Verlag, 2002.Artificial Intelligence (LNAI) 2557, pages 131-142. Springer-Verlag, 2002.

S. L. Needham and D. L. Dowe. Message length as an effective Ockham's razor in decision tree induction. In S. L. Needham and D. L. Dowe. Message length as an effective Ockham's razor in decision tree induction. In Proceedings of the 8th International Workshop on Artificial Intelligence and Statistics (AI+STATS 2001), Key West, Proceedings of the 8th International Workshop on Artificial Intelligence and Statistics (AI+STATS 2001), Key West, Florida, U.S.A., January 2001, pages 253-260, 2001Florida, U.S.A., January 2001, pages 253-260, 2001

Y. Agusta and D. L. Dowe. Unsupervised learning of correlated multivariate Gaussian mixture models using MML. In Y. Agusta and D. L. Dowe. Unsupervised learning of correlated multivariate Gaussian mixture models using MML. In Proceedings of the Australian Conference on Artificial Intelligence 2003, Lecture Notes in Artificial Intelligence (LNAI) Proceedings of the Australian Conference on Artificial Intelligence 2003, Lecture Notes in Artificial Intelligence (LNAI) 2903, pages 477-489. Springer-Verlag, 2003.2903, pages 477-489. Springer-Verlag, 2003.

J. W. Comley and D. L. Dowe. General Bayesian networks and asymmetric languages. In Proceedings of the Hawaii J. W. Comley and D. L. Dowe. General Bayesian networks and asymmetric languages. In Proceedings of the Hawaii International Conference on Statistics and Related Fields, June 5-8, 2003, 2003.International Conference on Statistics and Related Fields, June 5-8, 2003, 2003.

J. W. Comley and D. L. Dowe. Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric J. W. Comley and D. L. Dowe. Minimum Message Length, MDL and Generalised Bayesian Networks with Asymmetric Languages, chapter 11, pages 265-294. M.I.T. Press, 2005. [Camera ready copy submitted October 2003].Languages, chapter 11, pages 265-294. M.I.T. Press, 2005. [Camera ready copy submitted October 2003].

P. J. Tan and D. L. Dowe. MML inference of oblique decision trees. In Proc. 17th Australian Joint Conference on P. J. Tan and D. L. Dowe. MML inference of oblique decision trees. In Proc. 17th Australian Joint Conference on Artificial Intelligence (AI04), Cairns, Qld., Australia, pages 1082-1088. Springer-Verlag, December 2004.Artificial Intelligence (AI04), Cairns, Qld., Australia, pages 1082-1088. Springer-Verlag, December 2004.

www.csse.monash.edu.au/~dld/CSWallacePublicationswww.csse.monash.edu.au/~dld/CSWallacePublications


Recommended