+ All Categories
Home > Documents > Title: Agent Intelligence through Data Mining · Title: Agent Intelligence through Data Mining ......

Title: Agent Intelligence through Data Mining · Title: Agent Intelligence through Data Mining ......

Date post: 04-Apr-2018
Category:
Upload: dothu
View: 224 times
Download: 5 times
Share this document with a friend
135
Title: Agent Intelligence through Data Mining Presenters: Andreas L. Symeonidis Pericles A. Mitkas Department of Electrical and Computer Engineering Aristotle University of Thessaloniki & Laboratory of Intelligent Systems and Software Engineering, Informatics and Telematics Institute / CERTH Thessaloniki, Greece Contact Info: Department of Electrical and Computer Engineering Aristotle University of Thessaloniki -54124 Thessaloniki, Greece E-mail: [email protected] Tel.: +30 2310 99 6399 Fax: +30 2310 99 6398 E-mail: [email protected] Tel.: +30 2310 99 6390 Fax: +30 2310 99 6398 THE 17 TH EUROPEAN CONFERENCE ON MACHINE LEARNING AND THE 10 TH EUROPEAN CONFERENCE ON PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES ~ECML/PKDD 2006~ September 18 th , 2006, Berlin, Germany
Transcript

Title: Agent Intelligence through Data Mining

Presenters:

Andreas L. Symeonidis Pericles A. Mitkas Department of Electrical and Computer Engineering

Aristotle University of Thessaloniki

& Laboratory of Intelligent Systems and Software Engineering,

Informatics and Telematics Institute / CERTH

Thessaloniki, Greece Contact Info:

Department of Electrical and Computer Engineering

Aristotle University of Thessaloniki -54124

Thessaloniki, Greece

E-mail: [email protected]

Tel.: +30 2310 99 6399

Fax: +30 2310 99 6398

E-mail: [email protected]

Tel.: +30 2310 99 6390

Fax: +30 2310 99 6398

THE 17TH EUROPEAN CONFERENCEON MACHINE LEARNING

AND THE 10TH EUROPEAN CONFERENCE ON PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES

~ECML/PKDD 2006~

September 18th, 2006, Berlin, Germany

References

1. Ackley D.H. & M.L. Littman, 1992. “Learning from Natural Selection in an Artificial Environment” in Artificial Life II Video Proceedings, C.G. Langton, Ed.. Redwood City, California, Addison-Wesley..

2. Adriaans P. & D. Zantige, 1996. Data Mining. Addison-Wesley. 3. Agent Working Group, 2000. Agent Technology Green Paper, Object Management

Group. 4. Agrawal R., C. Aggarwal, & V. Prasad, 1999. "A tree projection algorithm for

generation of frequent itemsets", in Proceedings of High Performance Data Mining Workshop, Puerto Rico.

5. Agrawal R. & R. Srikant, 1994. "Fast algorithms for mining association rules", in Proceedings of the 20th VLDB Conference, Santiago, Chile, pp. 487-499.

6. Agrawal R. & R. Srikant, 1995. "Mining Sequential Patterns", in Proceedings of the International Conference on Data Engineering (ICDE), Taipei, Taiwan.

7. Amir A., R. Feldman, & R. Kashi, 1997. "A new and versatile method for association generation", Information Systems, vol. 22, no. 6-7, pp. 333-347.

8. Arthur B.W., 1994. "Inductive Reasoning and Bounded Rationality", American Economic Review, vol. 84, no. 2, pp. 406-411.

9. Athanasiadis I.N. & P.A. Mitkas, 2004. "An agent-based intelligent environmental monitoring system", Management of Environmental Quality, vol. 15, no. 3, pp. 229-237.

10. Bellifemine F., A. Poggi, & G. Rimassa, 2000. "Developing multi-agent systems with JADE", in Seventh International Workshop on Agent Theories, Architectures, and Languages, Boston MA.

11. Bigus J.P., 1996. Data Mining with Neural Networks Solving Business Problems from Application Development to Decision Support. Mc Graw-Hill.

12. Booker L., D.E. Goldberg, & J.H. Holland, 1989. "Classifier systems and genetic algorithms ", Artificial Intelligence, vol. 40, no. 1-3, pp. 235-282.

13. Bossel H., 1977. "Orientors of Nonroutine Behavior" in Concepts and Tools of Computer-Assisted Policy Analysis, H. Bossel Ed. pp. 227-265. Basel: Birkhauser, Verlag.

14. Bousquet F., C. Cambier, & P. Morand, 1994. "Distributed Artificial Intelligence and Object-Oriented Modelling of a Fishery", Mathematical Computation Modelling, vol. 20, no. 8, pp. 97-107.

A

B

15. Caglayan A., C. Harrison, & C.G. Harrison, 1997. Agent Sourcebook: A Complete Guide to Desktop, Internet, and Intranet Agents. John Wiley & Sons.

16. Carlsson C. & E.Turban, 2002. "DSS: directions for the next decade", Decision Support Systems, vol. 33, pp. 105-110.

17. Caswell H., 1989. Matrix population models: Construction, analysis, and interpretation. Sunderland, MA: Sinauer Associates.

18. Chen M.S., J. Han, & P.S. Yu, 1996. "Data Mining: An Overview from a Database Perspective", IEEE Transactions on Knowledge and Data Engineering, vol. 8, no. 6, pp. 866-883.

19. Chen Z., 1999. Computational Intelligence for Decision Support. CRC Press, Boca Raton.

20. Choy K.L., W.B. Lee, & V. Lo, 2002. "Development of a case based intelligent customer-supplier relationship management system", Expert Systems with Applications, vol. 23, no. 3, pp. 281-297.

21. Choy K.L., W.B. Lee, & V. Lo, 2003. "Design of an intelligent supplier relationship management system: a hybrid case based neural network approach", Expert Systems with Applications, vol. 24, no. 2, pp. 225-237.

22. Crist T.O. & J.W. Haefner, 1994. "Spatial Model of Movement and Foraging in Harvester Ants (Pogonomyrmex) (II): The Roles of Environment and Seed Dispersion", Journal of Theoretical Biology, vol. 166, pp. 315-323.

23. Davenport T.H., 2000. "The future of enterprise system-enabled organizations", Information Systems Frontiers, vol. 2, no. 2, pp. 163-180.

24. Dean J., 1998. "Animats and what they can tell us", Trends in Cognitive Sciences, vol. 2, no. 2, pp. 60-67.

25. DeAngelis D. L.& L.J. Gross, 1992. Individual-based models and approaches in ecology: Populations, communities and ecosystems. Chapman and Hill, New York.

26. Durrett R. & S.A. Levin, 1994. "Stochastic spatial models: A user's guide to ecological applications", Philosophical Transactions of the Royal Society of London, vol. 343, (B), pp. 329-350.

27. Epstein J.M. & R.L. Axtell, 1996. Growing Artificial Societies: Social Science from the Bottom Up. The MIT Press, Washington.

28. Farquhar A., R. Fikes, & J. Rice, 1996. "The Ontolingua Server: A tool for Collaborative Ontology Construction", Knowledge Systems Laboratory, Stanford University, Technical Report KSL-96-26.

C

D

E

F

29. Fayyad U., 1996. "Mining Databases: Towards Algorithms for Knowledge Discovery", Bulletin of the Technical Committee on Data Engineering, vol. 21, no. 1, pp. 39-48.

30. Fayyad U., G. Piatetsky-Shapiro, & P. Smyth, 1996. "Knowledge Discovery and Data Mining: Towards a unifying framework", in Proceedings of The Second International Conference on Knowledge Discovery and Data Mining, Portland, USA, pp. 82-88.

31. Ferber J., 1999. Multi-Agent Systems – An introduction to Distributed Artificial Intelligence. Addison-Wesley, London.

32. Fernandes A.A.A., 2000. "Combining Inductive and Deductive Inference in Knowledge Management Tasks", in Proceedings of the 11th International Workshop on Database and Expert Systems Applications - TAKMA 2000, IEEE Computer Society, pp. 1109-1114.

33. Freitas A.A., 1999. "On Rule Interestingness measures", Knowledge-Based Systems, vol. 12, no. 5-6, pp. 309-315.

34. Friedman-Hill E.J., 2003. Jess, The Expert System Shell for the Java Platform, version 6.1. Available: http://herzberg.ca.sandia.gov/jess.

35. Galitsky B. & R. Pampapathi, 2003. "Deductive and inductive reasoning for processing the claims of unsatisfied customers", in Proceedings of the 16th Int. Conf. on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems (IEA/AIE 2003), Springer-Verlag, Heidelberg, pp. 21-30.

36. Ganti V., J. Gehrke, & R. Ramakrishman, 1999. "Mining Very Large Databases", Computer Magazine, vol. 32, no. 8, pp. 38-45.

37. Gasser L., 1991. "Social Conceptions of Knowledge and Action: DAI Foundations and Open Systems Semantics", Artificial Intelligence, vol. 47, pp. 107-138.

38. Genesereth M.R. & S. Ketchpel, 1994. "Software agents", Communications of the ACM, vol. 37, no. 7, pp. 48-53.

39. Goldberg D.E., 1989. Genetic Algorithms in Search, Optimization & Machine Learning. Addison-Wesley, Massachusetts.

40. Haeckel S.H. & R. Nolan, 1994. "Managing by wire", Harvard Business Review. 41. Haefner J.W. & T.O. Crist, 1994. "Spatial Model of Movement and Foraging in

Harvester Ants (Pogonomyrmex) (I): The Roles of Memory and Communication", Journal of Theoretical Biology, vol. 166, pp. 299-313.

42. Han J. & M. Kamber, 2001. Data Mining: Concepts and Techniques. Morgan Kaufmann, Burnaby.

43. Hillbrand E. & J. Stender, 1994. Many-Agent simulation and Artificial Life. IOS Press. 44. Holland J.H., 1975. Adaptation in Natural and Artificial Systems. The University of

Michigan Press, Ann Arbor. 45. Holland J.H., 1987. "Genetic Algorithms and Classifier Systems: Foundations and

Future Directions", in Proceedings of the second international conference on genetic algorithms and their applications, Lawrence Erlbaum Associates, Hillsdale, New Jersey, pp. 82-89.

G

H

46. Holland J.H., 1995. Hidden order: How adaptation builds complexity. Addison-Wesley, Reading, MA.

47. Holsapple C.W.& M.P. Sena, 2004. "ERP plans and decision-support benefits", Decision Support Systems, to be published.

48. Hraber P.T., T. Jones, & S. Forrest, 1997. "The Ecology of Echo" in Artificial Life III, C.G. Langton Ed. Longman, Addison Wesley, pp. 165-190.

49. Information Discovery Inc 1999. Datamines for Data Warehousing.

50. Jennings N.R., 1993. "Commitements and Conventions: The Foundation of Coordination in Multi-Agent Systems", The Knowledge Engineering Review, vol. 2, no. 3, pp. 223-250.

51. Jennings N.R., J. Corera, I. Laresgoiti, E.H. Mamdani, F. Perriolat, P. Sharek, & L.Z. Varga 1996. "Using ARCHON to develop real-world DAI applications for electricity transportation management and particle accelarator control", IEEE Expert.

52. Jennings N.R., K. Sycara, & M.J. Wooldridge, 1998. "A roadmap of agent research and development", International Journal of Autonomopus Agents and Multi-Agent Systems, vol. 1, pp. 7-38.

53. Kaelbling L.P. & S.J.Rosenschein, 1990. Action and planning in embedded agents. The MIT Press, Cambridge.

54. Kargupta H., I. Hamzaoglou, & B. Stafford, 1996. "PADMA: PArallel Data Mining Agents for scalable text classification" in the Proceedings of High Performance Computing.

55. Kero B., L. Russell, S. Tsur, & W.M. Shen, 1995. "An Overview of Data Mining Technologies", in the KDD Workshop in the 4th International Conference on Deductive and Object-Oriented Databases, Singapore.

56. Knapik M. & J.Johnson, 1998. Developing Intelligent Agents for Distributed Systems. McGraw Hill.

57. Kodratoff Y., 1988. Introduction to Machine Learning. Pitman Publishing, London. 58. Koonce D.A., C-H. Fang, & S-C. Tsai, 1997. "A Data Mining tool for Manufacturing

Systems", Computers ind.Engineering, vol. 33, no. 1-2, pp. 27-30. 59. Krebs F. & H. Bossel, 1996. "Emergent value orientation in self-organization of an

animat", Ecological Modelling, vol. 96, pp. 143-164. 60. Kwon O.B. & J.J. Lee, 2001. "A multi agent intelligent system for efficient ERP

maintenance", Expert Systems with Applications, vol. 21, pp. 191-202.

I

J

K

61. Langton C.G., 1994. Personal Communication. 62. Lee C., 1961. "An algorithm for path connections and its applications", IRE Trans

Electron.Computers, vol. 10, pp. 346-365. 63. Levi S.D., P. Kaminsky, & S.E. Levi, 2000. Designing and managing the supply chain.

McGraw-Hill, Illinois. 64. Looney C.G., 1997. Pattern Recognition Using Neural Networks: Theory and

Algorithms for Engineers and Scientists. Oxford University Press.

65. MacQueen J., 1967. "Some methods for classification and analysis of multivariate observations", in Proceedings of Fifth Berkeley Symposium on Mathematical Statistics and Probability, Berkeley, pp. 281-297.

66. Mahalingam K. & M.N. Huhns, 1997. "An Ontology Tool for Distributed Information Environments", IEEE Computer, vol. 30, no. 6, pp. 80-83.

67. Malone T.W., 1998. "Inventing the organizations of the twentieth first century: control, empowerment and information technology", in Sense and Respond: Capturing Value in the Network Era, S.P. Bradley & R. Nolan, Eds. Harvard Business School Press, Boston MA, pp. 263-284.

68. May R.M., 1973. Stability and Complexity in model ecosystems Princeton University Press, Princeton, N. J.

69. Mitkas P.A., A.L. Symeonidis, D. Kehagias, & I. Athanasiadis, 2002. "An agent framework for dynamic agent retraining: Agent academy", in Challenges and Achievements in e-business and e-work Prague, pp. 757-764.

70. Mitkas P.A., D. Kehagias, A.L. Symeonidis, & I. Athanasiadis, 2003. "A Framework for Constructing Multi-Agent Applications and Training Intelligent Agents", in Proceedings of the 4th International Workshop on Agent-Oriented Software Engineering (AOSE-2003), Springer-Verlag, Melbourne, Australia, pp. 1-16.

71. Mobasher B., 1999. "A Web personalization engine based on user transaction clustering", in Proceedings of the 9th Workshop on Information Technologies and Systems (WITS'99).

72. Mobasher B., R. Cooley, & J. Srivastava, 1999. "Creating adaptive web sites through usage-based clustering of URLs" in IEEE Knowledge and Data Engineering Workshop (KDEX'99).

73. Mobasher B., R. Cooley, & J. Srivastava, 2000. "Automatic personalization based on Web usage mining", Communications of the ACM, vol. 43, no. 8.

74. Mohammadian M., 2004. Intelligent Agents for Data Mining and Information Retrieval. Idea Group Inc..

75. Murrel D.J., J.M.J. Travis, & C. Dytham, 2002. "The evolution of dispersal distance in spatially-structured populations", Oikos, vol. 97, pp. 229-236.

L

M

76. Nwana H.S., 1995. "Software Agents: An Overview", The Knowledge Engineering Review, vol. 11, no. 3, pp. 205-244.

77. O' Conner M. & J. Herlocker, 1999. "Clustering items for collaborative filtering", in Proceedings of the ACM SIGIR Workshop on Recommender Systems, Berkeley, CA.

78. Papoulis A. 1991. Probability, Random Variables, and Stochastic Processes. McGraw-Hill.

79. Pecala S.W., 1986. "Neighborhood models of plant population dynamics. 2. Multispecies models of annuals", Theoretical Population Biology, vol. 29, pp. 262-292.

80. Peng Y., T. Finin, Y. Labrou, B. Chu, W. Tolone, & A. Boughannam, 1999. "A multi agent system for enterprise integration", Applied Artificial Intelligence, vol. 13, no. 1-2, pp. 39-63.

81. Perkowitz M. & O. Etzioni, 1998. "Adaptive Web sites: automatically synthesizing Web pages", in Proceedings of Fifteenth National Conference on Artificial Intelligence, Madison, WI.

82. Pilot Software 1999, White Paper: An introduction to Data Mining.

Q 83. Quinlan, J.R., 1993. C4.5: Programs for Machine Learning. San Mateo, Morgan

Kaufmann.

R 84. Ray T.S., 1992. "An approach to the synthesis of life" in Artificial Life II, C.G.

Langton, C. Taylor, J.D. Farmer, and S. Rasmussen, Eds. Redwood City, CA, Addison-Wesley pp. 371-408.

85. Rosenschein J.S. & G. Zlotkin, 1994. "Designing Conventions for Automated Negotiation", AI Magazine, pp. 29-46.

86. Rousset F. & S. Gandon, 2002. "Evolution of the distribution of dispersal distance under distance-dependent cost of dispersal", Journal of Evolutionary Biology, vol. 15, pp. 515-523.

87. Rust R.T., V.A. Zeithaml, & K. Lemon, 2000. Driving customer Equity: How customer lifetime value is reshaping corporate strategy. The Free Press, New York.

88. Rygielsky C., J.C. Wang, & D.C. Yen, 2002. "Data mining techniques for customer relationship management", Technology in Society, vol. 24, no. 4, pp. 483-502.

N

O

P

S 89. Shahabi C., A. Zarkesh, J. Adibi, & V. Shah, 1997. "Knowledge discovery from users

Web-page navigation", in Proceedings of Workshop on Research Issues in Data Engineering, Birmingham, England.

90. Shapiro J.,1999. "Bottom-up vs. top-down approaches to supply chain modeling", in Quantitative models for supply chain management, S. Tayur, R. Ganeshan, and M. Magazine Eds. Kluwer Publishing, pp. 737-759.

91. Simon H., 1996. The Sciences of the Artificial. MIT Press, MA, Cambridge. 92. Singh M.P., 1997. "Considerations on Agent Communication", in FIPA Workshop,

FIPA97. 93. Spiliopoulou M. & L.C. Faulstich, 1999. "WUM: A Web Utilization Miner", in

Proceedings of EDBT Workshop WebDB98, Valencia, Spain. 94. Spiliopoulou M., C. Pohle, & L.C. Faulstich, 1999. "Improving the effectiveness of a

Web site with Web usage mining", in Workshop on Web Usage Analysis and User Profiling (WebKKD99), San Diego.

95. Stolfo S.J., A.L. Prodromidis, S. Tselepis, W. Lee, D.W. Fan, & P.K. Chan, 1997. "Jam: Java agents for meta-learning over distributed databases", in Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, AAAI Press Publisher, Newport Beach, CA , pp. 74-81.

96. Symeonidis A.L., D. Kehagias, & P.A. Mitkas, 2003. "Intelligent policy recommendations on enterprise resource planning by the use of agent technology and data mining techniques", Expert Systems with Applications, vol. 25, no. 4, pp. 589-602.

97. Symeonidis A.L., P.A. Mitkas, & D. Kehagias, 2002. "Mining patterns and rules for improving agent intelligence through an integrated multi-agent platform", in 6th IASTED International Conference, Artificial Intelligence and Soft Computing, Banff, Alberta, Canada.

T 98. Talavera L. & U. Cortes, 1997. "Inductive hypothesis validation and bias selection in

unsupervised learning", in Proceedings of the 4th European Symposium on the Validation and Verification of Knowledge Based Systems, Leuven, Belgium, pp. 169-179.

99. The Data Mining Group, 2001. Predictive Model Markup Language Specifications (PMML), ver. 2.0. Available: http://www.dmg.org.

100. The FIPA Foundations, 2000. Foundation for Intelligent Physical Agents Specifications. Available: http://www.fipa.org.

101. The FIPA Foundations. FIPA-SL Specifications, 2000. FIPA SL Content Language Specification. Available: http://www.fipa.org/specs/fipa00008/SC00008I.html.

102. Turney P.D., 1993. "Robust Classification With Context-Sensitive Features", in 6th International Conference on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, pp. 268-276.

U 103. UCI Group, 2004. UCI Machine Learning Repository. Available:

http://www.ics.uci.edu/~mlearn/MLRepository.html.

W 104. Webopedia, 2003. Online dictionary for computer and internet terms. Available:

http://www.webopedia.com. 105. Weiss G., 2000. Multiagent Systems: A Modern Approach to Artificial Intelligence. The

MIT Press, Massachuetts, USA. 106. Werner G.M. & M.G. Dyer, 1994. "Bioland: A Massively Parallel Simulation

Environment for Evolving Distributed Foms of Intelligent Behavior", in Massively Parallel Artificial Intelligence, H. Kitano and J.A. Handler Eds. Menlo Park, California, AAAI Press/MIT Press.

107. Westerberg L. & U. Wennergren, 2003. "Predicting the spatial distribution of a population in a heterogeneous landscape", Ecological Modelling, vol. 166, pp. 53-65.

108. Wilson S.W., 1987. "Classifier Systems and the Animat Problem", Machine Learning, vol. 2, pp. 199-228.

109. Wilson S.W., 1991. "The Animat Path to AI", in From Animals to Animats: Proceedings of the First International Conference on the Simulation of Adaptive Behavior, J.A. Meyer and S.W. Wilson Eds. Cambridge, Massachussets, The MIT Press/Bradford Books.

110. Wilson S.W. & D.E. Goldberg, 1989. "A Critical Review of Classifier Systems", Proceedings of the Third International Conference on Genetic Algorithms, Morgan Kaufmann, Los Altos, California, pp. 244-255.

111. Witten I.H. & E. Frank, 1999. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufman, New Zealand.

112. Wooldridge M. & N.R. Jennings, 1995. "Intelligent agents: Theory and practice.", The Knowledge Engineering Review, vol. 10, no. 2, pp. 115-152.

113. Wooldridge M., 1999. "Intelligent Agents". In Multiagent Systems, G. Weiss Ed. The MIT Press.

114. Worley J.H., G.R. Castillo, L. Geneste, & B. Grabot, 2002. "Adding decision support to workflow systems by reusable standard software components", Computers in Industry, vol. 49, pp. 123-140.

Y 115. Yeager L., 1994. "Computational Genetics, Physiology, Metabolism, Neural Systems,

Learning, Vision, and Behavior, or Polyworld: Life in a New Context", in Artificial Life III, C.G.Langton Ed. Redwood City, California, Addison-Wesley.

Z 116. Zhang Z., C. Zhang, & S. Zhang, 2003. "An agent-based hybrid framework for database

mining", Applied Artificial Intelligence, vol. 17, pp. 383-398.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 1

Agent Intelligence through Data Mining

18/09/2006 1ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agent Intelligence Agent Intelligence through through

Data MiningData MiningAndreas L. Symeonidis

Research AssociateInformatics & Telematics InstituteCenter for Engineering Research and Technology – Hellas (CERTH) Email:[email protected]

Pericles A. MitkasAssociate Professor

Electrical and Computer EngineeringAristotle Univ. of Thessaloniki (AUTh)

and CERTHEmail: [email protected]

Thessaloniki, GREECE

Agent Intelligence through Data Mining

18/09/2006 2ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The goal of the tutorialThe goal of the tutorial……

Reasoning Agent

Software Agent Paradigm

Knowledge Discovery Process

Historical Data Knowledge Model

ApplicationDomain

Agent Modelling

AgentType

InferenceStructure

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 2

Agent Intelligence through Data Mining

18/09/2006 3ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Related TechnologiesRelated Technologies

Data Mining (DM): the extraction of interesting non-trivial, implicit, previously unknown and potentially useful information or patterns from data in large databases.

Software Agent (SA): a software entity that acts autonomously (on behalf of another entity) in a goal-oriented manner. It is able to perceive its environment through sensors and act on it through effectors.

Multi-agent Systems (MAS): a Software Engineering methodology for developing applications with the deployment of agents and agent primitives.

Agent Intelligence through Data Mining

18/09/2006 4ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Merging TechnologiesMerging Technologies (2/2)(2/2)

Software agents have been repeatedly used for executing DM tasks, but

DM results have not, yet, been dynamically incorporated to MAS

The reason: The inductive nature of DM and the lack of the appropriate tools hinders the unflustered

incorporation of knowledge to MAS

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 3

Agent Intelligence through Data Mining

18/09/2006 5ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Presentation Outline (1/3)Presentation Outline (1/3)

1 Basic Primitives of DM technology

Data Preprocessing

DM Techniques:

Cluster Analysis – Unsupervised Learning

Classification – Supervised Learning

Association Rules

2 Data Mining & Semantics

3 Embedding Domain Knowledge into DM

Agent Intelligence through Data Mining

18/09/2006 6ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Presentation Outline (2/3)Presentation Outline (2/3)

4 Data Mining Applications & Trends

5 Intelligent Agents

Definitions

Attributes & Communication

6 Agent Intelligence Infusion

The Levels of Intelligence Infusion

Tools

Methodologies

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 4

Agent Intelligence through Data Mining

18/09/2006 7ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Presentation Outline (3/3)Presentation Outline (3/3)

7 MAS exploiting DM extracted intelligence

An ERP add-on for intelligent CRM/SRM

A near real-time EMS

A decentralized maintenance management system

A self-organizing MAS, “in danger”

An agent-based, e-auction system

8 Open Issues

Agent Intelligence through Data Mining

18/09/2006 8ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Part 1 Part 1 -- IntroductionIntroduction

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 5

Agent Intelligence through Data Mining

18/09/2006 10ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

MotivationMotivationData explosion problemAutomated data collection tools and mature database technology lead to tremendous amounts of data stored in databases, data warehouses and other information repositories

Data warehousing and data mining

Data warehousing and on-line analytical processing

Extraction of interesting knowledge (rules, regularities, patterns, constraints) from data in large databases

Knowledge StarvationThe need to see through and interpret all this “useless” data

Parts of the KDDKDD process

Agent Intelligence through Data Mining

18/09/2006 11ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The KDD processThe KDD process

What is KDD?

Knowledge Discovery in Databases (KDD) is the extraction of interesting non-trivial, implicit, previously unknown and potentially useful information or patterns from data in large databases.

What is Data Mining?Data Mining is the most important step in the KDD process, consisting of applying data analysis and discovery algorithms that, under acceptable computational efficiency limitations, produce a particular enumeration of patterns over the data.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 6

Agent Intelligence through Data Mining

18/09/2006 12ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data Mining related technologiesData Mining related technologies

Machine LearningMachine LearningKnowledge Extraction Knowledge Extraction –– Extended retrievalExtended retrievalData/Pattern AnalysisData/Pattern AnalysisStatistical Analysis Statistical Analysis Business IntelligenceBusiness Intelligence

(Deductive) query processing. (Deductive) query processing. Expert systems or small ML/statistical programsExpert systems or small ML/statistical programs

What is not data mining

Agent Intelligence through Data Mining

18/09/2006 13ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data Mining vs. Machine LearningData Mining vs. Machine Learning

The size of the dataset is different:The size of the dataset is different:For machine learning, datasets are loaded to main memory – thus smallFor data mining, there is no such restriction (usually large datasets)

The objective is different:The objective is different:ML focuses on the inference mechanisms involved in the learning processDM focuses on the business exploitation of extracted results

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 7

Agent Intelligence through Data Mining

18/09/2006 14ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data Mining vs. Extended RetrievalData Mining vs. Extended Retrieval

The approach is different: The approach is different: Extended Retrieval (ER) is based on individual examples (retrieved and stored as analogs), whereas DM is ONLYinterested in a flood of data.

The extracted knowledge is different:The extracted knowledge is different:DM generated knowledge contains the condensed information extracted from structured databases. In the case of ER knowledge comes through the mapping of structure information

ER and DM are complementary approachesER and DM are complementary approaches

Agent Intelligence through Data Mining

18/09/2006 15ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data Mining vs. Statistical AnalysisData Mining vs. Statistical Analysis

The objective of statistical data analysis:The objective of statistical data analysis:To model the underlying structures in order to lead to evaluation hypothesis.

For SA Computational efficiency is not a For SA Computational efficiency is not a concernconcern

Data for SA are static and clean and datasets Data for SA are static and clean and datasets are smallare smallThe inference procedure is different:The inference procedure is different:

SA procedure involves repeated sampling under a given statistical model from an unknown distribution of the dataDM seeks to identify modeling procedures with a high probability of near-optimality over all possible distributions of data.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 8

Agent Intelligence through Data Mining

18/09/2006 16ECML/PKDD 2006, Berlin, Germany

AUTHAUTHData Mining: Confluence of Multiple Data Mining: Confluence of Multiple DisciplinesDisciplines

Data Mining

Database Technology Statistics

OtherDisciplines

InformationScience

MachineLearning Visualization

Agent Intelligence through Data Mining

18/09/2006 17ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Why data Mining?Why data Mining?

Data mining is a Data mining is a computercomputer--drivendriven application, as it application, as it is performed by computers and not by humans.is performed by computers and not by humans.

Data mining solves the Data mining solves the ““query formulationquery formulation””problem.problem.

Data mining confronts the visualisation and Data mining confronts the visualisation and understanding of large data sets.understanding of large data sets.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 9

Agent Intelligence through Data Mining

18/09/2006 18ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data Mining

Proactive and summarizing data analysis

0000’’ss

Data Warehousing & DSS

Multiple-level dynamic and summarizing data analysis

9090’’ss

Data Access

Dynamic and summarizingdata analysis

8080’’ss

The evolutionThe evolution……

Data Collection

Static and summarizingdata analysis

6060’’ss

Agent Intelligence through Data Mining

18/09/2006 19ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The KDD process in detail (1/3)The KDD process in detail (1/3)

1. Identify the goal of the KDD process:Develop an understanding of the application domain and the relevant prior knowledge.

2. Create a target data set:Select a data set, or focus on a subset of variables or data samples, on which discovery will be performed.

3. Clean and pre-process data:Remove noise, handle missing data fields, account for time sequence information and known changes.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 10

Agent Intelligence through Data Mining

18/09/2006 20ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

4. Reduce and project data:Find useful features to represent the data depending on the goal of the task.

5. Identify data mining method:Match the goals of the KDD process to a particular data mining method: e.g. summarization, classification, regression, clustering, etc.

6. Choose a data mining algorithm:Select method(s) to be used for searching for patterns in the data.

7. Apply data mining

The KDD process in detail (2/3)The KDD process in detail (2/3)

Agent Intelligence through Data Mining

18/09/2006 21ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

8. Evaluate data mining results:Interpret mined patterns, possibly return to steps 1-7 for further iteration.

9. Consolidate discovered knowledge:Incorporate this knowledge into another system for further action, or simply document it and report it to interested parties.

The KDD process in detail (3/3)The KDD process in detail (3/3)

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 11

Agent Intelligence through Data Mining

18/09/2006 22ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

TaskTask--relevant relevant

DataData

Pattern EvaluationPattern Evaluation

Data IntegrationData Integration

Unformatted Unformatted datadata

Data Data WarehouseWarehouse

Data CleaningData Cleaning

SelectionSelection

A graphical representation of the A graphical representation of the KDD processKDD process

KKnnoowwlleeddggee

Data MiningData Mining

Agent Intelligence through Data Mining

18/09/2006 23ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Relational databases

Data warehouses

Transactional databases

Object-oriented and object-relational databases

Spatial databases

Data Mining: On What Kind of Data? (1/2)Data Mining: On What Kind of Data? (1/2)

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 12

Agent Intelligence through Data Mining

18/09/2006 24ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data Mining: On What Kind of Data? (2/2)Data Mining: On What Kind of Data? (2/2)

Time-series data and temporal data

Text databases and multimedia databases

Heterogeneous and legacy databases

Genomics databases

World Wide Web

Agent Intelligence through Data Mining

18/09/2006 26ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data Mining Functionalities (1/6)Data Mining Functionalities (1/6)

Concept descriptionData characterization: Summarize the features of the class under study (target class) in general terms.

E.g. Summarize the characteristics of customers who spent more than $1000 during 2003.

Data discrimination: Compare the feature of the target class with one or a set of comparative classes (contrasting classes).

E.g. Create a comparative profile of customers that shop often to customers that shop rarely in our store

Usual concept description outputs are:Characteristic rulesDiscriminant rules

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 13

Agent Intelligence through Data Mining

18/09/2006 27ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data Mining Functionalities (2/6)Data Mining Functionalities (2/6)Association (correlation and causality)

The discovery of association rules showing attribute-value conditions that frequently occur

Widely used for market-basket analysis and transaction data analysis

Example:If the age of a customer is between 20 and 29 and his income is between $20.000 and $29.000, then he buys a PC with a certainty of 60%:

Age(“20-29”) ^ Income(“$20..$29K”) ⇒Buys(“PC”)

[support = 2%, confidence = 60%]

Agent Intelligence through Data Mining

18/09/2006 28ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data Mining Functionalities (3/6)Data Mining Functionalities (3/6)

Cluster analysisThe identification of a finite set of categories or clusters to

describe the data.

Class label is unknown: The training data does not have any!!!

Clustering principle: Objects that belong to the same cluster

must be similar to each other, while objects that belong to

different clusters must be dissimilar to each other.

ExampleIdentify homogeneous subpopulations of customers.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 14

Agent Intelligence through Data Mining

18/09/2006 29ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data Mining Functionalities (4/6)Data Mining Functionalities (4/6)

Classification and Prediction The process of finding a set of models (or functions) that describe and distinguish data classes or concepts.The derived model is based on the analysis of training data, whose class label is known.The derived model can be presented as:

Classification rules (If-Then rules)Decision tree (A flow-chart-like tree structure) Mathematical formulaeNeural Networks

ExamplesClassify countries based on climateClassify cars based on gas mileage

Agent Intelligence through Data Mining

18/09/2006 30ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data Mining Functionalities (5/6)Data Mining Functionalities (5/6)

Outlier analysisIdentify outliers in a set of data.

Outlier: a data object that does not comply with the general

behavior of the data

It can be considered as noise or exception but is quite useful in

fraud detection, rare events analysis

Example

Discovery of fraudulent usage of credit cards.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 15

Agent Intelligence through Data Mining

18/09/2006 31ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data Mining Functionalities (6/6)Data Mining Functionalities (6/6)

Trend and evolution analysisTrend and deviation: regression analysis

Sequential pattern mining, periodicity analysis

Similarity-based analysis

Other pattern-directed or statistical analysis

Agent Intelligence through Data Mining

18/09/2006 32ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Are all the Are all the ““DiscoveredDiscovered”” Patterns Interesting?Patterns Interesting?

A data mining system/query may generate thousands of patterns. Not all of them are interesting!!!

InterestingnessA pattern is interesting if it is easily understood by humans, valid on new or test data with some degree of certainty, potentially useful, novel, or validates some hypothesis that a user seeks to confirm.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 16

Agent Intelligence through Data Mining

18/09/2006 33ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Are all the Are all the ““DiscoveredDiscovered”” Patterns Interesting?Patterns Interesting?

Different types of InterestingnessObjective: based on statistics and the structure of discovered patterns

Support,

Confidence, etc.

Subjective: based on user’s belief in the data

Unexpected,

Novel,

Actionable, etc.

An interesting pattern represents knowledge

Agent Intelligence through Data Mining

18/09/2006 34ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Major Issues in Data Mining (1/3)Major Issues in Data Mining (1/3)

Mining methodology and user interactionMining different kinds of knowledge in databasesInteractive mining of knowledge at multiple levels of abstractionIncorporation of background knowledgeData mining query languages and ad-hoc data miningExpression and visualization of data mining resultsHandling noise and incomplete dataPattern evaluation: the interestingness problem

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 17

Agent Intelligence through Data Mining

18/09/2006 35ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Major Issues in Data Mining (2/3)Major Issues in Data Mining (2/3)

Performance and scalabilityEfficiency and scalability of data mining algorithmsParallel, distributed and incremental mining methods

Issues relating to the diversity of data typesHandling relational and complex types of dataMining information from heterogeneous databases and global information systems (WWW)

Agent Intelligence through Data Mining

18/09/2006 36ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Major Issues in Data Mining (3/3)Major Issues in Data Mining (3/3)

Issues related to applications and social impactsApplication of discovered knowledge

Domain-specific data mining tools

Intelligent query answering

Process control and decision making

Integration of the discovered knowledge with existing knowledge: A knowledge fusion problem

Protection of data security, integrity, and privacy

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 18

Agent Intelligence through Data Mining

18/09/2006 38ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

SummarySummary

Mining can be performed on a variety of information repositoriesData mining functionalities: characterization, discrimination, association, classification, clustering, outlier and trend analysis, etc.Classification of data mining systemsMajor issues in data mining

Agent Intelligence through Data Mining

18/09/2006 39ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Part 2Part 2--Data Data PreprocessingPreprocessing

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 19

Agent Intelligence through Data Mining

18/09/2006 41ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

1. Data cleaning

2. Data integration

3. Data transformation

4. Data reduction

5. Data discretization

Major Tasks in Data PreprocessingMajor Tasks in Data Preprocessing

Agent Intelligence through Data Mining

18/09/2006 42ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data cleaning tasks

Fill in missing values

Identify outliers and smooth out noisy data

Correct inconsistent data

1. Data Cleaning1. Data Cleaning

Real-world data tend to be “dirty” and incomplete…

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 20

Agent Intelligence through Data Mining

18/09/2006 43ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data is not always availableMany tuples have no recorded value for several attributes,

i.e. customer income in sales data

Missing data may be due to:Equipment malfunction

Inconsistency with other recorded data and thus deleted

Data not entered due to misunderstanding

Certain data may not be considered important at the time of entry

History or changes of the data may not be registered

Missing data may need to be inferred.

Missing DataMissing Data

Agent Intelligence through Data Mining

18/09/2006 45ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Noise: Random error or Variance in a measured variable.

Incorrect attribute values may be due to:Faulty data collection instrumentsData entry problemsData transmission problemsTechnology limitationInconsistency in naming convention

Other data problems which require data cleaning:Duplicate recordsIncomplete dataInconsistent data

Noisy DataNoisy Data

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 21

Agent Intelligence through Data Mining

18/09/2006 47ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data integration: Integration of multiple databases, data cubes, or files. Data integration combines data from multiple sources into a coherent store.

Schema integrationIntegrate metadata from different sourcesEntity identification problem: identify real world entities from

multiple data sources, e.g., A.cust-id ≡ B.cust-id#

Detecting and resolving data value conflictsFor the same real world entity, attribute values from different

sources are differentPossible reasons: different representations, different scales, e.g.,

metric vs. British units

2. Data Integration2. Data Integration

Agent Intelligence through Data Mining

18/09/2006 49ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Smoothing: remove noise from data

Aggregation: summarization, data cube construction

Generalization: concept hierarchy climbing

Normalization:scaled to fall within a small, specified range:

min-max normalization

z-score normalization

normalization by decimal scaling

Attribute/feature construction: New attributes constructed from the given ones

3. Data Transformation3. Data Transformation

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 22

Agent Intelligence through Data Mining

18/09/2006 51ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data reduction: Obtains a reduced representation of the data set that is much smaller in volume, but yet produces the same (or almost the same) analytical results

Data reduction strategies

Data cube aggregation

Dimensionality reduction

Numerosity reduction

Discretization and concept hierarchy generation

4. Data Reduction Strategies4. Data Reduction Strategies

Agent Intelligence through Data Mining

18/09/2006 52ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

5. Data 5. Data DiscretizationDiscretization (1/2)(1/2)

Part of data reduction but with particular importance

Discretization methods are applied both on numericand categorical data

Discretization methods for categorical data

Specification of a partial ordering of attributes explicitly at the schema level by users or experts

Specification of a portion of a hierarchy by explicit data grouping

Specification of a set of attributes, but not of their partial ordering

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 23

Agent Intelligence through Data Mining

18/09/2006 54ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Mainstream DM techniquesMainstream DM techniques

Clustering

Classification

Association Rules

Agent Intelligence through Data Mining

18/09/2006 55ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Part 3 Part 3 –– ClusteringClustering(Unsupervised Learning)(Unsupervised Learning)

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 24

Agent Intelligence through Data Mining

18/09/2006 56ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Cluster AnalysisCluster Analysis

The identification of a finite set of categories or clusters to describe the data.

Agent Intelligence through Data Mining

18/09/2006 57ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

What is a Cluster?What is a Cluster?

Cluster is the collection of data objects that are:

Similar to one another within the same cluster

Dissimilar to the objects in other clusters

Clustering: The BasicsClustering: The Basics……

Clustering as learning:Clustering as learning:

Clustering is unsupervised learning, which means that there are no predefined classes and no examplesthat would show what kind of desirable relations should be valid among the data.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 25

Agent Intelligence through Data Mining

18/09/2006 58ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Clustering criteriaClustering criteria

High intra-cluster similarityLow inter-cluster similarity

Cluster 1

x

x

Cluster 3

Cluster 2

x

Inter-class similarity

Intra-classsimilarity

Agent Intelligence through Data Mining

18/09/2006 59ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Requirements of Clustering in Requirements of Clustering in Data MiningData Mining

Scalability

Ability to deal with different types of attributes

Discovery of clusters with arbitrary shape

Minimal requirements for domain knowledge to determine input parameters

Able to deal with noise and outliers

Insensitive to order of input records

High dimensionality

Interpretability and usability

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 26

Agent Intelligence through Data Mining

18/09/2006 60ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Steps to develop a clustering process Steps to develop a clustering process

1 1 Feature selectionFeature selectionSelect properly the features on which clustering is to be performed

2 2 Clustering algorithmClustering algorithmProximity measure. Clustering criterion

3 3 Validation of the resultsValidation of the resultsThe correctness of clustering algorithm results is verified using appropriate criteria and techniques

4 4 Interpretation of the resultsInterpretation of the results

Agent Intelligence through Data Mining

18/09/2006 61ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Technique usedTechnique used in order to define clustersPartitional MethodsHierarchical MethodsDensity-Based MethodsGrid-Based Methods

The type of variablestype of variablesStatistical – Numerical DataConceptual- Categorical Data

TheoryTheory used in order to extract clustersFuzzy ClusteringCrisp ClusteringKohonen net clustering

A Categorization of Clustering MethodsA Categorization of Clustering Methods

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 27

Agent Intelligence through Data Mining

18/09/2006 62ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

PartitionalPartitional method:method: Decompose the data set into a set of k disjoint clusters.

Problem DefinitionProblem Definition

Given an integer k, find a partition of k clusters that optimizes the chosen partitioning criterion

Partitioning Algorithms: Basic ConceptPartitioning Algorithms: Basic Concept

ℜ: Best partitioning algorithm representative:

The KK--MeansMeans Method

Agent Intelligence through Data Mining

18/09/2006 63ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

E x mK k c xk

k= −∑ ( )

2

The KThe K--Means Clustering MethodMeans Clustering Method

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 28

Agent Intelligence through Data Mining

18/09/2006 64ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

AdvantagesAdvantagesRelatively efficient: O(tkn), where n is the number of objects,

k is the number of clusters, and t is the number of iterations.

Normally, k, t << n.

Often terminates at a local optimum.

ProblemsProblemsApplicable only to numerical data sets

Need to specify the number of clusters in advance

Unable to handle noisy data and outliers

Not suitable to discover clusters with non-convex shapes

Comments on the KComments on the K--Means MethodMeans Method

Agent Intelligence through Data Mining

18/09/2006 65ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

A way of overcoming the K-Means Method problems is the K-Medoids method

The KThe K--MedoidsMedoids Clustering MethodClustering Method

Medoid: a representative object in clusters.

Most Known K-Medoids Algorithms:

PAMPAM (Kaufmann & (Kaufmann & RousseeuwRousseeuw, 1987), 1987)

CLARACLARA (Kaufmann & (Kaufmann & RousseeuwRousseeuw, 1990), 1990)

CLARANSCLARANS (Ng & Han, 1994): Randomized sampling(Ng & Han, 1994): Randomized sampling

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 29

Agent Intelligence through Data Mining

18/09/2006 66ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Hierarchical Clustering AlgorithmsHierarchical Clustering Algorithms

BIRCH (1996):BIRCH (1996): uses CF-tree and incrementally adjusts the quality of sub-clusters

CURE (1998):CURE (1998): is robust to outliers and identifies clusters of non-spherical shapes.

ROCK (1999):ROCK (1999): is a robust clustering algorithm for Boolean and categorical data. It introduces two new concepts, that is a point's neighboursneighbours and linkslinks.

Agent Intelligence through Data Mining

18/09/2006 68ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

BIRCH BIRCH -- CFCF

What is the CF? A triplet summarizing information about subclusters of objects. The CF of a subcluster is defined as:

CF = (N, LS, SS)

N: Number of data points

LS: ∑Ni=1=Xi SS: ∑N

i=1=Xi2

Scales linearly:Scales linearly: Finds a good clustering with a single scan and improves the quality with a few additional scans.

Problem:Problem: Handles only numeric data, and is sensitive to the order of the data record.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 30

Agent Intelligence through Data Mining

18/09/2006 70ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

ClusteringClustering based on density (local cluster criterion), such as density-connected points.

Major features:Discover clusters of arbitrary shapeHandle noiseNeed density parameters as termination condition

Representative algorithms:

DBSCANDBSCAN: Ester, et al. (KDD’96)DENCLUEDENCLUE: Hinneburg & D. Keim (KDD’98)

DensityDensity--Based Clustering AlgorithmsBased Clustering Algorithms

Agent Intelligence through Data Mining

18/09/2006 71ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Relies on a densitydensity--basedbased notion of cluster: A clusterclusteris defined as a maximal set of density-connected points

Discovers clusters of arbitrary shape in spatial databases with noise

Core

Border

Outlier

Eps = 1cm

MinPts = 5

DBSCANDBSCAN

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 31

Agent Intelligence through Data Mining

18/09/2006 72ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

1. Arbitrary select a point p

2. Retrieve all points density-reachable from p w.r.t. Eps and MinPts.

3. If p is a core point, a cluster is formed.

4. If p is a border point, no points are density-reachable from p and DBSCAN visits the next point of the database.

5. Continue the process until all points are processed.

DBSCAN: The AlgorithmDBSCAN: The Algorithm

Agent Intelligence through Data Mining

18/09/2006 73ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Part 4 Part 4 -- ClassificationClassification(Supervised Learning)(Supervised Learning)

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 32

Agent Intelligence through Data Mining

18/09/2006 74ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

ClassificationClassification can be described as a function that maps (classifies) a data item into one of the several predefined classes.

RequirementsRequirementsA well-defined set of classesA training set of pre-classified examples characterize

the classification.

Classification Classification

GoalGoalInduce a model that can be used to classify future

data items whose classification is unknown.

Agent Intelligence through Data Mining

18/09/2006 75ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Bayesian classification

Decision Trees

Neural Networks

Classification MethodsClassification Methods

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 33

Agent Intelligence through Data Mining

18/09/2006 76ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Bayesian classificationBayesian classification

Aim: To classify a sample x to one of the given classes c1, c2,.., cNusing a probability model defined according to Bayes theory

RequirementsRequirementsA priori probability for each class ci. Conditional probability density function p(x/ci)∈[0,1]

⇓⇓ BayesBayes FormulaFormula

Posterior probability

A pattern is classified in the class with the highest posterior probability.

ProblemProblem:: Complete knowledge of probability laws is necessary in order to perform the classification

( ) ( )( ) ( )

q c xp x c p c

p x c p ci

i i

i ij

C( / )/

/=

=∑

1

Agent Intelligence through Data Mining

18/09/2006 77ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

RequirementsRequirementsClusters (categories)A training set of pre-classified data.

One of the most widely used techniques for classification and prediction.

Decision TreesDecision Trees

CharacteristicsCharacteristics

Internal node: A test of an attribute

Branch descending of a node: One of the possible values for this attribute

Leaf: One of the defined classes

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 34

Agent Intelligence through Data Mining

18/09/2006 78ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Decision TreesDecision Trees

A decision tree example:

ooututlooklook

sunny

humidityhumidityovercast rain

windwind

PPhigh normal

PPNN

true false

NN PP

The mechanismBuilding phase: The training data set is recursively partitioned until

all the instances in a partition have the same class

Pruning phase: Nodes are pruned to prevent over fitting and to obtain a tree with higher accuracy

Agent Intelligence through Data Mining

18/09/2006 79ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Decision tree algorithms differ on the test criterion for partitioning a set of records

ID3, C4.5:ID3, C4.5: Information gain algorithms

CLS:CLS: Examines the solution space of all possible decision trees to some fixed depth. It selects a test that minimizes the computational cost of classifying a record.

SLIQ, SPRINT:SLIQ, SPRINT: select the attribute to test, based on the GINI index

Decision Tree AlgorithmsDecision Tree Algorithms

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 35

Agent Intelligence through Data Mining

18/09/2006 80ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The mechanismStep 1:

If all instances in C are positive, then create YES node and halt.If all instances in C are negative, create a NO node and halt.Otherwise select an attribute, A with values v1, ..., vn and create

a decision node.

Step 2:Partition the training instances in C into subsets C1, C2,...,Cn

according to the values of A.

Step 3: Apply the algorithm recursively to each of the sets Ci.

ID3 : The AlgorithmID3 : The Algorithm

Agent Intelligence through Data Mining

18/09/2006 81ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

A statistical property, called information gaininformation gain, is used.

The information needed to identify the class of an element of S, called Entropy of SEntropy of S, is:

ID3: Definitions (1/2)ID3: Definitions (1/2)

ℜℜ: : How does ID3 decide which How does ID3 decide which attribute is the best?attribute is the best?

where p(I) is the proportion of S belonging to class I.

2( ) ( )log ( )Info S p I p I=∑

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 36

Agent Intelligence through Data Mining

18/09/2006 82ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

ID3: Definitions (2/2)ID3: Definitions (2/2)

The information needed to identify the class of an element of S, Info (S,A)Info (S,A), , after we partition S on basis of the value of an attribute A into sets Sv:

Gain(S, A) = Info(S) – Info ( S,A)

Gain(S, A)Gain(S, A) is information gain of example set S on attribute A.

( , ) [( / ) ( )]v vInfo S A S S Entropy S= ×∑

Agent Intelligence through Data Mining

18/09/2006 83ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Outlook Temperature Humidity Wind Play_ballSunny Hot High Weak NoSunny Hot High Strong NoOvercast Hot High Weak YesRain Mild High Weak YesRain Cool Normal Weak YesRain Cool Normal Strong NoOvercast Cool Normal Strong YesSunny Mild High Weak NoSunny Cool Normal Weak YesRain Mild Normal Weak YesSunny Mild Normal Strong YesOvercast Mild High Strong YesOvercast Hot Normal Weak YesRain Mild High Strong No

ID3: ExampleID3: Example

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 37

Agent Intelligence through Data Mining

18/09/2006 84ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

OutlookOutlook

Sunny RainyOvercast

?

Which attribute will be the root node ?Gain(S, Outlook) = 0.246

Gain(S, Temperature) = 0.029

Gain(S, Humidity) = 0.151

Gain(S, Wind) = 0.048

ID3 : Attribute SelectionID3 : Attribute Selection

Gain(Sunny, Humidity) = 0.970

Gain(Sunny, Temperature) = 0.570

Gain(Sunny, Wind) = 0.019

NoNo

OutlookOutlook

Sunny RainyOvercast

HumidityHumidity

High Normal

YesYes

Agent Intelligence through Data Mining

18/09/2006 85ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

OutlookOutlook

Sunny RainOvercast

HumidityHumidity

High Normal

NoNo YesYes

YesYesWindWind

Strong Weak

NoNo YesYes

ID3 Example : Final Decision TreeID3 Example : Final Decision Tree

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 38

Agent Intelligence through Data Mining

18/09/2006 86ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

C4.5C4.5 is an extension of ID3extension of ID3 that accounts for:

Unavailable values

Continuous attribute values ranges

Pruning of decision trees

Rule derivation

C4.5C4.5

Agent Intelligence through Data Mining

18/09/2006 87ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

PART 5 PART 5 ––Association RulesAssociation Rules

SS1 1 ⇒⇒ SS22

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 39

Agent Intelligence through Data Mining

18/09/2006 88ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The problem:

Given:1) database of transactions, 2) each transaction is a list of items (purchased by a customer in a visit)

Find: All rules that correlate the presence of one set of items with that of another set of items

Association mining: The ideaAssociation mining: The idea……

Agent Intelligence through Data Mining

18/09/2006 89ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Association miningFinding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.

Association mining: The solutionAssociation mining: The solution……

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 40

Agent Intelligence through Data Mining

18/09/2006 90ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The basic concepts on rule miningThe basic concepts on rule mining……

Support:Support: There is enough support for the rule S1=>S2 if the number of records whose attributes include S1 or S2 is at least minsupp.

Confidence: Confidence: We have enough confidence in an association rule if the ratio of records having attributes that include S1 or S2 over records having attributes that include S1 is at least minconf.

Agent Intelligence through Data Mining

18/09/2006 91ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Association Rule: a definitionAssociation Rule: a definition

An An association ruleassociation rule is an expression of the form is an expression of the form S1=>S2 where where S1, S2 are sets of attributes with are sets of attributes with

sufficient support and confidence.sufficient support and confidence.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 41

Agent Intelligence through Data Mining

18/09/2006 92ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

An association rule A=>BA=>B is thought to be interestinginteresting when:

P(A∩B)P(A) - P(B)>d

where dd is an appropriate value

When to pick an association rule?When to pick an association rule?

Agent Intelligence through Data Mining

18/09/2006 93ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

TransactionID Items Bought

2000 A,B,C

1000 A,C

4000 A,D

5000 B,E,F

Customerbuys diaper

Customerbuys both

Customerbuys beer

What is the support and confidence for the rules:1) A ⇒ C 2) C ⇒ A

1) A and C appear in 2 of the 4 transactions, so support is 50%. In addition, A appears in 3 transactions, two of which contain C too, so confidence is 2/3=66,6%.

2) A and C appear in 2 of the 4 transactions, so support is 50%. In addition, A appears in all the transactions that C appears, so confidence is 2/2=100%.

Counting Confidence and SupportCounting Confidence and Support……

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 42

Agent Intelligence through Data Mining

18/09/2006 94ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Frequent itemsets: Sets of items that have minimum support

A subset of a frequent itemset must also be a frequent itemset

i.e., if A B is a frequent itemset, both A and Bshould be frequent itemsets

Iteratively find frequent itemsets with cardinality from 1 to k (k-itemset)

Use the frequent itemsets to generate association rules.

Mining Frequent Mining Frequent ItemsetsItemsets::the Key Stepthe Key Step

Agent Intelligence through Data Mining

18/09/2006 95ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Association rule algorithmsAssociation rule algorithms

AprioriApriori

DHPDHP

TrieTrie Data StructureData Structure

Iceberg QueriesIceberg Queries

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 43

Agent Intelligence through Data Mining

18/09/2006 96ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Procedure Synopsis:Procedure Synopsis:Constructs a set of large itemsCounts the number of each set’s appearancesDetermines large itemsets on a predetermined minsupp

The major StepsThe major Steps……1) Join Step: Ck is generated by joining Lk-1 with itself

2) Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset

The The AprioriApriori AlgorithmAlgorithm

Agent Intelligence through Data Mining

18/09/2006 97ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Candidate generationCandidate generation--Pseudo CodePseudo Code

In order to estimate Ck, Apriori uses Lk-1× Lk-1. Then Ck consists of all (k-1)-itemsets.

Ck: Candidate itemset of size k

Lk : frequent itemset of size k

L1 : frequent items;

1

2

L⎡ ⎤⎢ ⎥⎣ ⎦

Pseudo-code:for (k = 1; Lk !=∅; k++) do beginCk+1 = candidates generated from Lk;for each transaction t in database doincrement the count of all candidates in Ck+1 that are

contained in tLk+1 = candidates in Ck+1 with min_support

endreturn ∪k Lk;

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 44

Agent Intelligence through Data Mining

18/09/2006 98ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

1. L3=abc, abd, acd, ace, bcd

2. Self-joining: L3 × L3

abcd from abc and abd

acde from acd and ace

3. Pruning:

acde is removed because ade is not in L3

4. C4=abcd

Candidate Generation ExampleCandidate Generation Example……

Agent Intelligence through Data Mining

18/09/2006 99ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Association Rule ExtractionAssociation Rule ExtractionAn ExampleAn Example……

TID Items100 1 3 4200 2 3 5300 1 2 3 5400 2 5

itemset sup.1 22 33 34 1

5 3

itemset sup.

1 22 33 3

5 3

Scan D

C1 L1

itemset

1 21 31 52 3

2 53 5

itemset sup1 2 11 3 21 5 12 3 2

2 5 33 5 2

itemset sup1 3 22 3 22 5 33 5 2

L2C2

C2

Scan D

C3 L3

itemset2 3 5 Scan D

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 45

Agent Intelligence through Data Mining

18/09/2006 100ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Hash-based itemset counting: A k-itemset whose corresponding hashing bucket count is below the threshold cannot be frequent

Transaction reduction: A transaction that does not contain any frequent k-itemset is useless in subsequent scans

Partitioning: Any itemset that is potentially frequent in DB must be frequent in at least one of the partitions of DB

Sampling: Mine on a subset of given data, lower support threshold + a method to determine the completeness

Dynamic itemset counting: Add new candidate itemsetsonly when all of their subsets are estimated to be frequent

Methods to Improve Methods to Improve AprioriApriori’’ss EfficiencyEfficiency

Agent Intelligence through Data Mining

18/09/2006 101ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

DHPDHP

Creates a hashing table that controls the legitimacy of k-itemsets and reduces dramatically the size of C2 and the itemset database in result.

FTDAFTDA It is used in order to extract association rules on

quantitative values. Fuzzy methods are applied and fuzzy rules are extracted.

Integration Algorithms (1/2)Integration Algorithms (1/2)

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 46

Agent Intelligence through Data Mining

18/09/2006 102ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

PartitionPartition

Divides the itemset database and applies only two scans in order to execute the algorithm.

DICDIC

Optimization through dynamic itemset counting.

TRIE Data StructureTRIE Data Structure Used to create covers: itemsets that have greater than or

equal to a specified minimum support threshold. Extracts exclusive association rules.

Integration Algorithms (2/2)Integration Algorithms (2/2)

Agent Intelligence through Data Mining

18/09/2006 103ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Exclusive Association RulesExclusive Association Rules

Definition:We say that, is an excluding association, if S1, S2 and S3 are mutually disjoint and the following conditions hold:

S1 ∪ S2 => S3

supp(S1 ∪S3)supp(S1) <minconf1.

2. supp(S1 ∪S3) - supp(S1 ∪S2∪S3) ≥ minsupp

3. supp(S1 ∪S3) - supp(S1 ∪S2∪S3)

supp(S1) - supp(S1 ∪S2)≥ minconf

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 47

Agent Intelligence through Data Mining

18/09/2006 105ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Part 7 Part 7 --Data Mining and SemanticsData Mining and Semantics

Agent Intelligence through Data Mining

18/09/2006 106ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The different perspectives on The different perspectives on SemanticsSemantics

The DM community refers to semantics as a way of representing knowledge by the use of “some” unified language

The AI community deals with semantics as a way of creating meta-knowledge through the refinement of decision structures

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 48

Agent Intelligence through Data Mining

18/09/2006 107ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

DM Perspective DM Perspective –– The ReasonsThe Reasons……

Each DM tool uses a different way of describing knowledge (same metrics, same concepts)

Not all tools provide all dm techniques

In order to incorporate results in your system, reusability is now mandatory

Agent Intelligence through Data Mining

18/09/2006 108ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The SolutionThe Solution……

The development of a modeling language, that would allow users to:

Develop models within one vendor's applicationUse other vendors' applications to test, visualize, analyze, evaluate or otherwise use the models

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 49

Agent Intelligence through Data Mining

18/09/2006 109ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The LanguageThe Language……

PMMLPMMLPredictive Markup Model Language

http://www.dmg.org

Developed by the DMG (Oracle, SAS, SPSS, MineIt, etc.)XML-based language, with its own DTD, mistakes are difficult to occurIt has a stable version (2.0)

Agent Intelligence through Data Mining

18/09/2006 110ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

A PMML document exampleA PMML document example

The Play_Golf model: A model that decides whether the golf game should be played, depending on humidity, wind, temperature and weather outlook.

After performing data mining on the Play_Golf dataset, the resulting decision tree is:

And the resulting PMML document is :

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 50

Agent Intelligence through Data Mining

18/09/2006 111ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

AI perspective AI perspective ––The ReasonsThe Reasons……

In order to incorporate meta-knowledge into data mining, new interestingness measures are applied, in order to indicate the validity of the knowledge extracted, according to domain understanding.

Fuzzy weights

Penalty matrices

Agent Intelligence through Data Mining

18/09/2006 112ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Part 8 Part 8 --New data mining algorithms: New data mining algorithms:

the AI waythe AI way……

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 51

Agent Intelligence through Data Mining

18/09/2006 113ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The new agentThe new agent--oriented oriented algorithmalgorithm……

The need to develop an algorithm that deals with agent actions has led us to the development of κ-Profile.

κ-Profile is a data mining mechanism that was first introduced for the dynamic segregation of web roaming attitudes.

It is now being adapted in order to be able to predict agent behaviors and actions.

Agent Intelligence through Data Mining

18/09/2006 114ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The web roaming mechanismThe web roaming mechanism……

OfflineData extractionData preprocessingTransaction identificationPageview identificationClusteringProfiling

OnlineActive sessionRecommendation

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 52

Agent Intelligence through Data Mining

18/09/2006 115ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Techniques and Algorithms usedTechniques and Algorithms used

Weighted vectorsFIS (Fuzzy Inference System)

ClusteringSimple k-means Maximin

Profiling Profiling The k-Profiler

RecommendationRecommendation

EvaluationEvaluation

Agent Intelligence through Data Mining

18/09/2006 116ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data PreprocessingData Preprocessing

Transaction vectorsOutliersRobots, spiders, etc.

Pageview vectorsHomepageBanners, etc.Very frequent pagesVery rare pages

Fuzziness

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 53

Agent Intelligence through Data Mining

18/09/2006 117ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

ClusteringClustering andand ProfilingProfiling

MaximinDecides on the number of clustersFinds primary cluster centers

Simple k-meansCreates transaction clusters

K-ProfilerCreates the transaction profiles

Agent Intelligence through Data Mining

18/09/2006 118ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

κκ--Profiler Profiler –– From theoryFrom theory......T = t1, t2, …, tm – Transaction Set

P = p1, p2, …, pn – Pageview Set

t = <w(p1,t), …, w(pn,t)> – Transaction Vector

TC = c1, c2, ... , ck – Transaction Groups

prc = <p,weight(p, prc)> | p∈P, weight(p, prc) ≥ μ– User Profiles

weight(p, prc) =1

( , )| | t C

w p tc ∈⋅ ∑

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 54

Agent Intelligence through Data Mining

18/09/2006 119ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

ProfileProfile 11

GroupGroup 11t1 = p1, p2

t2 = p3, p7

t3 = p1, p7X

X

X

p1: (1+1)/3 = 66,67%

p2: 1/3 = 33,33%

p3: 1/3 = 33,33%

p7: (1+1)/3 = 66,67%

k-Profilerfilter >50%

(p1, 0.6667)

(p7, 0.6667)

... ... to practiceto practice

Agent Intelligence through Data Mining

18/09/2006 120ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The Recommendation engineThe Recommendation engine

User profile C = w , w , …, w

where w =

match(S, C) =

Rec(S, p) =

UREC(S) = w | C∈UP, and Rec(S, w) ≥ ρ

1C

2C C

n

( , ),

0,i iweight p C p C

otherwise

∈⎧⎨⎩

2 2( ) ( )

Ck k

k

Ck k

k k

w S

S w

×

∑ ∑

( , ) ( , )weight p C match S C⋅

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 55

Agent Intelligence through Data Mining

18/09/2006 121ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Work on agentsWork on agents

Modeling agent actions according to the ‘κ-Profile’ paradigmImplementation of the agent behavior that shall incorporate the recommendation engine into the agentsTesting and dissemination…

Agent Intelligence through Data Mining

18/09/2006 122ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Part 9 Part 9 --Data Mining Applications Data Mining Applications

and Trendsand Trends

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 56

Agent Intelligence through Data Mining

18/09/2006 123ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Web miningBiomedical and DNA data analysisFinancial data analysisRetail industryTelecommunication industry

Market analysis and management

Risk analysis and management

Fraud detection and management

Basic Data Mining application domainsBasic Data Mining application domains

Agent Intelligence through Data Mining

18/09/2006 124ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Web mining Web mining -- Data on the WebData on the Web

Primary data (Web content):Mainly text, with some multimedia content and mark-up commands.Underlying databases (not directly accessible).

Secondary data (Web usage):Access logs collected by a Web server.A variety of navigational information collected by Web clients.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 57

Agent Intelligence through Data Mining

18/09/2006 125ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Approaches to Web miningApproaches to Web mining

Web content mining:Pattern discovery in Web content data.Mainly mining unstructured textual data.

Web structure miningPattern discovery in the Web graph.The graph is defined by the hyperlinks.

Web usage miningDiscovery of interesting usage patterns.Mainly in server logs.

Agent Intelligence through Data Mining

18/09/2006 126ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Web content miningWeb content mining

Information AccessDocument category modelingConstruction of thematic hierarchies

Fact ExtractionExtraction of product information, presented in different

formats

Information ExtractionStructured “event” summaries from large textual corpora

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 58

Agent Intelligence through Data Mining

18/09/2006 127ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Web structure mining (1/2)Web structure mining (1/2)

Information retrieval can be improved by:Identifying authoritative pages.Identifying resource index pages.Summarizing common references.

Linked pages often contain complementary information

(e.g. product offers).

Structural analysis of a Web site facilitates its

improvement.

Agent Intelligence through Data Mining

18/09/2006 128ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Web structure mining (2/2)Web structure mining (2/2)

Social network analysis:Nodes with large fan-in (authorities) provide high quality information.Nodes with large fan-out (hubs) are good starting points.

Disconnected subgraphs correspond to different

social (e.g. research) communities.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 59

Agent Intelligence through Data Mining

18/09/2006 129ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Web Usage Mining (1/2)Web Usage Mining (1/2)

Personalization:Better service for the user:

Reduction of the information overload.More accurate information retrieval and extraction.

Customer relationship management:Customer segmentation and targeted advertisement.Customer attraction and retention strategy.Service improvement (site structure and content).

Agent Intelligence through Data Mining

18/09/2006 130ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Web Usage Mining (2/2)Web Usage Mining (2/2)

Data collection

Data pre-processing

Pattern discovery

Knowledge post-processing

Collection of usage data by the server and the client.

Data cleaning, user identification, session identification

Construction of user models

Report generation, visualization, personalization module.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 60

Agent Intelligence through Data Mining

18/09/2006 131ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

DNA sequences: 4 basic building blocks (nucleotides):

adenine (A) cytosine (C) guanine (G)thymine (T)

Gene: a sequence of hundreds of individual nucleotides arranged in a particular order

Biomedical Data Mining and DNA Biomedical Data Mining and DNA Analysis (1/2)Analysis (1/2)

Agent Intelligence through Data Mining

18/09/2006 132ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Tremendous number of ways that the nucleotides can be ordered and sequenced to form distinct genesSemantic integration of heterogeneous, distributed genome databases

Current: highly distributed, uncontrolled generation and use of a wide variety of DNA dataData cleaning and data integration methods developed in data mining will help

Biomedical Data Mining and DNA Biomedical Data Mining and DNA Analysis (2/2)Analysis (2/2)

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 61

Agent Intelligence through Data Mining

18/09/2006 133ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data Mining for Financial Data Analysis (1/2)Data Mining for Financial Data Analysis (1/2)

Financial data collected in banks and financial institutions areoften relatively complete, reliable, and of high quality

Design and construction of data warehouses for multidimensional data analysis and data mining

View the debt and revenue changes by month, by region, by sector, and by other factorsAccess statistical information such as max, min, total, average, trend, etc.

Loan payment prediction/consumer credit policy analysis

feature selection and attribute relevance rankingLoan payment performanceConsumer credit rating

Agent Intelligence through Data Mining

18/09/2006 134ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Classification and clustering of customers for targeted marketing

Multi-dimensional segmentation by nearest-neighbor, classification, decision trees, etc. to identify customer groupsor associate a new customer to an appropriate customer group

Detection of money laundering and other financial crimes

Integration of data from multiple DBs (e.g., bank transactions, federal/state crime history DBs)Tools: Data visualization, linkage analysis, classification, clustering tools, outlier analysis, and sequential pattern analysis tools (find unusual access sequences)

Data Mining for Financial Data Analysis (2/2)Data Mining for Financial Data Analysis (2/2)

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 62

Agent Intelligence through Data Mining

18/09/2006 135ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Retail industry: Huge amounts of data on sales, customer shopping history, etc.

Applications of retail data mining:Identify customer buying behaviors

Discover customer shopping patterns and trends

Improve the quality of customer service

Achieve better customer retention and satisfaction

Enhance goods consumption ratios

Design more effective goods transportation and distribution policies

Data Mining for Retail IndustryData Mining for Retail Industry

Agent Intelligence through Data Mining

18/09/2006 136ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

A rapidly expanding and highly competitive industry and a great demand for data mining

Understand the business involved

Identify telecommunication patterns

Catch fraudulent activities

Make better use of resources

Improve the quality of service

Multidimensional analysis of telecommunication dataIntrinsically multidimensional: calling-time, duration, location of caller, location of callee, type of call, etc.

Data Mining for TelecommunicationsData Mining for Telecommunications

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 63

Agent Intelligence through Data Mining

18/09/2006 137ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data types:Relational

Transactional

Text

Time sequence

Spatial, etc.

Data sources:ASCII text files, multiple relational data sourcessupport ODBC connections (OLE DB, JDBC)?

Need for multiple dimensional views in selection

Data Mining Tools Data Mining Tools -- Criteria (1/3)Criteria (1/3)

Agent Intelligence through Data Mining

18/09/2006 138ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data Mining Tools Data Mining Tools -- Criteria (2/3)Criteria (2/3)

Data mining functions and methodologies:One vs. multiple data mining functionsOne vs. variety of methods per function

Coupling with DB and/or data warehouse systems:Four forms of coupling: no coupling, loose coupling, semi-tight coupling, and tight coupling.

ScalabilityRow (or database size) scalabilityColumn (or dimension) scalabilityCourse of dimensionality: it is much more challenging to make a

system column scalable that row scalable

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 64

Agent Intelligence through Data Mining

18/09/2006 139ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Visualization tools“A picture is worth a thousand words”Visualization categories: data visualization, mining result

visualization, mining process visualization, and visual data mining

System issuesRunning on only one or on several operating systems?Client/server architecture?Provide Web-based interfaces and allow XML data as input

and/or output?

Data Mining Tools Data Mining Tools -- Criteria (3/3)Criteria (3/3)

Agent Intelligence through Data Mining

18/09/2006 140ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The Generations of DM systemsThe Generations of DM systems

First generation: Systems that performed classification or clustering and were based on a certain algorithm

Second generation: Systems that provide better support throughout the whole KDD process

Third generation: Systems that deal with the business end user rather than advanced data analysis

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 65

Agent Intelligence through Data Mining

18/09/2006 143ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Part 10 Part 10 ––Intelligent Agent Intelligent Agent

TechnologyTechnology

Agent Intelligence through Data Mining

18/09/2006 144ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Positioning agents in the software Positioning agents in the software development processdevelopment process

Agent technology is the next step in object-oriented programming.

It satisfies all the requirements, while it supports major key properties, since agents are:

autonomous, goal-oriented, cooperative, communicative, adaptive…

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 66

Agent Intelligence through Data Mining

18/09/2006 145ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agents: A systemAgents: A system--building paradigmbuilding paradigm

RelatedTechnologies

Distributed SystemsDistributed Systems

Mobile CodeMobile Code

Database &Database &Knowledge base Knowledge base

TechnologyTechnologyMachine LearningMachine Learning

AI & CognitiveAI & CognitiveScienceScience

Information RetrievalInformation Retrieval

agents2004

objects1982

structured programming

1974= =

Agent Intelligence through Data Mining

18/09/2006 146ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Intelligent Agents: An IntroIntelligent Agents: An Intro……

Intelligent Agents (IAs) are considered to be very important because they promise to change the way that people interact with computers.

The underpinning concepts of IAs can be traced back to the early years of Artificial Intelligence -40 years ago

Research on IAs is considered to be in its first stages

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 67

Agent Intelligence through Data Mining

18/09/2006 147ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Definitions of AgentsDefinitions of Agents

There is not a consensus definition of IAs!!!

There are several operational definitions:Dictionary DefinitionCommon Definition PatternsDefinition of IAs based on their characteristics

Agent Intelligence through Data Mining

18/09/2006 148ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Dictionary DefinitionDictionary Definition

“An agent is one who acts for or in An agent is one who acts for or in the place of another by authority from the place of another by authority from

himhim…… “

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 68

Agent Intelligence through Data Mining

18/09/2006 149ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Common Definition PatternsCommon Definition Patterns

An agent is a software entity that…Acts autonomously in a goal-oriented mannerFunctions continuouslyActs on behalf of another entity (human, IA)Is able to perceive its environment through sensors and act on it through effectorsEmploys some degree of knowledge

Agent Intelligence through Data Mining

18/09/2006 150ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Attributes of an Agent (1/2)Attributes of an Agent (1/2)

AutonomyInteractivity

ReactivityPro-activity

AdaptivitySociabilityCollaborative Behavior

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 69

Agent Intelligence through Data Mining

18/09/2006 151ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Attributes of an Agent (2/2)Attributes of an Agent (2/2)

CompetitivenessTemporal ContinuityPersonalityMobilityLearning

Agent Intelligence through Data Mining

18/09/2006 152ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Taxonomies of Agents (1/2)Taxonomies of Agents (1/2)

Nwana’s 3 dimensional classification:

Collaborative Collaborative Learning AgentsLearning AgentsSmartSmart

AgentsAgents

Collaborative Collaborative AgentsAgents Interface AgentsInterface Agents

LearnCooperate

Autonomous

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 70

Agent Intelligence through Data Mining

18/09/2006 153ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Taxonomies of Agents (2/2)Taxonomies of Agents (2/2)

We can also classify agents by:What they do (Web Search, Information Filtering, Notification, Financial Services, Entertainment, etc.)

Where they act (Desktop, Internet, Intranet)

The degree of the characteristics they enjoy(Mobile, Collaborative, Reactive, etc.)

A Combination of the above (Hybrid Agents)

Agent Intelligence through Data Mining

18/09/2006 154ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Underlying TechnologiesUnderlying Technologies

Technologies for Developing AgentsInternal InfrastructureProgramming LanguagesStandards for distributed computing

Agents Communication

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 71

Agent Intelligence through Data Mining

18/09/2006 155ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Internal InfrastructuresInternal Infrastructures

Knowledge-based expert systems

Agents and Objects

AI technologies (Neural Networks, Genetic Algorithms, Fuzzy Systems)

Agent Intelligence through Data Mining

18/09/2006 156ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Programming LanguagesProgramming Languages

TelescriptSmalltalkJAVAC++Other languages and development frameworks

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 72

Agent Intelligence through Data Mining

18/09/2006 157ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Standards for Distributed SystemsStandards for Distributed Systems

CORBA (Common Object Request Broker Architecture)

DCOM (Distributed Component Object Model)

JAVA / RMI (Remote Method Invocation)

Agent Intelligence through Data Mining

18/09/2006 158ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agent CommunicationAgent Communication

KQML (Knowledge and Querying Manipulation Language)

KIF (Knowledge Interchange Format)

FIPA (Agent Communication Language)

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 73

Agent Intelligence through Data Mining

18/09/2006 159ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Some Important Issues of Agent DesignSome Important Issues of Agent Design

Multi-agent architecture vs. Single-agent architecture

Mobility

Security

Agent Intelligence through Data Mining

18/09/2006 160ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Why MAS instead of a Single Agent? (1/2)Why MAS instead of a Single Agent? (1/2)

A single Agent that handles a great amount of tasks lacks performance, reliability, maintainability, etc.

MAS can provide modularity, flexibility, modifiability, extensibility, due to Distributed Environments

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 74

Agent Intelligence through Data Mining

18/09/2006 161ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

A single Agent cannot obtain extensive and specialized knowledge

A MAS can access more knowledge resources

Applications requiring Distributed Computing are better supported by MAS

Intelligence, in neuroscience terms, can be approached by a multi-processing system such as a wide distributed environment of MAS

Why MAS instead of a Single Agent? (2/2)Why MAS instead of a Single Agent? (2/2)

Agent Intelligence through Data Mining

18/09/2006 162ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

MobilityMobility

Mobility: The transportation of an Agent to remote services resourcesRationale for mobility: Improved performance

New requirements are introduced:Presence of an agent server

Security

Common standards for IAs Communication

Complexities about system maintenance and IAsidentification in a distributed environment

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 75

Agent Intelligence through Data Mining

18/09/2006 163ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

SecuritySecurity

Unauthorized disclosure

Unauthorized alteration

Unauthorized damage

Unauthorized copy and replay

Denial of Service

Repudiation

Spoofing and Masquerading

Agent Intelligence through Data Mining

18/09/2006 164ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Current Status of Agent Applications (2/2)Current Status of Agent Applications (2/2)

Electronic Commerce

Information Retrieval and Knowledge Management

Mobile Computing

Planning and Scheduling

Scientific Applications

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 76

Agent Intelligence through Data Mining

18/09/2006 165ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Distributed Project Management

Manufacturing

Networking

Other applications (economics, business, military, etc.)

Tools for Agent Development

Current Status of Agent ApplicationsCurrent Status of Agent Applications

Agent Intelligence through Data Mining

18/09/2006 166ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Standards and OrganizationsStandards and Organizations

AgentCities

AgentLink

FIPA

Knowledge Sharing Effort

OMG

Other Organizations (Agentx Working Group, International Foundation for MAS, Ontology.Org)

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 77

Agent Intelligence through Data Mining

18/09/2006 168ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Part 11 Part 11 ––Agent Intelligence & Data Agent Intelligence & Data

MiningMining

Agent Intelligence through Data Mining

18/09/2006 169ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The main problemThe main problem……

Data Mining Intelligent Agents

Do they need each other?Do they need each other?

Symbiosis?Symbiosis?

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 78

Agent Intelligence through Data Mining

18/09/2006 170ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Reasoning Agent

Software Agent Paradigm

Knowledge Discovery Roadmap

AgentAgent--oriented Software Engineeringoriented Software Engineering

Historical Data Knowledge Model

ApplicationDomain Agent Modelling

AgentType

InferenceStructure

Agent Intelligence through Data Mining

18/09/2006 171ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The main ideaThe main idea......

Data / Problem model Knowledge Extraction

Knowledge Model 1

Knowledge Model 2

Knowledge Model n

…….

Agent Modeling

..…

Agent Type 1

Agent Type 2

Agent Type m

Agent A Agent B

++

Agent EAgent D

Agent C

++

Agent F

++

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 79

Agent Intelligence through Data Mining

18/09/2006 172ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The main goalThe main goal

The development of a unified methodology that:Takes logic limitations into accountIs supported by the appropriate toolsHas been applied to a satisfactory number of applications, in order to:

Provide the capability of dynamically incorporating knowledge to SAs and MAS. This knowledge has been

extracted with the use of DM techniques

Agent Intelligence through Data Mining

18/09/2006 173ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Knowledge diffusion levelsKnowledge diffusion levels

Extracting knowledge on a MAS application level:Perform data mining techniques on application data, in order to discover useful rules – associations – patterns.

Extracting knowledge on a MAS behavioral level:Perform data mining techniques on agent behavior data, in order to predict their behaviors and, thus, reduce system work load. The extracted knowledge is related to agent actions.

Extracting knowledge on evolutionary agent communities:Deploy evolutionary DM techniques, in order to study societal issues. It has to do with the satisfaction of the goal of a community, which evolves and learns through interaction.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 80

Agent Intelligence through Data Mining

18/09/2006 174ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Defining trainingDefining training

Training: The process of dynamically incorporating DM-extracted knowledge models

to SAs and MAS.

Agent Intelligence through Data Mining

18/09/2006 175ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The MethodologyThe Methodology

Common steps

BehaviorLevel

ApplicationLevel

EvolutionaryCommunities

Multi-agent system

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 81

Agent Intelligence through Data Mining

18/09/2006 176ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The Common stepsThe Common steps

Create agents

Create agent types

Create agent behaviors

Extract knowledge model

Apply data mining techniques

Instantiate application

Agent Monitoring

CreateOntology

Retrain agents

Agent Intelligence through Data Mining

18/09/2006 177ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

ApplicationApplication--oriented DMoriented DM……

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 82

Agent Intelligence through Data Mining

18/09/2006 178ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

ApplicationApplication--oriented DMoriented DM……

Data mining (clustering, classification…) is performed on application-specific data

For example, develop a MAS that decides whether a golf game will be played, depending on humidity, outlook, wind, etc.

The decision is extracted through data mining

Agent Intelligence through Data Mining

18/09/2006 179ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

A significant data volume must be available (the bigger the better)

The DM expert has to decide on the best way to exploit the resulting knowledge

The multi-agent system architecture has to seriously take into consideration specific limitations, such as the safety and soundness of the application.

PrerequisitesPrerequisites……

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 83

Agent Intelligence through Data Mining

18/09/2006 180ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

AdvantagesAdvantages……

The size of knowledge bases of intelligent agents can dramatically affect their performance.

Agent retraining on new data is feasible

The retraining process can be automatedand dynamic, providing versatility to MAS implementation.

Agent Intelligence through Data Mining

18/09/2006 181ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

An integrated tool for embedding data mining extracted intelligence into agents

is Agent Academy (AA):http://sourceforge.net/projects/agentacademy

Tools and techniquesTools and techniques……

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 84

Agent Intelligence through Data Mining

18/09/2006 182ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The AA hypothesisThe AA hypothesis……

Using Agent Academy and its data-mining capabilities, the MAS developer can reduce the effort required to:

a. Develop new or expand existing applications with intelligent agents, and

b. Upgrade them, as needed, by retraining the agents deployed

Agent Intelligence through Data Mining

18/09/2006 183ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The AAThe AA--1.0 architecture1.0 architecture……

Agent Factory

Data A

cquisition

AgentTrainingModule

Agent UseRepository

Agent Request

Agent AcademyAgent Academy

Agent Tracking

DataWEB

Data Miner

Application Data

Agent Behavior Data

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 85

Agent Intelligence through Data Mining

18/09/2006 184ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Components of the AAComponents of the AA--1.0 System1.0 System

instantiated MAS

Agent Academy

AURDMM

ATM

Agent Factory (main GUI)

Protégé 2000-

Agent Types Definition

Behaviour Types Definition

MAS Creation Tool

Ontology definition

Application Data

Agent Intelligence through Data Mining

18/09/2006 185ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agent FactoryAgent Factory

instantiated MAS

Agent Academy

AURDMM

ATM

Agent Factory (main GUI)

Protégé 2000-

Agent Types Definition

Behaviour Types Definition

MAS Creation Tool

Ontology definition

Application Data

1. Allows developers to design a complete agent-based scenario through GUIs

2. Agent behaviors are modeled as workflow processes and designed through a GUI

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 86

Agent Intelligence through Data Mining

18/09/2006 186ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agent Use RepositoryAgent Use Repository

instantiated MAS

Agent Academy

DMM

ATM

Agent Factory (main GUI)

Protégé 2000-

Agent Types Definition

Behaviour Types Definition

MAS Creation Tool

Ontology definition

AUR

Application Data1. A database for all the information

• Related with DMM:

i. Application and tracking data

ii. Decision structures

• Related with AF and ATM:

i. Ontologies

ii. Behavior types

iii.Agent types

iv.Agent instances

v. And more…

2. Application data stored in a generic way

3. Data structures defined by the ontologies

Agent Intelligence through Data Mining

18/09/2006 187ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data Mining Module (DMM)Data Mining Module (DMM)

instantiated MAS

Agent Academy

AURDMM

ATM

Agent Factory (main GUI)

Protégé 2000-

Agent Types Definition

Behaviour Types Definition

MAS Creation Tool

Ontology definition

Application Data

1. DMM applies data mining techniques on “application data” and “tracking data”

2. “Application data” describe the application domain

3. “Tracking data” reveal agent-specific interaction patterns (mainly used for retraining)

4. Selection from a certain data set

5. Choice of specific DM algorithms

6. Allows the user to evaluate, store, visualize data mining results

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 87

Agent Intelligence through Data Mining

18/09/2006 188ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agent Training Module (ATM)Agent Training Module (ATM)

instantiated MAS

Agent Academy

AURDMM

ATM

Agent Factory (main GUI)

Protégé 2000-

Agent Types Definition

Behaviour Types Definition

MAS Creation Tool

Ontology definition

Application Data

1. ATM keeps track of

1. Which agent implements which behaviors

2. Which behavior uses which decision structures and physical assets

2. ATM informs the related agents

1. Whenever there is an update in the physical assets or modifications in the behaviors

2. When a change in a decision structure is reported by DMM

Agent Intelligence through Data Mining

18/09/2006 189ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The Integrated PlatformThe Integrated Platform

instantiated MAS

Agent Academy

AURDMM

ATM

Agent Factory (main GUI)

XMLApplication Data

Protégé 2000-

Agent Types Definition

Behaviour Types Definition

MAS Creation Tool

Ontology definition

Extraction of the decision

model

Insertion of the decision model

into agents

Instantiation of a new

MAS Agent with reasoning

Dummy agents

Data storage

Design of agent application

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 88

Agent Intelligence through Data Mining

18/09/2006 190ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

MultiMulti--Agent Scenario CreationAgent Scenario Creation

instantiated MAS

Agent Academy

AURDMM

ATM

Agent Factory (main GUI)

Protégé 2000-

Agent Types Definition

Behaviour Types Definition

MAS Creation Tool

Ontology definition

Instantiation of a new

MAS

Agent with reasoning

Dummy agents

DSsPAsOntologies

Insertion of the decision model

into agents

ACL

RMI

Agent Intelligence through Data Mining

18/09/2006 191ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agent RetrainingAgent Retraining

Application Developer

AA PlatformAA Platform

Application Owner

Knowledge-base

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 89

Agent Intelligence through Data Mining

18/09/2006 192ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Retraining through DMMRetraining through DMM

instantiated MAS

Agent Academy

AURDMM

ATM

Agent Factory (main GUI)

Protégé 2000-

Agent Types Definition

Behaviour Types Definition

MAS Creation Tool

Ontology definition

JESS Rules new

decision model

decision structureid

RMI

RetrainedAgent

XMLApplication Data1. Decision structures are

modified by DMM

2. DMM passes the id of the modified DS to ATM

3. ATM obtains the content of the modified DS from AUR

4. ATM converts the DS into JESS rules

5. ATM informs the related task agents

Agent Intelligence through Data Mining

18/09/2006 193ECML/PKDD 2006, Berlin, Germany

AUTHAUTHAAAA--1.01.0……

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 90

Agent Intelligence through Data Mining

18/09/2006 194ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

NeverthelessNevertheless……

Although ambitious, AA-1.0 was not very easy to use. No software code is produced (agent code is not transparent)Training and retraining is very complicatedAA-1.0 is a MAS itself, inserting limitations on module communicationThis is why AA-2.0 has been released

Agent Intelligence through Data Mining

18/09/2006 195ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

22

AAAA--2.0 Main Frame2.0 Main Frame

Behavior Design Tool

Agent Design Tool

MAS Design Tool

Project Management

(Clean – Build-Run

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 91

Agent Intelligence through Data Mining

18/09/2006 196ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

22

AAAA--2.0 Project Actions2.0 Project Actions

Project Build

Successful

Project Notepad

Behaviors, Types, Agents

etc

Agent Intelligence through Data Mining

18/09/2006 197ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

AAAA--2.0 Behavior Design Tool2.0 Behavior Design ToolStructured

Source Code

Editing

Method Blocks

Behavior Tools`

Generated Source Code

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 92

Agent Intelligence through Data Mining

18/09/2006 198ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

AAAA--2.0 Agent Design Tool2.0 Agent Design Tool

Generated Source Code

Behavior Execution

Debug Tool

Data MiningTool

Agent Behaviors Tab

Agent Intelligence through Data Mining

18/09/2006 199ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

AAAA--2.0 Data Mining Steps2.0 Data Mining Steps

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 93

Agent Intelligence through Data Mining

18/09/2006 200ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

AAAA--2.0 DM Integration2.0 DM Integration

Newly created AI Behavior ready to

be imported

Add AI Behavior Button

Agent Intelligence through Data Mining

18/09/2006 201ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

AAAA--2.0 MAS Design Tool2.0 MAS Design Tool

Add Agents into MAS

Run Agent Scenario

Agents in MAS

Agent State

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 94

Agent Intelligence through Data Mining

18/09/2006 202ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

AAAA--2.0 Running Scenarios2.0 Running Scenarios

Agent is printing a message (Cyclic

Behavior)

Agents is active

Agent present in JADE Platform

Agent Intelligence through Data Mining

18/09/2006 203ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agent behaviorAgent behavior--oriented DMoriented DM……

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 95

Agent Intelligence through Data Mining

18/09/2006 204ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Why BehaviorWhy Behavior--oriented DM (1/2)oriented DM (1/2)

Interaction among agents is crucial for the efficiency of MAS

New agents enter into the system without the necessary knowledge and skills New agents are not able to learn from the others’behavior It is not possible to define and represent a priori the relevant knowledge the agents need for the interaction

Agent Intelligence through Data Mining

18/09/2006 205ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Why BehaviorWhy Behavior--oriented DM (2/2)oriented DM (2/2)

In order to improve its behavior, a new agent should act consistently with the knowledge and the behaviors (culture) of the other agents.

A way for supporting multi-agent interaction based on the idea of

Implicit Culture (IC)http://www.science.unitn.it/~pgiorgio/ic/

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 96

Agent Intelligence through Data Mining

18/09/2006 206ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The main idea of Implicit CultureThe main idea of Implicit Culture……

A situation of the agent environment is represented as a set:

<a, <a, σσ, , tt>>where:

a: set of agentsσ: set of scenes (environment + actions)t: time slot

The goal is to predict the next executed action IC is realized through SICS…

Agent Intelligence through Data Mining

18/09/2006 207ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Systems for Implicit Culture Support (SICS)Systems for Implicit Culture Support (SICS)

SICS main goals are to:Establish the implicit culture phenomenon and

Propose the next expected situated actions based on the already “played” scenes.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 97

Agent Intelligence through Data Mining

18/09/2006 208ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

a

σtα

b

σ’tβ c

σ”tγ

G’G

ε

The SICS architecture...The SICS architecture...

Observer DB

Observer stores in a data base the situated executedactions of the agents of G.

Σ0InductiveModuleΣ

Inductive Module that,using the data of the DB and the a priori theory Σo, induces a cultural constraint theory Σ.

Composer

Composer proposes to a group G’ a set of scenes such that the expected situated actions satisfy Σ.

Agent Intelligence through Data Mining

18/09/2006 209ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

a

bc

G’G

ε

The SICS architecture...The SICS architecture...

Observer DB

Observer stores in a data base the situated executedactions of the agents of G.

Σ0InductiveModuleΣ

Inductive Module that,using the data of the DB and the a priori theory Σo, induces a cultural constraint theory Σ.

Composer

Composer proposes to a group G’ a set of scenes such that the expected situated actions satisfy Σ.

σt+1

σ’t+1

σ”t+1

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 98

Agent Intelligence through Data Mining

18/09/2006 210ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The methodologyThe methodology

CommonSteps

ApplicationLevel

Evolutionary Communities

BehaviorLevel

Multi-agent system

Agent Intelligence through Data Mining

18/09/2006 211ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 99

Agent Intelligence through Data Mining

18/09/2006 212ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Online processOffline process

Behavior level: The training Behavior level: The training frameworkframework

Recommendation Engine

Compare current action sequence

with stored profiles

Produce recommendationsfor agent actions

Agent action monitoring

Agent action dataset processing

Agent action monitoring

Profile storing

Apply Data Mining techniques

Agent action dataset processing

Agent Intelligence through Data Mining

18/09/2006 213ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

κ-Profile is a mechanism, which:

Represents agent actions, Creates agent action profiles, Produces recommendations, andEvaluates them.

The developed DM algorithmThe developed DM algorithm

Apply Maximin

Apply K-Means

Create profiles

Produce recommendations

Evaluate recommendations

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 100

Agent Intelligence through Data Mining

18/09/2006 214ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The parameters to determineThe parameters to determine......

In order to develop a MAS with recommendation abilities (that follows the presented methodology), the parameters that have to be specified are:

1. All possible agent actions.

2. The specific MAS goal.

3. The fuzzy inference engine parameters.

Agent Intelligence through Data Mining

18/09/2006 215ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Knowledge extraction Knowledge extraction in agent communitiesin agent communities

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 101

Agent Intelligence through Data Mining

18/09/2006 216ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The orientationThe orientation……

Self-organization is 100% pure AI (Artificial Intelligence) orientedA lot of research work on this topic…

Agent Intelligence through Data Mining

18/09/2006 217ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The goal of selfThe goal of self--organizing MASorganizing MAS……

To achieve a certain MAS goal, through “self-learning”Self-learning has to do with the penalties or awards the MAS receives whenever it performs an action.The most difficult part in developing a self-organizing MAS is to find the best award/penalty function…

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 102

Agent Intelligence through Data Mining

18/09/2006 218ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The main technologiesThe main technologies……

Neural NetworksMainly used for classification (the output is categorical)Really efficient when confronted with a specific problem

Genetic AlgorithmsMainly used for optimization problems, where dynamic programming cannot be usedThe correct representation of the problem is crucial

Agent Intelligence through Data Mining

18/09/2006 219ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agent Communities Agent Communities (1/(1/22))

Agent communities simulate problems that are complicated, heterogeneous and non-linear.

The internal structure of such systems is inherently complicated, due to continually varying interactions.

The goal of agent communities is collective, and its evaluation is done through indicators.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 103

Agent Intelligence through Data Mining

18/09/2006 220ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agent Communities Agent Communities (2/(2/22))

Agent communities do not offer historical data, for DM techniques to be applied on and knowledge models to be extracted.

The presented approach is based on way evolutionary techniques can be exploited in order to augment agent community intelligence.

The knowledge extraction mechanism employed evaluates agent decisions (rewards or punishes) and updates the agent decision model.

Agent Intelligence through Data Mining

18/09/2006 221ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The methodologyThe methodology

CommonSteps

ApplicationLevel

BehaviorLevel

Evolutionary Communities

Multi-agent System

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 104

Agent Intelligence through Data Mining

18/09/2006 222ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Evolutionary communities Evolutionary communities ––training frameworktraining framework

Genetic AlgorithmMechanism

Decision Evaluation Mechanism

Agent action monitoringAgent

Knowledge Model

Agent action

AgentDecision

EvaluationKnowledge

ModelUpdate

Agent Intelligence through Data Mining

18/09/2006 223ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Part 12 –MAS exploiting

DM-extracted intelligence

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 105

Agent Intelligence through Data Mining

18/09/2006 224ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The presented systemsThe presented systems……

An ERP add-on, providing intelligent policy recommendations on customer and supplier management.An intelligent environmental monitoring system.A decentralized maintenance management system.A self-organizing agent community “in danger”.An agent-based, e-auction system.

Agent Intelligence through Data Mining

18/09/2006 225ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The ERP addThe ERP add--onon

Provides an Intelligent Shell on-top of an ERP, working in close cooperation with it.Incorporates intelligence inside a business process: the handling of a customer’s order.Makes a specific recommendation for a specific customer and a specific order.Manipulates important ERP data with efficient algorithms for reducing information overload.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 106

Agent Intelligence through Data Mining

18/09/2006 226ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Main GoalsMain Goals (2/2)(2/2)

Order Recommendation

SupplierRelationshipManagement

(SRM)

DemandForecasting

MarketBasket

Analysis

CustomerRelationshipManagement

(CRM)

CustomersInventory

Suppliers

Agent Intelligence through Data Mining

18/09/2006 227ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Why do it with agents?Why do it with agents?

Multiple Loci of Control:Changes in the goals and behaviors of one agent adequate to adjust the whole system.

Software & Business Engineering Perspective:

Coordination is done easily both in the design and the development phase.

Adoption of Agents - no questions to us

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 107

Agent Intelligence through Data Mining

18/09/2006 228ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Why do it with data mining?Why do it with data mining?

Has been widely used for CRMExtended to SRMWith Clustering

Deal with noisy data (missing, unknown or erroneous)Find non-trivial, unknown, implicit, potentially useful patternsLearn and provide quick responses

Market Basket Analysis and Association RulesDiscover customer oriented buying habitsDiscover market oriented buying habits

Agent Intelligence through Data Mining

18/09/2006 229ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Responsible for producing product profiles based on

historical data

CustomerCustomer

OrderAgent

RecommendationAgent

CustomerPattern

IdentificationAgent

InventoryPattern

IdentificationAgent

SupplierPattern

IdentificationAgent

ERPAgent

CustomerData

SupplierData

InventoryData

MAS

ERP

GUI agent for accepting orders and displaying the final recommendation.

Gets the order preferences

and request the profiles of the

involved entities from the PIAs

Aggregates and manipulates the resulted profiles

into a single recommendation

Responsible for producing customer

profiles based on historical data

Responsible for producing supplier profiles based on

historical data

Transducer from XML to ACL.

Responsible for retrieving up-to-

date data.

ProfileRepository

Architectural DiagramArchitectural Diagram

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 108

Agent Intelligence through Data Mining

18/09/2006 230ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Final RecommendationFinal RecommendationProductProfiles

Availabilities

Ordered Quantities

Split Policy

sS Point

Customer & SupplierProfiles

OrderPreferences

RA

Static BusinessRules

Delivery Quantities

Quantities to Procure

Delivery Days

Due Days

Additional Discounts(Positive, Negative)

Agent Intelligence through Data Mining

18/09/2006 231ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Conclusions (1/2)Conclusions (1/2)

Added value without rebuilding or reengineering the systemReduction of work effort through adaptability of business rules and automatic generation of the recommendationReusable and customizable system (match different ERP platforms with minimum effort)

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 109

Agent Intelligence through Data Mining

18/09/2006 232ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Conclusions (2/2)Conclusions (2/2)

YesNoAutonomy

YesNoCustomer Management &Pricing Policy

YesNoSupplier Management

HighLowAdaptability

AutomaticallyIndirectly, through reports

Recommendations

YesNoMarket Basket Analysis

ERP + DKEClassic ERPImprovement

Agent Intelligence through Data Mining

18/09/2006 235ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The systemThe system……

Real-time Environmental MonitoringAgents detect ozone alarms and notify interested partiesThe application domain

Real–time systems managementEnvironmental assessmentEmergency real-time systemsReal time applicationsDecision support systems

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 110

Agent Intelligence through Data Mining

18/09/2006 236ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The architectural diagramThe architectural diagram

Early Alarms

Alarms

Distribution Agent

Diagnosis Agent1

Sensor1

Diagnosis Agent2

Sensor2

Diagnosis Agentn

Sensor n

Alarm Agent

D.B.

Distribution Agent

Semantic

Representation

FeedBack Agent

Human Expert

Human Expert

DISTRIBUTIONSYSTEM

MANAGEMENTSYSTEM

DISTRIBUTION SYSTEM

Measurements

Alarms

Alarms

Short Forecast AgentRequest

Short TimePrediction

Agent Intelligence through Data Mining

18/09/2006 237ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Diagnosis AgentsDiagnosis Agents

Diagnosis Agents:Get measurements at real-time from sensorsVerify that the sensors operate properly

Inductive reasoning enginesValidate incoming measurementsPredict a missing ozone measurement level

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 111

Agent Intelligence through Data Mining

18/09/2006 238ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Knowledge ExtractionKnowledge Extraction

From raw data to data-driven reasoning Quinlan’s C4.5 for decision tree inductionTwo reasoning engines in Diagnosis Agent:1. Incoming Data Validation2. Missing Measurement Estimation

Agent Intelligence through Data Mining

18/09/2006 239ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Validation Reasoning EngineValidation Reasoning EngineAttributesO3 The current ozone valueO3_30 The ozone value 30 min agoO3_90 The ozone value 90 min agoMinMax60 The difference between the maximum

and the minimum ozone value in the last 60 minMinMax150 The difference between the maximum

and the minimum ozone value in the last 150 minO3val The corresponding validation tag (valid/erroneous)

Confusion MatrixValidation Decision ModelRecords classified as : valid erroneousNo. records in class ‘valid’: 34,454 21No. records in class ‘erroneous’: 63 420

Decision Tree ParametersSize of Decision Tree: 29 (15 Leaves)Correctly classified records: 99.71%

C4.5

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 112

Agent Intelligence through Data Mining

18/09/2006 240ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Estimation Reasoning EngineEstimation Reasoning EngineAttributesNO concurrent value of NO concentrationNO2 concurrent value of NO2 concentrationNOX concurrent value of NOx concentrationTEM concurrent value of TemperatureHR concurrent value of Relative HumidityO3_15 ozone value 15 min agoO3_30 ozone value 30 min agoO3Class missing ozone value level (low/med)

Confusion MatrixEstimation Decision ModelRecords classified as : low medNo. Records in class ‘low’: 9905 2,351No. Records in class ‘med’: 752 4,384

Decision Tree ParametersSize of Decision Tree: 29 (15 Leaves)Correctly classified records: 93.80%

C4.5

Agent Intelligence through Data Mining

18/09/2006 241ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Conclusions on EMSConclusions on EMS

In this scenario, agents act as mediators, delivering validated information in a distributed environment.

Inductive Reasoning Agents seem suitable for building Intelligent Environmental Software Applications

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 113

Agent Intelligence through Data Mining

18/09/2006 242ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Distributed Service ManagementDistributed Service Management

Customer ofCompany XY

CallCenter Field

ServiceTechnician

Company XY Subcontractorof Company XY

ControllingDepartment

Service contract

Subcontractorcontract

1. Problemdescription

2. Info andorder

3. Executionof order

4. Invoice

5. Invoice

Management of physical (maintenance) service to be carried out at the customer’s location.

Web based applicationWeb based applicationProcess monitoringProcess monitoringAutomated helpdeskAutomated helpdeskDecision supportDecision supportProactive event predictorProactive event predictorAdaptabilityAdaptabilityCustomizabilityCustomizability

Agent Intelligence through Data Mining

18/09/2006 244ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Subcontractor

<<institutional>>

Petrol Station

<<institutional>>

Union Technik<<institutional>>

Field servicetechnician

<<human>>

Call CenterEmployee

<<human>>SLIM

<<artificial>>

BUISY

<<artificial>> Controllingdepartmentemployee

<<human>>

Petrol stationleaseholder

<<human>>

Subcontracting Contract

BreakdownManagement

BreakdownFinancialControl

Maintenance of Incidental Breakdowns

GUI Agent<<artificial>>

AuthenticationAgent

<<artificial>>

PersonalAgent

<<artificial>>

Broker

<<artificial>>

DirectoryServer

<<artificial>>

User

<<human>>

authentication

is-a

is-a

is-a

is-a

AuthenticationSystem

<<artificial>>

customization

system-userinterface

EENEXPlatform

<<artificial>>

developmentsolution

developmentsolution

Plausibilitycheck

<<artificial>>

TechnicianAssignment

<<artificial>>

ProblemSolver

<<artificial>>

RequestAssistant

<<artificial>>

Architectural DiagramArchitectural Diagram

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 114

Agent Intelligence through Data Mining

18/09/2006 245ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

EmbeddingEmbedding Agents Agents intointo a Legacy a Legacy ArchitectureArchitecture

AMADEEWorkflow Control

AMADEEConnector

AGENT ACADEMYAgent Action

AMADEEConnector

AGENT ACADEMYAgent Action

Web Interface

-Full process Control through AMADEE Workflow Control-Seamless embedding in client‘s and AA architecture

AMADEEConnector

MS SQL Database

AMADEEConnector

BUISY system Action

… … …

Agent Intelligence through Data Mining

18/09/2006 246ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agent Agent CommunicationCommunication FlowFlow((forfor PlausibilityPlausibility checkschecks))

AMADEEWorkflow Control

AMADEEConnector

AGENT ACADEMYAgent ActionWeb Interface

<Request Environment="Development"><Message Type="plausibility">

<FacilityID>74143</FacilityID><Amount>50</Amount><Duration>10</Duration><UnitID>47085</UnitID><SpareParts>

<SparePart><ID>9</ID><Name>Abdeckband 80 x 2500 x 1,8 mm</Name><Supplier>ETC</Supplier><Price>28.12</Price>

</SparePart><SparePart>

<ID>12</ID><Name>Abdeckband rot, neue Ausf. f. H105</Name><Supplier>ETG</Supplier><Price>33.75</Price>

</SparePart></SpareParts><SparePartID/><Action>Check</Action><UnitName>1/2 2306</UnitName>

</Message></Request>

<Message><Plausible>True</Plausible>

</Message>

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 115

Agent Intelligence through Data Mining

18/09/2006 247ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Biotope: A selfBiotope: A self--organizing organizing agent community agent community

Simulates a parametrical ecosystem, with food, traps and obstacles.The living organisms of this ecosystem are intelligent agents.The agents live in an unknown, probably “hostile” environment.

X

Y

00

00

00

0 0 0

0 0

0

0 0 0 00

00 0 0

0 00 0

00 0

0

00 0

0

00

00

0 000

00 0

0 00

00

000

0

0 0 00

00

00 00

0 000

000

00

00

00

000

00

000

000

000

0

000 0

0

000

00 0

000

0

00

0

0 0

00

00

0 000

0

0

00

00

0

000

0

0

00

00

0

000

0

0

00

00

0

0 00

0

00

0

0 00

000

000 0

0

000

0 0

00

0

00

000

000 0

0

0

0 0

0

0

0

1

1

1

1

1

1

1

1

1

11

2

2

2

3

3

3

33 3

3

3 3

3

3 3 3

33

3

3

33 3

3

3

4

Agent Intelligence through Data Mining

18/09/2006 248ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

00

00

00

0 0 00 0

0

0 0 0 00

00 0 00 00 000 0

000 0

000

000 0000

0 0

0 0

0

0

0

0

00

00 0 0

0000

0 0

00 0

00

0

0

000

0

0

0

0

000

0

0

0

0

0

0

00

0

00

0000 0

0

00

0

00 0

0

000000

0 00

000

0 0000

0

00000

0000

0

00000

0000

0

00000

0 00

0000

0 00

00

0

000 0

0

00

0

0 0

0

0

000

000

000 0

000 0

0

0

0

1

1

1

1

1

1

1

1

1

11

2

2

2

3

3

3

33 3

33 3

3

3 3 3

33

33

33 3

33

4

00

00

00

0 0 00 0

0

0 0 0 00

00 0 00 00 000 0

000 0

000

000 0000

0 0

0 0

0

0

0

0

00

00 0 0

0000

0 0

00 0

00

0

0

000

0

0

0

0

000

0

0

0

0

0

0

00

0

00

0000 0

0

00

0

00 0

0

000000

0 00

000

0 0000

0

00000

0000

0

00000

0000

0

00000

0 00

0000

0 00

00

0

000 0

0

00

0

0 0

0

0

000

000

000 0

000 0

0

0

0

1

1

1

1

1

1

1

1

1

11

2

2

2

3

3

3

33 3

33 3

3

3 3 3

33

33

33 3

33

400

00 0

000000 0

00

0 0 001 1

2 333

4

00

00 0

000000 0

00

0 0 001 1

2 333

4

Agent SightAgent Sight

0 00 0 0

001 13

2 3 0 3

0 0 000

0 0 000

0 00 0 0 001 13 2 3 0 3 0 0 000 0 0 000

Vision Field

Vision Vector

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 116

Agent Intelligence through Data Mining

18/09/2006 249ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agent MovementAgent Movement

00

00 0

000000 0

00

0 0 001 1

2 333

4

1 2 3 4 5

6 7 8 9 10

11 12 13 14

15 16 17 18 19

20 21 22 23 24

0 00 0 00 000 0 03001 1 2 330 00 0 0 10

0 00 0 00 000 0 03001 1 2 330 00 0 0 100 00 0 00 000 0 03001 1 2 330 00 0 0 10

10

00

00 0

000040 0

00

0 0 001 1

2 333

0

1 2 3 4 5

6 7 8 9 10

11 12 13 14

15 16 17 18 19

20 21 22 23 24

00

00 0

000000 0

00

0 0 001 4

2 333

0

1 2 3 4 5

6 7 8 9 10

11 12 13 14

15 16 17 18 19

20 21 22 23 24

Agent Intelligence through Data Mining

18/09/2006 250ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agent IntelligenceAgent Intelligence

They perceive part of the environment (according to their vision capabilities), and learn what to eat and what to avoid.

They communicate with each other (when “in sight”) and exchange knowledge

0 00 0 0 001 13 2 3 0 3 0 0 000 0 0 000

Vision Vector

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 117

Agent Intelligence through Data Mining

18/09/2006 251ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The scope of BiotopeThe scope of Biotope

To use the simulating environment for evolution experiments.To develop the suitable tools for monitoring the evolution of the ecosystem.To monitor and model agent behaviors.To enhance agent intelligence by the use of genetic algorithms

Agent Intelligence through Data Mining

18/09/2006 252ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agent ReproductionAgent ReproductionReproduction condition: Energy > 800 unitsReproduction Outcome

1 offspringInherits part of the parent’s Classifier set

Dispersal distance dsDistance: Exp(ds)~(1/m).exp(-ds/m)Orientation: θ~(0,2π)

Energy variationParent: exponential decrease 400 --> 0Offspring: exponential increase 200 --> 600

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 118

Agent Intelligence through Data Mining

18/09/2006 253ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agent Intelligence through Data Mining

18/09/2006 254ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Rule evaluationRule evaluation

0 00 0 00 000 0 03001 1 2 330 00 0 0 10

0 00 0 00 000 0 03001 1 2 330 00 0 0 21

0 00 1 03 300 0 00330 1 0 020 10 0 0 7

StrengthsClassifier Set

176

210

00

00 0

0000

00 0

00

0 0 001 1

2 33

3

4240

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 119

Agent Intelligence through Data Mining

18/09/2006 255ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

00

00 0

0000

00 0

00

0 0 001 1

2 33

3

4

00

00 0

0000

00 0

00

0 0 001 4

2 33

3

0

Rule evaluationRule evaluation

0 00 0 00 000 0 03001 1 2 330 00 0 0 10

0 00 0 00 000 0 03001 1 2 330 00 0 0 21

0 00 1 03 300 0 00330 1 0 020 10 0 0 7

StrengthsClassifier Set

176

210

216

200

+8

224 24

Agent Intelligence through Data Mining

18/09/2006 256ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Different Environments Different Environments --Conclusions (1/2)Conclusions (1/2)……

tra p c ollision rate

steps

0.02

0.03

0.04

0.01

2000 400 600 800 1000 1200

ΠΡΟΣΟΜΟΙΩΣΗ 1

ΠΡΟΣΟΜΟΙΩΣΗ 2

effe ctiveness

steps

2

3

4

1

2000 400 600 800 1000 1200

ΠΡΟΣΟΜΟΙΩΣΗ 1

ΠΡΟΣΟΜΟΙΩΣΗ 2

2020Traps

240240Obstacles

140100Food

301Vision Error Percentage

3030Food refresh Steps

SIMULATION 2SIMULATION 1

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 120

Agent Intelligence through Data Mining

18/09/2006 257ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

120Exchanged Rules Percentage

200200Communication Step

1000200Genetic Algorithm Step

5003000Number of Rules

30003000Ageing Step

500500Energy

SIMULATION 2SIMULATION 1

effectiveness

steps

2

3

4

1

2000 400 600 800 1000 1200

ΠΡΟΣΟΜΟΙΩΣΗ 1

ΠΡΟΣΟΜΟΙΩΣΗ 2

unknown situa tion rate

steps

0,4

0,6

0,8

0,2

2000 400 600 800 1000 1200

ΠΡΟΣΟΜΟΙΩΣΗ 1

ΠΡΟΣΟΜΟΙΩΣΗ 2

Different Agents Different Agents -- Conclusions (2/2)Conclusions (2/2)……

Agent Intelligence through Data Mining

18/09/2006 258ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Environmental IndicatorsEnvironmental Indicators

Resource availabilityEnvironmental varietyEnvironmental reliability

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 121

Agent Intelligence through Data Mining

18/09/2006 259ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Agent Performance IndicatorsAgent Performance Indicators

Energy EffectivenessAgingFood Consumption RateTrap Collision RateReproduction RateUnknown Situation Rate

Agent Intelligence through Data Mining

18/09/2006 260ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

ConclusionsConclusions & Future & Future DirectionsDirections

In agent communities, the emphasis is on how to model the common problem.

The combination of Genetic Algorithms, a classifier system, and an advanced agent communication framework, proves capable of handling dynamic and complex problems.

The Biotope infrastructure can be used to model and simulate distributed computational systems, where the agents are the computational entities, food represents the resources of the system, traps resource losses, and obstacles represent system incompatibilities.

A test case could be a community of agents roaming the Web, either collaborating or competing over its digital resources, while fragmentarily perceiving their environments.

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 122

Agent Intelligence through Data Mining

18/09/2006 261ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

An agentAn agent--based ebased e--auction auction environmentenvironment

Protocol resembles to English double auctionsMany different strategies deployed by agentsAgents: multiple Buyers and multiple Sellers that trade over many goods

Agent Intelligence through Data Mining

18/09/2006 262ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

The auction environmentThe auction environment

sellerj

selleri

Auction workspaceAuction workspacebuyersbuyers

bb11

bb22

bbNN

……

…………

Hello!Hello!

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 123

Agent Intelligence through Data Mining

18/09/2006 263ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Negotiating agentsNegotiating agents

Negotiations take place between a buyer and a seller in order to reach an agreement

buyerbuyer

PROPOSAL

bid1

ask1

ACCEPT

bidn

seller

n transactions

Agent Intelligence through Data Mining

18/09/2006 264ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Improving the behavior of a Improving the behavior of a buyerbuyer

Transactional Transactional Historical DataHistorical Data

Strategy

Rule Base

Data Mining

Buyer Agent

Trend Trend AnalysisAnalysis

buyerbuyer

PROPOSAL

bid1

ask1

ACCEPT

bidm

seller

m < ntransactions

Reducing communication by training

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 124

Agent Intelligence through Data Mining

18/09/2006 265ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

How can DM improve How can DM improve decisions in auctions?decisions in auctions?

The number of messages in a negotiation can be decreasedDM can provide a tool for identifying hidden patterns (trends) in the history of previous transactionsEvaluation criteria can be used to update-improve an agent’s rule base (e.g. fuzzy criteria)

Agent Intelligence through Data Mining

18/09/2006 266ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Part 13 Part 13 –– Open IssuesOpen Issues

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 125

Agent Intelligence through Data Mining

18/09/2006 267ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data mining and intelligent agents (1/2)Data mining and intelligent agents (1/2)

How to determine safety and soundness in multi-agent systemsHow to specify a methodology for developing intelligent (through data mining) multi-agent applicationsWhen and how to perform agent retraining

Agent Intelligence through Data Mining

18/09/2006 268ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Data mining and intelligent agents (2/2)Data mining and intelligent agents (2/2)

How to develop self-improving agents through dm How to develop tools and techniques for data mining agent behaviorHow to specify dm metrics that take semantics seriously into accountHow to evaluate agent “intelligence”

Agent Intelligence through Data Mining 18/9/2006

ECML/PKDD 2006 - Berlin 126

Agent Intelligence through Data Mining

18/09/2006 269ECML/PKDD 2006, Berlin, Germany

AUTHAUTH

Thank youThank youPericles A. Mitkas

Associate ProfessorElectrical and Computer Engineering

Aristotle Univ. of ThessalonikiEmail: [email protected]

Andreas L. Symeonidis

Research AssociateInformatics & Telematics InstituteCenter for Engineering Research and Technology – Hellas (CERTH) Email:[email protected]


Recommended