Intelligent Internet Agents for Distributed Data Mining
{yzhang, sowen, sprasad, raj}@[email protected]
Yanqing Zhang, Scott Owen, Sushil Prasad and Raj Sunderraman
Department of Computer Science
Georgia State University
George Vachtsevanos
School of Electrical and Computer Engineering
Georgia Institute of Technology
Outline• Motivation
• Architecture of Intelligent Internet Agents
• Program Libraries of Intelligent Middleware
• Smart Web Search Agents
• Intelligent Soft Computing Agents
• Benefits
• Deliverables
• Conclusion
Motivation• Distributed Web KDD: Useful information and
knowledge mined in distributed Web databases
• QoS (Efficiency, Web Speed, User Time) : Huge amounts of useless data flow on the Internet
• From Data Web to Information Web: Upgrade a current data-flow-oriented Internet to a future information-flow-oriented Internet
• Intelligent Web Middleware: with reusable, portable and scalable intelligent functionality
• Smart E-Business: Use intelligent Web agents to do better E-Business on the Internet
Architecture of Intelligent Internet Agents
Application Layer: E-Commerce, E-Education, other E-B
Intelligent Layer: Data Mining, Soft Computing, ES, etc
Network Layer: Backbone, gigaPoPs, other hardware
Program Libraries of Intelligent Middleware1. Binary Association Rule Generator2. Fuzzy Association Rule Generator3. Neural-Net-based Data Classifier and Pattern Generator4. Fuzzy c-means Program for Data Clustering5. Genetic Algorithms for Data Refinement and Optimization6. Granular Neural Nets for Linguistic Data Mining7. XML-based Smart Web Search Sub-Programs8. Connection Programs between Database and Middle Layer9. Local Cache Database Manager10. Local Cache Informationbase Manager11. Basic GUI Programs12. Client-Server Creation and Communication Programs13. Distributed Operation Manager14. Distributed Data Mining Synchronization, 15. Web Customer Log Miner, .….. , and so on.
Smart Web Search Agents• Data Search Engines >> Information Search Agents
- Traditional searching on the Web is done using one of the following three:
- Directories (Yahoo, Lycos, etc) - Search Engines (AltaVista, NorthernLight, etc) - Metasearch Engines (MetaCrawler,
SavvySearch, AskJeeves, etc) All of these involve keyword searches;
Drawback: not easily personalized, too many results (although many give
relevancy factors)
- Smart Search Agents will provide
- more personalized searches
- domain-based search,
- more efficient searches
Smart Search Agents will employ - local cache databases (containing
frequently asked queries/results; possibly updated periodically - nightly!)
- local cache information base (containing mined information and discovered knowledge for efficient personal use)
- domain-based agents (e.g. Job Search; Sports-NBA Stats, Bibliography-Digital Libraries)
Some initial results:• M. Nagarajan, Metagenie - A metasearch engine for
multi-databases, M.S. thesis, GSU (July 1999) Domains: Jobs, Books• S. Ahmed, EXACT-FINDER: A cache-based meta-search
engine, M.S. thesis, GSU (May 2000) Local cache database storing personalized frequently
asked queries and results, updated periodically• R. Sunderraman, ReQueSS: Relational Querying of semi-
structured data, ICDE 2000 (demo session), San Diego, CA, March 2000.
• X. Li, Querying unified sources of Web data, M.S. thesis, GSU (July 1999)
Data wrappers for Web sources (NBA stats/box scores, DBLP Bibliography database)
Intelligent Tools for E-Business• Computational Intelligence, Neural Networks,
Fuzzy Logic, Genetic Algorithms, Hybrid Systems
• Learning Algorithms, Heuristic Searching
• Data Analysis and Modeling, Data Fusion and Mining, Knowledge Discovery
• Prediction & Time Series Analysis
• Information Retrieval, Intelligent User Interface
• Intelligent Agents, Distributed IA and Multi-Agents, Cooperative Knowledge-based Systems
Enhancing E-Business Process Through Data Mining
• Quality of discovered knowledge– Having right data– Having appropriate
data mining tools!!!
D a ta M in in g( Kn o w led g e d is c o v er y )
D AT A W ar eh o u s e
D AT A W ar eh o u s e
D AT A W ar eh o u s e
F ailu r e P atte r n s
Su cces s P at t ern s
F A IL U R E P at t ern s
SU C C E SS P at t ern s
• Traditional Data Mining Tools
– Simple query and reporting
– Visualization driven data exploration tools, OLAP
– Discovery process is user driven
Intelligent Data Mining Tools
• Automate the process of discovering patterns/knowledge in data
• Require hypothesis, exploration• Derive business knowledge (patterns) from data• Combine business knowledge of users with
results of discovery algorithms
D AT A W ar eh o u s e
D AT A W ar eh o u s e
D AT A W ar eh o u s e
F ailu r e P a tte r n s
Su cces s P at t ern s
F A IL U R E P at t ern s
SU C C E SS P at t ern s
Intelligent Information Agents
• The Data Mining Problem:– Clustering/ Classification– Association– Sequencing
• Viewed as an Optimization Problem
• Tools: Genetic Algorithms
Fuzzy Rules Discovering• Rules discovering : The discovery of associations
between business events, i.e. which items are purchased together
• In order to do flexible querying and intelligent searching, fuzzy query is developed to uncover potential valuable knowledge
• Fuzzy Query uses fuzzy terms like tall, small, and near to define linguistic concepts and formulate a query
• Automated search for fuzzy Rules is carried out by the discovery of fuzzy clusters or segmentation in data
Fuzzy Decision Making:Match Users with Dynamic Products, Services, and Pricing
Loss Ratio(Risk)
Response
Persistency(Retention)
Low Medium High
Lo
w
Med
ium
Hig
h
Low Medium
High
Low RiskHigh ResponseHigh Retention
->Customer: Preferred
Pricing: according to Life-time Value
Cross-Selling: BundleExtra Liability Insurance
(Risk-Response-Retention ( R ) Model)3
Example of 3 Service Provider’s Features
Measuring Performance of Intelligent Agents
• Accuracy : distance or variance measure of IAs’ performance from their goal, i.e. Fuzzy Entropy
• Speed : latency of response
• Cost : resources consumed, consequences of failures
• Benefit : payoff for goals achieved
...BenefitwCostwSpeedwAccuracyw IAP 4321 ...BenefitwCostwSpeedwAccuracyw IAP 4321
Performance Assessment, Learning and Optimization
D AT A W ar eh o u s e
D AT A W ar eh o u s e
D AT A W ar eh o u s e
F ailu r e P a tte r n s
Su cces s P at t ern s
F A IL U R E P at t ern s
SU C C E SS P at t ern s
Learning/Adaptation
Learning/Adaptation
Performance Evaluation Module
Performance Evaluation Module
Goals/Objectives
Goals/Objectives
Examples• Product Information Clustering
– Use a GA as the Heuristic Search Engine– Apply the GA selection and inversion operators– Evaluate information content– Estimate system entropy– Apply reinforcement learning strategy
• Dynamic Pricing– In addition to above steps, explore association
and sequencing relations
The “New Technology” Paradigm
InternetRelatedTechnologies
Euphoria/Optimism Reality
Back to Basics
Time
INFORMATION IS SELLING NOW!
Intelligent Agents will give your information product bargaining power
Benefits• Better QoS:
- Web users get information (not raw data)
- Smart agents can make decisions for users
- Smart agents can save users’ surfing time
• Faster Internet:
- Information flows on the Internet quickly (e.g., 1k information << 100 k raw data)
- Reduce data redundancy on the Internet
- Reduce Web communication congestion
Deliverables
• Intelligent Middle Layer
- Data Mining Program Libraries
- Soft Computing Program Libraries (e.g., Neural Networks, Fuzzy Logic, Genetic Algorithms, Neuro-fuzzy Systems)
• Application Layer - Smart Web Search Agents
- Intelligent Soft Computing Agents
Conclusion
• To make the future Internet more intelligent and more efficient, it is necessary to design relevant "Intelligent Middleware" between network hardware and high-level Web application systems.
• We will first design basic intelligent middle layer with basic intelligent functionality, and then implement two Web application systems for distributed data mining and E-Business.