Date post: | 01-Apr-2015 |
Category: |
Documents |
Upload: | nyasia-ricard |
View: | 212 times |
Download: | 0 times |
www.monash.edu.au
Advanced Topics in Data Mining and Research DirectionsAdvanced Topics in Data Mining and Research Directions
CSE5610 Intelligent Software Systems
Semester 1, 2006
www.monash.edu.au
2
Outline
• Mining Different Data Types
– Spatial, Temporal, Time Series, Data Streams, Multimedia, XML, Web, Text etc.
• Distributed Data Mining (DDM)
• Mobile & Ubiquitous Data Mining (UDM)
• Data Mining E-Services
• Anytime, Anywhere Data Mining E-Services
www.monash.edu.au
3
Generations of Data Mining
• Four Generations of Data Mining Systems – Robert Grossman
• First Generation
– Stand Alone, Centralised, Single Algorithm
• Second Generation
– Integration with databases, support for high-dimensionality, complex data types
• Third Generation
– Distribution and Heterogeniety
• Fourth Generation
– Support for mining embedded, mobile and ubiquitous data sources
www.monash.edu.au
Distributed Data Mining
www.monash.edu.au
5
Distributed Data Mining
• Inherently distributed data
• MNC + Global Markets
• => Physical/geographical separation of users from the data sources
• Traditional data mining model involving the co-location of users, data and computational resources is inadequate
www.monash.edu.au
6
Distributed Data Mining (DDM)
• The inherent distribution of data and other resources as a result of organisations being distributed.
• The large volumes of data, the transfer of which results in exorbitant communication costs.
• The need to mine heterogeneous data, the integration of which is both non-trivial and expensive.
• The performance and scalability bottle necks of data mining.
www.monash.edu.au
7
Distributed Data Mining (DDM)
• DDM = Data Mining (DM) + Knowledge Integration (KI)
• DM - Performing traditional knowledge discovery at each distributed data site.
• KI - Merging the results generated from the individual sites into a body of cohesive and unified knowledge.
www.monash.edu.au
8
Parallel Data Mining (PDM)
• Principal distinction between DDM & Parallel DM– parallel mining involves parallel processors
with or without shared memory
• Parallel data mining also includes development of parallel versions of traditional data mining techniques.
• Can be integration – DecisionCentre
www.monash.edu.au
9
DDM – Algorithms & Architectures
• Research in distributed data mining can be divided into two broad categories [Fu01]:
• Data Mining Algorithms. – focus on efficient techniques for knowledge
integration.
• Distributed Data Mining Architectures.– focus on development of distributed data mining
architectures
– emphasizes the processes and technologies that support construction of software systems to perform distributed data mining
www.monash.edu.au
10
Taxonomy of DDM Architectures
Distributed DataMining Systems
Client-Server Agents
Stationary Mobile
Architectures
Self-directedmigration
www.monash.edu.au
11
Classification – DDM Systems
DDM Architectural Models DDM Systems
Client-server DecisionCentre [CDG99], IntelliMiner [PaS99, PaS01], InterAct [PaD02]
Agents Mobile Agent Stationary Agent
JAM [SPT97], Infosleuth [UMG98, MUU99], BODHI [KPH99], Papyrus [Ram98], PADMA [KHS97a, KHS97b]
www.monash.edu.au
12
Client-Server DDM
PC Workstation Laptop
Data Mining Sever
DataServer 2
DataTransfer
UserData Mining
Request
DataMiningResults
DataServer 1
www.monash.edu.au
13
Mobile Agent Model for DDM
PC Workstation
Task Controlling Agent
USERS
Agent SystemData MiningResult Agent
Data MiningResult Agent
DirectoryService
KnowledgeIntegration Agent
Data Resource Agents
DataServer 1
DataServer 1
Laptop
Data Mining Agents
www.monash.edu.au
14
Hybrid Model for DDM
DDM Server
Agent Centre
DataSource 1
DataSource2
DataSource n
ClientServer
AgentAgent
Optimiser
www.monash.edu.au
Ubiquitous Data Mining
www.monash.edu.au
16
Ubiquitous Data Mining (UDM)
• Mining data in a resource-constrained environment to support the time critical information needs of mobile users
• Typical Characteristics– Mobile User – frequent disconnections– Handheld Device -
> Resource constraints – memory, battery, processor, screen real-estate
– Time critical– Real-time & On-line – Data Streams
• Example Scenarios
• Many Challenges
www.monash.edu.au
17
Current Research
• Kargupta’s Group– MobiMine
• @CSSE, Monash Univ.– AgentUDM
– Adapative, Cost-efficient & Light-weight data mining techniques for data streams
> Mohamed Medhat > LWC, LWF & LWClass
> Watch this space!!!
www.monash.edu.au
Data Mining E-Services
www.monash.edu.au
19
Data Mining E-Services
• “…data analysis and mining functions themselves will be offered as business intelligence e-services that accept operational data from clients and return models or rules”
Umesh Dayal, 2001
•Why? – Knowledge is a key resource – Cost of data mining infrastructure
www.monash.edu.au
20
Data Mining E-Services
• Current Commercial Landscape– Several ASPs -> DigiMine, Information Discovery,
WhiteCross Systems, ListAnalyst.com etc. etc.
– Mode of Operation
• Hybrid Model & Data Mining ASPs– Optimise Response Time
> Leads to improved throughput
– QoS Estimation
– Location Preferences of Clients
www.monash.edu.au
21
Data Mining E-Services
• Current Commercial Landscape– Several ASPs -> DigiMine, Information Discovery,
WhiteCross Systems, ListAnalyst.com etc. etc.
– Mode of Operation
• Hybrid Model & Data Mining ASPs– Optimise Response Time
> Leads to improved throughput
– QoS Estimation
– Location Preferences of Clients
www.monash.edu.au
Anytime, Anywhere Data Mining E-Services
www.monash.edu.au
23
My Thoughts
• Data is a commodity, Analysis is a service
• Access anytime, anywhere• By anyone…
– From large corporations to small business to individuals
• From home buyers to mobile salespersons to grocery shoppers…
www.monash.edu.au
24
My Thoughts
• A preliminary model for delivery– Datacentric Grids
High Performance Servers
MiningAlgorithms
ModelRepository
Mobile AgentManagement
System
Model Query
Compute NewModel Request
+Remote User
Data
Compute NewModel Request
+User Data
Compute NewModelRequest
Compute NewModelRequest + UserComputation
Data Repository
Data1
Data2
Datan
PrivateDatacentric
Grid
Compute NewModel Request+ User Data +
UserComputation
Datacentric Grid Management Module
www.monash.edu.au
References
www.monash.edu.au
26
References
• http://www.csse.monash.edu.au/projects/MobileComponents/projects/dame/
• http://www.csse.monash.edu.au/~shonali/research.html
• http://www.csee.umbc.edu/~hillol/DDMBIB/
• http://www.csee.umbc.edu/~hillol/diadic.html
• http://www.csse.monash.edu.au/~mgaber/main.html