Institut für Softwarewissenschaft - Universität Wien
P.Brezany1
Toward Knowledge Discovery inDatabases Attached to Grids
Peter Brezany
Institute for Software Science
University of Vienna
E-mail : [email protected]
Institut für Softwarewissenschaft - Universität Wien
P.Brezany2
Media That Radically Influenced Society
Web
1500sPrinting Press
1840sPenny Post
1850sTelegraph
1920sTelephone
1930sRadio
1990s
1950s TV
20xxGrid
Institut für Softwarewissenschaft - Universität Wien
P.Brezany3
Talk Outline
• Data Mining on the Grid – Background Information
• Application Examples
• Architecture of a Traditional Data Mining System
• GridMiner – A framework for Data Mining on the Grid
• GridMiner Architecture
• Functional and Data Access Model
• Conclusions
Institut für Softwarewissenschaft - Universität Wien
P.Brezany4
Data Mining on the Grid
• Data mining on the Grid (DMG) : finding unknown data patterns in an environment with geographically distributed data and computation.
• Data may be highly heterogeneous with a high update frequency
• A good DMG algorithm analyzes data in a distributed fashion with modest data communication overhead.
• A typical DMG algorithm involves local data analysis followed by the generation of a global data model.
Institut für Softwarewissenschaft - Universität Wien
P.Brezany5
Application Examples
• Finding out the dependency of the emergence of hepatitis-C on the weather patterns: access to a large hepatitis-C DB at one location and an environmental DB at another location.
• 2 major financial organizations want to cooperate. They need to share data patterns relevant to the data mining task, they do not want to share the data since it is sensitive - combining the databases may not be feasible.
• Federating Brain Data Project – Integrating several neuro-science DBs
• A major multi-national corporation wants to analyze the customer transaction records for quickly developing successful business strategies. - It has thousands of establishments through out the world
- Collecting all the data to a centralized data warehouse, followed by analysis using existing commercial data mining software,takes too long.
Institut für Softwarewissenschaft - Universität Wien
P.Brezany6
Telemedical ApplicationsAMG – Austrian Medical Grid
Web
Raw Medical Data
Reconstructed Medical Data
Derived Medical DataDatabase Database
Institut für Softwarewissenschaft - Universität Wien
P.Brezany7
Telemedical Collaboration - Example
A patient living in a remote village has a heart problem.
An EEG is taken by the local doctor and all the patient’s detailsare stored in the doctor’s PC based telemedical system.
MRI and CT scans are taken within different departments of ageneral hospital and stored in the telemedical DB. A consultantcompiles a report and saves it in the DB.
If necessary, in a specialized clinic a 3D ultrasound scan is takenand further report compiled.
Requiring complicated surgery, an external specialist using VirtualReality techniques defines how the surgery should be planned.The resulting operation is placed on video for, e.g., education.
Data mining support/assistance is needed.
Institut für Softwarewissenschaft - Universität Wien
P.Brezany8
Architecture of a Data Mining System
Graphical user interface
Pattern evaluation
Data mining engine
Database or data warehouse server
Knowledge base
Database Datawarehouse
FilteringData cleaning, data integration
Institut für Softwarewissenschaft - Universität Wien
P.Brezany9
On Line Analytical Mining (OLAM)
Institut für Softwarewissenschaft - Universität Wien
P.Brezany10
GridMiner – A Framework for Data Mining on Grids
System Requirements:- Algorithm and data publishing and integration- Compatibility with grid infrastructure and Grid awareness- Openness- Scalability- Security and data privacy
Functionality requirements:- Mining different kinds of knowledge in databases- Incremental data mining algorithms- Interactive mining of knowledge at multiple levels of abstraction
Institut für Softwarewissenschaft - Universität Wien
P.Brezany11
GridMiner (Layered) Architecture(Based on the K.F. Jeffery´s idea)
Institut für Softwarewissenschaft - Universität Wien
P.Brezany12
Functional and Data Access Model
MDS
Institut für Softwarewissenschaft - Universität Wien
P.Brezany13
Example: Mining Patterns for Data Classification and
Associations
use database dat1, dat2mine classificationsanalyze credit_ratingusing g_parsimonydisplay as tree
use database DBs attributesmine associationsusing method attributesdisplay as rules
Institut für Softwarewissenschaft - Universität Wien
P.Brezany14
Knowledge Grid Architecture Layers
Generic Grid and Data Grid Services
KnowledgeDirectory Service
Resource AllocationExecution Management
DataAccess Service
Tools and AlgorithmsAccess Service
Execution PlanManagement
Result Present.Service
High level layer
Core layer
Institut für Softwarewissenschaft - Universität Wien
P.Brezany15
Conclusions
• Grid data mining is a relevant research topic• GridMiner approach may contribute to this research
domain• Collaborations are needed• IPG (Information Power Grid) is the only Grid project,
which wants to addresss knowledge discovery issues• Looking for a pilot application(s)• Open issues
- basic Grid technology: Globus, DataGrid,
Jini, JXTA ?
Institut für Softwarewissenschaft - Universität Wien
P.Brezany16
Data Storage and the Components
Site A Site B Site C Site D
Preprocesing Preprocessing Preprocessing Preprocessing
Local DM Local DM Local DM Local DM
Construction of the Global Model
GUI Site E