Post on 28-Nov-2014
description
transcript
Knowledge discovery & data mining
Towards KD Support Environments
Fosca Giannotti and Dino PedreschiPisa KDD Lab
CNUCE-CNR & Univ. Pisahttp://www-kdd.di.unipi.it/
A tutorial @ EDBT2000
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 2
Module outline
Data analysis and KD Support Environments
Data mining technology trends from tools … … to suites … to solutions
Towards data mining query languagesDATASIFT: a logic-based KDSEFuture research challenges
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 3
Vertical applications
We outlined three classes of vertical data analysis applications that can be tackled using KDD & DM techniques
Fraud detection Market basket analysis Customer segmentation
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 4
Why are these applications challenging?
Require manipulation and reasoning over knowledge and data at different abstraction levels conceptual
semantic integration of domain knowledge, expert (business) rules and extracted knowledge
semantic integration of different analysis paradigms logical/physical
interoperability with external components: DBMS’s, data mining tools, desktop tools
querying/mining optimization: loose vs. tight coupling between query language and specialized mining tools
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 5
Why are these applications challenging?
The associated KDD processneeds to be carefully specified, tuned and controlled
Selection and Preprocessing
Data Mining
Interpretation and Evaluation
Data Consolidation
Knowledge
p(x)=0.02
Warehouse
Data Sources
Patterns & Models
Prepared Data
ConsolidatedData
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 6
Why are these applications challenging?
Still not properly supported by available KDD technology
what is offered: horizontal, customizable
toolkits/suites of data mining primitives
what is needed: KD support environments for
vertical applications
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 7
Traditional Focus on knowledge
transfer, design and coding
30% - analysis and design
70% - program design, coding and testing
Prototyping - expensive Development process
has few loops Maintenance requires
human analysis
Data mining Focus on data
selection, representation and search
70% - data preparation 30% - model generation
and testing Prototyping - cheap Development process is
inherently iterative Maintenance requires
re-learning model
Datamining vs. traditional Sw development process
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 8
From R. Agrawal’s invited lecture @ KDD’99
The greatest peril in the development of a high-tech market lies in making the transition from an early market dominated by a few visionaries to a mainstream market dominated by pragmatists.
Early MarketMainstream Market
Chasm
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 9
Is data mining in the chasm?
Perceived to be sophisticated technology, usable only by specialists
Long, expensive projectsStand-alone, loosely-coupled with
data infrastructuresDifficult to infuse into existing
mission-critical applications
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 10
Module outline
Data analysis and KD Support Environments
Data mining technology trends from tools … … to suites … … to solutions
Towards data mining query languagesDATASIFT: a logic-based KDSEFuture research challenges
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 11
Generation 1: data mining tools
~1980: first generation of DM systemsresearch-driven tools for single tasks,
e.g. build a decision tree - say C4.5 find clusters - say Autoclass (Cheeseman 88) …
Difficult to use more than one tool on the same data – lots of data/metadata transformation
Intended user: a specialist, technically sophisticated.
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 12
Generation 2: data mining suites
~1995: second generation of DM systems
toolkits for multiple tasks with support for data preparation and interoperability with DBMS, e.g. SPSS Clementine IBM Intelligent Miner SAS Enterprise Miner SFU DBMiner
Intended user: data analyst – suites require significant knowledge of statistics and databases
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 13
Growth of DM tools (source: kdnuggets.com)
From G. Piatetsky-Shapiro. The data-mining industry coming of age. IEEE Intelligent Systems, Dec. 1999.
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 14
Generation 3: data mining solutions
Beginning end of 1990svertical data mining-based applications
and solutions oriented to solving one specific business problem, e.g. detecting credit card fraud customer retention …
Address entire KDD process, and push result into a front-end application
Intended user: business user – the interfaces hid the data mining complexity
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 15
Emerging short-term technology trends
Tighter interoperability by means of standards which facilitate the integration of data mining with other applications: KDD process, e.g. the Cross-Industry
Standard Process for Data Mining model (www.crisp-dm.org)
representation of mining models: e.g., the PMML - predictive modeling markup language (www.dmg.org)
DB interoperability: the Microsoft OLE DB for data mining interface
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 16
Approaches in data mining suites
Database-oriented approach IBM Intelligent Miner
OLAP-based mining DBMiner - Jiawei Han’s group @ SFU
Machine learning CART, ID3/C4.5/C5.0, Angoss Knowledge
StudioStatistical approaches
The SAS Institute Enterprise Miner.
Visualization approach: SGI MineSet, VisDB (Keim et al. 94).
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 17
Other approaches in data mining suites
Neural network approach: Cognos 4thoughts, NeuroRule (Lu et al.’95).
Deductive DB integration: KnowlegeMiner (Shen et al.’96)
Datasift (Pisa KDD Lab - see refs).
Rough sets, fuzzy sets: Datalogic/R, 49er
Multi-strategy mining: INLEN, KDW+, Explora
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 18
SFU DBMiner: OLAP-centric mining
Warehouse
Workplace
Active Object
Elements
Active Object
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 19
IBM Intelligent Miner – DB-centric mining
Mining Base Container
Contents Container
Work Area
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 20
IBM – IM architecture
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 21
Angoss Knowledge Studio: ML-centric mining
Project Outline
Work Area
Additional Visualizatio
ns
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 22
KS project outline tool
(Limited) support to the KDD process
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 23
Support for data consolidation stepDBMiner
ODBC databases – SQL + SmartDrives Single database – multiple tables Consolidation of heterogeneous sources
unsupported Intelligent Miner
DB2 and text – SQL without SmartDrives Multiple databases Consolidation of heterogeneous sources
supportedKnowledge Studio
ODBC databases and text Single table Consolidation of heterogeneous sources
unsupported
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 24
Support for sselection and election and ppreprocessingreprocessing
DBMiner SQL only
Intelligent Miner SQL + standard and advanced
statistical functionalities
Knowledge Studio descriptive statistics
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 25
Support for data mining step
DBMiner Association rules Decision trees Prediction
Intelligent Miner Associations rules Sequential
patterns Clustering Classification Prediction Similar time series
Knowledge StudioDecision treesClusteringPrediction
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 26
Support for interpretation and evaluation
Predefined interestingness measuresEmphasis on visualizationLimited export capability of analysis
resultsGain charts for comparison of
predictive models (KS and IM)Limited model combination capabilities
(KS)
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 27
Module outline
Data analysis and KD Support Environments
Data mining technology trends from tools … … to suites … … to solutions
Towards data mining query languagesDATASIFT: a logic-based KDSEFuture research challenges
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 28
Data Mining Query Languages
A DMQL can provide the ability to support ad-hoc and interactive data mining
Hope: achieve the same effect that SQL had on relational databases.
Various proposals: DMQL (Han et al 96) mine operator (Meo et el 96) M-SQL (Imielinski et al 99) query flocks (Tsur et al 98)
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 29
MINE operator of (Meo et al 96)
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 30
References - DMQL
J. Han, Y. Fu, W. Wang, K. Koperski, and O. R. Zaiane. DMQL: A Data Mining Query Language for Relational Databases. In Proc. 1996 SIGMOD'96 Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD'96), pp. 27-33, Montreal, Canada, June 1996.
R. Meo, G. Psaila, S. Ceri. A New SQL-like Operator for Mining Association Rules. In Proc. VLDB96, 1996 Int. Conf. Very Large Data Bases, Bombay, India, pp. 122-133, Sept. 1996.
T. Imielinski and A. Virmani. MSQL: a query language for database mining. Data Mining and Knowledge Discovery, 3:373-408, 1999.
S. Tsur, J. Ulman, S. Abiteboul, C. Clifton, R. Motwani, S. Nestorov. Query flocks: a generalization of association rule mining. In Proc. 1998 ACM-SIGMOD, p. 1-12, 1998.
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 31
Module outline
Data analysis and KD Support Environments
Data mining technology trends from tools … … to suites … … to solutions
Towards data mining query languagesDATASIFT: a logic-based KDSEFuture research challenges
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 32
DATASIFT - towards a logic-based KDSE
DATASIFT is LDL++ (Logic Data Language, MCC & UCLA) extended with mining primitives (decision trees & association rules)
LDL++ syntax: Prolog-like deductive rules
LDL++ semantics: SQL extended with recursion (and more)
Integration of deduction and inductionEmployed to systematically develop the
methodology for MBA and audit planningSee Pisa KDD Lab references
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 33
Our position
A suitable integration of deductive reasoning (logic database
languages) inductive reasoning (association rules &
decision trees)
provides a viable solution to high-level problems in knowledge-intensive data analysis applications
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 34
Our goal
Demonstrate how we support design and control of the overall KDD process and the incorporation of background knowledge data preparation knowledge extraction post-processing and knowledge
evaluation business rules autofocus datamining
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 35
With respect to other DMQL’s
extending logic query languages yields extra expressiveness, needed to bridge the gap between data mining (e.g., association rule mining) vertical applications (e.g., market basket
analysis)
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 36
Architecture - client agent
User interface
Access to business rules and visualization of results through
web browser to control interaction MS Excel objects (sheets and charts) to
represent output of analysis (association rules)
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 37
Architecture - server agent
A query engine (mediator) record previous analyses Metadata/meta knowledge interaction with other components
LDL++ server extended with external calls to DBMSs and to …
Inductive modules Apriori classifiers (decision trees)
Coupling with DBMS using the Cache-mine approach Performance comparable with SQL-based
approaches on same mining queries (Giannotti at el 2000)
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 38
Deductive rules in LDL++
E.g.: select transactions involving milkmilk_basket(T,I) basket(T,I),basket(T,milk).
Querying ?- milk_basket(T,I)milk_basket(2,bread). milk_basket(3,bread).milk_ basket(2,milk). milk_basket(3,orange).milk_ basket(2,onions). milk_basket(3,milk). milk_ basket(2,fish).
A small database of cash register transactions
basket(1,fish). basket(2,bread). basket(3,bread).basket(1,bread). basket(2,milk). basket(3,orange).
basket(2,onions). basket(3,milk).basket(2,fish).
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 39
Aggregates in LDL++
E.g.: count occurrences of pairs of distinct items in all transactionspair(I1,I2,count<T>) basket(T,I1),basket(T,I2),I1 I2.
A small database of cash register transactions
basket(1,fish). basket(2,bread). basket(3,bread).basket(1,bread). basket(2,milk). basket(3,orange).
basket(2,onions). basket(3,milk).basket(2,fish).
aggregate
Querying ?- pair(fish,bread,N)pair(fish,bread,2) (i.e., N=2)
Aggregates are the logical interface between deductive and inductive environment.
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 40
Association rules in LDL++
E.g., compute one-to-one association rules with at least 40% supportrules(patterns<0.4,0,{I1,I2}>)basket(T,I1),basket(T,I2).
basket(1,fish). basket(2,bread). basket(3,bread).basket(1,bread). basket(2,milk). basket(3,orange).
basket(2,onions). basket(3,milk).basket(2,fish).
patterns is the aggregate interfacing the computation of association rules
patterns<min_supp, min_conf, trans_set>
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 41
Association rules in LDL++
Result of the query ?- rules(X,Y,S,C)rules({milk},{bread},0.66,1)
i.e. milk bread [0.66,1]
rules({bread},{milk},0.66,0.66)
rules({fish},{bread},0.66,1)
rules({bread},{fish},0.66,0.66)
Same status for data and induced rules
basket(1,fish). basket(2,bread). basket(3,bread).basket(1,bread). basket(2,milk). basket(3,orange).
basket(2,onions). basket(3,milk).basket(2,fish).
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 42
Reasoning on item hierarchies
P ro d u ct (ite m )
F a m ily
S e c to r
D e pa rtm e nt Which rules survive/decay up/down the item hierarchy?
rules_at_level(I,pattern<S,C,Itemset>) itemset_abstraction(I,Tid,Itemset).
preserved_rules(Left,Right)
rules_at_level(I,Left,Right,_,_),rules_at_level(I+1,Left,Right,_,_).
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 43
Business rules: reasoning on promotions
Which rules are established by a promotion?
interval(before, -, 3/7/1998).interval(promotion, 3/8/1998, 3/30/1998).
interval(after, 3/31/1998, +).
established_rules(Left, Right) not rules_partition(before, Left, Right, _, _),
rules_partition(promotion, Left, Right, _, _),
rules_partition(after, Left, Right, _, _).
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 44
Business rules: temporal reasoning
How does rule support change along time?
25
/11
/97
26
/11
/97
27
/11
/97
28
/11
/97
29
/11
/97
30
/11
/97
01
/12
/97
02
/12
/97
03
/12
/97
04
/12
/97
05
/12
/97
0
5
10
15
20
25
30
35
Support Pasta => Fresh Cheese 14
Bread Subsidiaries => Fresh Cheese 28
Biscuits => Fresh Cheese 14
Fresh Fruit => Fresh Cheese 14
Frozen Food => Fresh Cheese 14
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 45
Decision tree construction in DATASIFT
construct training and test set using rulestraining_set(P,Case_list) ...test_tuple(ID,F1,...,F20,Rec,Act_rec,CAR)
...construct classifier using external call to C5.0tree_rules(Tree_name,P,PF,MC,BO,Rule_list) training_set(P,Case_list),tree_induction(Case_list,PF,MC,BO,Rule_list).
parameters pruning factor PF misclassification costs MC boosting BO external call
induced classifier
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 46
Putting decision trees at work
prediction of target variableprediction(Tree_name,ID,CAR,Predicted_CAR)
tree_rules(Tree_name, _ ,_ , _ , Rule_list),test_subject(ID, F1, …, F20, _, _, CAR),classify(Rule_list ,[F1, …, F20], Predicted_CAR).
Model evaluation: actual recovery of a classifier (=sum recovery of tuples classified as positive)
actual_recovery(Tree_name,sum<Actual_Recovery>) prediction(Tree_name, ID, _ , pos),test_subject(ID, F1, …, F20, _,Actual_Recovery, _).
aggregate
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 47
Combining decision trees
Model conjunction: tree_conjunction(T1,T2,ID,CAR,pos)
prediction(T1, ID, CAR, pos),prediction(T2, ID, CAR, pos).
tree_conjunction (T1, T2, ID, CAR, neg) test_subject(ID, F1, …, F20, _, _, CAR),~ tree_conjunction(T1, T2, ID, CAR, pos).
More interesting combinations readily expressible: e.g. meta learning (Chan and Stolfo 93)
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 48
We proposed ...
a KDD methodology for audit planning: define an audit cost model monitor training- and test-set construction assess the quality of a classifier tune classifier construction to specific policies
and its formalization in a prototype logic-based KDSE, supporting: integration of deduction and induction integration of domain and induced knowledge separation of conceptual and implementation
level
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 49
Module outline
Data analysis and KD Support Environments
Data mining technology trends from tools … … to suites … … to solutions
Towards data mining query languagesDATASIFT: a logic-based KDSEFuture research challenges
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE
1. Integration with data warehouse and relational DB
2. Scalable, parallel/distributed and incremental mining
3. Data mining query language optimization
4. Multiple, integrated data mining methods
5. KDSE and methodological support for vertical appl.
6. Interactive, exploratory data mining environments
7. Mining on other forms of data: spatio-temporal databases text multimedia web
A data mining research agenda
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 51
Scale up!
Scaling up existing algorithms (AI, ML, IR) Association rules Correlation rules Causal relationship Classification Clustering Bayesian networks
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 52
Background knowledge & constraints
Incorporating background knowledge and constraints into existing data mining techniques
Double benefit for DMQL: semantics and optimization! traditional algorithms
Disproportionate computational cost for selective users
Overwhelming volume of potentially useless results need user-controlled focus in mining process
Association rules containing certain itemsSequential patterns containing certain patternsClassification?
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 53
Vertical applications of data mining
More success stories needed! Current data mining systems lack a
thick semantic layer (similarly to the early relational database systems)
Verticalized data mining systems, e.g. Market analysis systems Fraud detection systems
Automated mining and interactive mining: how far are they?
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 54
Autofocus data mining
policy options, business rules
selection of data mining functionfine parameter tuning of mining
function
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 55
DBMS coupling
Tight-coupling with DBMS
Most data mining algorithms are based on flat file data (i.e. loose-coupling with DBMS)
A set of standard data mining operators (e.g. sampling operator)
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 56
Web mining – why?
No standards on the web, enormous blob of unstructured and heterogeneous info
Very dynamic One new WWW server every 2 hours
5 million documents in 1995
320 million documents in 1998
Indices get obsolete very quicklyBetter means needed for discovering
resources and extracting knowledge
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 57
Web mining: challenges
Today`s search engines are plagued by problems– the abundance problem:
99% of info of no interest to 99% of people!
– limited coverage of the Web– limited query interface based on
keyword-oriented search– limited customization to individual users
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 58
Web mining Web content mining
mining what Web search engines find Web document classification (Chakrabarti et al 99) warehousing a Meta-Web (Zaïane and Han 98) intelligent query answering in Web search
Web usage mining Web log mining: find access patterns and trends
(Zaiane et al 98) customized user tracking and adaptive sites
(Perkowitz et al 97)
Web structure mining discover authoritative pages: a page is important if
important pages point to it (Chakrabarti et al 99, Kleinberg 98)
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 59
Warehousing a Meta-Web (Zaïane & Han 98)
Meta-Web: summarizes the contents and structure of the Web, which evolves with the Web
Layer0: the Web itself
Layer1: the lowest layer of the Meta-Web an entry: a Web page summary, including class,
time, URL, contents, keywords, popularity, weight, links, etc.
Layer2 and up: summary/classification/clustering Meta-Web is warehoused and incrementally
updated Querying and mining is performed on or assisted
by meta-Web Is it feasible/sustainable? Is XML of any help?
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 60
Meta-Web from Jiawei Han’s panel talk @ SIGMOD99
Generalized Descriptions
More Generalized Descriptions
Layer0
Layer1
Layern
...
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 61
Weblog mining
Web servers register a log entry for every single access.
A huge number of accesses (hits) are registered and collected in an ever-growing web log.
Why warehousing/mining web logs?
Enhance server performance by learning access patterns of general or particular users (guess what user will ask next and pre-cache!)
Improve system design of web applications Identify potential prime advertisement locations
Greatest peril: the privacy pitfall See e.g. (Markoff 99) the rise of the Little Brother.
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 62
Some web mining references M. Perkowitz and O. Etzioni. Adaptive sites: Automatically learning from user access
patterns. In Proc. 6th Int. World Wide Web Conf., Santa Clara, California, April 1997. J. Pitkow. In search of reliable usage data on the www. In Proc. 6th Int. World Wide Web
Conf., Santa Clara, California, April 1997. T. Sullivan. Reading reader reaction : A proposal for inferential analysis of web server
log files. In Proc. 3rd Conf. Human Factors & the Web, Denver, Colorado, June 1997. O. R. Zaiane, M. Xin, and J. Han. Discovering Web access patterns and trends by
applying OLAP and data mining technology on Web logs. In Proc. Advances in Digital Libraries Conf. (ADL'98), pages 19-29, Santa Barbara, CA, April 1998.
O. R. Zaiane, and J. Han. Resource and knowledge discovery in global information systems: a preliminary design and experiment. In Proc. KDD’95, p.331-336, 1995.
O. R. Zaiane, and J. Han. WebML: querying the world-wide web for resources and knowledge. In Proc. Int. Workshop on Web informtion and Data management (WIDM98), p. 9-12, 1998.
S. Chakrabarti, B. E. Dom, S. R. Kumar, P. Raghavan, et al. Mining the web’s link structure. COMPUTER, 32:60-67, 1999.
S. Chakrabarti, B. E. Dom, P. Indik. Enhanced hypertext classification using hyperlinks. In Proc. 1998 ACM-SIGMOD, p. 307-318, 1999.
J. Kleinberg. Autohoritative sources in a hyperlinked environment. In Proc. ACM-SIAM Symp. on Discrete Algorithms, 1998.
J. Markoff. The Rise of Little Brother. Upside, Apr. 1999; http://www.upside.com/texis/mvm/story?id=36d4613c0
Konstanz, 27-28.3.2000 EDBT2000 tutorial - KDSE 63
Pisa KDD Lab references
F. Giannotti and G. Manco. Making Knowledge Extraction and Reasoning Closer. In Proc. PAKDD'99, The Fourth Pacific-Asia Conference on Knowledge Discovery and Data Mining, Kyoto, 2000.
F. Giannotti and G. Manco. Querying Inductive Databases via Logic-Based User Defined Aggregates. In Proc. PKDD'99, The Third Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases. Prague, Sept. 1999.
F. Bonchi, F. Giannotti, G. Mainetto, D. Pedreschi. Using Data Mining Techniques in Fiscal Fraud Detection. In Proc. DaWak'99, First Int. Conf. on Data Warehousing and Knowledge Discovery. Florence, Italy, Sept. 1999.
F. Bonchi , F. Giannotti, G. Mainetto, D. Pedreschi. A Classification-based Methodology for Planning Audit Strategies in Fraud Detection. In Proc. KDD-99, ACM-SIGKDD Int. Conf. on Knowledge Discovery & Data Mining, San Diego (CA), August 1999.
F. Giannotti, G. Manco, D. Pedreschi and F. Turini. Experiences with a logic-based knowledge discovery support environment. In Proc. 1999 ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery (SIGMOD'99 DMKD). Philadelphia, May 1999.
F. Giannotti, M. Nanni, G. Manco, D. Pedreschi and F. Turini. Integration of Deduction and Induction for Mining Supermarket Sales Data. In Proc. PADD'99, Practical Application of Data Discovery, Int. Conference, London, April 1999.
F. Giannotti, G. Manco, M. Nanni, D. Pedreschi. Nondeterministic, Nonmonotonic Logic Databases. IEEE Trans. on Knowledge and Data Engineering. 2000.
F. Giannotti, M. Nanni, G. Manco, D. Pedreschi and F. Turini. Using deduction for intelligent data analysis. Submitted, 2000. http://www-kdd.di.unipi.it/
P. Becuzzi, M. Coppola, S. Ruggieri and M. Vanneschi. Parallelisation of C4.5 as a particular divide and conquer computation. Proc.3rd Workshop on High Performance Data Mining, Springer-Verlag LNCS, 2000.