Bayesian Networks and Association Analysis

A N O V E R V I E W O F L I T E R AT U R E A N D P R O B L E M S T AT E M E N T S A R O U N D S E N S I T I V I T Y A N D I N T E R E S T I N G N E S S I N B E L I E F N E T W O R K S

A D N A N M A S O O DS C I S . N O VA . E D U / ~ A D N A N

A D N A N @ N O VA . E D U

D O C T O R A L C A N D I D AT E

N O VA S O U T H E A S T E R N U N I V E R S I T Y

Bayesian Networks and Association Analysis

Preliminaries

Data mining is a statistical process to extract useful information, unknown patterns and interesting relationships in large databases. In this process, many statistical methods are used. Two of these methods are Bayesian networks and association analysis.

Bayesian networks are probabilistic graphical models that encode relationships among a set of random variables in a database. Since they have both causal and probabilistic aspects, data information and expert knowledge can easily be combined by them. Bayesian networks can also represent knowledge about uncertain domain and make strong inferences.

Association analysis is a useful technique to detect hidden associations and rules in large databases, and it extracts previously unknown and surprising patterns from already known information. A drawback of association analysis is that many patterns are generated even if the data set is very small. Hence, suitable interestingness measures must be performed to eliminate uninteresting patterns.

Reference: D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 52

Bayesian Networks

Bayesian networks are Directed Acyclic Graphs (DAGs) that encode probabilistic relationships between random variables. They provide to model joint probability distribution of a set of random variables efficiently and to make some computations from this model.

Bayesian Networks have have emerged as an important method which adds uncertain expert knowledge to the system

Bayesian networks provides support for inference and learning.

Bayesian networks can be learned directly from data, they can also be learned from expert opinion.

The results obtained from association analysis can be used to learn and update Bayesian networks.


Bayesian Networks (cont)

Association Analysis

Association analysis helps examine items that are seen together frequently in data set and to reveal patterns that help decision making. These patterns are represented as “association rules” or “frequent itemsets” in association analysis. In association analysis obtaining the patterns is complicated and time consuming when

data set is large. patterns found by association analysis can be deceptive since

some relationships may arise by chance. A problem encountered in association analysis is that a great

number of patterns are generated even if data set is small, so millions of patterns can be obtained when data set is large.

Therefore, patterns obtained by association analysis should be evaluated according to their interestingness levels and the patterns which are found uninteresting according to these measures should be eliminated.


Measuring Interestingness

Interestingness levels are measured by interestingness measures. These measures are categorized into “objective interestingness measures” and “subjective interestingness measures”. Objective measures are based on data and structure of pattern, subjective measures are also based on expert knowledge in addition to data and structure of the pattern.

Subjective interestingness measures are generally specified through belief systems. Since Bayesian networks are belief systems, these measurescan be specified over them


Interestingness Measures for Association Patterns (Tan et al, 2002)

Mutual collaboration between Bayesian networks and Association analysis

Bayesian networks and association analysis can be used together in knowledge discovery. While Bayesian networks are used to generate an objective or subjective measure, interesting patterns obtained via association analysis are used to learn Bayesian networks.

Bayesian networks encode the expert belief systems and they can be used to create a subjective and objective measure. Therefore, the most different patterns from the past information (belief) indicated by a Bayesian network are considered interesting. There are some methods suggested in the literature to determine interesting patterns by using Bayesian networks. Two of these methods were suggested by Jaroszewicz and Simovici and Malhas and Aghbari.


Interestingness according to Jaroszewicz & Simovici

Jaroszewicz and Simovici define interestingness of an itemset as absolute difference between its supports estimated from data and Bayesian network. If this difference for an itemset is bigger than a given threshold, this itemset is considered interesting. In this method, interesting itemsets are determined instead of association rules.

Direction of rules is specified according to user’s experience.

Interestingness according to Malhas & Aghbari

Another subjective interestingness measure generated using Bayesian networks were suggested by Malhas and Aghbari. This measure is the sensitivity of the Bayesian network to the patterns discovered and it is obtained by assessing the uncertainty-increasing potential of a pattern on the beliefs of Bayesian network. The patterns having the highest sensitivity value is considered the most interesting patterns. In this approach, mutual information is a measure of uncertainty. Sensitivity of a pattern is the sum of the mutual information increases when a pattern enters as an evidence/finding to the Bayesian network.

A Belief-Driven Method for Discovering Unexpected Patterns

Tuzhilin and Padmanabhan (AAAI)

Abstract Several pattern discovery methods proposed in the data mining

literature have the drawbacks that they discover too many obvious or irrelevant patterns and that they do not leverage to a full extent valuable prior domain knowledge that decision makers have. (Tuzhilin and Padmanabhan) proposed a new method of discovery that addresses these drawbacks.

In particular they propose a new method of discovering unexpected patterns that takes into consideration prior background knowledge of decision makers. This prior knowledge constitutes a set of expectations or beliefs about the problem domain.

Tuzhilin and Padmanabhan’s proposed method of discovering unexpected patterns uses these beliefs to seed the search for patterns in data that contradict the beliefs. To evaluate the practicality of our approach, authors applied our algorithm to consumer purchase data from a major market research company and to web logfile data tracked at an academic Web site.

Fast Discovery of Unexpected Patterns in Data, Relative to a Bayesian Network (Jaroszewicz &

Scheffer)

Jaroszewicz and Scheffer considered a model in which background knowledge on a given domain of interest is available in terms of a Bayesian network, in addition to a large database. The mining problem is to discover unexpected patterns: goal is to find the strongest discrepancies between network and database.

This problem is intrinsically difficult because it requires inference in a Bayesian network and processing the entire, potentially very large, database.

A sampling-based method that we introduce is efficient and yet provably finds the approximately most interesting unexpected patterns. Jaroszewicz & Scheffer give a rigorous proof of the method’s correctness. Experiments shed light on its efficiency and practicality for large-scale Bayesian networks and databases.

Scalable pattern mining with Bayesian networks as background knowledge (Szymon Jaroszewicz , Tobias Scheffer & Dan A.

Simovici)

Abstract Authors study a discovery framework in which background knowledge on

variables and their relations within a discourse area is available in the form of a graphical model.

Starting from an initial, hand-crafted or possibly empty graphical model, the network evolves in an interactive process of discovery, researchers focus on the central step of this process:

Given a graphical model and a database, authors address the problem of finding the most interesting attribute sets. Authors formalize the concept of interestingness of attribute sets as the divergence between their behavior as observed in the data, and the behavior that can be explained given the current model.

Jaroszewicz et al derive an exact algorithm that finds all attribute sets whose interestingness exceeds a given threshold. Authors then consider the case of a very large network that renders exact inference unfeasible, and a very large database or data stream. They devise an algorithm that efficiently finds the most interesting attribute sets with prescribed approximation bound and confidence probability, even for very large networks and infinite streams.

Fast Discovery Of Interesting Patterns Based On Bayesian Network Background Knowledge by Rana Malhas & Zaher Al Aghbari

Abstract

The main problem faced by all association rule/pattern mining algorithms is their production of a large number of rules which incur a secondary mining problem; namely, mining interesting association rules/patterns. The problem is compounded by the fact that ‘common knowledge’ discovered rules are not interesting, but they are usually strong rules with high support and confidence levels- the classical measures.

In their research paper, authors presented a fast algorithm for discovering interesting (unexpected) patterns based on background knowledge, represented by a Bayesian network. A pattern/rule is unexpected if it is ‘surprising’ to the user. The algorithm profiles a pattern as interesting (unexpected), if the absolute difference between its support estimated from the dataset and the Bayesian network exceeds a user specified threshold (ε). Itemsets with the highest diverging supports are considered the most interesting.

Measuring the Interestingness

The most interesting variable set according to Jaroszewicz and Simovici (2004) approach using Bayesian network is based on the absolute difference between this set’s support values obtained from data and Bayesian network.

This interestingness value is based on the discrepancy between Bayesian network and data. By using different Bayesian networks depending on expert knowledge, more interesting variable sets may be obtained.

Bayesian network can be updated again using the most interesting variable sets, and this updated network can be used again to find interesting patterns. As these steps repeat, the interestingness values of variable sets are reduced.

This is because Bayesian network adapts data well and discrepancy between Bayesian network and data decreases.


Using Interesting Patterns to Learn Bayesian Networks

Learning structure and parameters in Bayesian networks is an important problem in literature. Expert knowledge is generally used to solve this problem. However, it is not always possible to reach the appropriate expert opinion. In current applications, data set is exploited to learn Bayesian network because of the lack of expert opinion. Learning problem in Bayesian networks are separated into “parameter learning” and “structure learning”.


Bayesian Network and Expert Opinion

Bayesian networks are generally created according to expert opinion about the problem and the data. Achieving expert opinion is generally difficult and costly. If expert opinion about interested data can not be reached, knowledge obtained from association analysis results can be used to create a Bayesian network. In addition, if expert opinion does not exist but a Bayesian network is created according to non-expert opinion, this Bayesian network can be updated according to association analysis results. Hence, a suitable Bayesian network can be created without the need for expert opinion.

Expert Knowledge and Belief Network Learning

If there is no expert knowledge about the structure of Bayesian network, the data is used to learn the DAG structure that best describes the data. In principle, in order to find the best DAG structure for the variable set V, all possible DAG representations for V should be established and compared.


Research Problem – Interestingness for Rare Beliefs

As seen in earlier examples from the the literature, the properties of several interestingness measures have been analyzed and several frameworks has been proposed for selecting a right interestingness measure for extracting association rules.

However, for rare beliefs, anomalies and outliers, which contain useful knowledge, researchers are making efforts to investigate efficient approaches to extract the same.

The research problem is to analyze the properties of interestingness measures for determining the interestingness of outliers and rare beliefs.

Based on the analysis, researchers can suggest a set of measures and properties an expert should consider while selecting a measure to find the interestingness of rare associations.

Conclusion

Association analysis and Bayesian networks are two methods which are used to accomplish different goals in data mining. Whereas the aim of association analysis is to obtain interesting patterns in a data set, the aim of Bayesian networks is to calculate local probability distributions of the variables by modelling causal relationships between variables.

Output of one of these two methods can be used as an input to another method. Interesting patterns determined by association analysis is exploited in learning and updating Bayesian networks. Also, Bayesian networks are exploited to create interestingness measures used in association analysis.


Conclusion

In association analysis, objective interestingness measures are generally used to determine interesting patterns. Interestingness is the incompatibility degree of the pattern to the prior knowledge of the researcher. Objective interestingness measures do not fully comply with this meaning of interestingness.

These measures identify patterns frequently seen in data set rather than interesting patterns. However, subjective interestingness measures comply with the meaning of interestingness rather than objective measures. Subjective interestingness measures are defined over expert systems.

Bayesian networks are expert systems and they may be used to define a subjective measure. The most different patterns from the knowledge represented by Bayesian networks are specified as the most interesting patterns.


Conclusion (cont)

Using together these two data mining techniques provides to add prior information to knowledge discovery process.

Even if this prior information is not received from an expert, suitable results can still be obtained by supporting this non-expert information with data. Hence, there is no need to complicated and time consuming algorithms to learn Bayesian networks and to create interestingness measures, and more suitable results to real world are reached.


References and Bibliography

Ersel, D., & Günay, S. Bayesian Networks and Association Analysis in Knowledge Discovery Process. D. Ersel, S. Günay / İstatistikçiler Dergisi 5 (2012) 51-64 64

I. Ben-Gal, 2007, Bayesian Networks,Encyclopedia of Statistics in Quality &Reliability,F. Ruggeri, F.Faltin, R. Kenett, R. (eds), Wiley & Sons.

M.W. Berry,M.Browne, 2006,Lecture Notes in Data Mining, World Scientific Publishing, Singapore, 222p.

Y. Dong-Peng, L. Jin-Lin, 2008, Research on personal credit evaluation model based on bayesian network and association rules, Knowledge Discovery and Data Mining, 2008. WKDD 2008. First International Workshop on , 457-460.

D. Heckerman, 1995, Bayesian networks for data mining, Data Mining and Knowledge Discovery1, 79-119.

S. Jaroszewicz, D.A. Simovici, 2004, Interestingness of frequent itemsets using Bayesian networks as background knowledge, Proceedings of the 10th ACM SIGKDD Conference on Knowledge Dicovery and Data Mining, August 20-25, 2004, New York, USA, 178-186.

F.V. Jensen, 2001, Bayesian Networks and Decision Graphs, Springer-Verlag, New York, 268p. R. Malhas, Z. Aghbari, 2007, Fast discovery of interesting patterns based on Bayesian network

background knowledge, University of Sharjah Journal of Pure and Applied Science, 4 (3). K. Murphy, 1998, A brief introduction to graphical models and Bayesian networks,

http://www.cs.ubc.ca/~murphyk/Bayes/bnintro.html#infer. A. Siberschatz, A. Tuzhilin, A., 1995, On subjective measures of interestingness in knowledge

discovery, Proceedings of the 1st ACM SIGKDD International Conference on Knowledge Discovery and Data

Mining, August 20-21, 1995, Montreal, Canada, 275-281. P. Tan, V. Kumar, J. Srivastava, 2002, Selecting the right interestingness measure for association

patterns, Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, July 23-26, 2002, Edmonton, Alberta, Canada, 32-41.

P.Tan, M. Steinbach, V. Kumar, 2006, Introduction to Data Mining, Addison-Wesley, Boston,769p.

Date post:	12-May-2015
Category:	Technology
Upload:	adnanmasood
View:	775 times
Download:	4 times

Bayesian Networks and Association Analysis

Technology