+ All Categories
Home > Documents > A Practical Approach To Data Mining Presentation

A Practical Approach To Data Mining Presentation

Date post: 08-Jun-2015
Category:
Upload: millerca2
View: 1,244 times
Download: 4 times
Share this document with a friend
Description:
Presented at Project World and World Congress for Business Analysts in ANaheim, Ca. November 2009
Popular Tags:
36
A Practical Approach to Data Mining While Maintaining System Performance, Security and Privacy Chuck Miller - PMP, SSBB Project Manager Prescription Solutions
Transcript
Page 1: A Practical Approach To Data Mining Presentation

A Practical Approach to Data Mining While Maintaining System Performance, Security

and Privacy

Chuck Miller - PMP, SSBB

Project Manager

Prescription Solutions

Page 2: A Practical Approach To Data Mining Presentation

2

Agenda

Introduction What is Data Mining? Data Mining Tools Common Uses Meaningful Data Roadblocks System Performance Stability Security Privacy and Ethics Knowledge Exercise Resources

Page 3: A Practical Approach To Data Mining Presentation

3

Introduction

In today’s world security and privacy have become very large concerns, especially when it involves data. Those concerns along with system performance can greatly impact your ability to gather meaningful data.

Whether you are an analyst just starting out or a seasoned veteran looking for a refresher the following information will help you gain a clearer understanding of what data mining can do and provide you with the keys to unlocking your analytical potential.

Page 4: A Practical Approach To Data Mining Presentation

4

What Data Mining is…

In its simplest form data mining is the process of extracting hidden patterns from data.

Data Mining is considered to be proactive as it allows you to utilize historical information to predict future trends.

Page 5: A Practical Approach To Data Mining Presentation

5

What Data Mining is… (cont.)

Data mining commonly involves four classes of tasks or techniques:

Classification - Arranges the data into predefined groups. For example an email program might attempt to classify an email as legitimate or spam. Common algorithms include Nearest neighbor, Naive Bayes classifier and Neural network.

Clustering - Is like classification but the groups are not predefined, so the algorithm will try to group similar items together.

Page 6: A Practical Approach To Data Mining Presentation

6

What Data Mining is… (cont.)

Regression - Attempts to find a function which models the data with the least errors.

Association rule learning - Searches for relationships between variables. For example a supermarket might gather data of what each customer buys. Using association rule learning, the supermarket can determine what products are frequently bought together, which is useful for marketing purposes. This is sometimes referred to as "market basket analysis". Very commonly used today online to track your purchases and suggest other items you may be interested in. Amazon.com is a prime example.

Page 7: A Practical Approach To Data Mining Presentation

7

What Data Mining is Not…

The following terms are often referred to as data mining but are actually data mining tools.

Data Warehousing SQL/AD-Hoc Queries Reporting Data Visualization/Dashboards Online Analytical Processing (OLAP)

Page 8: A Practical Approach To Data Mining Presentation

8

Data Mining Tools

Most data mining tools can be classified into one of three categories:

Traditional data mining tools

Dashboards

Text-mining tools

Page 9: A Practical Approach To Data Mining Presentation

9

Data Mining Tools - Traditional

Traditional data mining programs help you establish data patterns and trends by using a number of complex algorithms and techniques as outlined in the previous slides. These tools come in a myriad of programs and outputs. They are generally available for any operating system. In addition, while some may concentrate on one database type, most will be able to handle any data using OLAP or a similar technology.

Page 10: A Practical Approach To Data Mining Presentation

10

Data Mining Tools - Dashboards

Dashboards are installed in computers to monitor information in a database and reflect data changes and updates onscreen. These are very popular as the graphical representation makes it very easy to spot trends.

Page 11: A Practical Approach To Data Mining Presentation

11

Data Mining Tools - Text

The third type of data mining tool sometimes is called a text-mining tool because of its ability to mine data from different kinds of text. Microsoft Word, Acrobat PDF and simple text files are just a few types. These tools scan content and convert the selected data into a format that is compatible with the tool's database, thus providing users with an easy and convenient way of accessing data without the need to open different applications.

These are useful as scanned content can be unstructured (i.e., information is scattered almost randomly across the document, including e-mails, Internet pages, audio and video data) or structured (i.e., the data's form and purpose is known, such as content found in a database).

Page 12: A Practical Approach To Data Mining Presentation

12

Common Uses of Data Mining…

Market Basket Analysis - Customers are very likely to purchase shampoo and conditioner together, so a retailer would not put both items on promotion at the same time. The promotion of one would likely drive sales of the other.

Direct mail marketing - A company determines based on data mining, who is likely to be interested in a particular product or promotion. They then use that information to send mail or email to that targeted audience. This gives a much higher return on investment.

Page 13: A Practical Approach To Data Mining Presentation

13

Common Uses of Data Mining… (cont.)

Credit card fraud detection - Have you ever received a phone call from your credit card company after making a purchase asking if it was you who made the purchase? This is because based on your purchasing trends which were modeled using data mining, you bought something that was outside your model.

Bioinformatics - Mapping the human genome and creating modeling sequences.

Page 14: A Practical Approach To Data Mining Presentation

14

Example - Manufacturing

Page 15: A Practical Approach To Data Mining Presentation

15

Example - Bioinformatics

Page 16: A Practical Approach To Data Mining Presentation

16

Example – Customer Service

Page 17: A Practical Approach To Data Mining Presentation

17

What is Meaningful Data?

Meaningful or useful data is data that you are relatively certain contains the information which you are mining. Mined data must still be interpreted for relevancy.

To ensure the data is meaningful you need to create validation rules. Data validation can run from the simplest; verifying

characters come from a valid data set. To the complex; automated programs that check data against detailed specific criteria.

Page 18: A Practical Approach To Data Mining Presentation

18

What is Meaningful Data? (cont.)

There are 2 terms commonly used with data types, positive and negative.

Positive Data – Is the most common and is used as discussed previously for forecasting or predicting future results and behavior.

Page 19: A Practical Approach To Data Mining Presentation

19

What is Meaningful Data? (cont.)

Negative Data - Are anomalies or discrete events that can skew your results.

For example, a one time promotion of a product occurred and will never happen again. Including this item in your model will throw it off because if not for that promotion, your customer would never have purchased it.

Page 20: A Practical Approach To Data Mining Presentation

20

How to Navigate Around Data Access Roadblocks

Hi-level buy in

It is always helpful to have executive support when access to data is required.

Look to tie your need in with a corporate

initiative.

Page 21: A Practical Approach To Data Mining Presentation

21

How to Navigate Around Data Access Roadblocks (cont.)

Explain what data mining is.

Site some examples from companies similar to yours and the results they have produced with data mining.

ABC Co. has increased sales in this segment quarter to quarter by implementing sales suggestions on their website.

Page 22: A Practical Approach To Data Mining Presentation

22

How to Navigate Around Data Access Roadblocks (cont.)

Take a small data set and prove what the benefits are.

This is usually the most difficult because you need access to enough meaningful data to create a model.

Page 23: A Practical Approach To Data Mining Presentation

23

How to Navigate Around Data Access Roadblocks (cont.)

Suggest limited, timed access to the data.

It is always better to have the most current data, but if you are mining against monthly results you may only need access once the monthly cycle is complete. You have a window to activate the data mining and then you have your data within your chosen tool and can utilize it.

Page 24: A Practical Approach To Data Mining Presentation

24

Data Mining vs. System Performance

Never conduct data mining during peak operating hours.

Conduct a sample run on a smaller subset of data to check run time and performance degradation.

Conduct data mining on a backup copy or read only version of the databases.

Page 25: A Practical Approach To Data Mining Presentation

25

How to Maintain Data Stability

Limit access to data.

Always backup your data.

It is preferable to use a backup or read-only copy of data.

Page 26: A Practical Approach To Data Mining Presentation

26

Security - Internal

Do the people running the data mining have clearance to access the systems?

Do the people reviewing that data have clearance or the need to know that information?

By mining multiple databases does the assembled information violate any security policies?

Page 27: A Practical Approach To Data Mining Presentation

27

Security - External

Limit access to data.

Eliminate external references.

Summarize the data.

Data masking.

Page 28: A Practical Approach To Data Mining Presentation

28

Privacy Concerns and Ethics

Some people believe that data mining alone is ethically neutral. However, the ways in which data mining can be used can raise questions regarding privacy, legality, and ethics. In particular, data mining government or commercial data sets for national security or law enforcement purposes but this is also applicable to every company no matter how large or small, that collects customer information.

Page 29: A Practical Approach To Data Mining Presentation

29

Privacy Concerns and Ethics (cont.)

Data mining requires data preparation which can uncover information or patterns which may compromise confidentiality and privacy obligations. A common way for this to occur is through data aggregation.

Data aggregation is when the data is accrued, possibly from various sources, and put together so that it can be analyzed. This is not data mining per se, but a result of the preparation of data before and for the purposes of the analysis. The threat to an individual's privacy comes into play when the data, once compiled, causes the data miner, or anyone who has access to the newly-compiled data set, to be able to identify specific individuals, especially when originally the data was anonymous.

Page 30: A Practical Approach To Data Mining Presentation

30

Privacy Concerns and Ethics (cont.)

Data Aggregation example Amazon displays items frequently purchased together on

their website. This information alone is okay. However to get this information you have to scan all customer records from multiple vendors that use your website to sell products and combine them into a working model. If while compiling this information it includes the individual customers who purchased it there is then a privacy issue.

Page 31: A Practical Approach To Data Mining Presentation

31

Privacy Concerns and Ethics (cont.)

It is recommended that an individual is made aware of the following before data is collected. The purpose of the data collection and any data mining

projects. How the data will be used. Who will be able to mine the data and use it. The security surrounding access to the data. How collected data can be updated. One may additionally modify the data so that it is

anonymous, so that individuals may not be readily identified.

Page 32: A Practical Approach To Data Mining Presentation

32

Privacy Concerns and Ethics (cont.)

Does this violate privacy, ethics, both or neither?

Page 33: A Practical Approach To Data Mining Presentation

33

Suggested Reading

Competing on Analytics by Tom Davenport and Jeanne Harris (Hardcover / 2007) This book focuses on the challenges of getting an

organization to change its approach to problem solving, by increasing the use of analytics across a business.

Data Mining Explained by Delmater and Hancock (Paperback / 2001) Many of the data mining books focus on the technology

rather than the impact on the business process.  This book provides a good introduction to data mining as well as a good  discussion of the business impact that can be felt throughout an organization. 

Page 34: A Practical Approach To Data Mining Presentation

34

Resources

Web: www.thearling.com www.kdnuggets.com

Print: Basic Statistics – Tools for Continuous Improvement by

Mark J. Kiemele (Hardcover / 1997)

Page 35: A Practical Approach To Data Mining Presentation

35

Questions

Page 36: A Practical Approach To Data Mining Presentation

36

Thank You

If you have any follow up questions, I can be contacted at

[email protected]


Recommended