+ All Categories
Home > Documents > Final report

Final report

Date post: 12-Sep-2014
Category:
View: 906 times
Download: 0 times
Share this document with a friend
Description:
 
Popular Tags:
32
MIS 531A Data Structure & Algorithm Report of Group Web Mining Project WEB MINING PROJECT OF 531A SUMMERY REPORT Team member: Yi-Jen Ho Yi-Ling Lin Howard R. Zhao Ying Ding Instructor: Prof. Hsinchun Chen 12/16/2005 1
Transcript
Page 1: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

WEB MINING PROJECT OF 531A

SUMMERY REPORT

Team member: Yi-Jen Ho

Yi-Ling LinHoward R. Zhao

Ying Ding

Instructor: Prof. Hsinchun Chen

12/16/2005

INDEX

1. Introduction................................................................................32. Research Questions....................................................................4

2.1 Friendly User Interface..........................................................................................4

2.2 Web-service APIs...................................................................................................5

2.3 Database and Data Mining Algorithm...................................................................5

3. Literature Review......................................................................64. Research Design.........................................................................7

4.1 Functionality..........................................................................................................7

4.2 Architecture............................................................................................................7

5. Findings.......................................................................................85.1 Database Schema / Data Mining Algorithm..........................................................8

1

Page 2: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

5.1.1 Database Design.........................................................................................9

5.1.2 Schema of Data Tables................................................................................9

5.1.3 Data Mining Algorithm.............................................................................11

5.2 APIs......................................................................................................................11

5.3 User Interface.......................................................................................................12

5.4 Sample Scenarios.................................................................................................13

6. System Novelty.........................................................................216.1 Data Mining Algorithm........................................................................................21

6.2 APIs......................................................................................................................21

6.3 User Interface.......................................................................................................21

6.4 Compare with others............................................................................................22

7. Conclusion / Future Directions...............................................22Reference.......................................................................................25

1. Introduction

Standing on the Giant’s shoulder, we can always look further and more widely. Our team,

encouraged by the promising access huge, open data warehouses of Amazon, eBay, and

Google, , came to a creative e-business idea: provide a platform for users to gather

specifically useful information for their special products to sell or buy.

In order to make our dream come true, we did our all efforts to establish a powerful

website to fulfill the potential requirements of our target users and named it “Wishsky”.

We expect our product, Wishsky, could provide complete production information, such as

retail prices on Amazon, auction prices and seller details on eBay, and hot news of certain

items. In the future, Wishsky will become an all purpose community not only to serve

general Internet users but also to provide bring manufacturers and other businesses

important information.

According to above brainchild, we consider following five perspectives to construct

the business model of Wishsky: target customers, services provided, potential business

2

Page 3: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

partners, information resources and financials. Our target customers are users who want

to compare prices before buying, who prefer buying products online, and who prefer

applications which emphasis community interaction. Moreover, we provide services

including product information gathered from eBay and Amazon, Google searches related

to customers’ wish items and the consuming trend information.

As to potential business partners, we expect to collaborate with e-market places such

as Yahoo! Auction, and online retailers as well as Walmart.com. In addition, we will get

information resources from e-Bay, Amazon, Google, and our business partners. The

financial opportunities in the platform would be positioned on different business values.

Basically, in the short term, we may attract some advertisements and our statistics

information to sustain the operation overhead of Wishsky.

Besides, we are going to collect information from the “Wish-Lists” on Amazon and

from our own users. After making some statistical analysis, we will provide precise

potential market information for sellers, like the catalog of the most popular products, the

acceptable price range, the geographical distribution of buyers, etc. Furthermore, we will

serve the buyers by mining meaningful information for them to get the products with the

cheapest price and best services.

Based on mentioned business model, Wishsky is implemented by accessing API

(Application Programming Interface) of Amazon, eBay, and Google, by presenting a user

friendly environment, by constructing complete supporting database, and by adopting

efficient data mining algorithm. It is a critical success factor for Wishsky to combine

above components closely and effectively so we have to overcome many research

questions which are detailed as following.

3

Page 4: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

2. Research Questions

After overall business model of Wishsky is initiated, some important research challenges

should be overcome. These challenges could be categorized in three topics, friendly user

interface, web-service APIs, and data mining algorithm.

2.1 Friendly User Interface

Because Wishsky is design for supplicated Internet users as well as aged and young users,

it is very important for Wishsky to build up a friendly environment to serve multi-aged

users. Therefore, there are three major issues of user interface.

First, the operation processes of Wishsky should be very easy to use, and there

should be some clear instruction to guide users with diverse information literacy to

perform powerful functionality of Wishsky. Second, interesting and lovely interface is

critical for Wishsky to attractive more traffic. Last, well-design interface should combine

and present all functionality in Wishsky.

Therefore, in Wishsky, lots of easily understandable pictures and figures will be used

to teach first time users how to operate functions so users do not spend much time to do

“try-and-error”. Meanwhile, lively interface will promote the reputation of Wishsky

significantly.

2.2 Web-service APIs

According to initial plans, there are three APIs which are implemented in Wishsky, and

these APIs come from Amazon, Google, and eBay. Basically, APIs of each web-service

4

Page 5: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

provider need to be tested because there could be some conflictions between APIs and

system architecture.

Moreover, the combination of different APIs provided by different web-service

providers should be paid more attention. For instance, how to integrate different data

formats of heterogeneous APIs is critical technical issue to solve.

In order to achieve the planned functions, some Amazon APIs, such as customers’

wish-list retrieval and item search, are implemented in Wishsky. Furthermore, available

auction search of eBay API is used to provide users valuable auction details. As to

Google’s APIs, new search is important to be adopted to search hot news related to

products.

2.3 Database and Data Mining Algorithm

In order to provide effective recommendation mechanism of related product for users, it

is important to construct correct database schema and to apply proper and efficient data

mining algorithm. Fortunately, there are tons of well-designed DBMS and data mining

related applications/program packages implemented easily by doing some revisions. As a

result, significant are how to choose proper DMBS, how to design effective database

schema, how to decide which type of data mining algorithms form 4 major types,

predictive modeling, database segmentation, link analysis, and deviation detection, and

how to choose proper algorithm package to implement into Wishsky.

In addition, time-complexity of algorithm is another major issue for data mining

implementation. Imagine that it is time-consuming to execute recommendation algorithm

if the volume of data is extremely high so adopting real-time process or back-end process

significant affects overall performance of Wishsky.

5

Page 6: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

3. Literature Review

How to expand the market is always one of the most concerned issues in business. The

initiative of procuring customers’ wishes has been proved to be one of the most efficient

ways to understand what the market needs, and this has been widely adopted in business.

Based on the customers’ wish-lists, business distributes to them their product information

via various channels, like email, personalized dynamic web content.

This strategy, on one hand, provides business a useful channel for advertising; on the

other hand, customers get informed about those products they are interested in. Tons of

businesses have integrated this method in their marketing strategies, like www. Half.com.

However, many businesses are not satisfied with this simple application of information

from customers’ wish-lists, they move further to dig out more value in it. Various

recommendation algorithms, such as association rules and clustering algorithm, are

designed to figure out the items customers may possibly be interested in according to

their shopping history or the items which already exist in their wish-lists so business may

develop a wider market. The logistics behind recommendation rules are different from

one another, which may come from market analysis, or studies in social science or

psychology, yet they are all created to expand the sellers’ market. The typical

representative and also one of the pioneers to deploy this strategy is Amazon.

No matter how innovative techniques the aforementioned businesses apply to make

full use of customers’ wish-lists, they share one thing in common - they are trying to sell

what they want to sell. However, our project set out to provide a quite different kind of

service to customers who create wish-lists in our website, where they can really enjoy the

pleasure of being served like God.

6

Page 7: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

4. Research Design

4.1 Functionality

The main function of the Wishsky application is to allow users to add and manage

Amazon items in a feature-rich wish-list based setting. There are two ways users can add

items to their wish-list. The first way is to search for a wishlist on Amazon using the

email address. The second way is to search for items by item type (i.e. Book, DVD, etc)

and keywords. In both ways, a list of items will be displayed with thumbnail image (if

available), title, and listing price. There’s also a link to the item on the Amazon webpage,

so users can see more details about the item. To add an item to the Wishsky wish-list,

select the checkbox next to it and click the “Add Items to this Wishlist” button. A user

can also delete items from his or her wish-list by clicking the “Delete” button next to the

item.

The next major function of the application is the integration with EBay and Google.

Once a user is done managing the wish-list, the next webpage shows the items on your

wish-list. Above it is a list of news items from Google related to a wish-list item.

Clicking in a wish-list item displays a popup with auction items searched from EBay.

The bottom part of the page displays recommended items. Recommendations are figured

out three ways: by Association rule, popular items from a user’s friends list, and most

popular items from all users.

4.2 Architecture

The above diagram displays the basic architecture of the Wishsky application. Wishsky

is a Java application, and uses a variety of java libraries to interact with its various

7

Page 8: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

components. Users access the application through a web browser (i.e. Firefox, Internet

Explorer), and the web browser uses HTTP requests to interface with our application,

which is hosted in a Tomcat Web container. Using JDBC, Wishsky interfaces with a SQL

Server database. The database is used for saving customer information, and wish-list

items. It is also used for data mining. Finally, Wishsky also connects to EBay, Google

and Amazon via Web Services. Using SOAP calls, the application is able to dynamically

request and receive information from each site.

5. Findings

After doing our all efforts to construct basic architecture of Wishsky and link all

important components together, we are pleasant to detail what we found during the

process of implementing Wishsky.

8

Page 9: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

5.1 Database Schema / Data Mining

Algorithm

In order to achieve effective and efficient operations within functionality of

Wishsky, there is a need to establish a supporting database to record users’ data or to store

product related information which retrieved from Amazon, Google, or eBay. Moreover,

based on the consideration of efficiency, the back-end recommendation of Wishsky

implemented by association rules also needs database to record rules which are generated

by Apriori algorithm. Hence, there are two groups of data tables which are detailed as

following.

5.1.1 Database Design

The first group of data tables is used to store users related data, such as their basic

profiles, their wish-list information, items in users’ wish-lists, and theirs friend lists. The

second group of data tables is used to record association rules which are generated by

back-end data mining algorithm everyday. As following figure shows, the relationships

between data tables present how Wishsky record users’ related information and data

mining results.

9

Customer

Friend

Wishlist Item

Association

Page 10: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

5.1.2 Schema of Data Tables

Table Name: Customer

PK C_id int 4

C_account char 20

C_pwd char 20

C_amazon_email nvarchar 50

C_amazon_listid char 10 Null

C_ebay_id nvarchar 50 Null

C_ebay_pwd nvarchar 20 Null

C_first_name nvarchar 20

C_last_name nvarchar 20

C_gender char 1

C_marry char 1

C_birth_year int 4

C_birth_month int 4

C_birth_day int 4

C_profession nvarchar 15

C_income char 6

C_state char 2

C_city nvarchar 15

Table Name: Wishlist

PK W_id int 4

FK C_id int 4

W_name char 20

Table Name: Item

PK I_id bigint 8

FK W_id int 4

I_amazon_id nvarchar 50

I_name nvarchar 50 Null

10

Page 11: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

I_quantity int 4 Null

I_priority int 4

I_lowestprice decimal 9

I_set_lowprice decimal 9 Null

I_set_highprice decimal 9 Null

Table Name: Friend

PK F_no bigint 8

C_Account_one char 20

FK C_Account_two_id int 4

C_Account_two char 20

FK C_Account_two_email nvarchar 50

C_Account_two_W_id int 4

F_Nickname nvarchar 100 Null

Table Name: Association

PK A_no bigint 8 0

FK I_one char 20 0

FK I_two char 20 0

A_confidential decimal 9 Null

5.1.3 Data Mining Algorithm

Wishsky is very friendly because it provides recommended production information,

including product picture, title, and retailer price on Amazon, to users so they can notice

what they need but forget. In order to recommend proper products for users, there are

three recommendation mechanisms in Wishsky, and they are “Associated Items”,

“Friend’s Wish-Items”, and “Hot Product”.

First, “Associated Items” is implemented by overwriting Apriori algorithm in

WEKA software package. Like what is mentioned before, we decide to do data mining

process with back-end method because of efficiency consideration, and the back-end

method was shown in following picture. Apriori is commonly used to figure out

association between items within different transaction records so we adapted it to do first

type of recommendation.

11

Page 12: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

Second, we implement “Friend’s Wish-Items” by calculating hot products between

your friends’ wish-lists because we assume that birds of a feather flock together. As a

result, Wishsky recommends what your friends like to you. Last one is hot product

recommendation. By intuition, the most popular items are recommended to users.

5.2 APIs

When these three major Internet companies (eBay, Google, and Amazon) announced that

they would provide access to their wealth of data free of charge, the Internet community

applauded them. The way they each decided to allow access to their data is in some ways

very similar, but also different. Fortunately, they all provided a SOAP interface using

Java, and sometimes, they provide the java libraries themselves. This made it easier to

integrate each web service with the Wishsky application. Another similarity is that we

were given unique ID strings to be attached to each SOAP request. Through this method,

each company could monitor and limit the number of requests made to their system by

specific applications. During our testing and development, the request limits were never

an issue, but if our application were to be used by hundreds, or even thousands of

customers, this would be a major issue.

Because each Web Service provides different levels of functionality for users, there

was noticeable different level of effort required to setup the interface. By far, EBay took

the most amount of time, because not only does it allow searching of items being

auctioned, but it also allows bidding and posting items to sell. What this meant is that

EBay requires not one, but three IDs, a real eBay account, a registered runame (a unique

name of our Wishsky application), and a long unique string (token) associated with the

EBay account, just to do just the simplest search. The scope of our application did not

involve bidding or selling items on EBay, only searching. This meant we only needed

one EBay account and token string was needed for the whole application. Otherwise,

12

Page 13: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

each Wishsky user would be required to register an EBay account and go through an

authorization process. In comparison, Amazon and Google especially, were relatively

easy to setup, was not without its difficulties.

The good folks at Google provided a jar file with the necessary java classes to

quickly access Google searches. However, we found out that the supplied jar file was not

compatible with Tomcat. I was forced to discard the jar, and create my own SOAP

classes using Apache Axis WSDL2Java function. Amazon is more similar to EBay,

because we use it to search for Items, either through an Amazon Wishlist, or by keyword.

However, it does not allow posting items to sell, so there’s no extensive authorization

process.

5.3 User Interface

The interface design rationale of the platform is giving our customer hope and making

their dreams come true. Because we believe everybody is happy when he or she sees the

beautiful rainbow in the sky, we design the interface with sky and rainbow. We also hope

our customers are happy when they are in our website. So we design two little happy boy

and girl to welcome our customers and let our customers have a family feeling in our

website. In short, we will design the interface based on these expectations:

Give user the impression of being in a dream

Make the interface appealing to multiple age groups

Family friendly

5.4 Sample Scenarios

When a new user visit to Wishsky, and he can easily find what products are hot ones. If

click the picture of the product, it will directly link to Amazon related page so that he can

get more detail about that products. Meanwhile, Wishsky provides important news and

reviews information for hot products. After user clicks on the title of the production, right

frame will show him the lists of related links which are generated from Google search.

13

Page 14: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

For instance, he would like read the first news. Pop up windows directly like to the

webpage.

By the way, in this page, Wishsky

also provides the function of searching

wish-list in Amazon by interring

correct email address.

After entering basic data and click submit, user, as a new register, will see a user

14

Page 15: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

friendly page because there is a clear guide to teach first time users how to create his own

wish-lists on Wishsky. First, user can directly import desired items from his Amazon

Wishlist according to his email address for Amazon. After successfully import it, he can

choose items and add them to my wish-list.

15

Page 16: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

16

Page 17: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

On the other hands, users can user type keyword of product title and select product

categories to search products form Amazon. For instance, he would like search “Sony T7

Camera” related products. Easily, he can find what he want and add it into his wish-list.

17

Page 18: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

18

Page 19: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

If users finish Wishlist initial setting, they just click done and go to next page. Then, they

can enter a personalized page. Wishsky tries to build up an environment which is just for

you so that you can see you name, cute icon based on your gender, and your wish items

on this page. Again, Wishsky provides critical product information based on your wished

items. Moreover, when users click on item title, a list of available auctions of certain

items will pop up. If users want see more detail, just click the link to eBay.

19

Page 20: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

In Wishsky, users can maintain their basic data, including password, personal

profiles, and friend lists. Users just need to input their friends’ ID and added into their

friend list. After finishing it, they can easily know what their friends want for Christmas

gifts.

20

Page 21: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

In addition, our Wishsky provides three different mechanisms to recommend users

something they might be interesting in. The first one is based on the result created by

association rule algorithm. Initially, Wishsky will show us recommended items which

come from association rule based on our customers’ Wishlist.

The second type is based on your friends’ Wishlist so Wishsky recommends what your

friends like to you. The last one is based on hot item within our websites so that we can

catch the trend of the fashion.

21

Page 22: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

6. System Novelty

Wishsky is a website integrates product information form Amazon, Google, and eBay

and, at the same time, it provides friendly interface for multi-aged users to operate useful

functions. There must be something special within Wishsky so, in this chapter, we

discuss the novelty of Wishsky.

6.1 Data Mining Algorithm

Wishsky implements recommendation function based on the result generated from

Apriori algorithm, this kind of recommendation is most common one applied on nearly

all of electronic retailers, such as Amazon.

Simultaneously, Wishsky design another interesting recommendation mechanism

which depend on the most popular items which appear in users’ friends’ wish-lists

repeatedly. It is the unique innovation which successfully combines wish-list and friend

list because Wishsky is the only one community based on wish-list. As a result, Wishsky

has high potential to become a whole new community which provides lots of business

opportunities.

6.2 APIs

Our application is the only application known to interface with EBay, Amazon and

Google. This allows users to potentially receive a great deal of relevant information

quickly. Users can open multiple browsers at once, and search each application

separately, but it would tedious to search for many items, and there would be no way to

easily view all the information on one page.

22

Page 23: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

6.3 User Interface

On the basis of the design rationale of family friendly interface, we not only have friendly

interface design for different customers but also provide clear directions for different

functions. Our interface is designed with soft color and naive logos. With different

functions, providing circumspect directions to lead customers using each function more

smoothly.

6.4 Compare with others

Our project is different not only in the technological way, but also our business idea is

quite interesting and fresh. This demonstrates in two perspectives.

The first is we create a new kind of business community. A business community is a

place where customers with similar business interest gather together. In the community

we create in our project, customers know each other’s wishes. This will provide a lot of

convenience to our customers. For instance, you can avoid the embarrassing time of

sending an inappropriate gift to your friend if you know what he or she really wishes.

And you will probably get very excited to receive from your friend a right gift you have

long been expecting for, as he or she gets a chance to know your wishes. As far as we

know, this kind of business community has never been seen before.

7. Conclusion / Future Directions

We are so glad to see what we have achieved after a long semester effort- a business

website with an innovative business idea, comprehensive and considerate service,

friendly and warm interface, most importantly, the optimistic huge market value. This is

23

Page 24: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

from a cooperative and hard working team. At the beginning period of our project, we

proposed several different plans, and each sounded very reasonable, but we respected

each other’s idea, and discussed without biased in order to choose out the best one.

During the designing period, we all tried to do what he or she could do, and no one

talked about if the work distribution was fair or not. Furthermore, we helped each other

out, and never retained anything within him or her. We learned from each other, and

encouraged each other whenever we felt depressed. Due to the lack of practical

experience in developing complete systems, we did feel a lot of pain during the process

for this project, as we always came up more requirements than planned while having to

make the schedule. But the best thing was we had never given up, and we came to know

that things can be more perfect but never be completely perfect. The result was we were

very happy to see the good responses from the professor and our classmates during the

demo. We sincerely hope that all the users of “Wishsky” realize their wishes as we did.

In order to continue our great dream of Wishsky, we, of course, will put more efforts

in it so we have some future plans to create more value-added functions, design more

interesting interfaces, and even integrating more APIs of web-services. The major future

plans are summarized into three following directions.

Data Mining Algorithm

At this moment, Wishsky provides users product recommendation based on product

association relationships of all users’ wish-lists, product association relationships of

users’ friends’ wish-lists, and hot products in Wishsky. Although current solution of

recommendation is sufficient to fulfill the need of users, it still could be improved.

According to our future plan, we would like to adopt ART (Adaptive Resonance

Theory Network) to group our mass users based on their basic attributes so that is the

reason why we require new registers full out many personal attributes. Because ART is

an unsupervised machine learning algorithm, it is useful to categorize users.

Hence, users can be divided into different interesting groups first, and then Apriori

association rule algorithm is used to find out product association rules among wish-lists

of users who are in the same interesting groups. By doing so, Wishsky could do more

effective and accurate product recommendations.

APIs

Web Services is still a new technology, a novelty that applications (Web or Desktop) are

just starting to use. Most likely, this technology will evolve and change rapidly. To take

24

Page 25: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

advantage of new features and bug fixes, Wishsky will also need to update to newer

versions on a regular basis. Also, because of time limitations, for now the Wishsky

application can only search for items on eBay and Amazon. Future development may

allow users to actually buy items or post items for auction in eBay. A key idea of

Wishsky is to allow users to see buyer demand before auctioning items, and to find out

which items are popular, and likely to receive more bids. It only seems only natural to

also allow users to quickly auction an item where they never have to leave our interface,

and may then be tempted to use a competing application’s software.

User Interface

We could provide more functions for our customers to personalize their own interface in

the future. Our customer can upload their personal photo to be the welcome logo, and

adjust the interface color by their preferences. In addition, they can design and change

each block of the interface with their interests and the interface will show all of the

interesting functions based on different customers.

Meanwhile, the platform will dynamically show different information and functions

for different customers based on their individual interests and settings. In briefly, we hope

we can provide a personalized space for our customers to enjoy all of information what

they want.

We hope that our product, Wishsky, not only make our clients’ dreams come true but

also ours. For the success of our project, we appreciate a lot for the instruction from Dr.

Chen, and the help from the teaching assistant Xin Li and other teams.

25

Page 26: Final report

MIS 531A Data Structure & Algorithm Report of Group Web Mining Project

Reference

Website

1. http://www.half.com

2. http://www.amazon.com

3. http://www.google.com

4. http://www.ebay.com

5. http://www.cs.waikato.ac.nz/ml/weka/

6. http://www.amazon.com/gp/aws/landing.html

7. http://www.google.com/apis/

8. http://developer.ebay.com

Paper

1. Yung-Hsiang Chiu, “The Study of Applying Neural Network and Data Mining

Techniques to Course Recommendation Base on E-learning Environment”, 6 June

2003

2. Guan-Hua Sun, “A Data Mining Methodology for Library New Book

Recommendation”, July 2000

26


Recommended