+ All Categories
Home > Technology > Finding Products on the Internet Using Neural Networks

Finding Products on the Internet Using Neural Networks

Date post: 13-Jun-2015
Category:
Upload: shion-deysarkar
View: 401 times
Download: 0 times
Share this document with a friend
Description:
Datafiniti crawls the entire Internet to find product data. We use a machine learning technique called neural networks to automatically identify product listings and various heuristic techniques to extract product data.
Popular Tags:
27
Finding Products on the Internet using Neural Networks http://www.datafiniti.net
Transcript
Page 1: Finding Products on the Internet Using Neural Networks

Finding Products on the Internetusing Neural Networks

http://www.datafiniti.net

Page 2: Finding Products on the Internet Using Neural Networks

● Goals

○ Collect vast amounts of data through web crawling

○ Normalize and deduplicate data

○ Make it searchable and meaningful

Page 3: Finding Products on the Internet Using Neural Networks

Motivation

Page 4: Finding Products on the Internet Using Neural Networks

Challenges

48 billion pages on the Internet○ Crawled 6 billion+ pages in the past year

Mostly unstructured data

Limitations of customized crawls○ Non-scalable○ Less robust

Page 5: Finding Products on the Internet Using Neural Networks

Solution: Intelligent Classifiers

Advantages○ Generic code: Scalability ○ More robust

Challenges○ More difficult to parse data of interest

Page 6: Finding Products on the Internet Using Neural Networks

Problem

Problem

Page 7: Finding Products on the Internet Using Neural Networks

Product PageProduct Page

Page 8: Finding Products on the Internet Using Neural Networks

Product CategoryProduct Category

Page 9: Finding Products on the Internet Using Neural Networks

Product PageProduct Page

Page 10: Finding Products on the Internet Using Neural Networks

Some Other Page

Page 11: Finding Products on the Internet Using Neural Networks

Problem

Solution

Page 12: Finding Products on the Internet Using Neural Networks

Minimize dependency on HTML

Supervised learning for page classification○ Neural networks

Heuristic algorithms for data parsing

Our Approach

Page 13: Finding Products on the Internet Using Neural Networks

Hidden Layer

Input Layer

Output Layer

AV(Product)

AV(Product Category)

AV(Other)Pag

e Fe

atur

es

AV: Activation Value : {0, 1}

Neural Network

Classification_Type = Type with max. AV

Page 14: Finding Products on the Internet Using Neural Networks

Page FeaturesBuy Widget

Price

Image

Num Clickable Images with

Price

Shipping Info

Page 15: Finding Products on the Internet Using Neural Networks

Page Features

Weight

Product Code

Keywords

Num. Words on

Page

Page 16: Finding Products on the Internet Using Neural Networks

Trained offline Dataset Feature Vector

Normalization

Neural Network

Input Layer Parameter Set

(P)

Hidden Layer Parameter Set

(Q)

Training

Page 17: Finding Products on the Internet Using Neural Networks

Web page Feature Vector Normalized Feature Vector (x)

Neural Network

Input Layer Parameter Set

(P)

Hidden Layer Parameter Set

(Q)

AV(Prod)

AV(ProdCat)

AV(Other)Page_Type = max{ AV(Prod), AV(ProdCat), AV(Other) }

Output of hidden layer: L1 = sigmoid(PTX)

Final output: L2 = sigmoid(QTL1)

L2 = { AV(Prod), AV(ProdCat), AV(Other) }

sigmoid(s) = 1 / (1 + e-s)

Deployment

Page 18: Finding Products on the Internet Using Neural Networks

Notation○ True Positive (TP)○ False Positive (FP)○ False Negative (FN)

Precision : TP / (TP + FP)

Recall : TP / (TP + FN)

F-score: 2PR / (P + R)

Known Dataset○ Precision = 1.0○ Recall = 0.985○ F-score = 0.9925

Live System/Unknown Data○ Precision = 0.854○ Recall = Difficult to

calculate

Evaluation

Page 19: Finding Products on the Internet Using Neural Networks

Problem

Data Extraction

Page 20: Finding Products on the Internet Using Neural Networks

Product Name

Product Price

Product Code○ UPC, EAN, ISBN, ASIN

Fields to Collect

Page 21: Finding Products on the Internet Using Neural Networks

Product PageGetting Product Name

Potential Names

<title>Pebble Smart Watch for Select Apple and Android Devices 301RD - Best Buy</title>

Match Found

Page 22: Finding Products on the Internet Using Neural Networks

Product PageGetting Product PricePrice values

with text - discard

Old Price - discard

Current Price - Accept

Page 23: Finding Products on the Internet Using Neural Networks

Improve classification accuracy

Increase/improve collection of data fields

Future Work

Page 24: Finding Products on the Internet Using Neural Networks

Questions?https://www.datafiniti.net

http://blog.datafiniti.net@datafiniti

Page 25: Finding Products on the Internet Using Neural Networks

Price

Image(s)

# clickable images adjacent to price values

"Add to cart", "Buy" widget

# words in page text

Keywords○ Product detail, specifications, features, size,

color, weight, shipping, availability, SKU, UPC, ISBN, ASIN

Page Features

Page 26: Finding Products on the Internet Using Neural Networks

Some Other Page

Page 27: Finding Products on the Internet Using Neural Networks

ProductImages

Related Products

PriceWidget to buy

Shipping Info

Classification Intuition


Recommended