Business Identification: Spatial Detection Alexander Darino Week 5.

Post on 02-Jan-2016

217 views 0 download

Tags:

transcript

Business Identification:Spatial Detection

Alexander DarinoWeek 5

2

Outline

• Recap of Previous Work• Business Name Detection• Business Name Matching• Business Spatial Detection• Weaknesses to Current Approach• Alternatives to Current Approach• Acknowledgements

3

Outline

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image OCR Detected Text

Business Name

Matching

BusinessIdentification

Business Spatial

Detection

Week 4 Week 5

Previous Work

4

Image Where Am I? Latitude, Longitude

Latitude, Longitude

Geocoding

ReverseGeocoding

Nearby Businesses

65George S Aiken CoWinghart's Burger & Whiskey BarMarket SquareBella Sera On the SquareChipotleNOLALas Velas…

5

Business Name Detection

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image OCR Detected Text

Business Name

Matching

BusinessIdentification

Business Spatial

Detection

6

Business Name Detection

7

Business Name Detection…<line dy="95" dx="1573" y="420" x="11" value="1">

<space dy="26" dx="9" y="379" x="11"/> <box dy="26" dx="9" y="379" x="11" value="0" weights="96" numac="1"/> <box dy="25" dx="6" y="406" x="11" value="J" weights="98,62" numac="2"

achars="p"/> <box dy="19" dx="5" y="382" x="19" value="n" weights="96" numac="1"/> <space dy="5" dx="30" y="441" x="25"/> <box dy="5" dx="7" y="441" x="56" value="."/> <box dy="24" dx="5" y="401" x="57" value="."/> <box dy="13" dx="8" y="429" x="58" value="v" weights="98" numac="1"/> <box dy="26" dx="9" y="402" x="60" value="." weights="94" numac="1"/> <box dy="22" dx="5" y="406" x="67" value="0" weights="96" numac="1"/> <box dy="10" dx="12" y="444" x="71" value="."/>

</line>…

8

Business Name Matching

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image OCR Detected Text

Business Name

Matching

BusinessIdentification

Business Spatial

Detection

9

Business Name Matching

• Developed Confidence Attribution Algorithm– Confidence of OCR Token being Name Token• Example: Confidence of “ESTUANT” representing

“RESTAURANT”• Point-based system

– Confidence of Name appearing in Image• Sum of points of matching OCR Text• Use logarithmically-normalized points to determine

business inclusion threshold

10

Business Name Matching

11

12

Business Name Matching

13

14

Business Name Matching

15

Business Name Matching

Note: k is usually 2 or 3

16

Business Name Matching

17

Business Name Matching

Note: This originally did not appear because it did not exceed the confidence threshold. It now appears because it contributes to the Business Name Identification

18

Business Spatial Identification

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image OCR Detected Text

Business Name

Matching

BusinessIdentification

Business Spatial

Detection

19

Business Spatial Identification

20

Business Spatial Identification

Aiken George S Co

Category:Food, GroceryAddress: 218 Forbes AvePittsburgh, PA 15222Phone: (412) 391-6358Rating: 4.5/5 (2 Reviews)

21

Business Spatial Identification

22

Business Spatial Identification

23

Business Spatial Identification

Bruegger's Bagels

Category:BagelsAddress: Market Sq

Pittsburgh, PA 15222Phone: (412) 281-2515Rating: Not Rated

24

Weaknesses to Current Approach

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image OCR Detected Text

Business Name

Matching

BusinessIdentification

Business Spatial

Detection

25

Weaknesses to Current Approach

Lots of Garbage

26

Weaknesses to Current Approach

Fragmented Word Detection

27

Weaknesses to Current ApproachFails with

non-orthogonal perspective

Did I already mention lots of

garbage?

28

Weaknesses to Current Approach

Fails withnon-roman text

Not scale-invariant

29

ALTERNATIVE APPROACHESTwo different

30

Alternative #1: Image Matching

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image

Match to Storefront

Image

BusinessIdentification

Business Spatial

Detection

31

Alternative #1: Image Matching

32

Alternative #1: Image Matching

• Weaknesses– Storefront images aren’t always available for

matching– Computationally Expensive• Hundreds of images to compare to

– Nothing new– Boring!

33

Alternative #2: Template Matching

LatitudeLongitude

Geocoding

ReverseGeocoding

Nearby Businesses

Image

Render Templates of Business Names in Different Fonts

Business SpatialDetection

Image Matching(eg. SIFT, HAAR)

Template Images

Business Identification

34

Alternative #2: Template Matching

• Tambellini• Tambellini• Tambellini• Tambellini

• Tambellini• Tambellini• Tambellini• Tambellini

35

Alternative #2: Template Matching

OCR• Not Scale Invariant• Unbounded Search• Fragmented Recognition• Roman-only font

Alternative #2• Scale Invariant• Bounded Search• Whole-word recognition• All fonts

36

Acknowledgements

• Subh– Provided several ideas regarding template

matching using SIFT, HAAR features, etc

Thank You