Enrich Search User Experience Using Amazon CloudSearch (SVC302) | AWS re:Invent 2013

Post on 01-Nov-2014

1,695 views 0 download

Tags:

description

Today's applications work across many different data assets - documents stored in Amazon S3, metadata stored in NoSQL data stores, catalogs and orders stored in relational database systems, raw files in filesystems, etc. Building a great search experience across all these disparate datasets and contexts can be daunting. Amazon CloudSearch provides simple, low-cost search, enabling your users to find the information they are looking for. In this session, we will show you how to integrate search with your application, including key areas such as data preparation, domain creation and configuration, data upload, integration of search UI, search performance and relevance tuning. We will cover search applications that are deployed for both desktop and mobile devices.

transcript

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.

Enrich Search User Experience for Different Parts

of Your Application Using Amazon CloudSearch

Jon Handler, CloudSearch Solution Architect

November 15, 2013

Agenda

• Sourcing your documents

• Retrieval and ranking

• Search user interface

• Performance and scale

• Developer example:

Peter Simpkin, Solution Architect, Elsevier

Architecting with Amazon CloudSearch

Hands-Off Operation

SEARCH INSTANCE Index Partition n

Copy 1

SEARCH INSTANCE Index Partition 2

Copy 2

SEARCH INSTANCE Index Partition n

Copy 2

SEARCH INSTANCE Index Partition 2

Copy n

SEARCH INSTANCE

Document Quantity and Size

Search Request Volume and Complexity

Index Partition n Copy n

SEARCH INSTANCE Index Partition 1

Copy 1

SEARCH INSTANCE Index Partition 2

Copy 1

SEARCH INSTANCE Index Partition 1

Copy 2

SEARCH INSTANCE Index Partition 1

Copy n

MovieMate Application

Multiple

Sources

Multiple

Functions

When wealthy industrialist Tony Stark is forced

to build an armored suit after a life-threatening

incident, he ultimately decides to use its

technology to fight against evil.

Iron Man (2008)

Tony Stark has declared himself Iron Man and

installed world peace... or so he thinks. He soon

realizes that not only is there a mad man...

Iron Man 2 (2010)

When Tony Stark's world is torn apart by a

formidable terrorist called the Mandarin, he

starts an odyssey of rebuilding and retribution.

Iron Man 3 (2013)

On the hunt for a fabled treasure of gold, a band

of warriors, assassins, and a rogue British soldier

descend upon a village in feudal China, where a

humble blacksmith...

The Man With The Iron Fists (2012)

Cancel Iron Man

Movies Search Social Account Nearby

Done Iron Man

Movies Search Social Account Nearby

Mobile Experience

Agenda

• Sourcing your documents

• Retrieval and ranking

• Search user interface

• Performance and scale

• Developer example:

Peter Simpkin, Solution Architect, Elsevier

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Amazon CloudSearch Documents

• Unique identifier

• Version

• Fields – Indexed according to configuration

– Source of matches

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Amazon RDS

Application Content

Movie data

Theater data

User reviews,

lists etc.

DynamoDB

User actions

Amazon S3

Help files

Media (clips,

images)

Articles

Bootstrap Strategy

Source

System

Processing

Script

Queuing Batching

Amazon EC2

Amazon EC2

Amazon

CloudSearch

Amazon SQS

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Document Construction

• One source will be the master

for each record

determine doc id and version

create fields

for each auxiliary source

gather additional data

send or queue the document

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example Relational DB

Movie

Title

Description

TheaterID

Theater

Name

AddressesID

ShowtimesID

Addresses

Street

City

State

Showtimes

Date

Time

State

Amazon S3

• Clips, images, reviews

• Apache Tika to extract content

• Amazon S3 Metadata for additional fields

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Amazon DynamoDB

DynamoDB CloudSearch

Table Domain

Item DocumentAttribute Field

Attribute

Attribute

Attribute

Field

Field

Field

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

When wealthy industrialist Tony Stark is forced

to build an armored suit after a life-threatening

incident, he ultimately decides to use its

technology to fight against evil.

Iron Man (2008)

Tony Stark has declared himself Iron Man and

installed world peace... or so he thinks. He soon

realizes that not only is there a mad man...

Iron Man 2 (2010)

When Tony Stark's world is torn apart by a

formidable terrorist called the Mandarin, he

starts an odyssey of rebuilding and retribution.

Iron Man 3 (2013)

On the hunt for a fabled treasure of gold, a band

of warriors, assassins, and a rogue British soldier

descend upon a village in feudal China, where a

humble blacksmith...

The Man With The Iron Fists (2012)

Cancel Iron Man

Movies Search Social Account Nearby

Done Iron Man

Movies Search Social Account Nearby

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Searching Show Times

id title description t_name t_street date time

1 Iron

Man

... Galaxy Main 11/1

1

12:30pm

2 Iron

Man

... Galaxy Main 11/1

1

1:15pm

3 Iron

Man

... Galaxy Main 11/1

1

2:45pm

4 Iron

Man

... Galaxy Main 11/1

1

6:00pm

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example Heterogeneous Data

Multi Domain

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Updating CloudSearch

Amazon EC2 Amazon

CloudSearch

Amazon SQS Amazon EC2

Amazon S3 Amazon

DynamoDB

Amazon RDS

Web Server

Users

Update Processor

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Section Summary

• Multiple sources

• Bootstrap / Update

• Heterogeneous data

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Agenda

• Sourcing your documents

• Retrieval and ranking

• Search user interface

• Performance and scale

• Developer example:

Peter Simpkin, Solution Architect, Elsevier

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Correct Matches When wealthy industrialist Tony Stark is forced

to build an armored suit after a life-threatening

incident, he ultimately decides to use its

technology to fight against evil.

Iron Man (2008)

Tony Stark has declared himself Iron Man and

installed world peace... or so he thinks. He soon

realizes that not only is there a mad man...

Iron Man 2 (2010)

When Tony Stark's world is torn apart by a

formidable terrorist called the Mandarin, he

starts an odyssey of rebuilding and retribution.

Iron Man 3 (2013)

On the hunt for a fabled treasure of gold, a band

of warriors, assassins, and a rogue British soldier

descend upon a village in feudal China, where a

humble blacksmith...

The Man With The Iron Fists (2012)

Cancel Iron Man

Movies Search Social Account Nearby

The Search Algorithm

• Locate documents that satisfy Boolean

constraints – Usually intersection

• Relevance rank those documents – Differentiated from databases by relevance

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Document Structure

Movie

title

description

user_rating

likes

release_date

latitude

longitude

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Configuring for Search

• Text fields for individual word search – User-generated and external text – titles, descriptions

• Literal fields for exact matches – Application-generated text like facets

• Integer fields for range searching and ranking

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Searching Text

http(s)://<endpoint>/2011-02-01/search?

• Simple searches – q=<text>

• Filtering – bq= (and title:'iron man' genre:'Action')

• Filtering with integer ranges – bq=(and 'iron man' year:..2010)

• Geo filtering – bq=(and 'iron man' latitude:12700..12900 longitude:5700..5800)

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Search Results

{"rank": "-text_relevance", "match-expr": "(label 'iron man')", "hits": { "found": 204, "start": 0, "hit": [ { "id": "sontsst12cf5f88b42" }, { "id": "sopvopr12ab017f082" }, { "id": "sorzrpw12ac468a13b" }, ] }, ... }

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Relevant Results When wealthy industrialist Tony Stark is forced

to build an armored suit after a life-threatening

incident, he ultimately decides to use its

technology to fight against evil.

Iron Man (2008)

Tony Stark has declared himself Iron Man and

installed world peace... or so he thinks. He soon

realizes that not only is there a mad man...

Iron Man 2 (2010)

When Tony Stark's world is torn apart by a

formidable terrorist called the Mandarin, he

starts an odyssey of rebuilding and retribution.

Iron Man 3 (2013)

On the hunt for a fabled treasure of gold, a band

of warriors, assassins, and a rogue British soldier

descend upon a village in feudal China, where a

humble blacksmith...

The Man With The Iron Fists (2012)

Cancel Iron Man

Movies Search Social Account Nearby

Customizing Ranking

• text_relevance and cs.text_relevance

• Rank expressions – Compute a score for each document

– &rank=<function>

• Defined in the console

• Defined at query-time – &q='iron-man'&rank-recency=text_relevance + year

&rank=recency

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Field Weighting

Field Weighting

• Adjust relative importance of fields

• &rank-title=

cs.text_relevance({"weights":{"title":4.0},

"default_weight":1})

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Popularity

Popularity

• Convert floating point to integer

• Weight by the number of ranks

• rank-pop=text_relevance +

(user-rating - 2) * log10(number-user-ranks) * 10

+ metascore * 3

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Freshness

Freshness

• Exponential decay function

• &rank-decay=text_relevance + 200*Math.exp(-

0.1*days_ago)

r = ce-lt

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Done Iron Man

Movies Search Social Account Nearby

Location Sort

Location Sort

• Latitude and longitude

expressed as integers

• Denormalized for particular

theaters with locations

Movie

title

description

user_rating

likes

release_date

latitude

longitude

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Location Sort

• Cartesian distance function

• &rank-geo=sqrt(pow(latitude - lat, 2) +

pow(longitude - lon, 2)

• &rank=-geo

(lat - latuser )2 + (lon- lonuser )

2

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Rank Expressions: Combined

• &rank-combined=text_relevance + 2.0 * geo +

0.5 * popularity + 0.3 * freshness

• &rank=combined

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Section Summary

• Search API basics

• Customizing ranking – Field weighting, popularity, freshness, GEO, combined

• Rank expression comparison tool

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Agenda

• Sourcing your documents

• Retrieval and ranking

• Search user interface

• Performance and scale

• Developer example:

Peter Simpkin, Solution Architect, Elsevier

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Facets

Facets

Simple Faceting: Document

Movie

title

description

genre

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Simple Faceting: Configuration

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Simple Faceting: Query

q=iron+man&facet=genre

{"rank": "-text_relevance", "match-expr": "(label 'star wars')", "hits": {"found": 7, "start": 0, "hit": [] }, "facets": { "genre": { "constraints": [ {"value": "Family", "count": 62}, {"value": "Action/Adventure", "count": 21}, {"value": "Drama", "count": 5 },

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Simple Faceting: UI <div class='facet'>

<ul class='facet_list'>

<?php

$genres = $resultsObj->facets->genre->constraints;

for ($i = 0; $i < count($genres); $i++) {

$curGenre = $genres[$i]; $curCount = $thisGenre->count;

?>

<li class='facet_item'>

<div class='facet_name'><?=$curGenre?></div>

<div class='facet_count'><?=$curCount?></div>

</li>

<?php } ?>

</ul>

</div>

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Facets

Document

• title: Lincoln

• description: ...

• oscar1: Awards

• oscar2: Awards/Best Actor

• oscar3: Awards/Best

Actor/Daniel Day Lewis

Movie

title

description

oscar1

oscar2

oscar3

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Query

&q=lincoln&facet=oscar1,oscar2,oscar3 {"rank": "-text_relevance", "hits":{...}, "facets": { "oscar1": { "constraints": [ {"value": "Awards", "count": 23}, {"value": "Nominations", "count": 124}]}, "oscar2": { "constraints": [ {"value": "Awards/Best Actor", "count": 6}, {"value": "Awards/Best Actress", "count": 3}...]},

"oscar3": { "constraints": [ {"value": "Awards/Best Actor/Daniel Day Lewis", "count": 1}, {"value": "Awards/Best Actor/Denzel Washington", "count": 2}...]},

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Drilldown

• bq=oscar1:'Awards'

• bq=oscar2:'Awards/Best Actor'

• bq=oscar3:'Awards/Best Actor/Daniel Day Lewis'

• bq=(and 'star' oscar2:'Awards/Best Actor')

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Section Summary

• Simple faceting

• Hierarchical faceting

• Hierarchical data handling

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Agenda

• Sourcing your documents

• Retrieval and ranking

• Search user interface

• Performance and scale

• Developer example:

Peter Simpkin, Solution Architect, Elsevier

The Search Algorithm

• Locate documents that satisfy Boolean

constraints – Usually intersection

• Relevance rank those documents – Differentiated from databases by relevance

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Performance Best Practices

• Match set size

• Text queries perform better than integer queries

• Complex relevance functions

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Optimizing Index Size

• Trade off literal and uint for cost/performance

• Result fields matter most

• Enabling faceting increases size

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Wrap Up

• Sourcing documents from various locations

• Building queries and ranking

• UI Components for faceting

• Getting the most out of your index

Peter Simpkin

Sourcing your documents

Retrieval and ranking

Search user interface

Performance and scale

Developer example

Solution Architect, Elsevier

Agenda

• Elsevier Intro

• Search Problem Statement

• Enterprise Content Search

• Hints and Tips

• Amazon CloudSearch Observations

• 7,000+ employees in 26 countries

• 2,200 journals / article market

share 25%

• $3B revenue

• Scientific, Technical & Medical

Content Systems

Content Challenges:

• No central place for consumers

to discover content

• Is not currently possible to

search and retrieve atomic

assets

• Assets are not reusable across

products Consumer Platforms

Enterprise Content Search Engine

Search Opportunities:

• Create a comprehensive

inventory to discover easily

content Elsevier owns

• Provide access to Granular /

Modular content they want at

will

• Assets must be uniquely

addressable

Empower our product development partners

Enterprise Content Search eco-system

Federated Content Warehouse Product Platform Data center

E.U Corporate Data center

U.S Corporate Data center

Amazon S3 Amazon

DynamoDB

Amazon SWF Amazon

CloudSearch SDF metadata

Simple Search UI

Elsevier Technical Drivers & Approach

• Fully-managed, full featured search service in

the cloud

• Automatically scales for data & traffic

• Easy to set up and use

• PoC created in days

• Search engine as a service

• Pay-as-you-go pricing model

Hints & Tips (and issn:'0022-1694'

(and type:'1.2'

(and (not action:'D')

(or (and pubstartdate:..2013176 pubenddate:2005002..)

(or (and pubstartdate:2005001

(and pubstarttime:0.. pubstarttime:..235959))

(or (and pubstartdate:2013177 pubstarttime:..235959)

(or (and pubenddate:2005001 pubendtime:0..)

(and pubenddate:2013177

(and pubendtime:..235959 pubendtime:0..)))))))))

• Query Response Time = 5 seconds

Optimising Nested Queries (and issn:'0022-1694' type:'1.2'

(not action:'D')

(or (and pubstartdate:..2013176 pubenddate:2005002..)

(and pubstartdate:2005001 pubstarttime:0..235959)

(and pubstartdate:2013177 pubstarttime:0..235959)

(and pubenddate:2005001 pubendtime:0..)

(and pubenddate:2013177 pubendtime:0..235959)))

• Response Time = 2.5 seconds

Optimised Nested Query ((not action:'D')

(or (and issn:'0022-1694' and type‘1.2'

and pubstartdate:..2013176 pubenddate:2005002..)

(and issn:'0022-1694' and type‘1.2'

and pubstartdate:2005001 pubstarttime:0..235959)

(and issn:'0022-1694' and type‘1.2'

and pubstartdate:2013177 pubstarttime:0..235959)

(and issn:'0022-1694' and type‘1.2'

and pubenddate:2005001 pubendtime:0..)

(and issn:'0022-1694' and type‘1.2'

and pubenddate:2013177 pubendtime:0..235959)))

• Response Time = 0.17ms

Amazon CloudSearch Observations

facilitates knowledge sharing on content matters across Elsevier’s product platforms

ability to leverage content infrastructure and capabilities across Elsevier’s divisions

easy to integrate with existing on-premise content systems

speed to market, allows developers to focus building other core content strategy components

need to spend time optimising queries to maximise performance

Resources

• Amazon CloudSearch Overview Page http://aws.amazon.com/cloudsearch/

– Developer Guide

– FAQs, Articles

– Community Forum

– Tutorial

• Free 30-day trial

• Contact: handler@amazon.com

Please give us your feedback on this

presentation

As a thank you, we will select prize

winners daily for completed surveys!

SVC302