Optimizing Unstructured Data
@ajkohn
@SEMpdx
#SearchFest
My name is AJ Kohn
Blind Five Year Old Since 2007
Making the complex simple
Semantic search is about understanding meaning
Natural Language
Processing
Finding all expressions that refer to the same entity in a text
Coreference Resolution
Part of Speech (POS) Tagging
Assign a part of speech to each word in a text
The word quiet isn’t spelled wrong but Google knew that I probably meant to write quite awesome instead
Making predictions based on patterns and rules from prior data
Google is better at getting meaning from text because
of access to more data
Letters and Words
“New York” hasPopulation: 8.046 Million
hasPointsofInterest: Empire State Building
hasAddress: 350 5th Avenue hasHeight: 1,250 feet
The Knowledge Graph
Connections and relationships between entities and documents
Named Entity Recognition (NER)
One size doesn’t fit all
Context-Dependent Fine-Grained Entity Type Tagging
Not just any entities but salient entities
66 entities on a page and less than 5% are salient
http://bit.ly/bigdealentities
How do you train a machine learning model to
identify salient entities?
Word to your mother
“Keywords don’t matter anymore”
Ice Bear cried, but just inside
I love structured data but optimizing unstructured data is far more powerful
Text on the page is more important now
Words = Entities ^ Context ^ Meaning
We can turn unstructured content into structured data
How much do you trust Google?
How much do you trust Google?
Stop writing for people and start writing for
search engines http://bit.ly/focusedwriting
Most users don’t read but skim and scan instead
http://bit.ly/usersdontread
First you looked here
Then here
A penny for a paragraph return
Not only do we mirror body language we seek it out when searching
Keyword rich text and subheads allow users to
resume reading at any time
Keyword is not a four letter word
Better to you query syntax call it
But what about user delight?
Task Completion > Aesthetics
Our job is to reduce friction
After writing your content go back and find where you can replace pronouns with nouns
Remember that readers won’t often ‘see’ these nouns but will use them as visual signposts
“It’s such a gorgeous work of art”
“Lobster and Cat is a beautiful painting”
ArtworkType: painting ArtworkTitle: Lobster and Cat
hasArtist: Pablo Picasso
Google may better understand the meaning of my query but do they know why I’m searching?
Why are they really searching?
Why are they really searching?
Common Problems with the Eureka 4870
Eureka 4870 Troubleshooting Tips
Local Vacuum Cleaner Repair Shops
Eureka 4870 Replacement Parts Guide to Buying a New Vacuum Cleaner
Why are they really searching?
Common Problems with the Eureka 4870
Eureka 4870 Troubleshooting Tips
Local Vacuum Cleaner Repair Shops
Eureka 4870 Replacement Parts Guide to Buying a New Vacuum Cleaner
Our job is to decode the intent from the query syntax
http://bit.ly/aggregatingintent
Target the keyword
Optimize the intent
What are we really talking about?
This is a factbox triggered by entities and
the Knowledge Graph
This answerbox is triggered by
semi-structured data
This answerbox is triggered by specific
patterns of text
Answerbox triggered by patterns of text and
specific understanding
Answerbox triggered by patterns of text and
specific understanding
Answerbox triggered by patterns of text and
semi-structured data
Answerbox triggered by patterns of text and
specific understanding
Game's the same, just got more fierce
Skate to where the puck is going to be, not to where it has been
The Link Graph +
Scored Entities
<entity A>
<entity B>
<entity C> <entity B>
<entity C>
<entity A>
<entity A>
<entity D>
<entity B>
<entity D>
Entity authority could flow through links
similar to anchor text
We can help Google to find structure, entities and meaning in our content
The easier we make it, the more likely we are to
satisfy robots and humans
AJ Kohn Owner, Blind Five Year Old www.blindfiveyearold.com [email protected]
@ajkohn