+ All Categories
Home > Documents > Powerset Natural Language and the Semantic Web Barney Pell, Ph.D. Founder and CTO ISWC 2007.

Powerset Natural Language and the Semantic Web Barney Pell, Ph.D. Founder and CTO ISWC 2007.

Date post: 29-Dec-2015
Category:
Upload: cathleen-simon
View: 213 times
Download: 0 times
Share this document with a friend
21
Powerset Natural Language and the Semantic Web Barney Pell, Ph.D. Founder and CTO ISWC 2007
Transcript

Powerset

Natural Language and the Semantic Web

Barney Pell, Ph.D.Founder and CTO

ISWC 2007

© 2007 Powerset Page 2

PowersetPowersetOutline

Natural language helps the semantic web

Powerset natural language search

Demos

Vision for ecosystem acceleration

© 2007 Powerset Page 3

PowersetPowersetSemantic Web Chicken and Egg

Semantic markup and resources are costly

People will work these at scale only when there are valuable applications

Applications are not viable without semantic markup and resources

Hence, semantic web is slow to realize its potential

© 2007 Powerset Page 4

PowersetPowersetNLP addresses Chicken and Egg

NLP reduces semantic development effort► Create annotations from unstructured text► Generate ontologies

Natural language search is a great application► Consuming semantic web information► Exposing semantic web services in response to

natural language queries

© 2007 Powerset Page 5

PowersetPowersetPowerset: A natural language search company

Our goal is to enable people to interact with information and services as naturally and effectively as possible

We combine deep NL and scalable search technology

How we do it: natural language search► Interpret the web► Index► Interpret the query► Search… Match

Our system creates and uses Semantic Web information in multiple ways

© 2007 Powerset Page 6

PowersetPowersetWhat’s next in web search?

Goal: Matching query intents with document intents

Changes to document model drive largest innovations:► Proximity: shift from “doc as bag-of-keywords” to “doc

as vector-of-keywords”► Anchor Text: Adding off-page text to doc

What unexploited aspect of the document is next?

© 2007 Powerset Page 7

PowersetPowersetLinguistic Structure

Documents are loaded with linguistic structure

Currently this is mostly discarded and ignored► Largely unaddressed on the document side due to

computational cost and complexity► Most previous work focuses on query only

But this structure has immense value!

It is how the document’s intent is encoded.

© 2007 Powerset Page 8

PowersetPowersetPowerset’s Semantic Index

The semantic indexer cracks the linguistic structure to extract meaning

Applies deep NLP to the entire corpus to build a rich representation

This new kind of index is a platform for innovation that allows for greatly expanded capabilities

© 2007 Powerset Page 9

PowersetPowersetNatural language for consumer search

Economics► Costs came down: Moore’s law, Tech speedups► Revenue came up: Search ad monetization

User experience can be transformed with current NL capabilities

► Perfection not required► Robust broad-coverage integrated systems► Multi-lingual core platform, languages as plug-ins

Change user behavior► Change to something easier► Already happening (voice, mobile, Yahoo Answers,

Shortcuts)► Can give results even with today’s behavior

© 2007 Powerset Page 10

PowersetPowersetConverging trends

Language technology► (Relatively) efficient and mature language computation (XLE)► Large scale grammars (Pargram)

Lexical and ontological knowledge resources► Existing: Wordnet, verbnet, framenet, SUMO…► Data-driven acquisition methods

Moore’s law► Computing and storage cheap enough now, getting cheaper

Open source software► Cluster management, map-reduce, big-table► Effect: can now down-load Google/Yahoo core competencies

Commodity computing: Amazon EC2, S3, SDS

© 2007 Powerset Page 11

PowersetPowerset

Sir Edward Heath died from pneumonia ...

Sir Edward Heath (noun)Sir Edward Heath (noun)

subj from

Sir Edward Heath (name)Sir Edward Heath (name)UK Prime MinisterUK Prime Ministerpoliticianpolitician

► Parses each sentence on the page► Extracts entities & semantic relationships► Identifies and expands to similar entities, relationships & abstractions► Indexes multiple facts for each sentence

Powerset reads each sentence

by

pneumonia (noun)pneumonia (noun)diseasedisease

die (verb)die (verb)killedkilled

© 2007 Powerset Page 12

PowersetPowersetMultiple queries retrieve the same “facts”

© 2007 Powerset Page 13

PowersetPowersetIntegrating Diverse Resources

•Powerset’s Natural Language Technology enables integrated search results

•Based on the answer to the query

•Can tap into multiple different sources , e.g.:

•Websites•Newsfeeds•Blogs•Archives•Metadata•Video•Podcasts•Databases

•From web results

•From Freebase

•From Blinx

© 2007 Powerset Page 14

PowersetPowersetNL Database Query (Entertainment)

Natural language query converted into database query

Results from database drive further engagement

© 2007 Powerset Page 15

PowersetPowerset

Parse Generate Select Transfer Interpret

Transfer/Glue SemanticsLFG Syntax

Finite-state morphology

EnglishFrench

JapaneseGerman

AlgorithmsEngineering

Mathematics

Multidimensional, multilingual architecture from long-term research

Language Technologies: Parc ⇔ Powerset

Smart summaries

Meaning-sensitiveapplications

Translation

Question answering

Consumer Search

Relation extraction

Dialog

ChineseNorwegian

Stochastic models

Entity recognition

Note-takingTheorySoftwareTableware

Am

big

uit

yM

anag

emen

t

Scale

Communityknowledge resources

XLE

Pargram

© 2007 Powerset Page 16

PowersetPowersetWe parse the Web.

© 2007 Powerset Page 17

PowersetPowersetNatural Language search architecture

Acquisition:manual + ML

Open-textcontent

NLquestions

ContentSemantics

Content Acquisition

User search

Ranking

XLE parse Semantic map

XLE parse Semantic map

Query

Resultpresentation

Large-scalesemantic index

Retrieval

Who did IBM acquire in the last 10 years?

IBM purchased Lotus in 1998.

IBM purchased Lotus in 1998.

QuestionSemantics

KnowledgeResources

LFG Grammar

Indexing

© 2007 Powerset Page 18

PowersetPowersetAlso required…

Scalability► Coordinated computing on 1000’s of machines► Serving millions of users► Managing processors, storage, communication

System integration► Robust combination of complex components

Data resources► For guidance, training, testing► Parse banks, relevance and ranking sets…

Knowledge resources► Lexical and conceptual mappings► Facts, inferences► Acquisition Strategies

■ Manual, Semi-Automatic, Automatic, Community Resources

User experience: create expectations, change behavior

© 2007 Powerset Page 19

PowersetPowersetEcosystem acceleration

Wisdom of crowds can accelerate semantic web

Publishers► Uploading ontologies to get more traffic► Get feedback on improving their content

Users► Play games to create and improve resources► Provide feedback to get better search► Create ontologies for personalization and groups

Developers► Package knowledge for specialized apps

So what starts as a broad platform gets deeper faster

Realizing a semantic web faster than expected

© 2007 Powerset Page 20

PowersetPowersetCommunity involvement plans

Powerset Open Platform► API► Access to technology to build mashups, applications

Powerset Contributions► Datasets► Annotations► Open Source Software

Partnerships & Collaborations

Powerlabs► Early Access to Powerset Technology► Tuning Product Concept & Design► High-Value feedback from users eager to help► Signup at www.powerset.com

© 2007 Powerset Page 21

PowersetPowerset

Thank you


Recommended