Date post: | 29-Dec-2015 |
Category: |
Documents |
Upload: | cathleen-simon |
View: | 213 times |
Download: | 0 times |
© 2007 Powerset Page 2
PowersetPowersetOutline
Natural language helps the semantic web
Powerset natural language search
Demos
Vision for ecosystem acceleration
© 2007 Powerset Page 3
PowersetPowersetSemantic Web Chicken and Egg
Semantic markup and resources are costly
People will work these at scale only when there are valuable applications
Applications are not viable without semantic markup and resources
Hence, semantic web is slow to realize its potential
© 2007 Powerset Page 4
PowersetPowersetNLP addresses Chicken and Egg
NLP reduces semantic development effort► Create annotations from unstructured text► Generate ontologies
Natural language search is a great application► Consuming semantic web information► Exposing semantic web services in response to
natural language queries
© 2007 Powerset Page 5
PowersetPowersetPowerset: A natural language search company
Our goal is to enable people to interact with information and services as naturally and effectively as possible
We combine deep NL and scalable search technology
How we do it: natural language search► Interpret the web► Index► Interpret the query► Search… Match
Our system creates and uses Semantic Web information in multiple ways
© 2007 Powerset Page 6
PowersetPowersetWhat’s next in web search?
Goal: Matching query intents with document intents
Changes to document model drive largest innovations:► Proximity: shift from “doc as bag-of-keywords” to “doc
as vector-of-keywords”► Anchor Text: Adding off-page text to doc
What unexploited aspect of the document is next?
© 2007 Powerset Page 7
PowersetPowersetLinguistic Structure
Documents are loaded with linguistic structure
Currently this is mostly discarded and ignored► Largely unaddressed on the document side due to
computational cost and complexity► Most previous work focuses on query only
But this structure has immense value!
It is how the document’s intent is encoded.
© 2007 Powerset Page 8
PowersetPowersetPowerset’s Semantic Index
The semantic indexer cracks the linguistic structure to extract meaning
Applies deep NLP to the entire corpus to build a rich representation
This new kind of index is a platform for innovation that allows for greatly expanded capabilities
© 2007 Powerset Page 9
PowersetPowersetNatural language for consumer search
Economics► Costs came down: Moore’s law, Tech speedups► Revenue came up: Search ad monetization
User experience can be transformed with current NL capabilities
► Perfection not required► Robust broad-coverage integrated systems► Multi-lingual core platform, languages as plug-ins
Change user behavior► Change to something easier► Already happening (voice, mobile, Yahoo Answers,
Shortcuts)► Can give results even with today’s behavior
© 2007 Powerset Page 10
PowersetPowersetConverging trends
Language technology► (Relatively) efficient and mature language computation (XLE)► Large scale grammars (Pargram)
Lexical and ontological knowledge resources► Existing: Wordnet, verbnet, framenet, SUMO…► Data-driven acquisition methods
Moore’s law► Computing and storage cheap enough now, getting cheaper
Open source software► Cluster management, map-reduce, big-table► Effect: can now down-load Google/Yahoo core competencies
Commodity computing: Amazon EC2, S3, SDS
© 2007 Powerset Page 11
PowersetPowerset
Sir Edward Heath died from pneumonia ...
Sir Edward Heath (noun)Sir Edward Heath (noun)
subj from
Sir Edward Heath (name)Sir Edward Heath (name)UK Prime MinisterUK Prime Ministerpoliticianpolitician
► Parses each sentence on the page► Extracts entities & semantic relationships► Identifies and expands to similar entities, relationships & abstractions► Indexes multiple facts for each sentence
Powerset reads each sentence
by
pneumonia (noun)pneumonia (noun)diseasedisease
die (verb)die (verb)killedkilled
© 2007 Powerset Page 13
PowersetPowersetIntegrating Diverse Resources
•Powerset’s Natural Language Technology enables integrated search results
•Based on the answer to the query
•Can tap into multiple different sources , e.g.:
•Websites•Newsfeeds•Blogs•Archives•Metadata•Video•Podcasts•Databases
•From web results
•From Freebase
•From Blinx
© 2007 Powerset Page 14
PowersetPowersetNL Database Query (Entertainment)
Natural language query converted into database query
Results from database drive further engagement
© 2007 Powerset Page 15
PowersetPowerset
Parse Generate Select Transfer Interpret
Transfer/Glue SemanticsLFG Syntax
Finite-state morphology
EnglishFrench
JapaneseGerman
AlgorithmsEngineering
Mathematics
Multidimensional, multilingual architecture from long-term research
Language Technologies: Parc ⇔ Powerset
Smart summaries
Meaning-sensitiveapplications
Translation
Question answering
Consumer Search
Relation extraction
Dialog
ChineseNorwegian
Stochastic models
Entity recognition
Note-takingTheorySoftwareTableware
Am
big
uit
yM
anag
emen
t
Scale
Communityknowledge resources
XLE
Pargram
© 2007 Powerset Page 17
PowersetPowersetNatural Language search architecture
Acquisition:manual + ML
Open-textcontent
NLquestions
ContentSemantics
Content Acquisition
User search
Ranking
XLE parse Semantic map
XLE parse Semantic map
Query
Resultpresentation
Large-scalesemantic index
Retrieval
Who did IBM acquire in the last 10 years?
IBM purchased Lotus in 1998.
IBM purchased Lotus in 1998.
QuestionSemantics
KnowledgeResources
LFG Grammar
Indexing
© 2007 Powerset Page 18
PowersetPowersetAlso required…
Scalability► Coordinated computing on 1000’s of machines► Serving millions of users► Managing processors, storage, communication
System integration► Robust combination of complex components
Data resources► For guidance, training, testing► Parse banks, relevance and ranking sets…
Knowledge resources► Lexical and conceptual mappings► Facts, inferences► Acquisition Strategies
■ Manual, Semi-Automatic, Automatic, Community Resources
User experience: create expectations, change behavior
© 2007 Powerset Page 19
PowersetPowersetEcosystem acceleration
Wisdom of crowds can accelerate semantic web
Publishers► Uploading ontologies to get more traffic► Get feedback on improving their content
Users► Play games to create and improve resources► Provide feedback to get better search► Create ontologies for personalization and groups
Developers► Package knowledge for specialized apps
So what starts as a broad platform gets deeper faster
Realizing a semantic web faster than expected
© 2007 Powerset Page 20
PowersetPowersetCommunity involvement plans
Powerset Open Platform► API► Access to technology to build mashups, applications
Powerset Contributions► Datasets► Annotations► Open Source Software
Partnerships & Collaborations
Powerlabs► Early Access to Powerset Technology► Tuning Product Concept & Design► High-Value feedback from users eager to help► Signup at www.powerset.com