Date post: | 23-Jan-2018 |
Category: |
Data & Analytics |
Upload: | ai-frontiers |
View: | 249 times |
Download: | 1 times |
Naturali
A Beijing-based startup company
Upgrade apps with a speech interface
Naturali Sesami✦ Translate speech inputs to action sequences
in apps and execute them on users’ behalf.
✦ Chinese version launched on LeTV phones as a system app on April 12, 2017
✦ Available as a third party app all Android phones since Aug. 2017
Advantages of Speech
Speed✦ voice input is three times as fast as typing
Hand-free:✦ send messages, play music, order food
✦ turn on hotspot: 5 clicks
Mind-free:✦ where is my luggage?
Chat + API: the down sides
Chat assistants displace apps, but
Chat is not the best mode of interaction for everything.
editing
browsing
viewing
None the less, there are plenty of needs for voice interaction.
who has access to
this?
Chat + API: the down sides
Re-invention of user experience inside the chat window:✦ usually not as good as
specialized apps,
✦ requires a great deal of repeated development effort
Chat + API: the down sides
Re-invention of user experience inside the chat window:✦ usually not as good as
specialized apps,
✦ requires a great deal of development effort
Chat + API: the down sides
Economic interests of the assistant and the backend services may not be aligned.
Naturali Sesami
A thin, transparent translation layer over apps.✦ voice ➜ front end UI actions
Seamless integration of speech and graphics✦ Existing GUI interactions are still
available
✦ Making voice interaction available on any app page
Voice to Actions in Three Steps
Speech Recognition: sound → text✦ data
Semantic Interpretation: text → intent✦ knowledge
Plan Generation: intent → actions✦ grounding
Naturali Speech
End-to-end DNN: CNN+LSTM+Attention+CTC✦ built from scratch with TensorFlow
✦ trained with thousands of hours of transcribed speech
Personalized and contextualized language model:✦ contact names
✦ app specific vocabulary
Semantic Interpretation: text → intent
An intent identifies a task and the necessary information (parameters) for the task
Example: ✦ task: FlightSearch
✦ parameters: (to, from, date, airline, class)
Entities and Types
Persons: singers/directors/contacts
Locations: cities/POIs/addresses
Apps and Games
Media: songs/shows/movies/books
Time and Date
Food
Sports teams
……
Recognizing Thousands of Types
It is not an option to use manually labeled training examples.
An alternative is to use naturally annotated data:✦ Hearst patterns: NPtype such as NPinst
✦ Other examples: navigate to NPloc
Multi-round Conversation
Complex intents may not be articulated in one shot✦ FlightSearch(to, from, date, airline, class)
A multi-round conversation incrementally collects information from user and guides the user in the process.
Composite Intents
Messenger chat with Alex and say let’s meet on saturday✦ OpenMessenger
✦ ChatWithPerson
✦ SendMessage
get a uber black ride to SFO✦ UberRide
✦ SetDest
✦ SelectUberBlack
Plan Generation: intent → actions
Grounding: establishes the connection between in the inside (the assistant) and the outside (apps and devices).
Example:✦ intent:
{“task”: “FlightStatus”, “number”:”UA888”, “date”:”2017-11-04”}
✦ action:
select * from flight_db where “airline”=“United Airlines”, flight_num = “888” and year=2017 and month=11 and day=4
Crowd Sourced Skills
Skills are immediately usable by the creator. ✦ The user may share the skills with others, e.g., tech support
for parents
Vetted skills can be made available to the public
Summary
Voice interaction is inevitable
Naturali Sesami translates user requests into sequences of actions in APPs.
Sesami grows by crowd sourcing skills.
Join US! ✦ [email protected]