Deep Learning for Text Analytics...Natural Language Processing Other Use Cases •Text-based...

Post on 22-May-2020

9 views 0 download

transcript

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Deep Learning for Text AnalyticsSAS User Group Malaysia

3rd May 2018

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Agenda

• The Natural Language in Machines

• What is it?

• Why we need it?

• Deep Learning with Recurrent Neural Network (RNN)

• Basic RNN architecture

• Text classification

• Text generation

• Creating, training and scoring an RNN in Jupyter Notebook

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Natural Language in Machines

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

SAS in a Chatbot

Source: https://becominghuman.ai/chatbots-using-aws-sas-viya-e8a7410ec256

Welcome! I’m your virtual assistant.

How can I help you?

In The Oil and Gas IndustryAI to Assist Customers

Provide answers and information on over 3,000 company products using information based on 100,000 information data sheets, 1,000 different pack options and 1,100 different physical characteristics.

Delivering the correct answer to each possible question was spread out over a variety of different sources including an external vehicle database with over a million different vehicle and engine combinations.

Need to recognize if a piece of information is missing and ask the customer further questions to clarify or confirm.

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Natural Language ProcessingInteraction

Natural Language Processing (NLP)

Natural Language Understanding (NLU)

Natural Language Generation (NLG)

Natural Language Interaction (NLI)

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Natural Language Processing

NLP Layer(Natural Language

Processing)

Knowledge Base

(Source Content)

Data Storage(Interaction History &

Analytics)

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Natural Language ProcessingOther Use Cases

• Text-based applications

• Searching for a certain topic in the database

• Extracting information for a large document

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Deep Learning with Recurrent Neural Network (RNN)

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Deep LearningRecurrent Neural Network (RNN)

• Designed to handle sequential data

o Text

o Speech

o Time

• Performs the same task for every element of a sequence

• Output for each element depends on computations of its preceding element

• Common variants

o Gated Recurrent Unit (GRU)

o Long Short-Term Memory (LSTM)

Output

Input

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Text Classification

The 16th American President

number

order

entity

context

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Word Vector

Unlabeled Corpus

The 15th American President

The 16th American President

The 17th American President

Alex reads this sentence

Alex read this sentence

Alex is reading this sentence

15th

17th

16th

read

reading

reads

Word Vector Algorithm

Words with similar context should have

similar vectors

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Text Generation

Translating vector back to

text

Convert text into vectorized

input

RNN

Calculate vector weight

Vector representing a sentence

based on the text

Use weight vector to refine model

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Text GenerationText Structure – Word Order

Who is the 16th American President

The 16th President who is American

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Creating, Training, Scoring an RNNUsing Deep Learning

Sample RNN ModelLoading the Action Sets

Sample RNN ModelThe Dataset

Sample RNN ModelThe Dataset

Sample RNN ModelText Classification Model

Sample RNN ModelTraining the Text Classification Model

Sample RNN ModelScoring the Text Classification Model

Sample RNN ModelWord Order

Sample RNN ModelText Generation Model

Sample RNN ModelTraining the Text Generation Model

NOTE: The Synchronous mode is enabled.

NOTE: The total number of parameters is 3822440.

NOTE: The approximate memory cost is 19739.00 MB.

NOTE: Loading weights cost 0.00 (s).

NOTE: Initializing each layer cost 147.37 (s).

NOTE: The total number of threads on each worker is 32.

NOTE: The total number of minibatch size per thread on each worker is 16.

NOTE: The maximum number of minibatch size across all workers for the synchronous mode is 512.

NOTE: Target variable: title

NOTE: Number of input variables: 1

NOTE: Number of numeric input variables: 2

NOTE: Batch nUsed Learning Rate Loss Fit Error Time (s) (Training)

NOTE: 0 512 0.05 11.679 1 0.79

NOTE: 1 512 0.05 11.414 1 0.77

NOTE: 2 512 0.05 11.2 1 1.39

NOTE: 3 512 0.05 11.093 1 0.72

NOTE: 4 512 0.05 10.858 0.9983 0.85

NOTE: 5 512 0.05 10.64 0.9835 1.18

NOTE: 6 512 0.05 10.64 0.9835 0.91

Sample RNN ModelScoring the Text Generation Model

Out[15]: § Scoreinfo

Descr Value

0 Number of Observations Read 9635

1 Number of Observations Used 9635

2 Misclassification Error (%) 90.71156

3 Loss Error 9.162203

Sample RNN ModelText Generation Output

review: Really Really enjoy playing this game. It makes you feel verys mart as you solve some puzzles quickly and then the nextone

will be a real stumper… play it ALL the time.

ground truth: Crazy Addicted

prediction: love this game

review: Don’t bother with this app. I wish I could delete it from my purchased app history so I’m not reminded that it was every on

my phone. I should have known better.

ground truth: Really?

prediction: i love this

Sample RNN ModelText Generation Output

review: I really like this app! very easy to use and the audio is great. it is nice to see the spreading of God’s word is still free

for some people. a++++.

ground truth: awesome!

prediction: very app

Com pany Confident ia l – For Internal Use OnlyCopyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Useful Links

• What’s New In SAS Deep Learning (Documentation)

http://go.documentation.sas.com/?docsetId=casdlpg&docsetTarget=n0gv3jjm5obouun1uvducbzl8nlf.htm&docsetVersion=8.2&locale=en

• Understanding Recurrent Neural Networks

http://karpathy.github.io/2015/05/21/rnn-effectiveness/

• RNN Simplified

https://www.youtube.com/watch?v=_aCuOwF1ZjU

sas.com

Copyright © S AS Inst i tute Inc. A l l r i ghts reserved.

Thank You