Home >Documents >Progress Report Reihaneh Rabbany Presented for NLP Group Computing Science Department University of...

Progress Report Reihaneh Rabbany Presented for NLP Group Computing Science Department University of...

Date post:15-Jan-2016
Category:
View:215 times
Download:1 times
Share this document with a friend
Transcript:
  • Progress ReportReihaneh Rabbany

    Presented for NLP GroupComputing Science DepartmentUniversity of AlbertaApril 2009

  • AgendaProject Proposal for Guiding Agent by SpeechMany to Many Alignment by Bayesian NetworksLetter to Phoneme AlignmentEvaluation of phylogenetic trees*

  • Quick RL overviewAn agent interacting with environmentperceives state performs actions receive rewardsAgentComputes the value of each action in each state long term reward obtainable from this state by performing this action

    Performs action selection by choosing the best action or sometimes a random actionexploration-exploitation

    *

  • Project Proposal for Guiding Agent by SpeechAccelerate learning using speechThe emotion in speech signal has considerable amount of side information Happiness or anger of a speech signal can provide a shaping reinforcement signalDeveloping tools and methods to extract emotion from speech and designing a methodology to use it as a shaping signal *

  • Approaches to use speech signal as a guide for learningExtracting prosodic features from speechAssociating meaning to these featuresSupervised learning-based approachdata-set of (prosodic features, emotion) pairsexcited, happy, upset, sad, boredAssigns a reward to the recognized emotionPure RL approach inspired by the learning process of the parent-infantThe infant gradually learns to associate value to perceived speech and how to use it to guide her exploration of the world

    *

  • RL ApproachTwo ways for developing this ideaAugmenting the observation space to include the prosodic featuresEmotion will become state-dependentLearns a separate instructor module Estimates the value of prosodic featuresInstructions (learnt instructor values) would affect the agent's action selection

    *

  • InstructionsDifferent ways that these instructions (learnt instructor values) could affect the agent's action selectionBalancing the exploration-exploitationWhen the speaker is not happy with what the agent is doing and it should explore other actionsUse it directly in action selection by some weightsMotivates the agent to keep its previous action if the instructor is satisfied with its current action

    Use it as a shaping reward to define a new reward function by adding it to the actual reward received from the environment

    *

  • AgendaProject Proposal for Guiding Agent by SpeechMany to Many Alignment by Bayesian NetworksLetter to Phoneme AlignmentEvaluation of phylogenetic trees*

  • Many to Many Alignment by Bayesian Networks

    Finding Alignment between two sequences Assuming the order is preserved

    Ive applied it into two applicationsLetter to phoneme alignment Aligning for a given dictionaryEvaluating Phylogenetic treesShows how compatible the tree is with the given taxonomy

    *

  • AgendaProject Proposal for Guiding Agent by SpeechMany to Many Alignment by Bayesian NetworksLetter to Phoneme AlignmentPhylogenetic trees evaluation*

  • ModelWord: sequence of lettersPronunciation: sequence of phonemes Alignment: sequence of subalignments

    Problem: Finding the most probable alignment

    Assumption: sub alignments are independent

    *

  • Many-to-Many EM 1. Initialize prob(SubAlignmnets)// Expectation Step2. For each word in training_set 2.1. Produce all possible alignments 2.2. Choose the most probable alignment// Maximization Step3. For all subalignments 3.1. Compute new_p(SubAlignmnets)

    *

  • Dynamic Bayesian NetworkModel

    Subaligments : hidden variablesLearn DBN by EM

    liPiai*

  • Context Dependent DBNContext independency assumptionMakes the model simplerIt is not always a correct assumptionExample: P() in Chat and Hat

    Model

    *

  • AgendaProject Proposal for Guiding Agent by SpeechMany to Many Alignment by Bayesian NetworksLetter to Phoneme AlignmentEvaluation of phylogenetic trees*

  • Evaluation of Phylogenetic TreesPhylogenetic TreesShow the evolution of species

    TaxonomyCaninae; True dogs; Canis; Coyote Caninae; True foxes; Vulpes; Kit FoxCaninae; True foxes; Vulpes; Fennec FoxCaninae; Basal Caninae; Otocyon ; Bat-eared Fox ...

    *

  • Tree Evaluation Labeling the inner nodes in the treeFor each species A path in the tree sequence of inner node labelsA taxonomy description taxonomy sequence There should be a many to many alignment between these two sequences*

  • Tree Evaluation (Cont.)Finding alignment between these sequences for all the speciesFinding the most probable alignmentsMeasuring the mean probability of these alignment How probable is this tree given this taxonomy *

  • Taxonomy and Trees

    Aligned result

    *

  • Discussion*

    The emotion in speech signal has considerable amount of side information that can be used to accelerate learning. Happiness or anger of a speech signalcan provide a shaping reinforcement signal. The goal of this research project is to develop tools and methods to extract emotion from speech and to designa rigorous methodology to use it as a shaping signal. A potential benchmark for these ideas is the Critterbot. By providing speech-based shaping signal, onemay hope that the robot performs better.I present several approaches to use speech signal as a guide for learning. All of them use automatic algorithms to extract prosodic features from speechsignal. These prosodic features define a finite alphabet sequence for each utterance, which is further used to extract the emotional state of speaker. Theseapproaches differ in the way they deal with these prosodic features and associate `meaning' to it. The first approach is a supervised learning-based approach. It assumes that we have access to a labeled data-set in the form of (prosodic features, emotion) pairs for training a supervised learning module. The example of the emotions labels are excited, happy, upset, sad, bored, etc. There is also another module, which is designed a priori, that assigns a reward to the recognized emotion [1]. Disadvantages of this approach are that it needs labeled data sets and also the assigned labels might not be the best abstraction of emotion for the task at hand.

    The alternative is inspired by the learning process of the parent-infant. The newborn infant does not know the meaning of the speech. However, the infantgradually learns to associate value to perceived speech and in later stages uses it to guide her exploration of the world. In other words, first agent do not respondto the sound guides as it doesn't know their meaning but after receiving actual reward from the environment, it would start following sounds that are leadingtoward the goal.

    One advantage of this process is that it is completely unsupervised and there is no need for labeled data. The other advantage is that one does not need tospecify the value of the speech signal, but let the agent discover it by itself.One can think of at least two ways for developing this idea. The first method is augmenting the observation space to include the prosodic features. Then theagent estimates this joint action-state value function by interacting with the environment. The pitfall of this method is that the value of emotion will becomestate-dependent which is not really the case. As an example, a motivating sound always should be interpreted motivating independent of the state of theenvironment, e.g. Critterbot being in the corner of the room or in the middle of the room.

    The second method, which will be the focus of my research project, learns a separate instructor module (say I) apart from the action-value function. Theinstructor module estimates the value of prosodic features (say I(e)) by any reinforcement learning method such as TD (Similar to actor and critic that we havedifferent set of parameters for each of them).

    There are different ways that these instructions (learnt instructor values) could affect the agent's action selection. One can consider instructions as function I((e)) and use it for balancing the exploration-exploitation (e.g. using Boltzmann action selection mechanism). In a way that lower I((e)) makes agent to explore more (This is when the speaker is not happy with what the agent is doing and it should explore other actions).

    The second way is to consider instructions as I(e, a, a) and use the following expression: at = argmaxa c Q(s, a) + (1 - c ) I(s, at-1 , a) This motivates the agent to keep its previous action if the instructor is satisfied with its current action. This might be just applicable for environments where the effect of action a in state s is similar to its effect in state s != s, which is to some extend valid for the Critterbot.

    Finally, I(e) can be used as a shaping reward to define a new reward function by adding it to the actual reward received from the environment, i.e. r = r + I(e). The parameter could be adjusted according to how confident the agent is about the value of I(e) (for example proportional to the number of times e guided agent to the goal).

    These instructor parameters might be initialized by some prior knowledge about meaning of sounds. For example shouts (loud intensity) are not good ones;parameters related to features with loud intensity could be initialized with lower values; this could give a guide to agents within first episodes and speed up thelearning.sub alignments are independent of each other and therefore we can compute probability of an alignment by multiplying probabilities of its subalignmnets.

    Same positioneach sub alignment is just dependent on its letters and phonemes (we call this feature context independence)Taxonomy is a particular scientific classification organised by subtype-supertype relationships.

Popular Tags:

Click here to load reader

Embed Size (px)
Recommended