+ All Categories
Transcript
Page 1: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

1

CS6140:MachineLearningSpring2017

Instructor:LuWangCollegeofComputerandInformaAonScience

NortheasternUniversityWebpage:www.ccs.neu.edu/home/luwang

Email:[email protected]

LogisAcs•  GradesforA2isout.

•  Nextweek:courseprojectpresentaAon.

•  Thefinalreportisdueon4/24.Allassignmentshavetobeinby4/29.

•  4/20:finalexam

•  AddiAonalofficehours:–  4.17,4-5pm,(Lu,448WVH)–  4.18,11am-12pm,(TA,166WVH)–  4.19,4-5pm,(Lu,448WVH)

WhatwelearnedlastAme

•  IntroducAontoReinforcementLearning•  TheReinforcementLearningProblem•  MarkovDecisionProcess

Page 2: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

2

Page 3: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

3

Page 4: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

4

Page 5: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

5

Today’sOutline

•  PlanningbyDynamicProgramming– PolicyevaluaAonandpolicyimprovement– ValueiteraAon

[SlidestakenfromDavidSilver’sreinforcementlearningcourse]

Page 6: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

6

Page 7: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

7

Page 8: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

8

Page 9: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

9

Page 10: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

10


Top Related