+ All Categories
Home > Documents > College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline •...

College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline •...

Date post: 28-Mar-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
10
4/6/17 1 CS 6140: Machine Learning Spring 2017 Instructor: Lu Wang College of Computer and InformaAon Science Northeastern University Webpage: www.ccs.neu.edu/home/luwang Email: [email protected] LogisAcs Grades for A2 is out. Next week: course project presentaAon. The final report is due on 4/24. All assignments have to be in by 4/29. 4/20: final exam AddiAonal office hours: 4.17, 4-5pm, (Lu, 448 WVH) 4.18, 11am-12pm, (TA, 166 WVH) 4.19, 4-5pm, (Lu, 448 WVH) What we learned last Ame IntroducAon to Reinforcement Learning The Reinforcement Learning Problem Markov Decision Process
Transcript
Page 1: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

1

CS6140:MachineLearningSpring2017

Instructor:LuWangCollegeofComputerandInformaAonScience

NortheasternUniversityWebpage:www.ccs.neu.edu/home/luwang

Email:[email protected]

LogisAcs•  GradesforA2isout.

•  Nextweek:courseprojectpresentaAon.

•  Thefinalreportisdueon4/24.Allassignmentshavetobeinby4/29.

•  4/20:finalexam

•  AddiAonalofficehours:–  4.17,4-5pm,(Lu,448WVH)–  4.18,11am-12pm,(TA,166WVH)–  4.19,4-5pm,(Lu,448WVH)

WhatwelearnedlastAme

•  IntroducAontoReinforcementLearning•  TheReinforcementLearningProblem•  MarkovDecisionProcess

Page 2: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

2

Page 3: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

3

Page 4: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

4

Page 5: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

5

Today’sOutline

•  PlanningbyDynamicProgramming– PolicyevaluaAonandpolicyimprovement– ValueiteraAon

[SlidestakenfromDavidSilver’sreinforcementlearningcourse]

Page 6: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

6

Page 7: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

7

Page 8: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

8

Page 9: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

9

Page 10: College of Computer and Information Science - cs6140 lec11 · 2017-04-06 · Today’s Outline • Planning by Dynamic Programming ... The action-value function CIT(s, a) is the expected

4/6/17

10


Recommended