Todays Topics Remember: no discussing exam until next Tues! ok to stop by Thurs 5:45-7:15pm for HW3...

Todays Topics Remember: no discussing exam until next Tues! ok to stop by Thurs 5:45-7:15pm for HW3 help More BN Practice (from Fall 2014 CS 540 Final) BNs for Playing Nannon (http://nannon.com/)http://nannon.com/ Exploration vs. Exploitation Tradeoff Stationarity Nannon Class Tourney? Read: Sections 18.6 (skim), 18.7, & 18.9 (artificial neural networks [ANNs] and support vector machines [SVMs]) 10/27/15CS Fall 2015 (Shavlik), Lecture 17, Week 81 From Fall 2014 Final 10/27/15CS Fall 2015 (Shavlik), Lecture 17, Week 8 2 What is the probability that A and C are true but B and D are false? What is the probability that A is false, B is true, and D is true? What is the probability that C is true given A is false, B is true, and D is true? From Fall 2014 Final 10/27/15CS Fall 2015 (Shavlik), Lecture 17, Week 8 3 What is the probability that A and C are true but B and D are false? = P(A) (1 P(B)) P(C | A B) (1 - P(D | A B C)) = 0.3 (1 0.8) 0.6 (1 0.6) What is the probability that A is false, B is true, and D is true? = P(A B D) = P(A B C D) + P(A B C D) = process complete world states like first question What is the probability that C is true given A is false, B is true, and D is true? = P(C | A B D) = P(C A B D) / P(A B D) = process like first and second questions Consider the following training set, where three Boolean-valued features are used to predict a Boolean-valued output. Assume you wish to apply the Nave Bayes algorithm. Calculate the ratio below and use pseudo examples. From Fall 2014 Final 10/27/15CS Fall 2015 (Shavlik), Lecture 17, Week 8 4 Prob(Output = True | A = False, B = False, C = False) ____________________________________________ = _________________ Prob(Output = False | A = False, B = False, C = False) From Fall 2014 Final Consider the following training set, where three Boolean-valued features are used to predict a Boolean-valued output. Assume you wish to apply the Nave Bayes algorithm. Calculate the ratio below and use pseudo examples. 10/27/15CS Fall 2015 (Shavlik), Lecture 17, Week 8 5 Prob(Output = True | A = False, B = False, C = False) ____________________________________________ = _________________ Prob(Output = False | A = False, B = False, C = False) P(A | Out) P (B | Out) P (C | Out) P ( Out) (3 / 5) (2 / 5) (2 / 5) (5 / 8) = ________________________________________________ = ___________________________ P(A | Out) P (B | Out) P (C | Out) P (Out) (2 / 3) (2 / 3) (1 / 3) (3 / 8) Assume FOUR pseudo examples (ffft, tttt, ffff, tttf) The Big Picture (not a BN, but like mini-max) - provided s/w gives you set of (annotated) legal moves - if zero or one, the s/w passes or makes the only possible move 10/27/15CS Fall 2015 (Shavlik), Lecture 17, Week 8 6 Current NANNON Board Possible Next Board ONE Possible Next Board TWO Possible Next Board THREE HIT: _XO_ __X_ BREAK: _XX_ _X_X EXTEND:_X_XX __XXX CREATE: _X_X_ __XX_ Four Effects of MOVES Choose move that gives best odds of winning Reinforcement Learning (RL) vs. Supervised Learning Nannon is Really an RL Task Well Treat as a SUPERVISED ML Task All moves in winning games considered GOOD All moves in losing games considered BAD Noisy Data, but Good Play Still Results Random Move & Hand-Coded Players Provided Provided Code can make 10 6 Moves/Sec 10/27/15CS Fall 2015 (Shavlik), Lecture 17, Week 8 7 What to Compute? Multiple Possibilities (Pick only One) P(move in winning game | current board chosen move) OR P(move in winning game | next board) OR P(move in winning game | next board prev board) OR P(move in winning game | next board effect of move) OR Etc. 10/27/15CS Fall 2015 (Shavlik), Lecture 17, Week 8 8 Hit, break, extend, create, or some combo `Raw Random Variables Representing the Board 10/27/15CS Fall 2015 (Shavlik), Lecture 17, Week 8 9 # of Safe Pieces for O # of Safe Pieces for X # of Home Pieces for X # of Home Pieces for O What is on Board Cell i (X, O, or empty) Board size varies (L cells) Number of pieces each player has also varies (K pieces) Full Joint Size for Above = K (K+1) 3 L (K+1) K - for L=12 and K=5, | full joint | = 900 x 3 12 = 478,296,900 Some Possible Ways of Encoding the Move - die value - which of 12 (?) possible effect combos occurred - moved from cell i to cell j (L 2 possibilities; some not possible with 6-sided die) - how many possible moves there were (L 2 possibilities) You can also create derived features, eg, inDanger HW3: Draw a BN then Implement the Calculation for that BN (also do NB) 10/27/15CS Fall 2015 (Shavlik), Lecture 17, Week 8 10 S1S1 S2S2 S3S3 SnSn [ P(S i = value i | WIN=true) ] x P(WIN=true) [ P(S i = value i | WIN=false) ] x P(WIN=false) Odds(WIN) = Recall: Choose move that gives best odds of winning WIN Going Slightly Beyond NB 10/27/15CS Fall 2015 (Shavlik), Lecture 17, Week 8 11 S1S1 S2S2 S3S3 SnSn P(S 1 = ? | S 2 = ? WIN) [ P(S i = ? | WIN) ] x P( WIN) P(S 1 = ? | S 2 = ? WIN) [ P(S i = ? | WIN) ] x P( WIN) Odds(WIN) = WIN Here the PRODUCT is from 2 to n Going Slightly Beyond NB (Part 2) 10/27/15CS Fall 2015 (Shavlik), Lecture 17, Week 8 12 S1S1 S2S2 S3S3 SnSn P(S 1 = ? S 2 = ? | WIN) [ P(S i = ? | WIN) ] x P( WIN) P(S 1 = ? S 2 = ? | WIN) [ P(S i = ? | WIN) ] x P( WIN) Odds(WIN) = WIN Here the PRODUCT is from 3 to n Used: P(S 1 = ? S 2 = ? | WIN) = P(S 1 = ? | S 2 = ? WIN) x P(S 2 = ? | WIN) A little bit of joint probability! Some Possible Java Code Snippets for NB private static int boardSize = 6; // Default width of board. private static int pieces = 3; // Default #pieces per player. int homeX_win[] = new int[pieces + 1]; // Holds p(homeX=? | win). int homeX_lose[] = new int[pieces + 1]; // Holds p(homeX=? | !win). int safeX_win[] = new int[pieces]; // NEED TO ALSO DO FOR 0! int safeX_lose[] = new int[pieces]; // Be sure to initialize! int board_win[][] = new int[boardSize][3]; int board_lose[][] = new int[boardSize][3]; int wins = 1; // Remember m-estimates. int losses = 1; 10/27/15CS Fall 2015 (Shavlik), Lecture 17, Week 8 13 Exploration vs. Exploitation Tradeoff We are not getting iid data since the data we get depends on the moves we choose Always doing what we currently think is best (exploitation) might be a local minimum So we should try out seemingly non-optimal moves now and then (exploration), but likely to lose game Think about learning how to get from home to work - many possible routes, try various ones now and then, but most days take what has been best in past Simple soln for HW3: observe 100,000 games where two random-move choosers play each other (burn-in phase) 10/27/15CS Fall 2015 (Shavlik), Lecture 17, Week 8 14 Stationarity What About the Fact Opponent also Learns? That Changes the Probability Distributions We are Trying to Estimate! However, Well Assume that the Prob Distribution Remains Unchanged (ie, is Stationary) While We Learn 10/27/15CS Fall 2015 (Shavlik), Lecture 17, Week 8 15 Have a Class Tourney? Everyone Plays Everyone Else, Many Times across Various Combos of Board Size x #Pieces Wont Impact Course Grade Opt In or Opt Out? (Student names not shared) Exceedingly Slow, Memory Hogging, or Crashing Code Disqualified Yahoo Research Sponsored in Past but Not Appropriate Here (since most of you have jobs) 10/27/15CS Fall 2015 (Shavlik), Lecture 17, Week 8 16

Date post:	18-Jan-2018
Category:	Documents
Upload:	byron-wheeler
View:	217 times
Download:	0 times

Todays Topics Remember: no discussing exam until next Tues! ok to stop by Thurs 5:45-7:15pm for HW3...

Documents