CS#109 Lecture#12 April#22th,2016 · Lecture#12 April#22th,2016. Four Prototypical Trajectories...

Post on 27-Jun-2020

6 views 0 download

transcript

CS  109Lecture  12

April  22th,  2016

Four Prototypical Trajectories

Today:1. Multi  variable  RVs

2. Expectation  with  multiple  RVs3. Independence  with  multiple  RVs

Four Prototypical Trajectories

Review

• For  two  discrete  random  variables  X and  Y,  the                        Joint  Probability  Mass  Function is:

• Marginal  distributions:

• Example:  X  =  value  of  die  D1,  Y  =  value  of  die  D2

),(),(, bYaXPbap YX ===

∑===y

YXX yapaXPap ),()()( ,

∑===x

YXY bxpbYPbp ),()()( ,

61

3616

1

6

1, ),1()1( ==== ∑∑

== yyYX ypXP

Discrete Joint Mass Function

Probability Table• States  all  possible  outcomes  with  several  discrete  variables• Often  is  not  “parametric”• If  #variables   is  >  2,  you  can  have  a  probability   table,  but  you  can’t  draw  it  on  a  slide

All  values  of  A

All  values  of  B

a

b P(A = a, B = b)

Remember “,” means “and”

Every outcome falls into a bucket

Probability Table

• Random  variables  X and  Y,  are  Jointly  Continuous if  there  exists  PDF  fX,Y(x, y) defined  over  –∞ < x, y < ∞ such  that:

∫ ∫=≤<≤<2

1

2

1

),( ) ,(P ,2121

a

a

b

bYX dxdyyxfbYbaXa

Jointly Continuous

0

y

x 900

900

Jointly Continuous

∫ ∫=≤<≤<2

1

2

1

),( ) ,(P ,2121

a

a

b

bYX dxdyyxfbYbaXa

a1

a2b2

b1

fX,Y (x, y)

x

y

Can calculate probabilities

Darts!

X-Pixel Marginal

y

xY-Pixel Marginal

X ⇠ N✓900

2,900

2

◆Y ⇠ N

✓900

3,900

5

Can calculate marginal probabilities

Four Prototypical Trajectories

Transfer  Learning

Four Prototypical Trajectories

Way  Back

Permutations

How  many  ways  are  there  to  order  n distinct  objects?

n!

Multinomial

How  many  ways  are  there  to  order  n objects such  that:n1 are  the  same  (indistinguishable)n2 are  the  same  (indistinguishable)…nr are  the  same  (indistinguishable)?

n!

n1!n2! . . . nr!

Called  the  “multinomial”  because  of  something  from  Algebra

=

✓n

n1, n2, . . . , nr

Binomial

How  many  ways  are  there  to  order  n objects such  that:r are  the  same  (indistinguishable)(n – r) are  the  same  (indistinguishable)?

Called  the  Binomial   (Multi  -­>  Bi)

How  many  ways  are  there  to  make  an  unordered  selection  of  r objects  from  n objects?

n!

r!(n� r)!=

✓n

r

• Consider  n independent  trials  of  Ber(p)  rand.  var.§ X  is  number  of  successes  in  n trials§ X  is  a  Binomial Random  Variable:    X  ~  Bin(n,  p)

nippin

ipiXP ini ,...,1,0 )1()()( =−⎟⎟⎠

⎞⎜⎜⎝

⎛=== −

Binomial Distribution

Probability of exactly isuccesses

Binomial # ways of ordering the

successes

Probability of each ordering of i

successes is equal + mutually exclusive

Four Prototypical Trajectories

End  Review

• Multinomial  distribution§ n independent   trials  of  experiment  performed§ Each  trial  results  in  one  of  m outcomes,  with                respective  probabilities:  p1,  p2,  …, pm where

§ Xi =  number  of  trials  with  outcome  i

where                                    and

∑=

=m

iip

11

mcm

cc

mmm ppp

cccn

cXcXcXP ...,...,,

),...,,( 2121

212211 ⎟⎟

⎞⎜⎜⎝

⎛====

ncm

ii =∑

=1!!!

!,...,, 2121 mm ccc

nccc

n⋅⋅⋅

=⎟⎟⎠

⎞⎜⎜⎝

Welcome Back the Multinomial

Joint distribution Multinomial # ways of ordering the successes

Probabilities of each ordering are equal and

mutually exclusive

• 6-­sided  die  is  rolled  7  times§ Roll  results:  1  one,  1  two,  0  three,  2  four,  0  five,  3  six

• This  is  generalization  of  Binomial  distribution§ Binomial:  each  trial  had  2  possible  outcomes§ Multinomial:  each  trial  has  m possible  outcomes

7302011654321

61420

61

61

61

61

61

61

!3!0!2!0!1!1!7

)3,0,2,0,1,1(

⎟⎠

⎞⎜⎝

⎛=⎟⎠

⎞⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛=

====== XXXXXXP

Hello Die Rolls, My Old Friends

• Ignoring  order  of  words,  what  is  probability  of  any  given  word  you  write  in  English?§ P(word  =  “the”)  >  P(word  =  “transatlantic”)§ P(word  =  “Stanford”)  >  P(word  =  “Cal”)§ Probability  of  each  word  is  just  multinomial  distribution

• What  about  probability  of  those  same  words  in  someone  else’s  writing?§ P(word  =  “probability”   |  writer  =  you)  >

P(word  =  “probability”   |  writer  =  non-­CS109   student)§ After  estimating  P(word  |  writer)  from  known  writings,  use  Bayes’  Theorem  to  determine  P(writer  |  word)  for  new  writings!

Probabilistic Text Analysis

Example  document:“Pay  for  Viagra  with  a  credit-­card.  Viagra  is  great.  So  are  credit-­cards.  Risk  free  Viagra.  Click  for  free.”n =  18  

Text is a Multinomial

Viagra = 2Free = 2Risk = 1Credit-card: 2…For = 2

P

✓1

2|spam

◆=

n!

2!2! . . . 2!p2viagra

p2free

. . . p2for

P

✓1

2|spam

◆=

n!

2!2! . . . 2!p2viagra

p2free

. . . p2for

P

✓1

2|spam

◆=

n!

2!2! . . . 2!p2viagra

p2free

. . . p2for

Probability of seeing this document | spam

It’s a Multinomial!

The probability of a word in spam email being viagra

• Authorship  of  “Federalist  Papers”§ 85  essays  advocating  ratification  of  US  constitution

§ Written  under  pseudonym  “Publius”o Really,  Alexander  Hamilton,  James  Madison  and  John  Jay

§ Who  wrote  which  essays?o Analyzed  probability  of  words  in  each  essay  versus  word  distributions  from  known  writings  of  three  authors

• Filtering  Spam§ P(word  =  “Viagra”   |  writer  =  you)  <<  P(word  =  “Viagra”  |  writer  =  spammer)

Old and New Analysis

Four Prototypical Trajectories

Expectation  with  Multiple  Variables?

Joint Expectation

E[g(X,Y )] =X

x,y

g(x, y)p(x, y)

E[g(X)] =X

x

g(x)p(x)

E[X] =X

x

xp(x)

• Expectation  over  a  joint  isn’t  nicely  defined  because   it  is  not  clear  how  to  compose  the  multiple  variables:• Add  them?  Multiply  them?

• Lemma:  For  a  function  g(X,Y)  we  can  calculate  the  expectation  of  that  function:

• By  the  way,  this  also  holds  for  single  random  variables:

E[X  +  Y]  =  E[X]  +  E[Y]

Generalized:

Holds  regardless  of  dependency  between  Xi’s

∑∑==

=⎥⎦

⎤⎢⎣

⎡ n

ii

n

ii XEXE

11][

Expected Values of Sums

E[X + Y ] = E[g(X,Y )] =X

x,y

g(x, y)p(x, y)

=X

x,y

[x+ y]p(x, y)

=X

x,y

xp(x, y) +X

x,y

yp(x, y)

=X

x

x

X

y

p(x, y) +X

y

y

X

x

p(x, y)

=X

x

xp(x) +X

y

yp(y)

= E[X] + E[Y ]

Skeptical Chris Wants a Proof!Let g(X,Y) = [X + Y]

E[X + Y ] = E[g(X,Y )] =X

x,y

g(x, y)p(x, y)

=X

x,y

[x+ y]p(x, y)

=X

x,y

xp(x, y) +X

x,y

yp(x, y)

=X

x

x

X

y

p(x, y) +X

y

y

X

x

p(x, y)

=X

x

xp(x) +X

y

yp(y)

= E[X] + E[Y ]

E[X + Y ] = E[g(X,Y )] =X

x,y

g(x, y)p(x, y)

=X

x,y

[x+ y]p(x, y)

=X

x,y

xp(x, y) +X

x,y

yp(x, y)

=X

x

x

X

y

p(x, y) +X

y

y

X

x

p(x, y)

=X

x

xp(x) +X

y

yp(y)

= E[X] + E[Y ]

E[X + Y ] = E[g(X,Y )] =X

x,y

g(x, y)p(x, y)

=X

x,y

[x+ y]p(x, y)

=X

x,y

xp(x, y) +X

x,y

yp(x, y)

=X

x

x

X

y

p(x, y) +X

y

y

X

x

p(x, y)

=X

x

xp(x) +X

y

yp(y)

= E[X] + E[Y ]

E[X + Y ] = E[g(X,Y )] =X

x,y

g(x, y)p(x, y)

=X

x,y

[x+ y]p(x, y)

=X

x,y

xp(x, y) +X

x,y

yp(x, y)

=X

x

x

X

y

p(x, y) +X

y

y

X

x

p(x, y)

=X

x

xp(x) +X

y

yp(y)

= E[X] + E[Y ]

E[X + Y ] = E[g(X,Y )] =X

x,y

g(x, y)p(x, y)

=X

x,y

[x+ y]p(x, y)

=X

x,y

xp(x, y) +X

x,y

yp(x, y)

=X

x

x

X

y

p(x, y) +X

y

y

X

x

p(x, y)

=X

x

xp(x) +X

y

yp(y)

= E[X] + E[Y ]

By the definition of g(x,y)

What a useful lemma

Break that sum into parts!

Change the sum of (x,y) into

separate sums

That is the definition of marginal probability

That is the definition of expectation

Four Prototypical Trajectories

Independence  and  Random  Variables

• Two  discrete  random  variables  X  and  Y  are  called  independent if:

• Intuitively:  knowing  the  value  of  X  tells  us  nothing  about  the  distribution  of  Y  (and  vice  versa)§ If  two  variables  are  not independent,   they  are  called  dependent

• Similar  conceptually  to  independent  events,  but  we  are  dealing  with  multiple  variables§ Keep  your  events  and  variables  distinct  (and  clear)!

yxypxpyxp YX , allfor )()(),( =

Independent Discrete Variables

• Flip  coin  with  probability  p of  “heads”§ Flip  coin  a  total  of  n +  m times§ Let  X  =  number  of  heads  in  first  n flips§ Let  Y  =  number  of  heads  in  next  m flips

§ X  and  Y  are  independent§ Let  Z  =  number  of  total  heads  in  n +  m flips§ Are  X  and  Z  independent?

o What  if  you  are  told  Z  =  0?

ymyxnx ppym

ppxn

yYxXP −− −⎟⎟⎠

⎞⎜⎜⎝

⎛−⎟⎟

⎞⎜⎜⎝

⎛=== )1()1(),(

)()( yYPxXP ===

Coin Flips

• Let  N  =  #  of  requests  to  web  server/day§ Suppose  N  ~  Poi(λ)§ Each  request  comes  from  a  human  (probability   =  p)  or  from  a  “bot”  (probability  =  (1  – p)),  independently

§ X  =  #  requests  from  humans/day (X  |  N)  ~  Bin(N,  p)§ Y  =  #  requests  from  bots/day   (Y  |  N)  ~  Bin(N,  1  -­ p)

)()|,()()|,(),(jiYXPjiYXjYiXPjiYXPjiYXjYiXPjYiXP

+≠++≠+==++=++=+=====

Web Server Requests

Probability of i human requests and j bot

requestsProbability of number of

requests in a day was i + j

Probability of i human requests and j bot requests |

we got i + j requests

• Let  N  =  #  of  requests  to  web  server/day§ Suppose  N  ~  Poi(λ)§ Each  request  comes  from  a  human  (probability   =  p)  or  from  a  “bot”  (probability  =  (1  – p)),  independently

§ X  =  #  requests  from  humans/day (X  |  N)  ~  Bin(N,  p)§ Y  =  #  requests  from  bots/day   (Y  |  N)  ~  Bin(N,  1  -­ p)

§ Note:

)()|,()()|,(),(jiYXPjiYXjYiXPjiYXPjiYXjYiXPjYiXP

+≠++≠+==++=++=+=====

Web Server Requests

0)|,( =+≠+== jiYXjYiXP

You got i human requests and j bot requests

You did not get i + jrequests

• Let  N  =  #  of  requests  to  web  server/day§ Suppose  N  ~  Poi(λ)§ Each  request  comes  from  a  human  (probability   =  p)  or  from  a  “bot”  (probability  =  (1  – p)),  independently

§ X  =  #  requests  from  humans/day (X  |  N)  ~  Bin(N,  p)§ Y  =  #  requests  from  bots/day   (Y  |  N)  ~  Bin(N,  1  -­ p)

)()|,()()|,(),(jiYXPjiYXjYiXPjiYXPjiYXjYiXPjYiXP

+≠++≠+==++=++=+=====

Web Server Requests

• Let  N  =  #  of  requests  to  web  server/day§ Suppose  N  ~  Poi(λ)§ Each  request  comes  from  a  human  (probability   =  p)  or  from  a  “bot”  (probability  =  (1  – p)),  independently

§ X  =  #  requests  from  humans/day (X  |  N)  ~  Bin(N,  p)§ Y  =  #  requests  from  bots/day   (Y  |  N)  ~  Bin(N,  1  -­ p)

)()|,()()|,(),(jiYXPjiYXjYiXPjiYXPjiYXjYiXPjYiXP

+≠++≠+==++=++=+=====

Web Server Requests

)!()( jiji

ejiYXP +

+−=+=+ λλ

ji ppjiYXjYiXPiji )1()|,( −⎟⎠⎞

⎜⎝⎛=+=+==+

)!()1(),( jiiji jiji eppjYiXP +

+ +−−⎟⎠⎞

⎜⎝⎛=== λλ

• Let  N  =  #  of  requests  to  web  server/day§ Suppose  N  ~  Poi(λ)§ Each  request  comes  from  a  human  (probability   =  p)  or  from  a  “bot”  (probability  =  (1  – p)),  independently

§ X  =  #  requests  from  humans/day (X  |  N)  ~  Bin(N,  p)§ Y  =  #  requests  from  bots/day   (Y  |  N)  ~  Bin(N,  1  -­ p)

§ Where  X  ~  Poi(λp)  and  Y  ~  Poi(λ(1  – p))§ X  and  Y  are  independent!

Web Server Requests

)!(! !)!( )1(),( jijiji jiji eppjYiXP +

+ +−−=== λλ

!))1((

!)(

jp

ip ji

e −⋅−= λλλ

!))1((

!)( )1(

jp

ip jpip ee −

⋅ −−−= λλ λλ )()( jYPiXP ===

• Two  continuous  random  variables  X  and  Y  are  called  independent if:P(X  ≤ a,  Y  ≤ b)  =  P(X  ≤ a)  P(Y  ≤ b)    for  any  a,  b

• Equivalently:

• More  generally,  joint  density  factors  separately:

babFaFbaF YXYX , allfor )()(),(, =babfafbaf YXYX , allfor )()(),(, =

∞<<∞−= yxygxhyxf YX , where)()(),(,

Independent Continuous Variables

• Consider  joint  density  function  of  X  and  Y:

§ Are  X  and  Y  independent?

• Consider  joint  density  function  of  X  and  Y:

§ Are  X  and  Y  independent?

§ Now  add  constraint  that:  0 < (x + y) < 1§ Are  X  and  Y  independent?

o Cannot  capture  constraint  on  x  +  y  in  factorization!

∞<<= −− yxeeyxf yxYX ,0for 6),( 23,

)()(),( ,2)( 3)( ,23 ygxhyxfeygexh YXyx === −−  soandLet

1,0for 4),(, <<= yxxyyxf YX

)()(),( ,2)( 2)( , ygxhyxfyygxxh YX ===  soandLet

Yes!

Yes!

No!

Pop Quiz (just kidding)

• Two  people  set  up  a  meeting  for  12pm§ Each  arrives  independently   at  time  uniformly  distributed  between  12pm  and  12:30pm

§ X  =  #  min.  past  12pm  person  1  arrives      X  ~  Uni(0,  30)§ Y  =  #  min.  past  12pm  person  2  arrives      Y  ~  Uni(0,  30)§ What  is  P(first  to  arrive  waits  >  10  min.  for  other)?

symmetryby   )10(2)10()10( YXPXYPYXP <+=<++<+

∫∫∫∫<+<+

==<+yx

YXyx

dxdyyfxfdxdyyxfYXP1010

)()(2),(2)10(2

∫∫∫ ∫∫ ∫===

==

=

−=⎟⎠

⎞⎜⎝

⎛=⎟

⎟⎠

⎞⎜⎜⎝

⎛=⎟

⎞⎜⎝

⎛= −30

102

30

102

30

10

10

02

30

10

10

0

2

)10(302

302

302

3012

010

yyy

y

xy

y

x

dyydyxdydxdxdy y

94100

210300

230

30210

2302 22

2

2

2 1030 =⎥

⎤⎢⎣

⎡⎟⎟⎠

⎞⎜⎜⎝

⎛−−⎟⎟

⎞⎜⎜⎝

⎛−=⎟⎟

⎞⎜⎜⎝

⎛−= yy

Dating at Stanford

• n random  variables  X1,  X2,  …,  Xn are  called    independent if:

• Analogously,  for  continuous  random  variables:

nii

n

inn aaaaXPaXaXaXP ,...,, of subsets allfor )(),...,,( 21

12211 ≤=≤≤≤ ∏

=

nii

n

inn xxxxXPxXxXxXP ,...,, of subsets allfor )(),...,,( 21

12211 ===== ∏

=

Independence of Multiple Variables

• If  random  variables  X  and  Y  independent,  then§ X  independent   of  Y,  and  Y  independent   of  X

• Duh!?    Duh,  indeed...§ Let  X1,  X2,  ...  be  a  sequence  of  independent   and  identically  distributed  (I.I.D.)  continuous  random  vars

§ Say  Xn >  Xi for  all  i =  1,...,  n  -­ 1    (i.e.  Xn =  max(X1,  ...  ,Xn))o Call  Xn a  “record  value”

§ Let  event  Ai indicate  Xi is  “record  value”o Is  An+1 independent  of  An?o Is  An independent  of  An+1?o Easier  to  answer:  Yes!o By  symmetry,  P(An)  =  1/n    and  P(An+1)  =  1/(n+1)o P(An An+1)  =  (1/n)(1/(n+1))  =  P(An)P(An+1)

Independence is Symmetric

Four Prototypical Trajectories

Earth  Day

Choosing a Random Subset• From  set  of  n elements,  choose  a  subset  of  size  ksuch  that  all              possibilities  are  equally likely§ Only  have  random(),  which  simulates  X  ~  Uni(0,  1)

• Brute  force:§ Generate  (an  ordering  of)  all  subsets  of  size  k§ Randomly  pick  one  (divide  (0,  1)  into            intervals)§ Expensive  with  regard  to  time  and  space§ Bad  times!

⎟⎟⎠

⎞⎜⎜⎝

kn

⎟⎟⎠

⎞⎜⎜⎝

kn

(Happily) Choosing a Random Subset• Good  times:

int indicator(double p) {if (random() < p) return 1; else return 0;

}

// array I[] indexed from 1 to nsubset rSubset(k, set of size n) {

subset_size = 0;I[1] = indicator((double)k/n);for(i = 1; i < n; i++) {

subset_size += I[i];I[i+1] = indicator((k – subset_size)/(n – i));

}return (subset containing element[i] iff I[i] == 1);

}

niiIIiIPnkIP in

jIki

j <<==+== −

∑−= 1 ])[],...,1[|1]1[( )1]1[(

][1 whereand

Random Subsets the Happy Way• Proof  (Induction  on  (k  +  n)):    (i.e.,  why  this  algorithm  works)

§ Base  Case:  k  =  1,  n  =  1,  Set  S  =  {a},  rSubset returns  {a}  with  p=  

§ Inductive  Hypoth.  (IH):  for  k  +  x  ≤ c,  Given  set  S,  |S|  =  x  and  k  ≤ x,rSubset returns  any  subset  S’  of  S,  where  |S’|  =  k,  with  p  =  

§ Inductive  Case  1:  (where  k  +  n ≤ c  +  1)   |S|  =  n  (=  x  +  1),  I[1] = 1o Elem  1  in  subset,  choose  k  – 1  elems  from  remaining  n  – 1o By  IH:  rSubset returns  subset  S’  of  size  k  – 1  with  p  =

o P(I[1]  =  1,  subset  S’)  =  

§ Inductive  Case  2:  (where  k  +  n ≤ c  +  1)    |S|  =  n  (=  x  +  1),  I[1] = 0o Elem  1  not  in  subset,  choose  k  elems  from  remaining  n  – 1

o By  IH:  rSubset returns  subset  S’  of  size  k  with  p  =

o P(I[1]  =  0,  subset  S’)  =  

⎟⎟⎠

⎞⎜⎜⎝

kx

1

⎟⎟⎠

⎞⎜⎜⎝

11

1kn

⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟

⎞⎜⎜⎝

−⋅

kn

kn

nk 1

11

1

⎟⎟⎠

⎞⎜⎜⎝

⎛ −

kn 1

1

⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟

⎞⎜⎜⎝

⎛ −⋅⎟⎠

⎞⎜⎝

⎛ −=⎟⎟⎠

⎞⎜⎜⎝

⎛ −⋅⎟⎠

⎞⎜⎝

⎛ −kn

kn

nkn

kn

nk 1

11

111

⎟⎟⎠

⎞⎜⎜⎝

11

1

Sum of Independent Binomial RVs• Let  X  and  Y  be  independent  random  variables

§ X  ~  Bin(n1,  p)  and    Y  ~  Bin(n2,  p)  § X  +  Y  ~  Bin(n1 +  n2,  p)

• Intuition:§ X  has  n1 trials  and  Y  has  n2 trials

o Each  trial  has  same  “success”  probability  p

§ Define  Z  to  be  n1 +  n2  trials,  each  with  success  prob.  p§ Z  ~  Bin(n1 +  n2,  p),  and  also  Z  =  X  +  Y

• More  generally: Xi ~  Bin(ni,  p)  for  1  ≤ i ≤ N⎟⎠

⎞⎜⎝

⎛⎟⎠

⎞⎜⎝

⎛∑∑==

pnXN

ii

N

ii ,Bin~

11

Sum of Independent Poisson RVs• Let  X  and  Y  be  independent  random  variables

§ X  ~  Poi(λ1)    and    Y  ~  Poi(λ2)

§ X  +  Y  ~  Poi(λ1 +  λ2)  

• Proof:  (just  for  reference)§ Rewrite  (X  +  Y  =  n)  as  (X  =  k,  Y  =  n – k)  where  0  ≤ k ≤ n

§ Noting  Binomial   theorem:  

§ so,  X  +  Y  =  n  ~  Poi(λ1 +  λ2)

∑∑==

−===−====+n

k

n

kknYPkXPknYkXPnYXP

00

)()(),()(

∑∑∑=

−+−

=

−+−

=

−−−

−=

−=

−=

n

k

knkn

k

knkn

k

knk

knkn

ne

knke

kne

ke

021

)(

0

21)(

0

21

)!(!!

!)!(!)!(!

212121 λλ

λλλλ λλλλλλ

( )nn

enYXP 21

)(

!)(

21

λλλλ

+==++−

∑=

−=+

n

k

knkn

knkn

02121 )!(!

!)( λλλλ