+ All Categories
Home > Documents > Lecture 8 Probabilities and distributions

Lecture 8 Probabilities and distributions

Date post: 08-Feb-2016
Category:
Upload: jill
View: 21 times
Download: 2 times
Share this document with a friend
Description:
Lecture 8 Probabilities and distributions. Probability is the quotient of the number of desired events k through the total number of events n. If it is impossible to count k and n we might apply the stochastic definition of probability . - PowerPoint PPT Presentation
Popular Tags:
26
Lecture 8 Probabilities and distributions n k k p ) ( Probability is the quotient of the number of desired events k through the total number of events n. If it is impossible to count k and n we might apply the stochastic definition of probability. The probability of an event j is approximately the frequency of j during n observations. ( ) 1 1 () n k k p k pk n n () ( )1 pk p k 0 1 p
Transcript
Page 1: Lecture 8 Probabilities  and  distributions

Lecture 8Probabilities and distributions

nkkp )( ( ) 1 1 ( )n k kp k p k

n n

( ) ( ) 1p k p k

0 1p

Probability is the quotient of the number of desired events k through the total number of events n.

If it is impossible to count k and n we might apply the stochastic definition of probability. The probability of an event j is approximately the frequency of j during n observations.

Page 2: Lecture 8 Probabilities  and  distributions

What is the probability to win in Duży Lotek?

139838161

)!649(!6!491

p

The number of desired events is 1. The number of possible events comes from the

number of combinations of 6 numbers out of 49.

)!649(!6!4949

6 C

We need the number of combinations of k events out of a total of N events

)!(!!kNk

NkN

C nk

1001!0

Bernoulli distribution

knn

kn

n1

0

Page 3: Lecture 8 Probabilities  and  distributions

What is the probability to win in Duży Lotek?1 1 1 1 0.0000649 49 49 496 5 4 3

p

Wrong!

Hypergeometric distribution

A B C D E F G H I1 N 49 49 49 492 K 6 =+KOMBINACJE(B1;B2) 6 =+KOMBINACJE(D1;D2) 6 =+KOMBINACJE(F1;F2) 6 =+KOMBINACJE(H1;H2)3 n 6 =+KOMBINACJE(B2;B4) 6 =+KOMBINACJE(D2;D4) 6 =+KOMBINACJE(F2;F4) 6 =+KOMBINACJE(H2;H4)4 k 3 =+KOMBINACJE(B1-B2;B3-B4) 4 =+KOMBINACJE(D1-D2;D3-D4) 5 =+KOMBINACJE(F1-F2;F3-F4) 6 =+KOMBINACJE(H1-H2;H3-H4)5Combinations =+C2/(C3*C4) =+E2/(E3*E4) =+G2/(G3*G4) =+I2/(I3*I4)6 Probability =1/C5 =1/E5 =1/G5 =1/I57 Sum =+SUMA(C6:I6)

P = 0.0186

knKN

nK

nN

C Knkn,,

nNknKN

nK

p Knkn,,

64966649

66

64956649

56

64946649

46

64936649

36

,,Knknp

N

K=n+kn

We need the probability that of a sample of K elements out of a sample universe of N exactly n have a desired probability and k not.

Page 4: Lecture 8 Probabilities  and  distributions

Assessing the number of infected personsAssessing total population size

Capture – recapture methods

The frequency of marked animals should equal the frequency wothin the total population Assumption:

Closed populationRandom catchesRandom dispersalMarked animals do not differ in behaviour

resample

resampletotaltotal

resample

resample

mnm

NNm

nm

42176 N Nreal = 38

We take a sample of animals/plants and mark them

We take a second sample and count the number of

marked individuals

Page 5: Lecture 8 Probabilities  and  distributions

The two sample case

common

common

nmmN

Nm

mn 211

2

You take two samples and count the number of infected persons in the first sample m1, in the second sample m2 and the number of infected persons noted in both samples k.

12143 N

How many persons have a certain infectuous desease?

Page 6: Lecture 8 Probabilities  and  distributions

m species l species k species

In ecology we often have the problem to compare the species composition of two habitats. The species overlap is measured by the Soerensen distance metric.

lmkS

2

We do not know whether S is large or small.

To assess the expectation we construct a null model.Both habitats contain species of a common species pool. If the pool size n is known we can estimate how many joint species k contain two random samples of size m and l out of n.

n species Common species pool

Habitat A Habitat B

K

n n k n mk m k l k

pn nm l

nmlk

lk

nm

The expected number of joint species.Mathematical expectation

The probability to get exactly k joint species.Probability distribution.

Page 7: Lecture 8 Probabilities  and  distributions

0

0.1

0.2

0.3

0 3 6 9 12 15Species in common

Pro

babi

lity A

0

0.1

0.2

0.3

0 3 6 9 12 15Species in common

Pro

babi

lity B

0

0.1

0.2

0.3

0 3 6 9 12 15Species in common

Pro

babi

lity C

0

0.1

0.2

0.3

0 3 6 9 12 15Species in common

Pro

babi

lity D

Ground beetle species of two poplar plantations and two adjacent wheet fields near Torun (Ulrich et al. 2004, Annales Zool. Fenn.)

Pool size 90 to 110 species.

There are much more species in common than expected just by chance.The ecological interpretation is that ground beetles colonize fields and adjacent

seminatural habitats in a similar manner. Ground beetles do not colonize according to ecological requirements (niches) but

according to spatial neighborhood.

K

n n k n mk m k l k

pn nm l

Page 8: Lecture 8 Probabilities  and  distributions

First steps in statistics

Experimental orobservational study

TheoryEnvisioned

methodof analysis

Motivation

Data analysis

What is interesting? Why is it interesting? Cui bono?

Page 9: Lecture 8 Probabilities  and  distributions

Literature

Planning

Data

Analysis

Interpretation

Defining the problemIdentifying the state of art

Formulating specific hypothesis to be tested

Study design, power analysis, choosing the analytical methods,

design of the data base,

Observations, experimentsMeta analysis

Statistical analysis, modelling

Comparing with current theory

PublicationScientific writing,

expertise

How to perform a biological study

Theory

Page 10: Lecture 8 Probabilities  and  distributions

Preparing the experimental or data collecting phase

• Let’s look a bit closer to data collecting. Before you start any data collecting you have to have a clear vision of what you want to do with these data. Hence you have to answer some important questions

• For what purpose do I collect data?• Did I read the relevant literature?• Have similar data already been collected by others?• Is the experimental or observational design appropriate for the statistical data

analytical tests I want to apply?• Are the data representative?• How many data do I need for the statistical data analytical tests I want to apply? • Does the data structure fit into the hypothesis I want to test?• Can I compare my data and results with other work?• How large are the errors in measuring? Do theses errors prevent clear final results?• How large might the errors be for the data being still meaningful?

Page 11: Lecture 8 Probabilities  and  distributions

How to lie with statistics

PO33%

PIS19%

LiD10%

Samoobrona10%

Unknown28%

Page 12: Lecture 8 Probabilities  and  distributions

Single sample notrepresentative

PS

Single sample too small

P

S

Multiple samples notrepresentative

PS

SS

S

S

S S

S

Multiple represenativesamples

PSS

S

S

S

S

S

S

Representative sampling

Page 13: Lecture 8 Probabilities  and  distributions

1

10

100

1000

10000

0.2 0.8 3.2 12.8 51.2

Body length class [mm]

Num

ber o

f species

0

500

1000

1500

2000

2500

3000

3500

4000

4500

0.2 0.8 3.2 12.8 51.2

Body length class [mm]

Num

ber o

f species

1

10

100

1000

10000

0.2 0.8 3.2 12.8 51.2

Body length class [mm]Num

ber o

f species

0

500

1000

1500

2000

2500

3000

3500

4000

4500

0.2 0.8 3.2 12.8 51.2

Body length class [mm]

Num

ber o

f species

Page 14: Lecture 8 Probabilities  and  distributions

0

10

20

30

40

50

1 2 3 4 5 6 7 8 9 10 11 12

Classes

Eve

nts

0

5

10

15

20

25

1 3 5 7 9 11Classes

Eve

nts

0

2

4

6

8

10

12

14

1 4 7 10 13 16 19 22Classes

Eve

nts

0102030405060708090

100

1 2 3Classes

Eve

nts

Page 15: Lecture 8 Probabilities  and  distributions

0.001

0.01

0.1

1

10

100

0.001 0.01 0.1 1 10 100Body weight [mg]

Mea

n de

nsity z

0.001

0.01

0.1

1

10

0.001 0.01 0.1 1 10 100Body weight class [mg]

Mea

n de

nsity z

0.001

0.01

0.1

1

10

100

0.001 0.01 0.1 1 10 100Body weight [mg]

Mea

n de

nsity z

0.001

0.01

0.1

1

10

0.001 0.01 0.1 1 10 100Body weight class [mg]

Mea

n de

nsity z

Page 16: Lecture 8 Probabilities  and  distributions

0.00

1.00

2.00

3.00

4.00

5.00

0 20 40 60 80 100 120

Numbers of storks

Birt

hrat

eNumber of storks nests and birthrates in Switzerland

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

1.6 1.65 1.7 1.75 1.8

Mean body height

% c

atho

lics

Page 17: Lecture 8 Probabilities  and  distributions

Worse Better

0369

12

A B C D EVariable

Var

iabl

e

A B C D ES1

0369

12

Var

iabl

e

Variable

0

3

6

9

12

A B C D EVariable

Var

iabl

e

ABCDE

02468

10

A B C D E0

20

40

60

80

020406080

A B C D E

VariableV

aria

ble

02468

10

A B C D E

Variable

Var

iabl

e

02468

10

A B C D E

Influence of variable 1 on variable 2y = f(x)R2 = n.s.

= 5.5 02468

10

A B C D EVariable 1

Var

iabl

e 2

0.01

0.1

1

10

0.01 1 100

Variable 1

Var

iabl

e 2

02468

10

0 2 4 6 8 10Variable 1

Var

iabl

e 2

A B C D ES1 0

2468

10

A B C D EVariable 1

Var

iabl

e 2

Page 18: Lecture 8 Probabilities  and  distributions

Scientific publications of any type are classically divided into 6 major parts

•Title, affiliations and abstractIn this part you give a short and meaningful title that may contain already an essential result. The abstract is a short text containing the major hypothesis and results. The abstract should make clear why a study has been undertaken•The introductionThe introduction should shortly discuss the state of art and the theories the study is based on , describe the motivation for the present study, and explain the hypotheses to be tested. Do not review the literature extensively but discuss all of the relevant literature necessary to put the present paper in a broader context. Explain who might be interested in the study and why this study is worth reading!•Materials and methods A short description of the study area (if necessary), the experimental or observational techniques used for data collection, and the techniques of data analysis used. Indicate the limits of the techniques used.•ResultsThis section should contain a description of the results of your study. Here the majority of tables and figures should be placed. Do not double data in tables and figures. •DiscussionThis part should be the longest part of the paper. Discuss your results in the light of current theories and scientific belief. Compare the results with the results of other comparable studies. Again discuss why your study has been undertaken and what is new. Discuss also possible problems with your data and misconceptions. Give hints for further work.•AcknowledgmentsShort acknowledgments, mentioning of people who contributed material but did not figure as co-authors. Mentioning of fund giving institutions•Literature

Page 19: Lecture 8 Probabilities  and  distributions

Country Island/ Mainland Area [km2] DeltaT [°C] Lat Long

Days below zero

Min Max Mean Variance Skewness Kurtosis Species Sources

Albania m 28748 17 41.33 19.92 34 -4.28959 2.60059 -1.31798 1.87086 0.0616831 -0.210158 132 Thibaud, 1992; Thibaud & Peja, 1994, 1996; Kontschan et al., 2003; Traser & Kontschan, 2004; Deharveng, 2007Andorra m 468 14.7 42.5 1.5 60 -0.867014 1.58438 -0.0465939 1.22393 1.79255 3.39576 4 Deharveng, 2007Austria m 83871 20 48.12 14.57 92 -4.84426 2.60059 -1.26057 1.61122 -0.0019179 -0.0696599 486 Pomorski, 2006; Querner, 2008Azores i 2200 7 37.73 -28.01 1 -4.13091 1.93892 -1.15658 1.62273 -0.045433 0.101475 94 Gama, 2005a,bBaleary Islands i 5014 15 39.55 2.65 18 -3.98247 0.651808 -1.74797 1.19506 -0.0467805 -0.179688 42 Jordana et al., 2005; Deharveng, 2007; Palacios-Vargas & Simón Benito, 2009Belarus m 207650 23 53.87 28 144 -4.13091 1.82671 -1.02222 1.98246 0.0370745 -0.386657 48 Kuznetsova, 2002; Deharveng, 2007Belgium m 30528 15 50.9 4 50 -4.64414 1.93892 -1.14785 1.77014 -0.0967065 -0.216933 209 Janssens, 2008Bosnia and Herzegovina m 51197 20 43.82 18 114 -4.84426 2.60059 -1.03804 1.38211 0.327301 1.01551 145 Bogojević 1968; Deharveng, 2007; Lučić et al., 2007a; Curcic et al., 2007Bulgaria m 110971 21 42.65 25 102 -4.84426 1.93892 -1.0666 1.74795 -0.143106 -0.0350317 209 Rusek, 1965; Tsonev & Kazandjicva, 1991; Thibaud, 1995b; Pomorski & Skarzynski, 1999; Smolis et al., 2004; Pomorski 2006; Deharveng, 2007Canary Islands i 7270 5 27.93 -15.4 1 -6.95173 0.651808 -2.06754 1.85064 -0.0919176 0.779021 115 Gama, 2005b; Deharveng, 2007Corsica i 8680 13 41.92 8.73 11 -3.87025 1.82671 -1.1579 1.17258 0.315027 0.669108 60 Deharveng, 2007Crete i 8259 13 35.33 24.83 1 -3.76326 1.58438 -1.39088 1.74242 0.192635 -0.766043 108 Ellis, 1976; Schultz & Lymberakis, 2006Croatia m 56594 21 45.82 15.5 114 -3.98247 2.60059 -0.85965 1.71272 0.321948 -0.213202 141 Bogojević, 1968; Ozimec, 2002; Deharveng, 2007Czech Republic m 78866 19 50.1 15.5 119 -4.92946 2.60059 -1.43186 1.78157 0.0042882 -0.0565949 361 Rusek 1977, 1979, 1996, 2001, 2003, 2004; Rusek & Rusek, 1999; Rusek & Subrt, 1999; Jilova & Rusek, 2005; Deharveng, 2007 Denmark m 43093 16 55.63 12.57 85 -4.84426 1.93892 -1.47086 2.00751 -0.062116 -0.483545 222 Fjellberg, 2007aDodecanese Is. i 2663 14 36.4 23.73 2 -3.46924 1.58438 -1.16453 1.22197 0.235818 0.207184 36 Deharveng, 2007 Estonia m 45227 21 59.35 26 143 -3.87025 1.82671 -1.19182 1.52952 0.43837 0.424804 40 Kanal, 2004; Deharveng, 2007 Faroe Is. i 1399 7 62 -7 30 -4.64414 1.93892 -1.34386 1.79648 -0.164351 -0.0688218 85 Fjellberg, 2007aFinland m 338145 23 60.32 25 169 -4.64414 1.93892 -1.39185 1.7577 0.0182062 -0.334913 225 Fjellberg, 2007aFrance m 543965 15 48.73 2.3 50 -5.06348 1.93892 -1.43424 1.46779 0.014345 0.0427404 641 Liste des Collemboles français awailabe at: http://www.insecte.org/forum/viewtopic.php?p=405014#p405014Franz Josef Land i 16134 27 79.85 57.42 310 -2.8658 -0.280761 -1.00011 0.50044 -1.70537 3.45228 15 Babenko & Fjellberg, 2006Germany m 357021 19 52.38 13.42 97 -5.06348 2.60059 -1.422 1.69377 -0.0851031 -0.0590463 420 Pallisa, 2000, Deharveng, 2007Greece m 131992 17 37.9 23.73 2 -3.98247 1.58438 -1.06325 1.45253 -0.120441 -0.209457 103 Schultz & Lymberakis 2006; Pomorski, 2006; Deharveng, 2007; Ramel et al., 2008Hungary m 93054 22 47.43 20 100 -4.84426 2.60059 -1.29947 1.74949 0.0120114 -0.205994 408 Traser & Dányi, 2008; Dányi & Traser, 2008

ln Body weight Body weight distribution

The source data base

Each row gets a single data record.Columns contain variables.Variables can be of text or metric type.

Never use the original data base for calculations.Use only a replicate.Take care of empty cells.In calculated cells take care of impossible values.

Page 20: Lecture 8 Probabilities  and  distributions

http://folk.uio.no/ohammer/past/

Page 21: Lecture 8 Probabilities  and  distributions

No Raw data Classes Class means CounterNumber of occassions

FrequenciesCummulative

frquencies1 0.154497 0-0.1 0.05 20 20 0.1 0.12 0.919498 0.1-0.2 0.15 48 28 0.14 0.243 0.517978 0.2-0.3 0.25 83 35 0.175 0.4154 0.742013 0.3-0.4 0.35 107 24 0.12 0.5355 0.295932 0.4-0.5 0.45 127 20 0.1 0.6356 0.819647 0.5-0.6 0.55 149 22 0.11 0.7457 0.693982 0.6-0.7 0.65 172 23 0.115 0.868 0.194982 0.7-0.8 0.75 185 13 0.065 0.9259 0.276991 0.8-0.9 0.85 198 13 0.065 0.99

10 0.054868 0.9-1 0.95 200 2 0.01 111 0.386411

12 0.00286 +D10+0.1+LICZ.JEŻELI(B$2:B$2

01;"<1")=E11-E12 =F11/E$11 =G11+H12

13 0.129657

05

10152025303540

0 0.2 0.4 0.6 0.8 1

N

X

0

0.05

0.1

0.15

0.2

0 0.2 0.4 0.6 0.8 1

f(X)

X

Frequency distribution

Page 22: Lecture 8 Probabilities  and  distributions

0

0.05

0.1

0.15

0.2

0 0.2 0.4 0.6 0.8 1

f(X)

X

Frequency distribution

No Raw data Classes Class means CounterNumber of occassions

FrequenciesCummulative

frquencies1 0.154497 0-0.1 0.05 20 20 0.1 0.12 0.919498 0.1-0.2 0.15 48 28 0.14 0.243 0.517978 0.2-0.3 0.25 83 35 0.175 0.4154 0.742013 0.3-0.4 0.35 107 24 0.12 0.5355 0.295932 0.4-0.5 0.45 127 20 0.1 0.6356 0.819647 0.5-0.6 0.55 149 22 0.11 0.7457 0.693982 0.6-0.7 0.65 172 23 0.115 0.868 0.194982 0.7-0.8 0.75 185 13 0.065 0.9259 0.276991 0.8-0.9 0.85 198 13 0.065 0.99

10 0.054868 0.9-1 0.95 200 2 0.01 111 0.386411

12 0.00286 +D10+0.1+LICZ.JEŻELI(B$2:B$2

01;"<1")=E11-E12 =F11/E$11 =G11+H12

13 0.129657

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

F(X)

X

Cumulative frequency distribution

Page 23: Lecture 8 Probabilities  and  distributions

Probability density function (pdf)

max

min

max( ) ( ) 1x

x

F x f x dx max

1

( ) ( ) 1x

i iF x f x

Statistical or probability distributions add up to one.

0

0.05

0.1

0.15

0.2

0 0.2 0.4 0.6 0.8 1

f(X)

X

0

0.05

0.1

0.15

0.2

0 0.2 0.4 0.6 0.8 1

f(X)

X

Discrete distribution Continuous distribution

Discrete and continuous distributions

Probability generating function (pgf)

Page 24: Lecture 8 Probabilities  and  distributions

00.020.040.060.08

0.10.120.140.160.18

0 5 10 15x

f(xi)

symmetric

0

0.05

0.1

0.15

0.2

0.25

0 5 10 15x

f(xi)

left skewed

00.020.040.060.08

0.10.120.140.160.18

0.2

0 5 10 15x

f(xi)

right skewed

00.020.040.060.08

0.10.120.140.16

0 5 10 15x

f(xi)

bimodal

0

0.05

0.1

0.15

0.2

0.25

0 5 10 15x

f(xi)

decreasing

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0 5 10 15x

f(xi)

U-shaped

Shapes of frequency distributions

Page 25: Lecture 8 Probabilities  and  distributions

Many statistical methods rely on a comparison of observed frequency distributions with theoretical distributions.

Deviations from theory (from expectation) (so called residuals) are measures of statistical significance.

0

0.05

0.1

0.15

0.2

0.25

0.3

0 0.2 0.4 0.6 0.8 1

f(X)

X

Df(x)

Df(x)

If the Df(x) are too large we accept the hypothesis that our observations differ from the theoretical expectation.

The problem in statistical inference is to find the appropriate theoretical distribution that can be applied to our data.

Page 26: Lecture 8 Probabilities  and  distributions

Home work and literature

Refresh:

• Arithmetic, geometric, harmonic mean• Variance, standard deviation standard error• Central moments• Third and fourth central moment• Mean and variance of power and

exponental function statistical distributions• Pseudocorrelation• Sample bias• Coefficient of variation• Representative sample

Prepare to the next lecture:

• Bernoulli distribution• Pascal distribution• Hypergeometric distribution• Linear random number

Literature:

Mathe-onlineŁomnicki: Statystyka dla biologów.


Recommended