Measures and indexes of variability - Stanford...

Measures and indexes of variability

Stats48n

Measures of spread/dispersion/variabilityI A measure of center needs to be complemented by a measure

of spread around this center.I The definition of averages that we explore naturally lead

themselves to measures of variabilityI Variance: average square distance from the mean

V(x1, . . . , xn) = 1n

n∑i=1

(xi − x̄)2

I Standard Deviation: √√√√1n

n∑i=1

(xi − x̄)2

I Note that R actually divides by n − 1 rather than n. This isbecause when x1, . . . , xn are a sample from a larger populationof possible values, dividing by n − 1 one has a “better”estimator for the population quantity.

A note: data with frequencies

I Often data is summarized so that we have counts ofoccurrences of the same values: we have a set v1, . . . , vm ofpossible values, with their frequencies fi

v1 v2 · · · vmf1 f2 · · · fm

- Calculating averages and standard deviations has to adapt to thisdifferent set-up

v̄ = 1∑mi=1 fi

m∑i=1

vi fi

Variance = 1∑mi=1 fi

m∑i=1

(vi − v̄)2fi

A note: the maximal variance of x1, . . . , xnI Generally speaking, the variance of a dataset can be arbitrarily

largeI Let’s consider some restrictions that make the statement

meaningfulI xi ≥ 0 ∀i

I fix the total sum of valuesn∑

i=1xi = nx̄

n∑i=1

(xi − x̄)2 =n∑

i=1(x2

i + x̄2 − 2xi x̄)

=n∑

i=1x2

i +n∑

i=1x̄2 − 2x̄

n∑i=1

xi

=n∑

i=1x2

i + nx̄2 − 2x̄(nx̄)

=n∑

i=1x2

i − nx̄2

A note: the maximal variance of x1, . . . , xn

So, V(x1, . . . , xn) = (n∑

i=1x2

i − nx̄2)/n. Now,

n∑i=1

x2i − nx̄2 ≤ (

n∑i=1

xi )2 − nx̄2 = n2x̄2 − nx̄2 = x̄2n(n − 1)

Which means that

V(x1, . . . , xn) ≤ x̄2(n − 1)

I Can we imagine a set of values of x1, . . . , xn for which thevariance is actually equal to this max?

Index of concentration

The opposite of spread-out is “concentrated.”

Let’s consider variables like the one we just talked about, that iswith only positive values. One such variable might be the incomeof households in a nation.

It is interesting to study how “concentrated” or not such income is.One can imagine that the total income of a nation is the totalamount of a resource that one could distribute.

Income inequality in the media

Income inequality in politics

How can we measure “income inequality”?

I Let’s think we have a population with n individuals, each withincome x1, . . . xn.

I nx̄ is the total income in the population (with x̄ =∑n

i=1 xi/n)I What would be the values of x1, . . . xn in the case of maximal“income equality”?

I What would be the values of x1, . . . xn in the case of maximal“income inequality”?

I How are we going to judge cases in the middle?I Any known measure?I Any measure we can come up with given what we already

know?

A graphical display for income distributionWe take the values x1, . . . xn and order them

x(1) ≤ x(2) ≤ · · · ≤ x(n)

For simplicity, we are going to drop the parentheses from the indexnotation, and just remember that x1 is the smallest index.We now calculate two quantities:

Fi = in Qi =

∑ij=1 xj∑nj=1 xj

I F1 is the fraction of the population that correspond to thebottom earner; F2 is the fraction of the population thatcorrespond to the two bottom earners etc.

I Q1 is the fraction of the national income earned by the bottomearner; Q2 is the fraction of the national income earned by thetwo bottom earners etc.

A graphical display for income distribution

I Let’s think about the relation between Fi and Qi in the case ofperfect income equality

I In general, Qi ≤ Fi . To see this, let’s look at their definitionand multiply by

∑nj=1 xj and divide by i

Qi ≤ Fi∑ij=1 xj∑nj=1 xj

≤ in∑i

j=1 xj

i ≤∑n

j=1 xj

n

and remember that the xi are increasing.

A graphical display for income distributionIncome values = 1,2,3,10,15,15,30,50

● ●●

●

●

●

●

●

●

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

F

Q

A graphical display for income distribution

●

●

●

●

●

●

●

●

●

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

F

Q

Perfect equality

● ● ● ● ● ● ● ●

●

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

F

Q

Maximal inequality

How could we use this to construct an Index?

An idea for the index

● ●●

●

●

●

●

●

●

0.00

0.25

0.50

0.75

1.00

0.00 0.25 0.50 0.75 1.00

F

Q

From area to index

I Index varies between 0 and 1I Area in between curves A= 1/2- area under bottom curveI Area under bottom curve: sum of areas of trapezoids. Thus

A = 12 −

n∑i=1

(Fi − Fi−1)(Qi + Qi−1)2

I Gini’s index= G = A1/2 = 1−

n∑i=1

(Fi − Fi−1)(Qi + Qi−1)

How do things change if we have repetition?

I data in the formx1 ≤ x2 ≤ · · · ≤ xk

n1 n2 · · · nk

with∑

j nj = nI Define

Fi =∑i

j=1 nj

n Qi =∑i

j=1 njxj∑kj=1 njxj

I Everything else stays the same.

Income distribution in USA 2015Current Population Survey, Income Data

●●

●●●

●●

●

●●●

●●

●●●●

●●

●●

●●

●●●●●

●●●●●

●●●●●●●

●

●

●●

●●

●

●●

●

●●●

●●

●●●

●

●●●

●

●●

●●●●●

●●●●●

●●●●●●●

●

●

●

●

●

●

●

●●

●

●●●

●

●

●●

●●

●●

●●●●●●●●●

●●●

●●

●

●

●●●●●

●●●●

●

●

●●

●●

●

●

●

●●●

●

●

●●

●

●●

●

●

●●

●

●

●

●

●

●

●

●●

●●

●

●

●

●

●●

●●●●●

●

●●●●●●●●

●

●

0.00

0.02

0.04

0.06

0.08

0e+00 1e+05 2e+05 3e+05 4e+05

Average Income in Bracket

Pro

port

ion

in In

com

e B

rack

et

●●●●●

●●●●●

●●●●●

●●●●●

●●●●●

All

Whites

Blacks

Asians

Hispanics

https://www.census.gov/programs-surveys/cps/about.html

https://www.census.gov/data/tables/time-series/demo/income-poverty/cps-hinc/hinc-06.html

Revisiting the Income data

●

●

●

●

●

All

Asian

Black

Hispanic

White

0.47 0.48 0.49 0.50

Gini Index

Rac

e

Gini index for other data

Gini index for other data

Something to note

We can calculate the following summary of “mutual variability”

∆ =k∑

i=1

k∑j=1|xi − xj |

nin

njn

And one can show thatG = ∆

2x̄

Date post:	21-Aug-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Measures and indexes of variability - Stanford...

Documents