DraftComments appreciated
Decomposition of Inequality Based on Incomplete Information
A contributed paper to the IARIW 24th General Conference Lillehammer, Norway, August 18-24, 1996
Yuri DikhanovStatistical Advisory Services
International Economics Department, IECDDThe World Bank
1818 H Street, N.W.Room N2-038Washington, D.C. 20433 U.S.A.phone: (202)458-2667fax: (202)522-3669e-mail: [email protected]
Abstract
In this paper, the author examines five measures of inequality: the Gini coefficient, two entropy (Theil) indexes, normalized variance and decile ratio. It is shown how to decompose these indexes into intra-group and between-group inequalities. These indexes are used to study inequalities in the former Soviet republics in 1990. This study is based on incomplete information on income intervals (only income boundaries and population shares have been used). The robustness of the approximating procedure (piecewise polynomial interpolation of the cumulative distribution function) is discussed. Two alternative representations of the Gini coefficient are discussed as well.
The views presented in this paper are the author’s and do not necessarily represent those of the World Bank or its Board of
Directors.
I. Introduction
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
Analysis of income or wealth distribution often includes decomposing inequality for total population into between-group and within-group inequalities. Not all inequality measures are decomposable, and not all of the decomposable ones are decomposable in the same way. Theoretically, the second Theil measure (T2) has probably the best properties. The Gini index, however, is the most widely cited measure. In this paper we made an attempt to decompose the Gini index in a meaningful way (see Section IV).
The Gini index, along with the two Theil measures, normalized variance and decile coefficient, was then used to analyze income inequalities in the former Soviet Union and its Republics in 1990 (see Section II and Annex). We found that the share of inter-group inequality in total inequality was in the range of 7.7-15.8 percent, depending on the index. As inputs into this exercise, we used official data on intervals: seven interval boundaries and population shares within these boundaries for each of the former Soviet Republics.
To process these discrete data we used interpolation with polynomial of order four on each interval. These polynomials are chosen to be twice continuously differentiable in all points of the distribution, which allows differential and integral operations with a distribution function and its derivatives in explicit form. Section III discusses the robustness of these procedure using two numerical examples: a “bad” one, a hypothetical mixture of two normal distributions with different means and variances, presented as five income intervals (quintiles); and a “good” one, a log-normal distribution, presented as ten intervals. As expected, in the “good” case the precision of approximation is by one or two order of magnitude better than in the “bad” case (0.004-0.39 percent depending on the parameter versus 0.2 - 1.3 percent).
Section V discusses two alternative graphical and analytical representations of the Gini coefficients that are based on the original distribution function rather than on the Lorenz curve.
II. Decomposition of income inequalities in the former Soviet Union.
There were two major reasons we used the former Soviet Union data from 1990: first, the data were available (there were not many countries where regional inequality data were collected on the regular and comparable basis); and, second, since 1990 the former Soviet Republics have become independent countries, and as economies in transition, they attract the special attention of academics and policy makers.
Original information included boundaries and population shares for seven intervals (see Table 1 below). To process this information we used a version of our Gini ToolPak.
2
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
Table 1. Original data on income distribution shares in the former Soviet Union for 1990
Interval boundaries U
SSR
Rus
sia
Ukr
aine
Bel
arus
Est
onia
Lat
via
Lith
uani
a
Mol
dova
Arm
enia
Aze
rbai
jan
Geo
rgia
Kaz
akhs
tan
Uzb
ekis
tan
Kyr
gyzs
tan
Tajik
ista
n
Turk
men
ista
n
<75 7.8 3.2 2.7 1.5 0.6 0.9 1.2 6.1 5.4 29.7 6.5 10.0 34.1 24.8 45.1 26.975-100 10.6 8.2 8.6 5.9 2.7 3.8 4.5 12.5 11.3 19.7 11.2 14.4 23.0 21.7 22.7 22.3100-150 28.0 27.2 31.2 27.0 15.4 19.5 20.9 32.9 31.6 26.8 28.7 31.1 26.8 30.8 21.6 29.6150-200 23.9 26.0 28.0 28.9 23.6 26.1 25.8 24.5 24.6 13.0 23.1 21.5 10.1 13.7 6.8 12.7200-250 14.9 17.3 16.2 19.1 21.7 21.3 20.5 13.0 14.3 6.0 14.5 11.9 3.7 5.5 2.4 5.1250-300 8.0 9.6 7.9 10.0 16.2 13.9 13.3 6.4 7.1 2.7 8.2 6.0 1.4 2.1 0.9 2.0>300 6.8 8.5 5.4 7.6 19.8 14.5 13.8 4.6 5.7 2.1 7.8 5.1 0.9 1.4 0.5 1.4
The overall results can be assessed from Figure 0-2 from the Annex that represents normalized values of various inequality measures (inequality indexes normalized by their standard deviations). As we can see, the lowest inequality was observed in Belarus and Ukraine, followed by Estonia, Latvia and Lithuania. That the Baltics had higher inequality than Belarus and Ukraine has to be attributed to the fact that, although minimum wages were the same in all of these former republics, the means were higher in the Baltics. Russia had a higher income inequality than these economies, which is to be expected given her size. A factor that additionally increased the inequality for Russia was the relatively high prices (and hence wages) in Siberia. The highest inequality was registered for Azerbaijan and the Central Asian states (Uzbekistan, Kyrgyzstan, Turkmenistan and Tajikistan). The results for Azerbaijan are not obvious given the much lower numbers for neighboring Armenia and Georgia.
Another piece of information that Figure 2 provides relates to the correlation between the indexes. We can see that, in general, all the indexes for this set of countries produce highly consistent results. Table 2 of the Annex provides correlation coefficients. As we can see, one of the highest values of correlation coefficients is observed for the Theil1-Theil2 pair: r2=0.9964. By absolute value, the difference between them is around 2 percent, which can be seen as a measure of the deviation of the actual distribution from the log-normal one (as we know, under the assumption of log-normal distribution, the two Theil indexes coincide). As we can see, for some economies the deviation between the two Theil indexes is insignificant: 0.1-0.2 percent - though a part of that can be attributed to the fact that the approximation errors go in opposite directions. The two Theil indexes and the Gini coefficient are correlated even tighter: r2=0.9979-0.9987. Also a very high correlation was registered for the Theil1-Decile ratio pair: r2=0.9980. Tight correlation is also observed for the Theil 2 - Decile ratio pair: r2=0.9932. The lowest value of correlation coefficient is registered for the Variance-Decile ratio pair: r2=0.9908. We have to say, though, that this value is still very significant. The overall conclusion is that all these inequality measures produce coherent results.
3
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
Table 1 of the Annex provides the results of actual estimations. Shares of inter-group variance presented in the table are of special significance for this paper. The two Theil indexes and variance display similar results in the 12.9-15.8 percent range. The share of inter-group variances for the Gini coefficient, on the other hand, is only 7.7 percent, which is roughly half of those for other measures. One has to bear in mind, however, that the ways these indexes decompose are different, and, thus, are not directly comparable. The two Theil indexes, for example, produce identical results only under the assumption of “log-normality” of the distribution. However, shares of inter-group variances will still be different because they are aggregated with different weights (income and population shares, respectively). The inter-group results produced using the second Theil index (0.0170) can be compared to those estimated by H. Theil (1989, Development of International Inequality, Journal of Econometrics, Vol. 42, No. 1, North-Holland). For 1985, he found the inequality between the OECD countries (without Australia) to be 0.0859; for tropical America, 0.0580; for tropical Asia, 0.2003; and for tropical Africa, 0.1871. Figure 5 of the Annex provides a graphical representation of the Theil index for combined distribution versus the between-group Theil index.
Figure 3 of the Annex presents density functions of income distribution in the former republics. It is interesting to note that the Estonian distribution has slight irregularities in the upper part of the distribution. This might indicate urban/rural or Tallinn/rest of the country income differentials1. More likely, a factor that might have contributed to that situation was the advance of reforms in Estonia: in 1990 this country had the highest share of non-agricultural private sector in the former Soviet Union, which provided much higher salaries than the state sector.
Figure 4 of the Annex is a histogram on a logarithmic scale. It shows shares of population within proportional boundaries (the next boundary is in proportion to the previous one). It has to be noted that in this case the highest point would not be the mode as in a distribution density function, but the mean. Using this type of histogram requires, however, some compliance with the assumption of “log-normalness” of the distribution.
Table 2 of the Annex presents income shares by decile. That Azerbaijan had the highest inequality and Belarus and Ukraine had the lowest can be directly inferred from this table.
III. Robustness of the computational procedure1 Tallinn had 35 percent of Estonian population and 45 percent of the income.
4
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
For this exercise the Gini Toolpak was used. In this section we will briefly explore the issues of robustness of the procedure. We will use two numerical examples: a “good” case (ten income intervals for log-normal distribution); and a “bad” case (five intervals, i.e., quintile data; for a mixture of two normal distributions with different means and variances).
The essence of the procedure (polynomial interpolation) is the following:Let’s assume that we are given only a set {F(Yi)} of M elements which describes values that the cumulative distribution function takes at Yi. We need to approximate all other points of the distribution, i.e., to estimate F(y) for y[0,+]. Within each interval [Yi+1 ,Yi], we will interpolate the distribution function by a polynomial of the order 4 in the form:
F yy Y
Y Yi i in i
i i
n
n, ( )
1
10
3
At the boundaries the polynomials are exact, and are not interpolations: i.e., F Y F Y F Yi i i i i i i, ,( ) ( ) ( ) 1 1 .
These polynomials are chosen to be twice continuously differentiable across the boundaries. This is a very important property, because it allows differential and integral operations with F and its derivatives in explicit form. For example, the mean of the
distribution would be calculated as follows:
ydFnY Y
nin
n
i i
i
M
1
31
0 1, where M
is the number of intervals. Other characteristics of the distribution function can be derived in a similar way.
Errors of estimation in polynomial interpolation
Using logic similar to that behind the remainder term of Taylor formula in Lagrange form, we arrive at the following expression for estimation errors2:
F y F yY Y
F
where F y
i ii i
y Y Yi i
,( )
[ , ]
( )
( ) ( )!
( )
arg max( ( ) )
11
44
4
14 2
1
In the case of normal (standard) distribution the above boils down to:
F y F y Y Y Fi i i i,( )( ) ( ) ( ) ( )( ) 1 1
4 1 31384
3
Or, in the case when the intervals are separated by /2, we obtain that the biggest errors will be in the interval [0.5, ] (that can be seen from the first order condition for F ( ) ( )( )1 33 ), and the errors in this interval are expressed as follows:
2 Dividing interval [Yi+1 ,Yi] in half simply states that, because at the end of the interval the polynomial becomes exact again, maximum errors are attained around the middle of the interval. The coefficient 1/384 [1/(24 4!)] is the absolute theoretical minimum for the errors. The minimum is attained when the polynomial coefficients for the interval are determined (almost) independently of other intervals. In other cases, the inequality is somewhat different, although the order of magnitude for errors remains the same.
5
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
F y F y ei i, ( ) ( ) .
1
3 62
1384 16
22
0 01%
A. “Good” case
As a “good” case, we used ten income intervals for the log-normal distribution LN(5,0.25).
The results are presented in the table below. Graphical results are presented in Figure 1. As can be seen from the graph, the actual distribution cannot be readily distinguished from the simulation. The largest difference is for the mode, which is notoriously difficult to get.
Actualvalues
Simulation Difference
Mean Income 153.12 153.09 -0.02%Gini-coefficient 0.14032 0.14023 -0.06%Median Income 148.41 148.41 0.00%Mode Income 139.42 139.97 0.39%Variance 38.887 38.923 0.09%Income less than mean 0.5497 0.5494 -0.06%Theil index 0.03125 0.03123 -0.07%Theil index 2 0.03125 0.03126 0.03%
0
0.001
0.002
0.003
0.004
0.005
0.006
0.007
0.008
0.009
0 100 200 300 400 500
Figure 1. Deviation of simulation from actual distribution: a "good" case
B. “Bad” case
As a “bad” case, we used five income intervals for the mixture of two normal distributions N(40,10) and N(60,5). The results are presented in the tables below.
6
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
Graphical results are presented in Figure 2. As can be seen from the graph, the actual distribution is visually readily distinguishable from the simulation. The largest difference is again for the mode.
Inputs into the procedure
Interval boundaries Quintiles of population < 37.4696 Quintile I37.4696 to 48.10972 Quintile II48.10972 to 56.60144 Quintile III56.60144 to 61.47081 Quintile IV> 61.47081 Quintile V
Results of the simulation
Actual values Simulation DifferenceMean Income 50.00 49.67 -0.7%Median Income 53.33 53.23 -0.2%Mode Income 59.64 58.87 -1.3%Income less than mean 43.20% 42.82% -0.9%
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0 10 20 30 40 50 60 70 800.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
45.0
50.0
0 10 20 30 40 50 60 70 80
Distributiondensity
Figure 2. Deviation of simulation from actual distribution: "bad" case
IV. Decomposition of inequality measures
IV.1. Decomposition of GINI - coefficient
7
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
Let’s consider a distribution F defined by its cumulative distribution function F(y). The respective distribution density function is F. The mean of that distribution is defined as i iydF y ( ) using Lebesgue-Stiltjes integrals. (Hereinafter a plain integral sign describes integrating from 0 to +). Then the essence of the Gini - coefficient can be seen from the graph of the Lorenz curve (see Figure 3).
Gini-coefficient is defined as equal to twice the area between the 45 line and Lorenz curve. Or
G yd dF
G Fd yd FydF
F
F
12
21
21
0
0
( ) ,
( )
or,
Let’s consider two distributions F1 and F2, where the distributions are defined by their respective cumulative distribution functions Fi(y). The respective distribution density
1
0ydF y
y
( )
Figure 3. Lorenz curve
8
F(y)
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
functions are Fi. Means are defined as i iydF . Thus, we can define Gini - coefficients G for the respective functions as follows:
G y F F dy F y F dy
G y F F dy F y F dy
11
1 11
1 1
22
2 22
2 2
2 12
21
2 12
21
( )
( )(1)
Then, for the combined distribution we can write:
G y F F F F dy
2 1
21 1 2 21 1 2 2 1 1 2 2
( )( ) (2)
where:
ii ip
p p
1 1 2 2- income share of the i distribution
pi - population share 1 1 2 2 - mean income for the combined distribution
Or, after some simple operations we will receive:
G G Gp p
y F F F F dy 1 1 2 21 2
1 2 1 2
2( )( ) (3)
Expression (3) is obtained as follows:
:
Gp p
y p F p F p F p F dy
p py p F F p F F p p F F F F dy
y p F F p F F p p F F F F F F F F dy
21
21
21
1 1 2 21 1 2 2 1 1 2 2
1 1 2 212
1 1 22
2 2 1 2 1 2 2 1
12
1 1 22
2 2 1 2 2 2 1 2 1 1 2 1
( )( )
[ ( )]
[ ( ( ) ( ))]
21
21
2
12
1 2 1 1 22
1 2 2 2 1 2 2 1 1 2
1 1 1 2 2 2 1 2 2 1 1 2
1 1 2 21 2
2 1 1 2
y p p p F F p p p F F p p F F F F dy
y p F F p F F p p F F F F dy
p
G G Gp p
y F F F F dy
i ii
[( ) ( ) ( )( )]
[ ( )( )]
( )( )
and,because
9
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
It is easy to see how the above expression can be expanded for a multi-component case:
Gp
y p F p F dy
py F F p p p p p F F F F dy
py F F p p p F F F F dy
i ii
i ii
i ii
i ii
ii
i i i j i j i j i ji jj i
i ii
ii
i i i j i j i ji j
21
12 1
12 1
2
{ [ ( )] ( )( )}
{ ( )( )}
,
,
The above expression can be rewritten as follows:
G Gp p
y F F F F dyii
ii j
i j i ji j
( )( ),
(4)
And, as it is easy to see how the Gini-coefficient can be expressed through the covariance as well:
GY
COV y Fii
i2
( , )
and the combined Gini-coefficient can be written as:
G COV y Fp p
COV y F Fi
iii
i ji j
i j 2
( , ) ( , )
,
(5)
Or,
G p COV y F p p COV y F Fi ii
i j i ji j
12
{ ( , ) ( , )}
,
The first component stands for intra-group covariances, whereas the second stands for inter-group covariance. As we can see from expression (3), the Gini - coefficient for the combined distribution consists of two parts: intra-group and inter-group variances. Similar to the Theil coefficient T1, the individual Gini - coefficients are added up with income weights.
10
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
IV.2. Decomposition of entropy (Theil) indexes
11
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
In his book, H. Theil (1967, Economics and Information Theory, North-Holland, Amsterdam), introduced, for income inequality measurement, the entropy measure used in thermodynamics and information theory. He suggested using the entropy index in two forms: as income-weighted and population-weighted entropy indexes. In this paper we will call them T1 and T2 respectively.
These indexes can be represented as follows:
TYY
YY
NN
TNN
NN
YY
i i i
i
i i i
i
1
2
log( )
log( )
where,Yi is income of group i;Ni is number of people in group i
Or, using Lebesgue-Stiltjes integrals:
Ty y
dF y
Ty
dF y
1
2
log( ) ( )
log( ) ( )
As can be shown, these indexes are easily decomposable in the multi-group case. For the Theil index T1 we have:
TYY
YY
NNi
ij
i
ij
i
ij
ij1 log( )
where:Yi
j is income of sub-group j of group i;
12
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
Nij is number of people in sub-group j of group i;
Yi is income of group i;Ni is number of people in group i
Or, using Lebesgue-Stiltjes integrals:
Ty y
dF yii i
i1 log( ) ( )
The Theil index T1 decomposes into:
T Tp
Ti i ii
ijii i i
i
ji1 1 1
log( ) log( )
T2 decomposes in a similar way with the population weights p.
As has been shown by F. Bourguignon (1979, Decomposable Income Inequality Measures, Econometrica, Vol. 47, No. 4.), and A.F.Shorrocks (1980, Inequality Measures, Econometrica, Vol. 48, No 3), the Theil indexes are the only income-weighted and population-weighted indexes respectively that can be decomposed in that way: i. e., weighted sum of individual Theil indexes and the Theil index constructed of individual distributions as if they were elements of the combined distribution. In this sense, the decomposition of the Theil indexes is different from that of the Gini.
IV.3. Decomposition of normalized variance
Normalized variance can be seen as a simple way of describing income inequalities.
sy
p p COVy y
COVy y
i ji j
i ji j
i j
i
i
j
j
2 ( ) ( , ) ( , ), ,
Or,
sy
sy
COVy y
kk
kki j
i j
i
i
j
j
2 2 2( ) ( ) ( , )
13
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
IV.4. Decomposition of decile ratio
Decile ratio is a simple and transparent inequality measure, however it cannot be meaningfully decomposed.
IV.5. Lorenz curve
The Lorenz function L is the function of income shares on population shares. The Lorenz curve associated with this function is plotted in Figure 3. The Lorenz curve plays an enormous role in income distribution analysis. Some important relationships between the Lorenz curve and the cumulative distribution function, as well as a graphical representation of the Theil index, are shown below.
L(F)=y/
F10
Figure 4. First derivative of the Lorenz curve
14
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
Figure 4 shows the first derivative of the Lorenz curve, L(F). It can be easily seen that L(F) is essentially the normalized income y/, and, thus, is the inverse (normalized) of the cumulative distribution function. The graph is also related to the Theil (T2) index. The logarithm of this graph is a graphical representation of the index (because the index
can be presented as Ty
dF y2 log( ) ( )
.
Log(L(F))=log(y/)
F1
0
Figure 5. Graphical representation of the Theil index (T2)
The second derivative of the Lorenz curve is also an important characteristic of a distribution: L(F)=yF/.. It is essentially the expression for the inverse function of a distribution density function F(y).
L(F)=yF/
F10
Figure 6. Second derivative of the Lorenz curve
IV.6. Some properties of log-normal distribution
Log-normal distribution plays an important role in inequality measurements. It is thought that real distributions of wealth and income at least partially can be approximated by it. An extensive treatment of the log-normal distribution is contained in
15
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
J. Aitchison and J.Brown (1957, The Lognormal Distribution, Cambridge University Press). Here we mention just a few relevant properties.
F y ey
e
Median e
Mode e
S e e e
s S e
E z ewhere
z z
y
m m
( )
( )
( )
( )
ln
(ln )
/
12
1
1
1
2
2
2
2
2 2
2
22
2
2
2
16
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
A convenient feature of the log-normal distribution is the simplicity of the Gini and Theil indexes: T y dL1 (ln ln )
Ty
e dy
yy
e d y
tee dt
et
e dt
y
y
t t
t
11
2
12
2
12
2
12
2
2 2
2
2
2
2
2
2
2
2 2
2
2
2 2
2 2
2 2 2
2 2 2
ln ln
lnln ( / )
( / )
( / )
( / ) /
(ln )
(ln )
( )
/( ( ))
And, in the case of the second Theil index, we can obtain the following expression:
T y dF2 (ln ln )
Ty
e d yy
22
2
2 2
2
22 2
2 2
ln
ln ( / )
( / ) /
(ln )
We can use the test of T1=T2 to examine how close a given distribution approaches a log-normal one.
The relationship of the Theil measures to normalized variance can be expressed as follows:
T T s1 22
12
12
2
ln( )
In the log-normal case, we can also think of the Theil indexes as the difference between the mean and median.
T TMedian
1 22
2
log( )
And, finally, as can be easily seen, the Gini coefficient for the log-normal distribution can be written as follows:
G F e 2 2 1 2 12 2( / ) ( )/
where (.) is the standard normal cumulative distribution.
V. Two alternative representations of the Gini coefficient
17
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
Apart from the traditional visualization of the Gini index using the Lorenz curve, it is possible to represent the Gini using simple graph of the distribution function. Below two such representations are discussed.
1. Let’s start from the following expression for the Gini coefficient:G y F dF
2 12
( ) (6)
Or, as it is easy to see, expression (6) can be written as:G y FdF
2
( ) (7)
Expressions (6) and (7) are equivalent to:
G Cov y F2
( , ) (8)
We can rewrite Expression (8) using slope coefficient as follows:
Gy F dF
F dFF dF Slope F y
212
12
12
2 1122
2
( )( )
( )( ) ( , )
because F F dFF F F2 2
3 2
0
112 3 2 4
112
( ) (9)
where Slope = slope coefficient3
Or, finally,
G Slope F y16
( , ) (10)
where yy
Expression (10) can be obtained from Expression (8) in a different way as well. Let’s start from rewriting Expression (8) using the correlation coefficient4:
G Cov y F y F y Fy Fy
2 2 1
3
( , ) ( , ) ( , ) (11)
3 Slope x yx E x y E y
x E xi i i
i i
( , )( ( ))( ( ))
( ( ))
2 , where i are weights, or, in continuous case,
Slope x yx E x y E y dF
x E x dF( , )
( ( ))( ( ))
( ( ))
2
4 Discrete case of using correlation coefficients in expressing Gini coefficient [Expression (11)] was shown in Milanovic (1996)
18
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
because F2 1
12 , [see Expression (9)].
Now, using
( , ) ( , )y F Slope y F F
y
, we obtain Expression (10) again:
G Slope F y Slope F yy F
y
13
16
( , ) ( , )
y
F
1/2
1
Slope(F, y )=6*Gini
Figure 7. Graphical representation of the Gini coefficient as one sixth of the slope coefficient between income y and distribution function F.
2. The next representation of the Gini coefficient can be obtained using Expression (7):
G y FdF yFdF y dFF
2 2
1 1 122
( )( )
(12)
where ( )F ydF2 2 , or the mean for the square of distribution F.Or, equivalently:G y dF y dF Fdy F dy
2 2 (13)
It is easy to see that distribution F2 has all the properties of a regular distribution. F2 is a monotonous transformation of F , and, hence, is itself a monotonously increasing function bounded by [0,1].
19
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
Expression (12) essentially says that the Gini coefficient is equal to the difference between regular mean and the mean for the square of distribution ( F2). The expression is presented in Figure 8 in graphical form. In the case when income normalized by the mean, the Gini coefficient is equal to the area between the distribution function F and the squared distribution function F2.
y
F
1
1/2
F
F2
Area = Gini
Figure 8. Graphical representation of the Gini coefficient as the area between the distribution function F and the squared distribution function F2.
20
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
References
Aitchison J. and J.Brown, 1957, The Lognormal Distribution, Cambridge University Press, Cambridge.
Bourguignon F.,1979, Decomposable Income Inequality Measures, Econometrica, Vol. 47, No. 4.
Shorrocks A.F., 1980, Inequality Measures, Econometrica, Vol. 48, No 3.
Theil H., 1967, Economics and Information Theory, North-Holland, Amsterdam.
Theil H.,1989, Development of International Inequality, Journal of Econometrics, Vol. 42, No. 1, North-Holland, Amsterdam.
21
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
ANNEX
0.0100
0.1000
1.0000
10.0000
US
SR
Rus
sia
Ukr
aine
Bel
arus
Est
onia
Latv
ia
Lith
uani
a
Mol
dova
Arm
enia
Aze
rbai
jan
Geo
rgia
Kaz
akhs
tan
Uzb
ekis
tan
Kyr
gyzs
tan
Tajik
ista
n
Turk
men
ista
n
Gini-coefficient
VarianceTheil index
Theil 2 indexDecile ratio
Figure A-1. Inequality in the former Soviet Union, 1990 (various indexes by absolute value)
22
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
US
SR
Rus
sia
Ukr
aine
Bel
arus
Est
onia
Latv
ia
Lith
uani
a
Mol
dova
Arm
enia
Aze
rbai
jan
Geo
rgia
Kaz
akhs
tan
Uzb
ekis
tan
Kyr
gyzs
tan
Tajik
ista
n
Turk
men
ista
n
Gini-coefficient
Variance
Theil index
Theil 2 index
Decile ratio
Figure A-2. Correlation between various inequality measures in the former Soviet Union, 1990 (inequality indexes normalized by standard deviation)
23
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
Table A-1. Inequality indexes for the former Soviet Union, 1990
Characteristics
Share of inter-group variance US
SR
Russ
ia
Ukra
ine
Bela
rus
Esto
nia
Latv
ia
Lithu
ania
Mold
ova
Arm
enia
Azer
baija
n
Geor
gia
Kaza
khst
an
Uzbe
kist
an
Kyrg
yzst
an
Tajik
istan
Turk
men
istan
Gini-coefficient 7.7% 0.2599 0.2407 0.2155 0.2150 0.2294 0.2313 0.2272 0.2393 0.2431 0.3017 0.2583 0.2646 0.2777 0.2725 0.2753 0.2768Variance 12.9% 0.4760 0.4414 0.4003 0.3970 0.4209 0.4303 0.4229 0.4458 0.4524 0.5780 0.4794 0.4948 0.5323 0.5153 0.5372 0.5265Theil index 14.3% 0.1109 0.0946 0.0747 0.0748 0.0856 0.0871 0.0839 0.0935 0.0989 0.1525 0.1128 0.1194 0.1298 0.1268 0.1260 0.1302Theil 2 index 15.8% 0.1144 0.0946 0.0744 0.0749 0.0858 0.0888 0.0831 0.0928 0.0959 0.1489 0.1072 0.1134 0.1266 0.1212 0.1260 0.1254Decile ratio N/ A 5.65 4.64 3.88 3.89 4.37 4.39 4.24 4.58 4.85 7.12 5.42 5.79 6.27 6.23 6.05 6.35Mean income 170.6 186.0 173.7 188.5 234.7 217.3 210.4 161.4 167.0 116.8 172.2 155.0 103.3 116.1 91.1 113.4
Table A-2. Income shares by decile, former Soviet Union, 1990
Deciles USS
R
Rus
sia
Ukr
aine
Bela
rus
Esto
nia
Latv
ia
Lith
uani
a
Mol
dova
Arm
enia
Aze
rbai
jan
Geo
rgia
Kaz
akhs
tan
Uzb
ekis
tan
Kyr
gyzs
tan
Tajik
istan
Turk
men
istan
Decile1 3.58% 4.27% 4.71% 4.76% 4.48% 4.51% 4.48% 4.28% 4.04% 3.20% 3.70% 3.54% 3.45% 3.38% 3.62% 3.37%Decile2 5.44% 5.79% 6.16% 6.20% 5.99% 5.99% 5.97% 5.81% 5.80% 4.91% 5.50% 5.46% 5.30% 5.31% 5.46% 5.27%Decile3 6.59% 6.81% 7.13% 7.15% 6.99% 6.97% 6.98% 6.82% 6.81% 5.96% 6.56% 6.51% 6.38% 6.47% 6.47% 6.42%Decile4 7.62% 7.74% 7.99% 8.01% 7.88% 7.86% 7.89% 7.75% 7.73% 6.96% 7.56% 7.50% 7.35% 7.47% 7.34% 7.41%Decile5 8.65% 8.67% 8.86% 8.87% 8.77% 8.73% 8.80% 8.67% 8.67% 8.05% 8.57% 8.52% 8.35% 8.46% 8.23% 8.39%Decile6 9.74% 9.68% 9.80% 9.78% 9.72% 9.66% 9.77% 9.66% 9.68% 9.28% 9.68% 9.63% 9.44% 9.56% 9.26% 9.48%Decile7 10.98% 10.82% 10.86% 10.81% 10.77% 10.71% 10.87% 10.81% 10.87% 10.74% 10.96% 10.92% 10.74% 10.85% 10.56% 10.78%Decile8 12.50% 12.22% 12.17% 12.07% 12.01% 11.99% 12.20% 12.25% 12.34% 12.62% 12.57% 12.55% 12.43% 12.49% 12.28% 12.46%Decile9 14.64% 14.19% 14.04% 13.84% 13.84% 13.77% 14.04% 14.35% 14.43% 15.45% 14.86% 14.90% 14.95% 14.93% 14.85% 14.99%Decile10 20.25% 19.83% 18.29% 18.51% 19.55% 19.82% 19.00% 19.60% 19.63% 22.82% 20.05% 20.48% 21.62% 21.06% 21.92% 21.43%
24
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
Table A-3. Correlation coefficients between various inequality measures
Gin
i-coe
ffici
ent
Varia
nce
Thei
l ind
ex
Thei
l 2 in
dex
Dec
ile ra
tio
Gini-coefficient 1Variance 0.99512 1Theil index 0.99873 0.993 1Theil 2 index 0.99791 0.9969 0.9964 1Decile ratio 0.99517 0.9908 0.998 0.9932 1
25
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
0.E+00
2.E+00
4.E+00
6.E+00
8.E+00
1.E+01
1.E+01
1.E+01
- 100 200 300 400 500 600
Rubles
USSR
Russia
Ukraine
Belarus
Estonia
Latvia
Lithuania
Moldova
Armenia
Azerbaijan
Georgia
Kazakhstan
Uzbekistan
Kyrgyzstan
Tajikistan
Turkmenistan
Figure A-3. Income distribution density, former Soviet Union, 1990
5/7/2023 document.doc
26
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
30 36 44 53 64 78 94 114
138
167
202
244
295
358
472
USSRRussiaUkraineBelarusEstoniaLatviaLithuaniaMoldova
ArmeniaAzerbaijanGeorgiaKazakhstan
UzbekistanKyrgyzstanTajikistan
Turkmenistan
0.0%
2.0%
4.0%
6.0%
8.0%
10.0%
Figure A-4. Histogram of income distribution, former Soviet Union, 1990
5/7/2023 document.doc
27
Y. Dikhanov, Decomposition of Inequality Based on Incomplete Information
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
0.0 100.0 200.0 300.0 400.0 500.0 600.0 700.0 800.0 900.0 1000.0
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
Figure A-5. Graphical representation of the Theil index for combined distribution and between-group Theil index, former Soviet Union, 1990.
5/7/2023 document.doc
28