THE AUSTRALIAN NATIONAL UNIVERSITY
WORKING PAPERS IN ECONOMICS AND ECONOMETRICS
Nonparametric Density Estimation for Stratified Samples
Robert Breunig The Australian National University∗
Working Paper No. 457
First version: February, 2001 Current version: November, 2005
ISBN: 086831 457 9
∗ Center for Economic Policy Research, Economics Program, Research School of Social Sciences, The Australian National University, Canberra ACT 0200, AUSTRALIA phone: +61 (0)2 6125-2148; fax: +61 (0)2 6125-0087, e-mail: [email protected]
Nonparametric Density Estimation forStrati�ed Samples
Robert BreunigThe Australian National University�
First version: February, 2001Current version: November 1, 2005
Abstract
In this paper, we consider the non-parametric, kernel estimate of the density,f(x), for data drawn from strati�ed samples. Much of the data used by socialscientists is gathered in some type of complex survey violating the usual assump-tions of independently and identically distributed data. Such e¤ects induced bythe survey structure are rarely considered in the literature on non-parametricdensity estimation, yet they may have serious consequences for our analysis, asshown in this paper.
A weighted estimator is developed which provides asymptotically unbi-ased density estimation for strati�ed samples. A data-based method for choosingthe optimal bandwidth is suggested, using information on within-stratum vari-ances and means. The weighted estimator and proposed bandwidth are shownto give smaller mean squared error for strati�ed samples than an unweightedestimator and a commonly used method of choosing the bandwidth. Severalillustrations from simulation are provided. We also show that the optimalsampling scheme in this case is always strati�ed sampling proportional to size,irrespective of the stratum-speci�c densities.
Keywords: nonparametric density estimation, bandwidth selection, strati�ed sam-pling, optimal sampling
JEL Classi�cation: C14, C42
.
�Center for Economic Policy Research, Economics Program, Research School of Social Sciences,The Australian National University, Canberra ACT 0200, AUSTRALIAphone: +61 (02) 6125-2148; fax: +61 (02) 6125-0087, e-mail: [email protected] am grateful to Aman Ullah for his sugestions and comments on this paper. I have also bene-�tted from conversations with Chris Skeels and Nilanjana Roy and the comments of participantsin seminars at Georgia State University, the U.S. Bureau of Labor Statistics, University of Califor-nia, Riverside, the Econometric Society Australasian Meetings 1998, University of Sydney and theUniversity of New South Wales.
� ����������
Wkh surshuwlhv ri nhuqho0edvhg qrq0sdudphwulf ghqvlw| hvwlpdwlrq duh zhoo0nqrzq iru
lqghshqghqw dqg lghqwlfdoo| glvwulexwhg +l1l1g1, gdwd1 Wkhlu xvh kdv ehfrph txlwh
frpprq lq dssolhg hfrqrphwulf zrun iru erwk ghvfulswlyh dqg dqdo|wlf sxusrvhv1 \hw
wkh ydvw pdmrulw| ri dssolhg zrun lq ghyhorsphqw dqg oderu +dv zhoo dv lq rwkhu duhdv,
xvhv vxuyh| gdwd zklfk ylrodwh wkh l1l1g1 dvvxpswlrq1 Lq frqwudvw wr wkh wh{werrn
dvvxpswlrq ri d udqgrp vdpsoh iurp dq lq�qlwh srsxodwlrq/ wkh gdwd vhwv zklfk
hfrqrplvwv xvh duh jhqhudwhg xvlqj vrph w|sh ri frpsoh{ vdpsolqj ghvljq1
Vwudwl�hg vdpsolqj lv suredeo| wkh prvw frpprqo| hqfrxqwhuhg vdpsolqj ghvljq
lq gdwd xvhv e| dssolhg hfrqrplvwv1 +Ghwdlov ri vwudwl�hg vdpsolqj dqg pruh frpsoh{
vdpsolqj vfkhphv pd| eh irxqg lq wkh hfrqrphwulf olwhudwxuh lq Sxgqh| +4<;<,/
Ghdwrq +4<<:,/ Xoodk dqg Euhxqlj +4<<;, ru lq rqh ri wkh wudglwlrqdo vwdwlvwlfv wh{wv
vxfk dv Nlvk +4<98, ru Wkrpsvrq +4<<5,1, Wkrxjk vwudwl�hg vdpsolqj pd| eh txlwh
frpsolfdwhg lq dssolfdwlrq/ wkh sulpdu| h�hfw ri vxfk vdpsolqj lv wkdw wkh srsxodwlrq
hohphqwv hqwhu wkh vdpsoh zlwk xqhtxdo suredelolwlhv1 Wkhuhiruh/ lq rughu wr hvwlpdwh
prgho sdudphwhuv zh qhhg wr dffrxqw iru wkhvh xqhtxdo vdpsolqj suredelolwlhv1
Wkh sxusrvh ri wklv sdshu lv wr ehjlq wkh wdvn ri ghyhorslqj qrq0sdudphwulf ghq0
vlw| hvwlpdwlrq iru vwudwl�hg vxuyh| gdwd1 Euhxqlj +5334, h{whqgv wkh qrq0sdudphwulf
nhuqho hvwlpdwru wr foxvwhuhg gdwd dqg ghprqvwudwhv wkh odujh srlqwzlvh eldv zklfk
uhvxowv iurp ljqrulqj wkh foxvwhulqj lq wkh gdwd1 Wklv sdshu frpsohphqwv wkdw zrun
e| frqvlghulqj vwudwl�hg vdpsolqj1
Wkh sodq ri wklv sdshu lv dv iroorzv1 Zh jlyh d eulhi lqwurgxfwlrq wr qrqsdud0
phwulf ghqvlw| hvwlpdwlrq1 Zh wkhq surfhhg wr wkh pdlq uhvxow= ghyhorsphqw ri
d zhljkwhg qrq0sdudphwulf ghqvlw| hvwlpdwru zklfk lv dv|pswrwlfdoo| xqeldvhg iru
vwudwl�hg vdpsohv1 Lqfrusrudwlqj wkh vdpsolqj lqirupdwlrq lqwr wkh fkrlfh ri edqg0
zlgwk vhohfwlrq/ zh surylgh wkh rswlpdo edqgzlgwk iru wkh fdvh zkhuh wkh gdwd lq
hdfk vwudwxp duh qrupdoo| glvwulexwhg1 Zh h{dplqh wkh surshuwlhv ri wkh sursrvhg
edqgzlgwk qxphulfdoo| dqg wkurxjk vlpxodwlrq1 Zh wkhq ghulyh wkh rswlpdo vdp0
solqj doorfdwlrq iru wkh vwudwl�hg ghqvlw| hvwlpdwru1 Vxusulvlqjo|/ lw gl�huv iurp wkh
rswlpdo doorfdwlrq iru hvwlpdwlrq ri wkh phdq1
4
� ����������� ���� ��������
Wkh sureohp ri hvwlpdwlqj wkh ghqvlw| ri d udqgrp yduldeoh jlyhq d vdpsoh ri gdwd kdv
orqj ehhq ri lqwhuhvw wr hfrqrplvwv1 Zkhq h{dplqlqj yduldeohv vxfk dv ulyhu rz ru
lqfrph glvwulexwlrq/ ghqvlw| hvwlpdwlrq doorzv wkh frpsdulvrq ri glvwulexwlrqv dfurvv
uhjlrqv dqg ryhu wlph1 Ghqvlw| hvwlpdwlrq fdq dovr eh d xvhixo ghvfulswlyh wrro wr
jlyh d ylvxdo slfwxuh ri gdwd1
Wudglwlrqdoo|/ ghqvlw| hvwlpdwlrq lqyroyhg wkh vshfl�fdwlrq ri d sduwlfxodu ixqf0
wlrqdo irup frpelqhg zlwk hvwlpdwhv ri sdudphwhuv wkurxjk pd{lpxp olnholkrrg ru
vrph rwkhu phwkrg1 Iru h{dpsoh/ li dq xqghuo|lqj glvwulexwlrq lv nqrzq wr eh qru0
pdo/ lw vx!fhv wr hvwlpdwh wkh phdq dqg vwdqgdug ghyldwlrq e| xvxdo phwkrgv wr
ixoo| fkdudfwhul}h wkh ghqvlw|1 Orj0qrupdo/ h{srqhqwldo/ Sduhwr/ dqg Jdppd glvwul0
exwlrqv kdyh doo ehhq xvhg dv sdudphwulf vshfl�fdwlrqv iru h{dplqlqj wkh glvwulexwlrq
ri lqfrph1
Wkh juhdw gudzedfn ri sdudphwulf phwkrgv/ krzhyhu/ lv wkdw wkh| uhtxluh nqrzo0
hgjh ri wkh wuxh xqghuo|lqj ghqvlw|1 Wklv lv uduho|/ li hyhu/ nqrzq1 Lq h{dplqlqj
lqfrph glvwulexwlrq fkdqjhv ryhu wlph/ gl�huhqw sdudphwulf ghqvlwlhv pd| vkrz gli0
ihuhqw fkdqjhv lq glvwulexwlrq/ wkxv pdnlqj dqdo|vlv vhqvlwlyh wr wkh sdudphwul}dwlrq
fkrvhq1
Qrq0sdudphwulf ghqvlw| hvwlpdwlrq/ rq wkh rwkhu kdqg/ doorzv iru hvwlpdwlrq ri
yduldeoh ghqvlwlhv zlwkrxw wkh lpsrvlwlrq ri sduwlfxodu glvwulexwlrqdo irupv1 Wkh klv0
wrjudp lv d srsxodu qrq0sdudphwulf phwkrg ri dqdo|}lqj ghqvlwlhv1 Krzhyhu/ lw lv qrw
d vprrwk uhsuhvhqwdwlrq ri gdwd +zklfk lq dgglwlrq wr ehlqj dhvwkhwlfdoo| ghvludeoh lv
lpsruwdqw zkhq uhsuhvhqwlqj frqwlqxrxv glvwulexwlrqv, dqg lwv pdq| rwkhu pdwkhpdw0
lfdo glvdgydqwdjhv kdyh ohg wr dq h{whqvlyh olwhudwxuh rq vprrwkhg/ qrq0sdudphwulf
hvwlpdwhv ri ghqvlw| ehjlqqlqj zlwk wkh lpsruwdqw frqwulexwlrqv ri Urvhqeodww +4<89,
dqg Sdu}hq +4<95,1 Iru vxppdulhv ri wkh qrq0sdudphwulf olwhudwxuh dqg vxevhtxhqw
ghyhorsphqwv/ vhh Vloyhupdq +4<;9,/ Kdugoh +4<<3,/ Sdjdq dqg Xoodk +4<<<,1
Lq wkh qh{w vhfwlrq/ zh h{whqg wkh qrq0sdudphwulf whfkqltxhv ri ghqvlw| hvwlpdwlrq
wr wkh dqdo|vlv ri gdwd jdwkhuhg xvlqj vwudwl�hg vxuyh| phwkrgv1
5
��� ������� ��� ����
Ohw xv frqvlghu ghqvlw| hvwlpdwlrq iru gdwd fkrvhq xqghu vwudwl�hg vdpsolqj1 Frqvlghu
wkh iroorzlqj srsxodwlrq prgho/
\�� > l @ 4> = = =>P m @ 4> = = =>Q� =
Wkh wrwdo qxpehu ri hohphqwv lq wkh srsxodwlrq lvS
Q� @ Q dqg wkh sursruwlrq ri
hohphqwv lq hdfk vwudwxp/ l/ lv �� @��
�1 Zh wuhdw wkh �qlwh srsxodwlrq zlwklq hdfk
vwudwxp dv dq l1l1g1 udqgrp revhuydwlrq iurp dq lq�qlwh +vxshu, srsxodwlrq zlwk
ghqvlw| jW� > zlwk phdq �� dqg yduldqfh �2� 1 Zh zloo rqo| uhvwulfw wkhvh ghqvlwlhv e|
wkh uhtxluhphqw wkdw wkh �uvw wzr prphqwv h{lvw dqg duh �qlwh iru hdfk vwudwxp1
Qrz frqvlghu d vdpsoh/ zkhuh q� hohphqwv duh gudzq lqghshqghqwo| iurp hdfk
vwudwxp +l1h1 d vwudwl�hg vdpsoh,1 Wkh wrwdo vdpsoh vl}h lvS
q� @ q= Wkh q� pd|
ru pd| qrw eh htxdo1 Vlqfh erwk wkh q� dqg wkh �� pd| ydu|/ wkh vdpsoh lqfoxvlrq
suredelolwlhv duh qrw htxdo iru doo hohphqwv lq wkh vdpsoh1 Wkh| zloo krzhyhu/ eh htxdo
iru doo hohphqwv lq wkh vdph vwudwxp1 Wkh suredelolw| wkdw wkh m0wk hohphqw lq wkh
l0wk vwudwxp lv lqfoxghg lq wkh vdpsoh lv ��� @ �� @?���
1
Urvhqeodww*v +4<89, nhuqho hvwlpdwru iru wkh ghqvlw| lq wkh lwk vwudwxp/ edvhg rq
wkh vdpsoh ri vl}h q� pd| eh zulwwhq dv
ej�+|, @ 4
k?Q���
��[�'�
g�N� +4,
zkhuh N� @ N+t��3+
�?, lv d nhuqho ixqfwlrq dqg �� @ q�@Q�= Wkh g� duh htxdo wr rqh
li \�� lv lq wkh vdpsoh dqg }hur rwkhuzlvh1 Iru wkh fdvh zkhuh wkh �qlwh srsxodwlrq
lq wkh lwk vwudwxp \�� > m @ 4> ===>Q� lv dq l1l1g1 udqgrp revhuydwlrq iurp dq lq�qlwh
srsxodwlrq zlwk ghqvlw| jW� zh dvvxph wkdw +q�> Q�,$4 vxfk wkdw �� $ f A 3 dqg
k? vdwlv�hv olp?�<��
k? @ k� / lq zklfk fdvh
Hej�+|, @ +Q�,3�Q+H g�,+H4
k�N+
\� � \
k�,, +5,
$ jW�
dqg
+qk,�*2 Y +ej�+|,,$ jW�
]N2
+#,g#= +6,
Wkh ghwdlov ri +4, wkurxjk +6, fdq eh zrunhg rxw e| iroorzlqj Urvhqeodww +4<89, ru
Sdjdq dqg Xoodk +4<<;,1
6
Wkh sdudphwhu ri lqwhuhvw lv wkh ryhudoo glvwulexwlrq ri gdwd lq wkh srsxodwlrq/
i+\ , @[
��jW
� = +7,
Iru hdvh ri h{srvlwlrq/ zh gurs wkh - lq wkh qrwdwlrq wkdw iroorzv1
Xvlqj wkh vdpsoh gdwd iurp doo vwudwd +o @ 4> = = = > q,/ wkh xvxdo hvwlpdwru iru wkh
ghqvlw| dw d srlqw | lv
ai+|, @4
Q�k?
�[,'�
g,N, +8,
zkhuh k lv wkh zlqgrz zlgwk/ dqg wkh nhuqho N +�, lv d v|pphwulf ixqfwlrq zklfk
vdwlv�hv=
+D4, +l,UN+#,g# @ 4
+ll,U#N+#,g# @ 3
+lll,U#2N+#,g# @ �
2?4
Zh fdq uh0zulwh +8, dv
ai+|, @4
qk
�[�'�
?�[�'�
N
�|�� � |
k
�@
�[�'�
q�
qej�+|, +9,
zkhuh ej�+|, lv wkh hvwlpdwh ri wkh ghqvlw| iru vwudwxp l dw wkh srlqw |1 Wklv hvwlpdwh
ri wkh srsxodwlrq ghqvlw|/ i+|,> lv wkxv d vdpsoh0zhljkwhg dyhudjh ri wkh ghqvlw|
hvwlpdwhv iru hdfk vwudwxp1 Wklv hvwlpdwru zloo qrw eh xqeldvhg iru wkh sdudphwhu ri
lqwhuhvw1 Wr vhh wklv/ zh zulwh
H ai+|, @�[�'�
q�
qHej�+|, +:,
dqg e| Wd|oru*v vhulhv h{sdqvlrq
Hej�+|, @ j�+|, . eldv�+k, +;,
zkhuh eldv�+k, uhsuhvhqwv eldv whupv zklfk zloo ghshqg xsrq k1 Wklv surylghv
H ai+|, @�[�'�
q�
qj�+|, .
�[�'�
q�
qeldv�+k,= +<,
Xvxdoo| zh fkrrvh k vxfk wkdw k $ 3 dv q� $ 4/ wkhuhiruh wkh eldv�+k, whupv zloo
ehfrph vpdoo dv wkh q� ehfrph odujh1 Hyhq wkhq/ krzhyhu/ zh vwloo kdyh eldv dulvlqj
7
iurp wkh idfw wkdw zh duh lpsolflwo| zhljkwlqj wkh vwudwxp0vshfl�f ghqvlwlhv e| wkh
vdpsoh sursruwlrqv1
Lw lv wkxv fohdu wkdw wkh ghqvlw| hvwlpdwh/ ai+|,/ zloo rqo| eh dv|pswrwlfdoo| xqel0
dvhg iru +7, zkhq
+l,q�
Q�
@q
Q
ru
+ll, j� @ j ;l= +43,
Wkhvh frqglwlrqv duh xqolnho| wr eh phw lq prvw vxuyh|v1 Lw lv d frpprq ihdwxuh ri
vxuyh|v wkdw vdpsolqj lv glvsursruwlrqdwh/ ylrodwlqj frqglwlrq +l,1 Hyhq zkhq wkh
ruljlqdo vxuyh| ghvljq lv vxfk wkdw wkh vdpsoh lqfoxvlrq suredelolwlhv duh htxdo lq doo
vwudwd/ ydu|lqj udwhv ri qrq0uhvsrqvh dqg rwkhu idfwruv xvxdoo| pdnh wkh vdpsolqj
glvsursruwlrqdwh1 Wklv lv riwhq d ghvluhg wudlw zkhq sduwlfxodu srsxodwlrqv ri lqwhuhvw
duh vdpsohg pruh khdylo| uhodwlyh wr wkh uhvw ri wkh srsxodwlrq +VLSS ri wkh XV
Fhqvxv/ iru h{dpsoh, ru zkhq frvw uhvwulfwv vdpsolqj1 +l1h1 wkh fdvh ri OVPV gdwd
iurp wkh Zruog Edqn zkhuh orzhu frvw ri vdpsolqj lq xuedq duhdv ohdgv wr kljkhu
vdpsolqj sursruwlrqv lq wkhvh duhdv1, Wkrxjk zh duh lqwhuhvwhg lq dq ryhudoo hvwlpdwh
ri wkh ghqvlw|/ lw lv sureohpdwlf wr dvvxph wkdw yduldeohv ri lqwhuhvw zloo eh lghqwlfdoo|
glvwulexwhg lq gl�huhqw vwudwd1 Ljqrulqj hlwkhu wklv glv0sursruwlrqdolw| lq wkh vxuyh|
ghvljq ru wkh gl�huhqfhv ehwzhhq vwudwd zloo ohdg wr eldvhg hvwlpdwlrq/ hyhq lq wkh
vlpsoh fdvh ri qrq0sdudphwulf nhuqho ghqvlw| hvwlpdwlrq1
Wkh vroxwlrq lv d zhljkwhg hvwlpdwru
ai�+|, @4
kS
z�
�[�'�
?�[�'�
z�N
�|�� � |
k
�+44,
zkhuh z����
?�1 Zh vhw wkh zhljkwv sursruwlrqdo wr wkh lqyhuvh ri wkh vhohfwlrq sure0
delolwlhv1 Li zh ixuwkhu uhtxluh wkdwS
z� @ 4/ wkhq z� @��
�?�1 Wkhq
ai�+|, @
�[�'�
Q�
Qej�+|, @ �[
�'�
��ej�+|, +45,
Dv qrwhg deryh/ krzhyhu/ wklv lv qrw xqeldvhg iru +7, vlqfh ej�+|, lv qrw
xqeldvhg iru j�1 Wklv eldv zloo ghshqg xsrq wkh fkrlfh ri zlqgrz zlgwk/k1 Zulwlqj
eldv� iru eldv�+k,/ zh fdq zulwh
ai�+|, @
�[�'�
��+j� . eldv�, +46,
8
dqg
eldv�ai�+|,
�@
�[�'�
��eldv� +47,
zkhuh wkh w|slfdo eldv whup xswr R+k2,zloo ghshqg rq wkh vhfrqg ghulydwlyh ri wkh
wuxh xqghuo|lqj ghqvlw|
eldv� @k2
5j����2= +48,
Dvvxplqj wkdw wkh vdpsolqj lv lqghshqghqw ehwzhhq vwudwd +zklfk lv xvxdoo| wkh
fdvh,/ zh fdq dovr zulwh
Y du�ai�+|,
�@ �2
�ydu +ej
�, . �2
2ydu +ej
2, . ===. �2
�ydu +ej� , +49,
dqg xswr R+ �
?�,
]Y du
�ai�+|,
�g| @
4
k
�]+N+#,,2 g#
� �[�'�
�2�
q�= +4:,
Vloyhupdq +4<;9, surylghv ghwdlov ri wkh qrq0vwudwl�hg fdvh iru vdpsolqj zlwk
uhsodfhphqw1 Li zh frqvlghu hdfk vwudwxp dv vxfk d vdpsoh/ lw lv wkhq vwudljkwiruzdug
wr zrun rxw +46, wkurxjk +4:,1
Sursrvlwlrq 4= Li wkh ghqvlwlhv ri vwudwd 4 wkurxjk P duh jlyhq dv j�wkurxjk
j� > wkh srsxodwlrq ghqvlw| i+|, lv hvwlpdwhg xvlqj d nhuqho ghqvlw| vdwlvi|lqj +D4,/
dqg d vwudwl�hg vdpsoh ri gdwd lv gudzq lqghshqghqwo| lq hdfk vwudwxp/ wkhq wkh
zlqgrz zlgwk zklfk plqlpl}hv wkh phdq0vtxduhg huuru ri ei+|, zloo ehkr| @
��22
�3
�
D
3EC597]
�
+N+#,,2 g#
6:84FD
�
D #�[�'�
�2�
q�
$�
D
3C]+
%�[�'�
���j���
�&2g|
4D3�
D
= +4;,
Surri= xvlqj +46, wkurxjk +4:, zh zulwh wkh phdq vtxduhg huuru ri ai�+|, dv Y du�ai�+|,
�.�
eldv�ai�+|,
��2
= Wkh lqwhjudwhg phdq vtxduhg huuru lv wkhq]+
�Y du
�ai�+|,
�.�eldv
�ai�+|,
��2
�g|=
Zh plqlpl}h wklv h{suhvvlrq zlwk uhvshfw wr k wr jhw wkh uhvxow lq Sursrvlwlrq 41
Lq rughu wr lpsohphqw wklv uhvxow/ zh qhhg wr nqrz wkh vhfrqg ghulydwlyh ri wkh
wuxh xqghuo|lqj ghqvlw|1 Ri frxuvh/ wklv zloo qrupdoo| qrw eh dydlodeoh1 Rqh vroxwlrq
wr wklv sureohp lv wr vshfli| d idplo| ri glvwulexwlrqv zklfk zloo doorz d ydoxh wr eh
9
dvvljqhg wr wkh whupU+
kS�
�'��� +j
���,l2
g| lq +4;,1 Iru wkh l1l1g1 fdvh/ lw kdv ehhq
vkrzq wkdw zkhq i+|, lv qrupdoo| glvwulexwhg wkdw wkh rswlpdo zlqgrz zlgwk zloo eh
kW@4139�q3
�
D zkhuh � lv wkh vwdqgdug ghyldwlrq ri |1 Wklv fkrlfh ri zlqgrz zlgwk
lv frpprqo| hpsor|hg lq hfrqrphwulfv vriwzduh sdfndjhv +Vkd}dp/ iru h{dpsoh, dqg
lv d iuhtxhqwo| xvhg vwduwlqj srlqw iru rwkhu edqgzlgwk vhohfwlrq whfkqltxhv vxfk dv
furvv0ydolgdwlrq1
Khuh lw lv qdwxudo wr dvn zkhwkhu d vlplodu uhihuhqfh zlqgrz zlgwk fdq eh ghulyhg
edvhg xsrq xqghuo|lqj qrupdo glvwulexwlrqv lq doo ri wkh vwudwd1 Fruroodu| 4 jlyhv
wkh ydoxh ri wkdw uhihuhqfh zlqgrz zlgwk1
Fruroodu| 4= Li j� wkurxjk j� duh qrupdoo| glvwulexwhg zlwk phdq �� dqg ydul0
dqfh �2�dqg wkh ghqvlw| lv hvwlpdwhg xvlqj d vwdqgdug qrupdo nhuqho/ wkhq wkh rswlpdo
zlqgrz zlgwk +lq wkh phdq vtxduhg huuru vhqvh, zloo eh
kr| @ 3=;:
#�[�'�
�2�
q�
$�
D
+�� . �2,3 �
D +4<,
zkhuh �� lv d zhljkwhg vxp ri vwudwxp0vshfl�f vwdqgdug ghyldwlrqv
�� @6
;
�[�'�
�2��3D�
dqg �2 lv d zhljkwhg vxp ri d ixqfwlrq ri wkh glvwdqfh ehwzhhq vwudwxp phdqv
�2 @�[�'�
�[, �'�
���,
��2�. �2
,
�3D
2
s5
+6� 9
+�� � �,,2�
�2�. �2
,
� . +�� � �,,e�
�2�. �2
,
�2
,h3 �
2
E>�3>,�2
+j2�nj2,,
Surri= Iru wkh fdvh ri d vwdqgdug qrupdo nhuqho �22@ 4 dqg
U�
+N+#,,2 g# @ �
2IZ1
Zh fdq zulwh
]+
%�[�'�
���j���
�&2g| @
]+
�[�'�
�2�
�j���
�2g| .
]+
�[�'�
�[, �'�
�,���j���
� �j��,
�g|
dqg iru qrupdo ghqvlwlhv uhsodfh j���zlwk
�
j��
I2Z
�4�
�+3>
�
j�
�2�h3 �
2
�+3>�
j�
�2= Wkhq wkh
�uvw whup/U+
S�
�'��2�+j��
�,2/ ehfrphv
�
HIZ
S�
�'��3D��2�1 Wkh vhfrqg whup fdq eh fdofx0
odwhg e| lqwhjudwlqj wkh surgxfw ri j���dqg j��
�1 Xvlqj wkhvh uhvxowv/ fdofxodwh �� dqg
�2 dqg uhsodfh lq wkh irupxod iru kr|1
Zh qrwh wkdw wkh rswlpdo zlqgrz zlgwk lv lqyhuvho| sursruwlrqdo wr d zhljkwhg
vxp ri wkh vwudwd vdpsoh vl}hv/ q�1 Lq wkh fdvh zkhuh q� @?
�dqg �� @
�
�/ wkhq
:
S�
�'�
w2�
?�@ q dqg wkh zlqgrz zlgwk zloo eh sursruwlrqdo wr q
3
�D dv lq wkh qrq0
vwudwl�hg fdvh/ exw wkh sursruwlrqdolw| frqvwdqw zloo gl�hu iurp wkh xvxdo 4139�1
Zkhq vwudwd vkduh frpprq phdqv dqg yduldqfhv/ dqg wkh srsxodwlrq dqg vdpsoh
sursruwlrqv duh htxdo lq doo vwudwd/ wklv uhvxow froodsvhv wr wkh xvxdo rswlpdo zlqgrz
zlgwk iru qrupdo ghqvlw|= kW
@ 4=39�q3
�D = Qrwh wkdw krzhyhu li �� @ � iru doo vwudwd/
wkdw wkh srsxodwlrq vwdqgdug ghyldwlrq pd| vwloo gl�hu iurp � dqg kr| 9@4139�q3
�D 1
Zkhq vwudwd vkduh frpprq phdqv dqg yduldqfhv/ kr| @ 4=39�
�S�
�'�
w2�
?�
��D= Wkxv
hyhq lq wkh fdvh ri krprjhqhrxv srsxodwlrqv lq doo vwudwd/ wkh rswlpdo zlqgrz zlgwk lv
gl�huhqw wkdq wkh xvxdo kW
xqohvv �� @?�
?1 Wklv lv dqdodjrxv wr wkh fdvh ri hvwlpdwlrq
ri wkh phdq/ zkhuh hyhq zkhq doo vwudwd kdyh lghqwlfdo phdqv/ wkh yduldqfh ri wkh
hvwlpdwru | lv gl�huhqw iru d vwudwl�hg vdpsoh wkdq iru d vlpsoh udqgrp vdpsoh1
Lq sudfwlfh zh fdq uhsodfh �� zlwk vrph frqvlvwhqw hvwlpdwru olnh
tv2
�dqg �� zlwk
lwv hvwlpdwh/ |1
51414 Qxphulfdo Surshuwlhv
Wr h{dplqh wkh h�hfwv ri xvlqj wkh rswlpdoo| fkrvhq zlqgrz zlgwk/ kr|/ dv rssrvhg
wr kW
/ wkh uhihuhqfh zlqgrz zlgwk iru wkh l1l1g1 fdvh/ zh frqvlghu wkh vlpsohvw fdvh
ri wzr vwudwd1 Frqvlghu wzr qrupdoo| glvwulexwhg vwudwd zlwk phdqv ��dqg �
2dqg
vwdqgdug ghyldwlrqv �� dqg �21 Lq Iljxuh 4/ zh vhh wkh h�hfwv ri wkh zlqgrz zlgwk dv
zh ydu| wkh phdqv ri wkh wzr vwudwd/ kroglqj wkh vwdqgdug ghyldwlrqv frqvwdqw1 Wkhq
zh krog phdqv frqvwdqw/ zkloh doorzlqj wkh vwdqgdug ghyldwlrqv ri wkh wzr vwudwd wr
ehfrph lqfuhdvlqjo| gl�huhqw1
Iru wkh fdvh ri wzr vwudwd/ zh fdq zulwh wkh zlqgrz zlgwk lq +4<, dv d ixqfwlrq
ri wkh gl�huhqfh lq phdqv/ ! @ �2� �
�dqg wkh gl�huhqfh lq vwdqgdug ghyldwlrqv/
� @j2
j�1 Lq wkdw fdvh/ �
�dqg �
2ehfrph
�� @6
;�D
�
��2
�. �
3D
�2
2
�dqg
�2 @
s5���2
��2
�+4 . �,
�3
D2
+6� 9
!2
�2
�+4 . �,
.!e�
�2
�+4 . �,
�2
,h3
�2
�2
j2�E�n��
Iru wkh sxusrvh ri wkh looxvwudwlrq/ zh �{ �� @ �2 @�
2> q� @ q2 @ 833/ dqg �
2
�@ 41 Lq
Iljxuh 4d/ zh fdq vhh wkdw dv wkh gl�huhqfh ehwzhhq vwudwd phdqv lqfuhdvhv/ kW
jurzv
zlwkrxw erxqg1 kr| rq wkh rwkhu kdqg/ lqfuhdvhv iru d shulrg/ exw zloo dv|pswrwlfdoo|
;
dssurdfk 16387681 +Wklv lv olp�<"
kr| iru wklv vhw ri sdudphwhu ydoxhv1, Lqwxlwlyho|/
wkh rswlpdo hvwlpdwru/ kr|/ lqfuhdvhv zkhq wkh frpelqhg vwudwd duh xqlprgdo/ exw
rqfh wkh phdqv duh idu hqrxjk dsduw iru wkh ghqvlw| wr h{klelw el0prgdolw|/ kr| ehjlqv
wr ghfuhdvh1 Wkh h�hfw ri wklv lv wkdw wkh ghqvlw| hvwlpdwlrq lv hvvhqwldoo| ehlqj frq0
gxfwhg vhsdudwho| rq hdfk vwudwxp/ wkh vpdoo zlqgrz zlgwk jlylqj qhdu }hur zhljkw
wr frpsdulvrqv ehwzhhq hohphqwv lq gl�huhqw vwudwd1 Iljxuh 4e surylghv wkh vdph
looxvwudwlrq iru vwudwd zlwk lghqwlfdo phdqv/ exw lqfuhdvlqjo| gl�huhqw vwdqgdug ghyl0
dwlrqv1 Djdlq/ kr|/ lv qrw jrlqj wr jurz zlwkrxw erxqg ehfdxvh lw wdnhv lqwr dffrxqw
wkh idfw wkdw wkh lqfuhdvlqj vdpsoh yduldwlrq lv wkh uhvxow ri wzr vwudwd zlwk wzr
gl�huhqw xqghuo|lqj glvwulexwlrqv1 +Lw lv srvvleoh wr vkrz wkdw olp�<"
kr|@1683;85;71,
Wdeoh 4 suhvhqwv wkh ydoxhv ri kW dqg kr| dw ydulrxv srlqwv iurp Iljxuh 41 Wkh
odvw froxpq ri Wdeoh 4 jlyhv wkh udwlr ri wkh dssur{lpdwh LPVH +xswr R+ �
?�,, riei� +|, xvlqj d vwdqgdug qrupdo nhuqho dqg hpsor|lqj erwk kW dqg kr|1 Zh frpsduh
wkhlu udwlr dv d phdvxuh ri wkh h!flhqf| orvv ri xvlqj kW1 Khuh wkh lqwhjudwhg phdq
vtxduhg huuru lv qxphulfdoo| fdofxodwhg xvlqj
LPVHk ei� +|,l @
]%
�Y du
�ai�+|,
�.�eldv
�ai�+|,
��2�g{
@
]%
;?=4
k
�]+N+#,,2 g#
� �[�'�
�2�q�
.ke
7�22
%�[�'�
��j��
�
&2<@>g{
dqg wkh dssursuldwh ydoxhv iru k dqg wkh rwkhu yduldeohv edvhg xsrq d vwdqgdug
qrupdo nhuqho/ dqg wzr qrupdo ghqvlwlhv zlwk �� @ 3/ �� @ 4/ dqg phdq dqg vwdqgdug
ghyldwlrq ri wkh vhfrqg vwudwxp dv vshfl�hg1 Wkh wrs kdoi ri wkh wdeoh frpsduhv
wkh orvv ri h!flhqf| iru wzr vwudwd zlwk htxdo vwdqgdug ghyldwlrqv dv wkh gl�huhqfh
ehwzhhq vwudwd phdqv lqfuhdvhv1 Lq wkh erwwrp kdoi ri wkh wdeoh/ wkh phdqv duh khog
frqvwdqw zkloh vwudwd vwdqgdug ghyldwlrqv ydu|1 Wkh uhvxowv iurp wkh odvw froxpq ri
Wdeoh 4 duh suhvhqwhg lq judsklfdo irup lq Iljxuhv 5 dqg 61
Jlyhq wkh xvh ri wkh zhljkwhg hvwlpdwru iru wkh ghqvlw|/ xvlqj wkh surshu zlqgrz
zlgwk jlyhv odujh lpsuryhphqwv lq phdq vtxduhg huuru ryhu wkh vwdqgdug uhihuhqfh
zlqgrz zlgwk1 Wklv lv wuxh hyhq zkhq wkh vdpsolqj lv sursruwlrqdo +wr vwudwxp
vl}h,1 Dv wkh wzr vwudwxpv ehfrph lqfuhdvlqjo| gl�huhqw +hlwkhu lq phdq ru lq
vwdqgdug ghyldwlrq, wkh jdlqv lq lqwhjudwhg phdq vtxduhg huuru ehfrph txlwh odujh1
Dv fdq eh vhhq lq Iljxuh 5/ iru wkh fdvh ri sursruwlrqdo vdpsolqj/ zkhq wkh
gl�huhqfh ehwzhhq phdqv lv juhdwhu wkdq 5/ xvlqj kW uhvxowv lq yhu| odujh h!flhqf|
orvvhv frpsduhg wr xvlqj kr|1 Wklv fruuhvsrqhqgv wr wkh uhvxowv suhvhqwhg lq wkh
<
vlpxodwlrq ehorz1 +Vhh Iljxuhv 8 dqg 91, Frpsdulqj Iljxuh 6 zlwk Iljxuh 5/ zh
vhh wkdw iru wkh fdvh ri glv0sursruwlrqdwh vdpsolqj/ wkh uhodwlyh orvv ri LPVH lv qrw
pxfk gl�huhqw wkdq lq wkh fdvh ri sursruwlrqdo vdpsolqj1 Dv zh zloo vhh iurp wkh
vlpxodwlrq/ krzhyhu/ wkh eldv lv pxfk juhdwhu xvlqj kW1 Vlqfh wkh h!flhqf| phdvxuh
frqvlghuhg khuh lqfoxghv lqwhjudwhg eldv/ lw lv shukdsv qrw d jrrg phdvxuh ri wkh
srlqwzlvh eldv iurp xvlqj kW1 Krzhyhu/ lw lv txlwh fohdu iurp wkh �jxuhv suhvhqwhg lq
wkh vlpxodwlrq h{huflvh ehorz wkdw wklv srlqwzlvh eldv zloo eh xqdffhswdeo| odujh1
51415 Vlpxodwlrq vwxg|
Lq d vlpxodwlrq vwxg| zh frqvlghu wkh sursrvhg rswlpdo zlqgrz zlgwk/ kr| yhuvxv
kW @ 4=39�q3�
D dqg k@ @ =< �PLQ+�> �?|eo3^�@o|�,e o@?}e
���e,= k@ kdv ehhq vkrzq wr eh
vxshulru wr kW iru pl{wxuhv ri qrupdov dqg elprgdo ghqvlwlhv/ vhh Vloyhupdq +4<;9,1
Iru fodulw|/ zh frqvlghu vdpsolqj iurp wzr vwudwd/ zkhuh wkh srsxodwlrq lq hdfk vwudwd
lv htxdo dqg wkh xqghuo|lqj ghqvlwlhv duh qrupdo zlwk phdq ��dqg vwdqgdug ghyldwlrq
��1 Iru wkh vlpxodwlrq/ zh �{ q� @ 4333> �� @ 3 dqg �� @ 4 zkloh ydu|lqj wkh vdpsoh
vl}h/ wkh phdq/ dqg wkh vwdqgdug ghyldwlrq ri vwudwxp 5 rqo|1 Sursruwlrqdo vdpsolqj
wkxv lpsolhv?2
?�@ 4/ rwkhuzlvh wkh vdpsolqj lv glvsursruwlrqdwh1
Iru sursruwlrqdo vdpsolqj/ zh frqvlghu wkh ehqfkpdun fdvh zkhq wkhuh lv qr
gl�huhqfh lq phdq ru vwdqgdug ghyldwlrq ehwzhhq wkh wzr vwudwd1 Zh wkhq frqvlghu
krz hvwlpdwlrq fkdqjhv xvlqj wkh sursrvhg kr| dv wkh gl�huhqfh ehwzhhq wkh wzr
vwudwd phdqv lqfuhdvhv/ dv wkh gl�huhqfh ehwzhhq wkh vwdqgdug ghyldwlrqv lqfuhdvhv/
dqg dv erwk fkdqjh1 Zh wkhq frqvlghu wkh vdph fdvhv iru glvsursruwlrqdwh vdpsolqj1
Zh frqgxfw 4333 uhshwlwlrqv iru hdfk fdvh1 Hdfk uhshwlwlrq lqyroyhv gudzlqj d
vdpsoh iurp wkh wzr vwudwd/ hvwlpdwlqj wkh wkuhh fdqglgdwh zlqgrz zlgwkv +kW> k@>
dqg kr|, edvhg xsrq wkh irupxodv deryh +zlwk � uhsodfhg e| v/ dqg � e| |,/ dqg
hvwlpdwlqj wkh qrq0sdudphwulf ghqvlw| dw 533 srlqwv1 Wkh �jxuhv jlyh wkh dyhudjh
hvwlpdwh ri wkh ghqvlw| ryhu wkh 4333 uhshwlwlrqv1 Wdeoh 5 suhvhqwv wkh dyhudjh fdo0
fxodwhg zlqgrz zlgwkv dqg wkh ydulrxv frpelqdwlrqv zklfk kdyh ehhq frqvlghuhg lq
wkh vlpxodwlrq h{huflvh1
Zh �uvw frqvlghu wkh fdvh ri sursruwlrqdo vdpsolqj +q� @ q2 @ 4333, zkhq
erwk vwudwd kdyh phdq }hur dqg yduldqfh rqh1 Dv fdq eh vhhq lq Iljxuh 7 dqg iurp
wkh fruuhvsrqglqj urz lq Wdeoh 5/ wkh dyhudjh kr| lv wkh vdph dv wkh dyhudjh kW
dqg wkh wzr dyhudjh ghqvlw| hvwlpdwhv duh lghqwlfdo1 Wklv lv dv h{shfwhg jlyhq wkh
43
glvfxvvlrq deryh1 Lq wklv fdvh/ wkh vwudwl�fdwlrq lv vsxulrxv vlqfh wkh wzr vwudwd duh
h{dfwo| lghqwlfdo1 Qrwh wkdw lq sudfwlfh/ krzhyhu/ wkh ghqvlw| hvwlpdwh xvlqj kW zloo eh
vxshulru wr wkdw xvlqj kr| vlqfh fdofxodwlrq ri kr| lqyroyhv frpsxwlqj wzr vwudwd phdqv
dqg wzr vwudwd vwdqgdug ghyldwlrqv lqvwhdg ri rqh wrwdo vdpsoh vwdqgdug ghyldwlrq1
Wkh hvwlpdwlrq ri irxu txdqwlwlhv lqvwhdg ri rqh lqwurgxfhv pruh yduldelolw| lqwr wkh
hvwlpdwh ri kr| wkdq kW=
Lq Iljxuhv 8 dqg 9/ zh frpsduh ghqvlw| hvwlpdwlrq zkhq erwk vwudwd kdyh vwdqgdug
ghyldwlrq htxdo wr rqh/ exw kdyh gl�huhqw phdqv1 +Wklv jlyhv d srsxodwlrq zklfk lv
d pl{wxuh ri qrupdov1, Zkhq wkh gl�huhqfh lq vwudwd phdqv lv vxfk wkdw wkh ryhudoo
srsxodwlrq ghqvlw| uhpdlqv xqlprgdo +Iljxuh 8,/ kr| shuirupv ehwwhu wkdq hlwkhu kW
ru k@1 Wkh lqwhjudwhg phdq vtxduhg huuru xvlqj kr| lv 5:( ohvv wkdq wkdw xvlqj kW1
+Vhh Iljxuh 5 dqg Wdeoh 4 lq vhfwlrq 515141,
Wkh lpsuryhphqw surylghg e| xvlqj kr| dv rssrvhg wr kW ru k@ lv gudpdwlf zkhq
wkh gl�huhqfh ehwzhhq wkh vwudwd phdqv jurzv dqg wkh ryhudoo ghqvlw| ehfrphv el0
prgdo1 Dv fdq eh vhhq lq Iljxuh 9/ kW whqgv wr ryhuvprrwk wkh shdnv1 k@ jlyhv
lpsuryhg shuirupdqfh dqg uhgxfhv wklv ryhu0vprrwklqj/ exw kr| fdq eh vhhq wr pdwfk
wkh shdnv hyhq ehwwhu wkdq hlwkhu kW ru k@1
Zkhq phdqv ehwzhhq vwudwd duh htxdo/ exw yduldqfhv gl�hu/ wkh vdph uhvxowv
krogv1 k@ lpsuryhv shuirupdqfh ryhu kW/ exw kr| pdwfkhv wkh ghqvlw| ehwwhu wkdq
hlwkhu1 Wklv fdvh lv frqvlghuhg lq Iljxuh :1 Iljxuh ; suhvhqwv wkh ghqvlw| hvwlpdwhv
zkhq erwk vwudwd phdqv dqg vwdqgdug ghyldwlrqv gl�hu/ exw vdpsolqj lv sursruwlrqdo1
Xvlqj kr| surylghv d pxfk ehwwhu pdwfk ri wkh wuxh xqghuo|lqj ghqvlw|/ vlqfh lw wdnhv
lqwr dffrxqw wkh gl�huhqw vwudwd0vshfl�f glvwulexwlrqv1
Dv qrwhg deryh/ sursruwlrqdo vdpsolqj zloo whqg wr eh wkh h{fhswlrq lq prvw
furvv0vhfwlrqdo gdwd vhwv xvhg e| hfrqrplvwv1 Wkh sursrvhg rswlpdo zlqgrz zlgwk/
kr| frpelqhg zlwk wkh zhljkwhg ghqvlw| hvwlpdwru ri +44,/ suryhv wr eh d yhu| srzhuixo
wrro iru qrq0sursruwlrqdo vdpsolqj1 Wklv lv h{dplqhg lq wkh uhpdlqlqj �jxuhv1
Zkhq wkh vdpsolqj lv qrw sursruwlrqdwh dqg wkh vwudwd gl�hu lq hlwkhu phdqv ru
yduldqfhv/ wkh xqzhljkwhg hvwlpdwru zloo eh eldvhg dv glvfxvvhg deryh1 Wklv lv fohdu
iurp Iljxuhv < dqg 43 zkhuh zh frpsduh ghqvlw| hvwlpdwlrq iru wzr vwudwd zlwk htxdo
vwdqgdug ghyldwlrqv exw gl�huhqw phdqv1 Lq erwk fdvhv/ wkh zhljkwhg hvwlpdwru xvlqj
kr| fohduo| rxwshuirupv xqzhljkwhg hvwlpdwlrq zlwk dq| zlqgrz zlgwk1 +Zh suhvhqw/
wkhuhiruh/ rqo| wkh frpsdulvrq ehwzhhq zhljkwhg hvwlpdwlrq xvlqj kr| dqg xqzhljkwhg
ghqvlw| hvwlpdwlrq xvlqj kW1, Khuh vwudwxp 5 lv vdpsohg wzlfh dv lqwhqvlyho| dv
44
vwudwxp 4/ wkxv wkh hohphqwv iurp vwudwxp 5 uhfhlyh d zhljkw wkdw lv kdoi wkdw ri
hohphqwv lq vwudwxp 41 Zh zrxog srlqw rxw wkdw wklv lv qrw d sduwlfxoduo| odujh
gl�huhqfh lq zhljkwv1 Lq pdq| ri wkh gdwd vhwv xvhg lq dssolhg hfrqrplf zrun/
wkh vdpsolqj glvsursruwlrq lv juhdwhu wkdq 43 ehwzhhq fhuwdlq vwudwd/ vr wkh uhvxowv
iurp ljqrulqj wkh zhljkwlqj lq wklv fdvh zloo eh hyhq pruh gudpdwlf zlwk hyhq odujhu
uhvxowlqj eldv1
Iljxuhv 44 dqg 45 looxvwudwh wkh fdvh ri htxdo vwudwd phdqv dqg gl�huhqw yduldqfhv
dqg wkh fdvh ri yduldwlrq ehwzhhq vwudwd ri erwk phdqv dqg vwdqgdug ghyldwlrqv1
Djdlq/ wkh vdph uhvxowv krog1 Odujh eldv lv lqfxuuhg e| ljqrulqj wkh vwuxfwxuh ri wkh
vdpsolqj1
51416 Rswlpdo doorfdwlrq
Li zh kdyh vrph lqirupdwlrq derxw wkh phdqv dqg vwdqgdug ghyldwlrqv lq wkh ydulrxv
vwudwd +shukdsv iurp d suhylrxv vxuyh|,/ fdq zh xvh wkdw lqirupdwlrq wr frqvwuxfw
dq rswlpdo vdpsolqj doorfdwlrq wr plqlpl}h wkh lqwhjudwhg phdq vtxduhg huuru ri
wkh hvwlpdwru ri i+|,B Zh nqrz wkdw lq wkh fdvh ri vwudwl�hg vdpsolqj iru phdq
hvwlpdwlrq wkdw ryhuvdpsolqj +uhodwlyh wr srsxodwlrq sursruwlrqv, vwudwd zlwk kljkhu
yduldqfh fdq jlyh d pruh suhflvh hvwlpdwh ri wkh phdq1 Grhv d vlplodu uhvxow krog
khuhB
Fxulrxvo|/ lw wxuqv rxw wkdw sursruwlrqdo vdpsolqj zloo eh wkh rswlpdo doorfdwlrq
lq doo fdvhv/ jlyhq wkdw zh duh rswlpdoo| fkrrvlqj kr|1
Sursrvlwlrq 5= Li wkh ghqvlwlhv ri vwudwd 4 wkurxjk P duh jlyhq dv j� wkurxjk
j� > wkh srsxodwlrq ghqvlw| i+|, lv hvwlpdwhg xvlqj d nhuqho ghqvlw| vdwlvi|lqj +D4,/
dqg wkh zlqgrz zlgwk lv fkrvhq dv +4;,/ wkhq wkh vdpsolqj doorfdwlrq zklfk plqlpl}hv
wkh lqwhjudwhg phdq vtxduhg huuru ri ai�+|, lv vdpsolqj sursruwlrqdo wr vwudwxp vl}h/
q� @ q��=
Surri= Wkh lqwhjudwhg phdq vtxduhg huuru ri ai�+|, lv
4
k
�]+N+#,,2 g#
� �[�'�
�2�
q�.
ke
7�22
]%
%�[�'�
��j��
�
&2g{
45
Uhsodflqj k zlwk kr| |lhogv
LPVH+ ai�+|,, @8
7�
2
D
2
�]+N+#,,2 g#
�e
D
3C]
%
%�[�'�
��j��
�
&2g{
4D
�
D#
�[�'�
�2�
q�
$e
D
@ nW=
#�[�'�
�2�
q�
$e
D
Li zh plqlpl}h wklv txdqwlw| zlwk uhvshfw wr q�> ===> q� frqvwudlqhg e|S
q� @ q/ d
w|slfdo htxdwlrq zloo eh
CLPVH
Cq�@ �
7
8nW
#�[�'�
�2�
q�
$3�
D
�2�
q2�
. � @ 3
zkhuh � lv wkh Odjudqjh pxowlsolhu1 Li zh vroyh iru �/ pxowlso| erwk vlghv ri wkh
htxdwlrq e| q2�/ wdnh wkh vtxduh urrw ri erwk vlghv ri wkh htxdwlrq/ dqg vroyh iru
s�S
q� @s�q @
�e
DnW��2
�S�
�'�
w2
�
?�
�3
�
�f
= Uhsodflqj � lq wkh deryh htxdwlrq wkhq
surylghvw2
�
?2
�
@ �
?2dqg q� @ q�� 1
Wklv lv d vrphzkdw vxusulvlqj uhvxow jlyhq wkh lqwxlwlrq iurp wkh phdq hvwlpdwlrq
sureohp1 Krzhyhu/ lq wklv fdvh/ zh duh qrw hvwlpdwlqj dq| vlqjoh srlqw iurp hdfk
vwudwxp/ exw lqvwhdg wkh hqwluh glvwulexwlrq1 Hyhq iurp d vwudwxp zkrvh glvwulexwlrq
kdv d vpdoo yduldqfh zh zloo qhhg d vdpsoh vl}h vx!flhqwo| odujh wr hvwlpdwh wkh
frqwulexwlrq ri wkdw vwudwxp wr wkh ryhudoo srsxodwlrq ghqvlw|1
� �������� �� ����
Wklv sdshu lv dq dwwhpsw wr ehjlq xqli|lqj wkh olwhudwxuhv rq vxuyh| ghvljq dqg
qrqsdudphwulf ghqvlw| hvwlpdwlrq1 Dv vxfk zh ehjlq e| dqdo|}lqj wkh soxj0lq zlqgrz
zlgwk iru qrupdo gdwd/ wkh srlqw ri ghsduwxuh iru prvw wkhruhwlfdo frqvlghudwlrqv ri
qrqsdudphwulf ghqvlw| hvwlpdwlrq dv zhoo dv d xvhixo edqgzlgwk wr jhqhudwh d �uvw
jxhvv dw wkh glvwulexwlrq ru iru xvh dv d vwduwlqj srlqw iru rwkhu gdwd0gulyhq edqgzlgwk
vhohfwlrq whfkqltxhv vxfk dv qhduhvw qhljkeru dqg furvv0ydolgdwlrq1 Lw lv lqvwuxfwlyh
wr vhh krz wkh vwdqgdug uhvxowv fkdqjh zkhq vwudwl�fdwlrq lv lqwurgxfhg1
Wkh iudphzrun khuh lv jhqhudo dqg grhv qrw ghshqg xsrq dq| plqlpdo vwudwd
vdpsoh vl}hv1 Iru wkh vlpsoh h{dpsohv frqvlghuhg lq wkh vlpxodwlrqv/ lw pd| eh wkdw
xvlqj d gl�huhqw edqgzlgwk iru hdfk vwudwxp/ hvwlpdwlqj lqglylgxdo vwudwxp0vshfl�f
ghqvlwlhv dqg wkhq frpelqlqj wkhp xvlqj +45, zloo surylgh dq dghtxdwh dowhuqdwlyh1
46
Krzhyhu iru fdvhv zkhuh wkhuh duh pdq| vwudwd/ vrph zlwk rqo| d kdqgixo ri hohphqwv/
vxfk d whfkqltxh lv qrw ihdvleoh1 Wkh whfkqltxh suhvhqwhg lq wklv sdshu/ wr fkrrvh
rqh edqgzlgwk iru doo wkh gdwd zklfk wdnhv lqwr dffrxqw wkh vwudwd gl�huhqfhv/ zloo
zrun lq wklv fdvh1 Ri frxuvh/ nqrzohgjh ri vwudwxp0vshfl�f phdqv dqg yduldqfhv
+ru dffhvv wr uhdvrqdeoh hvwlpdwhv wkhuhri, lv qhfhvvdu|1 Wklv lv d sureohp zklfk lv
iuhtxhqwo| idfhg e| vxuyh| vwdwlvwlfldqv lq ghvljqlqj dq rswlpdo doorfdwlrq1 Xvlqj
suhwhvwv/ suhylrxv vxuyh| vdpsohv/ ru vlpsoh djjuhjdwlrq uxohv wr frpelqh vlplodu
vwudwd duh doo zd|v durxqg wklv sureohp/ wkrxjk doo duh lpshuihfw1 Wkh whfkqltxh
grhv qrw uholhyh wkh uhvhdufkhu ri wkh qhhg wr pdnh lqwhooljhqw fkrlfhv dffruglqj wr
wkh sduwlfxodu dssolfdwlrq1
Pdq| sureohpv uhpdlq wr eh dgguhvvhg1 Rqh sureohp iru hfrqrplvwv lv wkh
ghduwk ri lqirupdwlrq rq wkh vxuyh| ghvljq ehklqg wkh gdwd1 Rffdvlrqdoo| zh nqrz
vrphwklqj derxw vxuyh| zhljkwv/ uduho| gr zh nqrz zklfk revhuydwlrqv frph iurp
zklfk sduwlfxodu vwudwd ru foxvwhuv1 Odfn ri vxfk lqirupdwlrq uhpdlqv dq lpshglphqw
wr lpsurylqj rxu dqdo|wlfdo whfkqltxhv dqg zh qhhg wr pdnh d pruh frqfhuwhg h�ruw
wr kdyh vxfk lqirupdwlrq lqfoxghg zlwk gdwd1
Dfnqrzohgjhphqwv
L dp judwhixo wr Dpdq Xoodk iru klv vxjhvwlrqv dqg frpphqwv rq wklv sdshu1 L
kdyh dovr ehqh�wwhg iurp frqyhuvdwlrqv zlwk Fkulv Vnhhov dqg wkh frpphqwv ri sdu0
wlflsdqwv lq vhplqduv dw Jhrujld Vwdwh Xqlyhuvlw|/ wkh X1V1 Exuhdx ri Oderu Vwdwlvwlfv/
Xqlyhuvlw| ri Fdoliruqld/ Ulyhuvlgh/ wkh Hfrqrphwulf Vrflhw| Dxvwudodvldq Phhwlqjv
4<<;/ Xqlyhuvlw| ri V|gqh| dqg wkh Xqlyhuvlw| ri Qhz Vrxwk Zdohv1
47
����������
^4` Euhxqlj/ U1/ 5334/ Ghqvlw| Hvwlpdwlrq iru Foxvwhuhg Gdwd/ Hfrqrphwulf Uhylhzv/
iruwkfrplqj1
^5` Ghdwrq/ D1/4<<:/ Wkh Dqdo|vlv ri Krxvhkrog Vxuyh|v= D Plfurhfrqrphwulf Ds0
surdfk wr Ghyhorsphqw Srolf| +Mrkqv Krsnlqv Xqlyhuvlw| Suhvv/ Edowlpruh,1
^6` Kdugoh/ Z1/ 4<<3/ Dssolhg Qrqsdudphwulf Uhjuhvvlrq +Fdpeulgjh Xqlyhuvlw|
Suhvv/ Qhz \run,1
^7` Nlvk/ O1/ 4<98/ Vxuyh| Vdpsolqj +Mrkq0Zloh|/ Qhz \run,1
^8` Sdjdq/ D1 dqg D1 Xoodk/ 4<<</ Qrqsdudphwulf Hfrqrphwulfv/ +Fdpeulgjh Xql0
yhuvlw| Suhvv/ Qhz \run,1
^9` Sdu}hq/ H1/ 4<95/ Rq Hvwlpdwlrq ri d Suredelolw| Ghqvlw| Ixqfwlrq dqg Prgh/
Dqqdov ri Pdwkhpdwlfdo Vwdwlvwlfv 66/ 4398043:91
^:` Sxgqh|/ V1/ 4<;</ Prghoolqj Lqglylgxdo Fkrlfh= Wkh Hfrqrphwulfv ri Fruqhuv/
Nlqnv/ dqg Krohv +Edvlo Eodfnzhoo/ R{irug,1
^;` Urvhqeodww/ P1/ 4<89/ Uhpdunv rq Vrph Qrqsdudphwulf Hvwlpdwhv ri Ghqvlw|
Ixqfwlrq/ Dqqdov ri Pdwkhpdwlfdo Vwdwlvwlfv 5:/ ;65�;6:1
^<` Vloyhupdq/ E1/ 4<;9/ Ghqvlw| Hvwlpdwlrq iru Vwdwlvwlfv dqg Gdwd Dqdo|vlv +Fkds0
pdq dqg Kdoo/ Orqgrq,1
^43` Vkd}dp Xvhu*v Uhihuhqfh Pdqxdo/ Yhuvlrq ;13/ 4<<:/ PfJudz0Kloo
^44` Vxuyh| ri Lqfrph dqg Surjudp Sduwlflsdwlrq/ Xvhuv Jxlgh/ 4<<4/ +X1V1 Ghsduw0
phqw ri Frpphufh/ Hfrqrplfv dqg Vwdwlvwlfv Dgplqlvwudwlrq/ Exuhdx ri wkh
Fhqvxv/ Zdvklqjwrq/ GF,1
^45` Wkrpsvrq/ V1/ 4<<5/ Vdpsolqj +Mrkq Zloh| ) Vrqv/ Qhz \run,1
^46` Xoodk/ D1 dqg U1 Euhxqlj/ 4<<;/ Hfrqrphwulf Dqdo|vlv lq Frpsoh{ Vxuyh|v/ lq=
G1 Jlohv dqg D1 Xoodk/ hgv1/ Kdqgerrn ri Dssolhg Hfrqrplf Vwdwlvwlfv +Nohzhu/
Qhz \run,1
48
Table 1Comparison of IMSE from weighted and unweighted estimation
Identical Standard deviations IMSE(hst)/
h*hst IMSE(h*) IMSE(hst) IMSE(h*)
Proportional 0.00 0.26626 0.26607 0.00133 0.00133 1.00000Sampling: 0.50 0.28290 0.27455 0.00129 0.00128 0.99815
1.00 0.33283 0.30181 0.00119 0.00117 0.979221.50 0.41603 0.34725 0.00110 0.00102 0.926102.00 0.53252 0.36956 0.00135 0.00095 0.705492.50 0.68229 0.34059 0.00375 0.00104 0.276223.00 0.86535 0.31439 0.01320 0.00112 0.084963.50 1.08168 0.30216 0.03859 0.00117 0.030244.00 1.33130 0.29899 0.09292 0.00118 0.012694.50 1.61420 0.30016 0.19669 0.00118 0.005975.00 1.93039 0.30242 0.38729 0.00117 0.00301
Dis-proportional 0.00 0.26626 0.27243 0.00146 0.00146 0.99897
Sampling: 0.50 0.28290 0.28112 0.00141 0.00141 0.99992n2/n1=2 1.00 0.33283 0.30903 0.00130 0.00128 0.98825
1.50 0.41603 0.35556 0.00118 0.00112 0.944662.00 0.53252 0.37840 0.00142 0.00105 0.739152.50 0.68229 0.34874 0.00380 0.00114 0.299483.00 0.86535 0.32191 0.01324 0.00123 0.093103.50 1.08168 0.30939 0.03862 0.00128 0.033214.00 1.33130 0.30615 0.09295 0.00130 0.013954.50 1.61420 0.30735 0.19671 0.00129 0.006575.00 1.93039 0.30966 0.38731 0.00128 0.00331
Identical means IMSE(hst)/
h*hst IMSE(h*) IMSE(hst) IMSE(h*)
Proportional 1.00 0.26626 0.26607 0.00133 0.00133 1.00000Sampling: 1.50 0.43267 0.31478 0.00145 0.00112 0.77163
2.00 0.66565 0.33664 0.00363 0.00105 0.288852.50 0.96519 0.34506 0.01280 0.00102 0.079823.00 1.33130 0.34834 0.04341 0.00101 0.023323.50 1.76397 0.34971 0.13070 0.00101 0.007724.00 2.26321 0.35034 0.35068 0.00101 0.002874.50 2.82901 0.35066 0.85215 0.00101 0.001185.00 3.46138 0.35082 1.90508 0.00101 0.000535.50 4.16031 0.35092 3.97038 0.00101 0.000256.00 4.92581 0.35097 7.79642 0.00101 0.00013
Dis-proportional 1.00 0.26626 0.27243 0.00146 0.00146 0.99897Sampling: 1.50 0.43267 0.32231 0.00153 0.00123 0.80293n2/n1=2 2.00 0.66565 0.34470 0.00368 0.00115 0.31292
2.50 0.96519 0.35333 0.01284 0.00112 0.087493.00 1.33130 0.35668 0.04343 0.00111 0.025623.50 1.76397 0.35809 0.13072 0.00111 0.008484.00 2.26321 0.35873 0.35070 0.00111 0.003164.50 2.82901 0.35905 0.85216 0.00111 0.001305.00 3.46138 0.35922 1.90509 0.00111 0.000585.50 4.16031 0.35932 3.97039 0.00111 0.000286.00 4.92581 0.35937 7.79643 0.00110 0.00014
12 µµ −
12 /σσ
Table 2Results of simulation exercise
Weighted Non-Parametric Density Estimation for Stratified Samples
Average results of 1000 repetitions
Stratum 1 values fixed: n1=1000
n2 hst ha h* FigureProportional 0 1 1000 0.23159 0.19553 0.23138 4 No difference between stratasampling: 2 1 1000 0.32764 0.27818 0.32071 5 Strata differ only by mean
3 1 1000 0.41773 0.35468 0.27351 6 Strata differ only by mean0 3 1000 0.51800 0.31662 0.30318 7 Strata differ only by standard deviation3 3 1000 0.62402 0.49159 0.30584 8 Both means and standard deviations differ
Non-proportional 2 1 2000 0.29363 0.24931 0.30272 9 Strata differ only by meansampling: 3 1 2000 0.37006 0.31420 0.25824 10 Strata differ only by mean
0 3 2000 0.53760 0.36005 0.28605 11 Strata differ only by standard deviation3 3 2000 0.61692 0.52379 0.28921 12 Both means and standard deviations differ
11 =σ01 =µ
2µ 2σ