+ All Categories
Home > Documents > Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for...

Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for...

Date post: 01-Jul-2018
Category:
Upload: buidang
View: 230 times
Download: 0 times
Share this document with a friend
31
THE AUSTRALIAN NATIONAL UNIVERSITY WORKING PAPERS IN ECONOMICS AND ECONOMETRICS Nonparametric Density Estimation for Stratified Samples Robert Breunig The Australian National University Working Paper No. 457 First version: February, 2001 Current version: November, 2005 ISBN: 086831 457 9 Center for Economic Policy Research, Economics Program, Research School of Social Sciences, The Australian National University, Canberra ACT 0200, AUSTRALIA phone: +61 (0)2 6125-2148; fax: +61 (0)2 6125-0087, e-mail: [email protected]
Transcript
Page 1: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

THE AUSTRALIAN NATIONAL UNIVERSITY

WORKING PAPERS IN ECONOMICS AND ECONOMETRICS

Nonparametric Density Estimation for Stratified Samples

Robert Breunig The Australian National University∗

Working Paper No. 457

First version: February, 2001 Current version: November, 2005

ISBN: 086831 457 9

∗ Center for Economic Policy Research, Economics Program, Research School of Social Sciences, The Australian National University, Canberra ACT 0200, AUSTRALIA phone: +61 (0)2 6125-2148; fax: +61 (0)2 6125-0087, e-mail: [email protected]

Page 2: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

Nonparametric Density Estimation forStrati�ed Samples

Robert BreunigThe Australian National University�

First version: February, 2001Current version: November 1, 2005

Abstract

In this paper, we consider the non-parametric, kernel estimate of the density,f(x), for data drawn from strati�ed samples. Much of the data used by socialscientists is gathered in some type of complex survey violating the usual assump-tions of independently and identically distributed data. Such e¤ects induced bythe survey structure are rarely considered in the literature on non-parametricdensity estimation, yet they may have serious consequences for our analysis, asshown in this paper.

A weighted estimator is developed which provides asymptotically unbi-ased density estimation for strati�ed samples. A data-based method for choosingthe optimal bandwidth is suggested, using information on within-stratum vari-ances and means. The weighted estimator and proposed bandwidth are shownto give smaller mean squared error for strati�ed samples than an unweightedestimator and a commonly used method of choosing the bandwidth. Severalillustrations from simulation are provided. We also show that the optimalsampling scheme in this case is always strati�ed sampling proportional to size,irrespective of the stratum-speci�c densities.

Keywords: nonparametric density estimation, bandwidth selection, strati�ed sam-pling, optimal sampling

JEL Classi�cation: C14, C42

.

�Center for Economic Policy Research, Economics Program, Research School of Social Sciences,The Australian National University, Canberra ACT 0200, AUSTRALIAphone: +61 (02) 6125-2148; fax: +61 (02) 6125-0087, e-mail: [email protected] am grateful to Aman Ullah for his sugestions and comments on this paper. I have also bene-�tted from conversations with Chris Skeels and Nilanjana Roy and the comments of participantsin seminars at Georgia State University, the U.S. Bureau of Labor Statistics, University of Califor-nia, Riverside, the Econometric Society Australasian Meetings 1998, University of Sydney and theUniversity of New South Wales.

Page 3: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

� ����������

Wkh surshuwlhv ri nhuqho0edvhg qrq0sdudphwulf ghqvlw| hvwlpdwlrq duh zhoo0nqrzq iru

lqghshqghqw dqg lghqwlfdoo| glvwulexwhg +l1l1g1, gdwd1 Wkhlu xvh kdv ehfrph txlwh

frpprq lq dssolhg hfrqrphwulf zrun iru erwk ghvfulswlyh dqg dqdo|wlf sxusrvhv1 \hw

wkh ydvw pdmrulw| ri dssolhg zrun lq ghyhorsphqw dqg oderu +dv zhoo dv lq rwkhu duhdv,

xvhv vxuyh| gdwd zklfk ylrodwh wkh l1l1g1 dvvxpswlrq1 Lq frqwudvw wr wkh wh{werrn

dvvxpswlrq ri d udqgrp vdpsoh iurp dq lq�qlwh srsxodwlrq/ wkh gdwd vhwv zklfk

hfrqrplvwv xvh duh jhqhudwhg xvlqj vrph w|sh ri frpsoh{ vdpsolqj ghvljq1

Vwudwl�hg vdpsolqj lv suredeo| wkh prvw frpprqo| hqfrxqwhuhg vdpsolqj ghvljq

lq gdwd xvhv e| dssolhg hfrqrplvwv1 +Ghwdlov ri vwudwl�hg vdpsolqj dqg pruh frpsoh{

vdpsolqj vfkhphv pd| eh irxqg lq wkh hfrqrphwulf olwhudwxuh lq Sxgqh| +4<;<,/

Ghdwrq +4<<:,/ Xoodk dqg Euhxqlj +4<<;, ru lq rqh ri wkh wudglwlrqdo vwdwlvwlfv wh{wv

vxfk dv Nlvk +4<98, ru Wkrpsvrq +4<<5,1, Wkrxjk vwudwl�hg vdpsolqj pd| eh txlwh

frpsolfdwhg lq dssolfdwlrq/ wkh sulpdu| h�hfw ri vxfk vdpsolqj lv wkdw wkh srsxodwlrq

hohphqwv hqwhu wkh vdpsoh zlwk xqhtxdo suredelolwlhv1 Wkhuhiruh/ lq rughu wr hvwlpdwh

prgho sdudphwhuv zh qhhg wr dffrxqw iru wkhvh xqhtxdo vdpsolqj suredelolwlhv1

Wkh sxusrvh ri wklv sdshu lv wr ehjlq wkh wdvn ri ghyhorslqj qrq0sdudphwulf ghq0

vlw| hvwlpdwlrq iru vwudwl�hg vxuyh| gdwd1 Euhxqlj +5334, h{whqgv wkh qrq0sdudphwulf

nhuqho hvwlpdwru wr foxvwhuhg gdwd dqg ghprqvwudwhv wkh odujh srlqwzlvh eldv zklfk

uhvxowv iurp ljqrulqj wkh foxvwhulqj lq wkh gdwd1 Wklv sdshu frpsohphqwv wkdw zrun

e| frqvlghulqj vwudwl�hg vdpsolqj1

Wkh sodq ri wklv sdshu lv dv iroorzv1 Zh jlyh d eulhi lqwurgxfwlrq wr qrqsdud0

phwulf ghqvlw| hvwlpdwlrq1 Zh wkhq surfhhg wr wkh pdlq uhvxow= ghyhorsphqw ri

d zhljkwhg qrq0sdudphwulf ghqvlw| hvwlpdwru zklfk lv dv|pswrwlfdoo| xqeldvhg iru

vwudwl�hg vdpsohv1 Lqfrusrudwlqj wkh vdpsolqj lqirupdwlrq lqwr wkh fkrlfh ri edqg0

zlgwk vhohfwlrq/ zh surylgh wkh rswlpdo edqgzlgwk iru wkh fdvh zkhuh wkh gdwd lq

hdfk vwudwxp duh qrupdoo| glvwulexwhg1 Zh h{dplqh wkh surshuwlhv ri wkh sursrvhg

edqgzlgwk qxphulfdoo| dqg wkurxjk vlpxodwlrq1 Zh wkhq ghulyh wkh rswlpdo vdp0

solqj doorfdwlrq iru wkh vwudwl�hg ghqvlw| hvwlpdwru1 Vxusulvlqjo|/ lw gl�huv iurp wkh

rswlpdo doorfdwlrq iru hvwlpdwlrq ri wkh phdq1

4

Page 4: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

� ����������� ���� ��������

Wkh sureohp ri hvwlpdwlqj wkh ghqvlw| ri d udqgrp yduldeoh jlyhq d vdpsoh ri gdwd kdv

orqj ehhq ri lqwhuhvw wr hfrqrplvwv1 Zkhq h{dplqlqj yduldeohv vxfk dv ulyhu rz ru

lqfrph glvwulexwlrq/ ghqvlw| hvwlpdwlrq doorzv wkh frpsdulvrq ri glvwulexwlrqv dfurvv

uhjlrqv dqg ryhu wlph1 Ghqvlw| hvwlpdwlrq fdq dovr eh d xvhixo ghvfulswlyh wrro wr

jlyh d ylvxdo slfwxuh ri gdwd1

Wudglwlrqdoo|/ ghqvlw| hvwlpdwlrq lqyroyhg wkh vshfl�fdwlrq ri d sduwlfxodu ixqf0

wlrqdo irup frpelqhg zlwk hvwlpdwhv ri sdudphwhuv wkurxjk pd{lpxp olnholkrrg ru

vrph rwkhu phwkrg1 Iru h{dpsoh/ li dq xqghuo|lqj glvwulexwlrq lv nqrzq wr eh qru0

pdo/ lw vx!fhv wr hvwlpdwh wkh phdq dqg vwdqgdug ghyldwlrq e| xvxdo phwkrgv wr

ixoo| fkdudfwhul}h wkh ghqvlw|1 Orj0qrupdo/ h{srqhqwldo/ Sduhwr/ dqg Jdppd glvwul0

exwlrqv kdyh doo ehhq xvhg dv sdudphwulf vshfl�fdwlrqv iru h{dplqlqj wkh glvwulexwlrq

ri lqfrph1

Wkh juhdw gudzedfn ri sdudphwulf phwkrgv/ krzhyhu/ lv wkdw wkh| uhtxluh nqrzo0

hgjh ri wkh wuxh xqghuo|lqj ghqvlw|1 Wklv lv uduho|/ li hyhu/ nqrzq1 Lq h{dplqlqj

lqfrph glvwulexwlrq fkdqjhv ryhu wlph/ gl�huhqw sdudphwulf ghqvlwlhv pd| vkrz gli0

ihuhqw fkdqjhv lq glvwulexwlrq/ wkxv pdnlqj dqdo|vlv vhqvlwlyh wr wkh sdudphwul}dwlrq

fkrvhq1

Qrq0sdudphwulf ghqvlw| hvwlpdwlrq/ rq wkh rwkhu kdqg/ doorzv iru hvwlpdwlrq ri

yduldeoh ghqvlwlhv zlwkrxw wkh lpsrvlwlrq ri sduwlfxodu glvwulexwlrqdo irupv1 Wkh klv0

wrjudp lv d srsxodu qrq0sdudphwulf phwkrg ri dqdo|}lqj ghqvlwlhv1 Krzhyhu/ lw lv qrw

d vprrwk uhsuhvhqwdwlrq ri gdwd +zklfk lq dgglwlrq wr ehlqj dhvwkhwlfdoo| ghvludeoh lv

lpsruwdqw zkhq uhsuhvhqwlqj frqwlqxrxv glvwulexwlrqv, dqg lwv pdq| rwkhu pdwkhpdw0

lfdo glvdgydqwdjhv kdyh ohg wr dq h{whqvlyh olwhudwxuh rq vprrwkhg/ qrq0sdudphwulf

hvwlpdwhv ri ghqvlw| ehjlqqlqj zlwk wkh lpsruwdqw frqwulexwlrqv ri Urvhqeodww +4<89,

dqg Sdu}hq +4<95,1 Iru vxppdulhv ri wkh qrq0sdudphwulf olwhudwxuh dqg vxevhtxhqw

ghyhorsphqwv/ vhh Vloyhupdq +4<;9,/ Kdugoh +4<<3,/ Sdjdq dqg Xoodk +4<<<,1

Lq wkh qh{w vhfwlrq/ zh h{whqg wkh qrq0sdudphwulf whfkqltxhv ri ghqvlw| hvwlpdwlrq

wr wkh dqdo|vlv ri gdwd jdwkhuhg xvlqj vwudwl�hg vxuyh| phwkrgv1

5

Page 5: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

��� ������� ��� ����

Ohw xv frqvlghu ghqvlw| hvwlpdwlrq iru gdwd fkrvhq xqghu vwudwl�hg vdpsolqj1 Frqvlghu

wkh iroorzlqj srsxodwlrq prgho/

\�� > l @ 4> = = =>P m @ 4> = = =>Q� =

Wkh wrwdo qxpehu ri hohphqwv lq wkh srsxodwlrq lvS

Q� @ Q dqg wkh sursruwlrq ri

hohphqwv lq hdfk vwudwxp/ l/ lv �� @��

�1 Zh wuhdw wkh �qlwh srsxodwlrq zlwklq hdfk

vwudwxp dv dq l1l1g1 udqgrp revhuydwlrq iurp dq lq�qlwh +vxshu, srsxodwlrq zlwk

ghqvlw| jW� > zlwk phdq �� dqg yduldqfh �2� 1 Zh zloo rqo| uhvwulfw wkhvh ghqvlwlhv e|

wkh uhtxluhphqw wkdw wkh �uvw wzr prphqwv h{lvw dqg duh �qlwh iru hdfk vwudwxp1

Qrz frqvlghu d vdpsoh/ zkhuh q� hohphqwv duh gudzq lqghshqghqwo| iurp hdfk

vwudwxp +l1h1 d vwudwl�hg vdpsoh,1 Wkh wrwdo vdpsoh vl}h lvS

q� @ q= Wkh q� pd|

ru pd| qrw eh htxdo1 Vlqfh erwk wkh q� dqg wkh �� pd| ydu|/ wkh vdpsoh lqfoxvlrq

suredelolwlhv duh qrw htxdo iru doo hohphqwv lq wkh vdpsoh1 Wkh| zloo krzhyhu/ eh htxdo

iru doo hohphqwv lq wkh vdph vwudwxp1 Wkh suredelolw| wkdw wkh m0wk hohphqw lq wkh

l0wk vwudwxp lv lqfoxghg lq wkh vdpsoh lv ��� @ �� @?���

1

Urvhqeodww*v +4<89, nhuqho hvwlpdwru iru wkh ghqvlw| lq wkh lwk vwudwxp/ edvhg rq

wkh vdpsoh ri vl}h q� pd| eh zulwwhq dv

ej�+|, @ 4

k?Q���

��[�'�

g�N� +4,

zkhuh N� @ N+t��3+

�?, lv d nhuqho ixqfwlrq dqg �� @ q�@Q�= Wkh g� duh htxdo wr rqh

li \�� lv lq wkh vdpsoh dqg }hur rwkhuzlvh1 Iru wkh fdvh zkhuh wkh �qlwh srsxodwlrq

lq wkh lwk vwudwxp \�� > m @ 4> ===>Q� lv dq l1l1g1 udqgrp revhuydwlrq iurp dq lq�qlwh

srsxodwlrq zlwk ghqvlw| jW� zh dvvxph wkdw +q�> Q�,$4 vxfk wkdw �� $ f A 3 dqg

k? vdwlv�hv olp?�<��

k? @ k� / lq zklfk fdvh

Hej�+|, @ +Q�,3�Q+H g�,+H4

k�N+

\� � \

k�,, +5,

$ jW�

dqg

+qk,�*2 Y +ej�+|,,$ jW�

]N2

+#,g#= +6,

Wkh ghwdlov ri +4, wkurxjk +6, fdq eh zrunhg rxw e| iroorzlqj Urvhqeodww +4<89, ru

Sdjdq dqg Xoodk +4<<;,1

6

Page 6: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

Wkh sdudphwhu ri lqwhuhvw lv wkh ryhudoo glvwulexwlrq ri gdwd lq wkh srsxodwlrq/

i+\ , @[

��jW

� = +7,

Iru hdvh ri h{srvlwlrq/ zh gurs wkh - lq wkh qrwdwlrq wkdw iroorzv1

Xvlqj wkh vdpsoh gdwd iurp doo vwudwd +o @ 4> = = = > q,/ wkh xvxdo hvwlpdwru iru wkh

ghqvlw| dw d srlqw | lv

ai+|, @4

Q�k?

�[,'�

g,N, +8,

zkhuh k lv wkh zlqgrz zlgwk/ dqg wkh nhuqho N +�, lv d v|pphwulf ixqfwlrq zklfk

vdwlv�hv=

+D4, +l,UN+#,g# @ 4

+ll,U#N+#,g# @ 3

+lll,U#2N+#,g# @ �

2?4

Zh fdq uh0zulwh +8, dv

ai+|, @4

qk

�[�'�

?�[�'�

N

�|�� � |

k

�@

�[�'�

q�

qej�+|, +9,

zkhuh ej�+|, lv wkh hvwlpdwh ri wkh ghqvlw| iru vwudwxp l dw wkh srlqw |1 Wklv hvwlpdwh

ri wkh srsxodwlrq ghqvlw|/ i+|,> lv wkxv d vdpsoh0zhljkwhg dyhudjh ri wkh ghqvlw|

hvwlpdwhv iru hdfk vwudwxp1 Wklv hvwlpdwru zloo qrw eh xqeldvhg iru wkh sdudphwhu ri

lqwhuhvw1 Wr vhh wklv/ zh zulwh

H ai+|, @�[�'�

q�

qHej�+|, +:,

dqg e| Wd|oru*v vhulhv h{sdqvlrq

Hej�+|, @ j�+|, . eldv�+k, +;,

zkhuh eldv�+k, uhsuhvhqwv eldv whupv zklfk zloo ghshqg xsrq k1 Wklv surylghv

H ai+|, @�[�'�

q�

qj�+|, .

�[�'�

q�

qeldv�+k,= +<,

Xvxdoo| zh fkrrvh k vxfk wkdw k $ 3 dv q� $ 4/ wkhuhiruh wkh eldv�+k, whupv zloo

ehfrph vpdoo dv wkh q� ehfrph odujh1 Hyhq wkhq/ krzhyhu/ zh vwloo kdyh eldv dulvlqj

7

Page 7: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

iurp wkh idfw wkdw zh duh lpsolflwo| zhljkwlqj wkh vwudwxp0vshfl�f ghqvlwlhv e| wkh

vdpsoh sursruwlrqv1

Lw lv wkxv fohdu wkdw wkh ghqvlw| hvwlpdwh/ ai+|,/ zloo rqo| eh dv|pswrwlfdoo| xqel0

dvhg iru +7, zkhq

+l,q�

Q�

@q

Q

ru

+ll, j� @ j ;l= +43,

Wkhvh frqglwlrqv duh xqolnho| wr eh phw lq prvw vxuyh|v1 Lw lv d frpprq ihdwxuh ri

vxuyh|v wkdw vdpsolqj lv glvsursruwlrqdwh/ ylrodwlqj frqglwlrq +l,1 Hyhq zkhq wkh

ruljlqdo vxuyh| ghvljq lv vxfk wkdw wkh vdpsoh lqfoxvlrq suredelolwlhv duh htxdo lq doo

vwudwd/ ydu|lqj udwhv ri qrq0uhvsrqvh dqg rwkhu idfwruv xvxdoo| pdnh wkh vdpsolqj

glvsursruwlrqdwh1 Wklv lv riwhq d ghvluhg wudlw zkhq sduwlfxodu srsxodwlrqv ri lqwhuhvw

duh vdpsohg pruh khdylo| uhodwlyh wr wkh uhvw ri wkh srsxodwlrq +VLSS ri wkh XV

Fhqvxv/ iru h{dpsoh, ru zkhq frvw uhvwulfwv vdpsolqj1 +l1h1 wkh fdvh ri OVPV gdwd

iurp wkh Zruog Edqn zkhuh orzhu frvw ri vdpsolqj lq xuedq duhdv ohdgv wr kljkhu

vdpsolqj sursruwlrqv lq wkhvh duhdv1, Wkrxjk zh duh lqwhuhvwhg lq dq ryhudoo hvwlpdwh

ri wkh ghqvlw|/ lw lv sureohpdwlf wr dvvxph wkdw yduldeohv ri lqwhuhvw zloo eh lghqwlfdoo|

glvwulexwhg lq gl�huhqw vwudwd1 Ljqrulqj hlwkhu wklv glv0sursruwlrqdolw| lq wkh vxuyh|

ghvljq ru wkh gl�huhqfhv ehwzhhq vwudwd zloo ohdg wr eldvhg hvwlpdwlrq/ hyhq lq wkh

vlpsoh fdvh ri qrq0sdudphwulf nhuqho ghqvlw| hvwlpdwlrq1

Wkh vroxwlrq lv d zhljkwhg hvwlpdwru

ai�+|, @4

kS

z�

�[�'�

?�[�'�

z�N

�|�� � |

k

�+44,

zkhuh z����

?�1 Zh vhw wkh zhljkwv sursruwlrqdo wr wkh lqyhuvh ri wkh vhohfwlrq sure0

delolwlhv1 Li zh ixuwkhu uhtxluh wkdwS

z� @ 4/ wkhq z� @��

�?�1 Wkhq

ai�+|, @

�[�'�

Q�

Qej�+|, @ �[

�'�

��ej�+|, +45,

Dv qrwhg deryh/ krzhyhu/ wklv lv qrw xqeldvhg iru +7, vlqfh ej�+|, lv qrw

xqeldvhg iru j�1 Wklv eldv zloo ghshqg xsrq wkh fkrlfh ri zlqgrz zlgwk/k1 Zulwlqj

eldv� iru eldv�+k,/ zh fdq zulwh

ai�+|, @

�[�'�

��+j� . eldv�, +46,

8

Page 8: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

dqg

eldv�ai�+|,

�@

�[�'�

��eldv� +47,

zkhuh wkh w|slfdo eldv whup xswr R+k2,zloo ghshqg rq wkh vhfrqg ghulydwlyh ri wkh

wuxh xqghuo|lqj ghqvlw|

eldv� @k2

5j����2= +48,

Dvvxplqj wkdw wkh vdpsolqj lv lqghshqghqw ehwzhhq vwudwd +zklfk lv xvxdoo| wkh

fdvh,/ zh fdq dovr zulwh

Y du�ai�+|,

�@ �2

�ydu +ej

�, . �2

2ydu +ej

2, . ===. �2

�ydu +ej� , +49,

dqg xswr R+ �

?�,

]Y du

�ai�+|,

�g| @

4

k

�]+N+#,,2 g#

� �[�'�

�2�

q�= +4:,

Vloyhupdq +4<;9, surylghv ghwdlov ri wkh qrq0vwudwl�hg fdvh iru vdpsolqj zlwk

uhsodfhphqw1 Li zh frqvlghu hdfk vwudwxp dv vxfk d vdpsoh/ lw lv wkhq vwudljkwiruzdug

wr zrun rxw +46, wkurxjk +4:,1

Sursrvlwlrq 4= Li wkh ghqvlwlhv ri vwudwd 4 wkurxjk P duh jlyhq dv j�wkurxjk

j� > wkh srsxodwlrq ghqvlw| i+|, lv hvwlpdwhg xvlqj d nhuqho ghqvlw| vdwlvi|lqj +D4,/

dqg d vwudwl�hg vdpsoh ri gdwd lv gudzq lqghshqghqwo| lq hdfk vwudwxp/ wkhq wkh

zlqgrz zlgwk zklfk plqlpl}hv wkh phdq0vtxduhg huuru ri ei+|, zloo ehkr| @

��22

�3

D

3EC597]

+N+#,,2 g#

6:84FD

D #�[�'�

�2�

q�

$�

D

3C]+

%�[�'�

���j���

�&2g|

4D3�

D

= +4;,

Surri= xvlqj +46, wkurxjk +4:, zh zulwh wkh phdq vtxduhg huuru ri ai�+|, dv Y du�ai�+|,

�.�

eldv�ai�+|,

��2

= Wkh lqwhjudwhg phdq vtxduhg huuru lv wkhq]+

�Y du

�ai�+|,

�.�eldv

�ai�+|,

��2

�g|=

Zh plqlpl}h wklv h{suhvvlrq zlwk uhvshfw wr k wr jhw wkh uhvxow lq Sursrvlwlrq 41

Lq rughu wr lpsohphqw wklv uhvxow/ zh qhhg wr nqrz wkh vhfrqg ghulydwlyh ri wkh

wuxh xqghuo|lqj ghqvlw|1 Ri frxuvh/ wklv zloo qrupdoo| qrw eh dydlodeoh1 Rqh vroxwlrq

wr wklv sureohp lv wr vshfli| d idplo| ri glvwulexwlrqv zklfk zloo doorz d ydoxh wr eh

9

Page 9: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

dvvljqhg wr wkh whupU+

kS�

�'��� +j

���,l2

g| lq +4;,1 Iru wkh l1l1g1 fdvh/ lw kdv ehhq

vkrzq wkdw zkhq i+|, lv qrupdoo| glvwulexwhg wkdw wkh rswlpdo zlqgrz zlgwk zloo eh

kW@4139�q3

D zkhuh � lv wkh vwdqgdug ghyldwlrq ri |1 Wklv fkrlfh ri zlqgrz zlgwk

lv frpprqo| hpsor|hg lq hfrqrphwulfv vriwzduh sdfndjhv +Vkd}dp/ iru h{dpsoh, dqg

lv d iuhtxhqwo| xvhg vwduwlqj srlqw iru rwkhu edqgzlgwk vhohfwlrq whfkqltxhv vxfk dv

furvv0ydolgdwlrq1

Khuh lw lv qdwxudo wr dvn zkhwkhu d vlplodu uhihuhqfh zlqgrz zlgwk fdq eh ghulyhg

edvhg xsrq xqghuo|lqj qrupdo glvwulexwlrqv lq doo ri wkh vwudwd1 Fruroodu| 4 jlyhv

wkh ydoxh ri wkdw uhihuhqfh zlqgrz zlgwk1

Fruroodu| 4= Li j� wkurxjk j� duh qrupdoo| glvwulexwhg zlwk phdq �� dqg ydul0

dqfh �2�dqg wkh ghqvlw| lv hvwlpdwhg xvlqj d vwdqgdug qrupdo nhuqho/ wkhq wkh rswlpdo

zlqgrz zlgwk +lq wkh phdq vtxduhg huuru vhqvh, zloo eh

kr| @ 3=;:

#�[�'�

�2�

q�

$�

D

+�� . �2,3 �

D +4<,

zkhuh �� lv d zhljkwhg vxp ri vwudwxp0vshfl�f vwdqgdug ghyldwlrqv

�� @6

;

�[�'�

�2��3D�

dqg �2 lv d zhljkwhg vxp ri d ixqfwlrq ri wkh glvwdqfh ehwzhhq vwudwxp phdqv

�2 @�[�'�

�[, �'�

���,

��2�. �2

,

�3D

2

s5

+6� 9

+�� � �,,2�

�2�. �2

,

� . +�� � �,,e�

�2�. �2

,

�2

,h3 �

2

E>�3>,�2

+j2�nj2,,

Surri= Iru wkh fdvh ri d vwdqgdug qrupdo nhuqho �22@ 4 dqg

U�

+N+#,,2 g# @ �

2IZ1

Zh fdq zulwh

]+

%�[�'�

���j���

�&2g| @

]+

�[�'�

�2�

�j���

�2g| .

]+

�[�'�

�[, �'�

�,���j���

� �j��,

�g|

dqg iru qrupdo ghqvlwlhv uhsodfh j���zlwk

j��

I2Z

�4�

�+3>

j�

�2�h3 �

2

�+3>�

j�

�2= Wkhq wkh

�uvw whup/U+

S�

�'��2�+j��

�,2/ ehfrphv

HIZ

S�

�'��3D��2�1 Wkh vhfrqg whup fdq eh fdofx0

odwhg e| lqwhjudwlqj wkh surgxfw ri j���dqg j��

�1 Xvlqj wkhvh uhvxowv/ fdofxodwh �� dqg

�2 dqg uhsodfh lq wkh irupxod iru kr|1

Zh qrwh wkdw wkh rswlpdo zlqgrz zlgwk lv lqyhuvho| sursruwlrqdo wr d zhljkwhg

vxp ri wkh vwudwd vdpsoh vl}hv/ q�1 Lq wkh fdvh zkhuh q� @?

�dqg �� @

�/ wkhq

:

Page 10: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

S�

�'�

w2�

?�@ q dqg wkh zlqgrz zlgwk zloo eh sursruwlrqdo wr q

3

�D dv lq wkh qrq0

vwudwl�hg fdvh/ exw wkh sursruwlrqdolw| frqvwdqw zloo gl�hu iurp wkh xvxdo 4139�1

Zkhq vwudwd vkduh frpprq phdqv dqg yduldqfhv/ dqg wkh srsxodwlrq dqg vdpsoh

sursruwlrqv duh htxdo lq doo vwudwd/ wklv uhvxow froodsvhv wr wkh xvxdo rswlpdo zlqgrz

zlgwk iru qrupdo ghqvlw|= kW

@ 4=39�q3

�D = Qrwh wkdw krzhyhu li �� @ � iru doo vwudwd/

wkdw wkh srsxodwlrq vwdqgdug ghyldwlrq pd| vwloo gl�hu iurp � dqg kr| 9@4139�q3

�D 1

Zkhq vwudwd vkduh frpprq phdqv dqg yduldqfhv/ kr| @ 4=39�

�S�

�'�

w2�

?�

��D= Wkxv

hyhq lq wkh fdvh ri krprjhqhrxv srsxodwlrqv lq doo vwudwd/ wkh rswlpdo zlqgrz zlgwk lv

gl�huhqw wkdq wkh xvxdo kW

xqohvv �� @?�

?1 Wklv lv dqdodjrxv wr wkh fdvh ri hvwlpdwlrq

ri wkh phdq/ zkhuh hyhq zkhq doo vwudwd kdyh lghqwlfdo phdqv/ wkh yduldqfh ri wkh

hvwlpdwru | lv gl�huhqw iru d vwudwl�hg vdpsoh wkdq iru d vlpsoh udqgrp vdpsoh1

Lq sudfwlfh zh fdq uhsodfh �� zlwk vrph frqvlvwhqw hvwlpdwru olnh

tv2

�dqg �� zlwk

lwv hvwlpdwh/ |1

51414 Qxphulfdo Surshuwlhv

Wr h{dplqh wkh h�hfwv ri xvlqj wkh rswlpdoo| fkrvhq zlqgrz zlgwk/ kr|/ dv rssrvhg

wr kW

/ wkh uhihuhqfh zlqgrz zlgwk iru wkh l1l1g1 fdvh/ zh frqvlghu wkh vlpsohvw fdvh

ri wzr vwudwd1 Frqvlghu wzr qrupdoo| glvwulexwhg vwudwd zlwk phdqv ��dqg �

2dqg

vwdqgdug ghyldwlrqv �� dqg �21 Lq Iljxuh 4/ zh vhh wkh h�hfwv ri wkh zlqgrz zlgwk dv

zh ydu| wkh phdqv ri wkh wzr vwudwd/ kroglqj wkh vwdqgdug ghyldwlrqv frqvwdqw1 Wkhq

zh krog phdqv frqvwdqw/ zkloh doorzlqj wkh vwdqgdug ghyldwlrqv ri wkh wzr vwudwd wr

ehfrph lqfuhdvlqjo| gl�huhqw1

Iru wkh fdvh ri wzr vwudwd/ zh fdq zulwh wkh zlqgrz zlgwk lq +4<, dv d ixqfwlrq

ri wkh gl�huhqfh lq phdqv/ ! @ �2� �

�dqg wkh gl�huhqfh lq vwdqgdug ghyldwlrqv/

� @j2

j�1 Lq wkdw fdvh/ �

�dqg �

2ehfrph

�� @6

;�D

��2

�. �

3D

�2

2

�dqg

�2 @

s5���2

��2

�+4 . �,

�3

D2

+6� 9

!2

�2

�+4 . �,

.!e�

�2

�+4 . �,

�2

,h3

�2

�2

j2�E�n��

Iru wkh sxusrvh ri wkh looxvwudwlrq/ zh �{ �� @ �2 @�

2> q� @ q2 @ 833/ dqg �

2

�@ 41 Lq

Iljxuh 4d/ zh fdq vhh wkdw dv wkh gl�huhqfh ehwzhhq vwudwd phdqv lqfuhdvhv/ kW

jurzv

zlwkrxw erxqg1 kr| rq wkh rwkhu kdqg/ lqfuhdvhv iru d shulrg/ exw zloo dv|pswrwlfdoo|

;

Page 11: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

dssurdfk 16387681 +Wklv lv olp�<"

kr| iru wklv vhw ri sdudphwhu ydoxhv1, Lqwxlwlyho|/

wkh rswlpdo hvwlpdwru/ kr|/ lqfuhdvhv zkhq wkh frpelqhg vwudwd duh xqlprgdo/ exw

rqfh wkh phdqv duh idu hqrxjk dsduw iru wkh ghqvlw| wr h{klelw el0prgdolw|/ kr| ehjlqv

wr ghfuhdvh1 Wkh h�hfw ri wklv lv wkdw wkh ghqvlw| hvwlpdwlrq lv hvvhqwldoo| ehlqj frq0

gxfwhg vhsdudwho| rq hdfk vwudwxp/ wkh vpdoo zlqgrz zlgwk jlylqj qhdu }hur zhljkw

wr frpsdulvrqv ehwzhhq hohphqwv lq gl�huhqw vwudwd1 Iljxuh 4e surylghv wkh vdph

looxvwudwlrq iru vwudwd zlwk lghqwlfdo phdqv/ exw lqfuhdvlqjo| gl�huhqw vwdqgdug ghyl0

dwlrqv1 Djdlq/ kr|/ lv qrw jrlqj wr jurz zlwkrxw erxqg ehfdxvh lw wdnhv lqwr dffrxqw

wkh idfw wkdw wkh lqfuhdvlqj vdpsoh yduldwlrq lv wkh uhvxow ri wzr vwudwd zlwk wzr

gl�huhqw xqghuo|lqj glvwulexwlrqv1 +Lw lv srvvleoh wr vkrz wkdw olp�<"

kr|@1683;85;71,

Wdeoh 4 suhvhqwv wkh ydoxhv ri kW dqg kr| dw ydulrxv srlqwv iurp Iljxuh 41 Wkh

odvw froxpq ri Wdeoh 4 jlyhv wkh udwlr ri wkh dssur{lpdwh LPVH +xswr R+ �

?�,, riei� +|, xvlqj d vwdqgdug qrupdo nhuqho dqg hpsor|lqj erwk kW dqg kr|1 Zh frpsduh

wkhlu udwlr dv d phdvxuh ri wkh h!flhqf| orvv ri xvlqj kW1 Khuh wkh lqwhjudwhg phdq

vtxduhg huuru lv qxphulfdoo| fdofxodwhg xvlqj

LPVHk ei� +|,l @

]%

�Y du

�ai�+|,

�.�eldv

�ai�+|,

��2�g{

@

]%

;?=4

k

�]+N+#,,2 g#

� �[�'�

�2�q�

.ke

7�22

%�[�'�

��j��

&2<@>g{

dqg wkh dssursuldwh ydoxhv iru k dqg wkh rwkhu yduldeohv edvhg xsrq d vwdqgdug

qrupdo nhuqho/ dqg wzr qrupdo ghqvlwlhv zlwk �� @ 3/ �� @ 4/ dqg phdq dqg vwdqgdug

ghyldwlrq ri wkh vhfrqg vwudwxp dv vshfl�hg1 Wkh wrs kdoi ri wkh wdeoh frpsduhv

wkh orvv ri h!flhqf| iru wzr vwudwd zlwk htxdo vwdqgdug ghyldwlrqv dv wkh gl�huhqfh

ehwzhhq vwudwd phdqv lqfuhdvhv1 Lq wkh erwwrp kdoi ri wkh wdeoh/ wkh phdqv duh khog

frqvwdqw zkloh vwudwd vwdqgdug ghyldwlrqv ydu|1 Wkh uhvxowv iurp wkh odvw froxpq ri

Wdeoh 4 duh suhvhqwhg lq judsklfdo irup lq Iljxuhv 5 dqg 61

Jlyhq wkh xvh ri wkh zhljkwhg hvwlpdwru iru wkh ghqvlw|/ xvlqj wkh surshu zlqgrz

zlgwk jlyhv odujh lpsuryhphqwv lq phdq vtxduhg huuru ryhu wkh vwdqgdug uhihuhqfh

zlqgrz zlgwk1 Wklv lv wuxh hyhq zkhq wkh vdpsolqj lv sursruwlrqdo +wr vwudwxp

vl}h,1 Dv wkh wzr vwudwxpv ehfrph lqfuhdvlqjo| gl�huhqw +hlwkhu lq phdq ru lq

vwdqgdug ghyldwlrq, wkh jdlqv lq lqwhjudwhg phdq vtxduhg huuru ehfrph txlwh odujh1

Dv fdq eh vhhq lq Iljxuh 5/ iru wkh fdvh ri sursruwlrqdo vdpsolqj/ zkhq wkh

gl�huhqfh ehwzhhq phdqv lv juhdwhu wkdq 5/ xvlqj kW uhvxowv lq yhu| odujh h!flhqf|

orvvhv frpsduhg wr xvlqj kr|1 Wklv fruuhvsrqhqgv wr wkh uhvxowv suhvhqwhg lq wkh

<

Page 12: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

vlpxodwlrq ehorz1 +Vhh Iljxuhv 8 dqg 91, Frpsdulqj Iljxuh 6 zlwk Iljxuh 5/ zh

vhh wkdw iru wkh fdvh ri glv0sursruwlrqdwh vdpsolqj/ wkh uhodwlyh orvv ri LPVH lv qrw

pxfk gl�huhqw wkdq lq wkh fdvh ri sursruwlrqdo vdpsolqj1 Dv zh zloo vhh iurp wkh

vlpxodwlrq/ krzhyhu/ wkh eldv lv pxfk juhdwhu xvlqj kW1 Vlqfh wkh h!flhqf| phdvxuh

frqvlghuhg khuh lqfoxghv lqwhjudwhg eldv/ lw lv shukdsv qrw d jrrg phdvxuh ri wkh

srlqwzlvh eldv iurp xvlqj kW1 Krzhyhu/ lw lv txlwh fohdu iurp wkh �jxuhv suhvhqwhg lq

wkh vlpxodwlrq h{huflvh ehorz wkdw wklv srlqwzlvh eldv zloo eh xqdffhswdeo| odujh1

51415 Vlpxodwlrq vwxg|

Lq d vlpxodwlrq vwxg| zh frqvlghu wkh sursrvhg rswlpdo zlqgrz zlgwk/ kr| yhuvxv

kW @ 4=39�q3�

D dqg k@ @ =< �PLQ+�> �?|eo3^�@o|�,e o@?}e

���e,= k@ kdv ehhq vkrzq wr eh

vxshulru wr kW iru pl{wxuhv ri qrupdov dqg elprgdo ghqvlwlhv/ vhh Vloyhupdq +4<;9,1

Iru fodulw|/ zh frqvlghu vdpsolqj iurp wzr vwudwd/ zkhuh wkh srsxodwlrq lq hdfk vwudwd

lv htxdo dqg wkh xqghuo|lqj ghqvlwlhv duh qrupdo zlwk phdq ��dqg vwdqgdug ghyldwlrq

��1 Iru wkh vlpxodwlrq/ zh �{ q� @ 4333> �� @ 3 dqg �� @ 4 zkloh ydu|lqj wkh vdpsoh

vl}h/ wkh phdq/ dqg wkh vwdqgdug ghyldwlrq ri vwudwxp 5 rqo|1 Sursruwlrqdo vdpsolqj

wkxv lpsolhv?2

?�@ 4/ rwkhuzlvh wkh vdpsolqj lv glvsursruwlrqdwh1

Iru sursruwlrqdo vdpsolqj/ zh frqvlghu wkh ehqfkpdun fdvh zkhq wkhuh lv qr

gl�huhqfh lq phdq ru vwdqgdug ghyldwlrq ehwzhhq wkh wzr vwudwd1 Zh wkhq frqvlghu

krz hvwlpdwlrq fkdqjhv xvlqj wkh sursrvhg kr| dv wkh gl�huhqfh ehwzhhq wkh wzr

vwudwd phdqv lqfuhdvhv/ dv wkh gl�huhqfh ehwzhhq wkh vwdqgdug ghyldwlrqv lqfuhdvhv/

dqg dv erwk fkdqjh1 Zh wkhq frqvlghu wkh vdph fdvhv iru glvsursruwlrqdwh vdpsolqj1

Zh frqgxfw 4333 uhshwlwlrqv iru hdfk fdvh1 Hdfk uhshwlwlrq lqyroyhv gudzlqj d

vdpsoh iurp wkh wzr vwudwd/ hvwlpdwlqj wkh wkuhh fdqglgdwh zlqgrz zlgwkv +kW> k@>

dqg kr|, edvhg xsrq wkh irupxodv deryh +zlwk � uhsodfhg e| v/ dqg � e| |,/ dqg

hvwlpdwlqj wkh qrq0sdudphwulf ghqvlw| dw 533 srlqwv1 Wkh �jxuhv jlyh wkh dyhudjh

hvwlpdwh ri wkh ghqvlw| ryhu wkh 4333 uhshwlwlrqv1 Wdeoh 5 suhvhqwv wkh dyhudjh fdo0

fxodwhg zlqgrz zlgwkv dqg wkh ydulrxv frpelqdwlrqv zklfk kdyh ehhq frqvlghuhg lq

wkh vlpxodwlrq h{huflvh1

Zh �uvw frqvlghu wkh fdvh ri sursruwlrqdo vdpsolqj +q� @ q2 @ 4333, zkhq

erwk vwudwd kdyh phdq }hur dqg yduldqfh rqh1 Dv fdq eh vhhq lq Iljxuh 7 dqg iurp

wkh fruuhvsrqglqj urz lq Wdeoh 5/ wkh dyhudjh kr| lv wkh vdph dv wkh dyhudjh kW

dqg wkh wzr dyhudjh ghqvlw| hvwlpdwhv duh lghqwlfdo1 Wklv lv dv h{shfwhg jlyhq wkh

43

Page 13: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

glvfxvvlrq deryh1 Lq wklv fdvh/ wkh vwudwl�fdwlrq lv vsxulrxv vlqfh wkh wzr vwudwd duh

h{dfwo| lghqwlfdo1 Qrwh wkdw lq sudfwlfh/ krzhyhu/ wkh ghqvlw| hvwlpdwh xvlqj kW zloo eh

vxshulru wr wkdw xvlqj kr| vlqfh fdofxodwlrq ri kr| lqyroyhv frpsxwlqj wzr vwudwd phdqv

dqg wzr vwudwd vwdqgdug ghyldwlrqv lqvwhdg ri rqh wrwdo vdpsoh vwdqgdug ghyldwlrq1

Wkh hvwlpdwlrq ri irxu txdqwlwlhv lqvwhdg ri rqh lqwurgxfhv pruh yduldelolw| lqwr wkh

hvwlpdwh ri kr| wkdq kW=

Lq Iljxuhv 8 dqg 9/ zh frpsduh ghqvlw| hvwlpdwlrq zkhq erwk vwudwd kdyh vwdqgdug

ghyldwlrq htxdo wr rqh/ exw kdyh gl�huhqw phdqv1 +Wklv jlyhv d srsxodwlrq zklfk lv

d pl{wxuh ri qrupdov1, Zkhq wkh gl�huhqfh lq vwudwd phdqv lv vxfk wkdw wkh ryhudoo

srsxodwlrq ghqvlw| uhpdlqv xqlprgdo +Iljxuh 8,/ kr| shuirupv ehwwhu wkdq hlwkhu kW

ru k@1 Wkh lqwhjudwhg phdq vtxduhg huuru xvlqj kr| lv 5:( ohvv wkdq wkdw xvlqj kW1

+Vhh Iljxuh 5 dqg Wdeoh 4 lq vhfwlrq 515141,

Wkh lpsuryhphqw surylghg e| xvlqj kr| dv rssrvhg wr kW ru k@ lv gudpdwlf zkhq

wkh gl�huhqfh ehwzhhq wkh vwudwd phdqv jurzv dqg wkh ryhudoo ghqvlw| ehfrphv el0

prgdo1 Dv fdq eh vhhq lq Iljxuh 9/ kW whqgv wr ryhuvprrwk wkh shdnv1 k@ jlyhv

lpsuryhg shuirupdqfh dqg uhgxfhv wklv ryhu0vprrwklqj/ exw kr| fdq eh vhhq wr pdwfk

wkh shdnv hyhq ehwwhu wkdq hlwkhu kW ru k@1

Zkhq phdqv ehwzhhq vwudwd duh htxdo/ exw yduldqfhv gl�hu/ wkh vdph uhvxowv

krogv1 k@ lpsuryhv shuirupdqfh ryhu kW/ exw kr| pdwfkhv wkh ghqvlw| ehwwhu wkdq

hlwkhu1 Wklv fdvh lv frqvlghuhg lq Iljxuh :1 Iljxuh ; suhvhqwv wkh ghqvlw| hvwlpdwhv

zkhq erwk vwudwd phdqv dqg vwdqgdug ghyldwlrqv gl�hu/ exw vdpsolqj lv sursruwlrqdo1

Xvlqj kr| surylghv d pxfk ehwwhu pdwfk ri wkh wuxh xqghuo|lqj ghqvlw|/ vlqfh lw wdnhv

lqwr dffrxqw wkh gl�huhqw vwudwd0vshfl�f glvwulexwlrqv1

Dv qrwhg deryh/ sursruwlrqdo vdpsolqj zloo whqg wr eh wkh h{fhswlrq lq prvw

furvv0vhfwlrqdo gdwd vhwv xvhg e| hfrqrplvwv1 Wkh sursrvhg rswlpdo zlqgrz zlgwk/

kr| frpelqhg zlwk wkh zhljkwhg ghqvlw| hvwlpdwru ri +44,/ suryhv wr eh d yhu| srzhuixo

wrro iru qrq0sursruwlrqdo vdpsolqj1 Wklv lv h{dplqhg lq wkh uhpdlqlqj �jxuhv1

Zkhq wkh vdpsolqj lv qrw sursruwlrqdwh dqg wkh vwudwd gl�hu lq hlwkhu phdqv ru

yduldqfhv/ wkh xqzhljkwhg hvwlpdwru zloo eh eldvhg dv glvfxvvhg deryh1 Wklv lv fohdu

iurp Iljxuhv < dqg 43 zkhuh zh frpsduh ghqvlw| hvwlpdwlrq iru wzr vwudwd zlwk htxdo

vwdqgdug ghyldwlrqv exw gl�huhqw phdqv1 Lq erwk fdvhv/ wkh zhljkwhg hvwlpdwru xvlqj

kr| fohduo| rxwshuirupv xqzhljkwhg hvwlpdwlrq zlwk dq| zlqgrz zlgwk1 +Zh suhvhqw/

wkhuhiruh/ rqo| wkh frpsdulvrq ehwzhhq zhljkwhg hvwlpdwlrq xvlqj kr| dqg xqzhljkwhg

ghqvlw| hvwlpdwlrq xvlqj kW1, Khuh vwudwxp 5 lv vdpsohg wzlfh dv lqwhqvlyho| dv

44

Page 14: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

vwudwxp 4/ wkxv wkh hohphqwv iurp vwudwxp 5 uhfhlyh d zhljkw wkdw lv kdoi wkdw ri

hohphqwv lq vwudwxp 41 Zh zrxog srlqw rxw wkdw wklv lv qrw d sduwlfxoduo| odujh

gl�huhqfh lq zhljkwv1 Lq pdq| ri wkh gdwd vhwv xvhg lq dssolhg hfrqrplf zrun/

wkh vdpsolqj glvsursruwlrq lv juhdwhu wkdq 43 ehwzhhq fhuwdlq vwudwd/ vr wkh uhvxowv

iurp ljqrulqj wkh zhljkwlqj lq wklv fdvh zloo eh hyhq pruh gudpdwlf zlwk hyhq odujhu

uhvxowlqj eldv1

Iljxuhv 44 dqg 45 looxvwudwh wkh fdvh ri htxdo vwudwd phdqv dqg gl�huhqw yduldqfhv

dqg wkh fdvh ri yduldwlrq ehwzhhq vwudwd ri erwk phdqv dqg vwdqgdug ghyldwlrqv1

Djdlq/ wkh vdph uhvxowv krog1 Odujh eldv lv lqfxuuhg e| ljqrulqj wkh vwuxfwxuh ri wkh

vdpsolqj1

51416 Rswlpdo doorfdwlrq

Li zh kdyh vrph lqirupdwlrq derxw wkh phdqv dqg vwdqgdug ghyldwlrqv lq wkh ydulrxv

vwudwd +shukdsv iurp d suhylrxv vxuyh|,/ fdq zh xvh wkdw lqirupdwlrq wr frqvwuxfw

dq rswlpdo vdpsolqj doorfdwlrq wr plqlpl}h wkh lqwhjudwhg phdq vtxduhg huuru ri

wkh hvwlpdwru ri i+|,B Zh nqrz wkdw lq wkh fdvh ri vwudwl�hg vdpsolqj iru phdq

hvwlpdwlrq wkdw ryhuvdpsolqj +uhodwlyh wr srsxodwlrq sursruwlrqv, vwudwd zlwk kljkhu

yduldqfh fdq jlyh d pruh suhflvh hvwlpdwh ri wkh phdq1 Grhv d vlplodu uhvxow krog

khuhB

Fxulrxvo|/ lw wxuqv rxw wkdw sursruwlrqdo vdpsolqj zloo eh wkh rswlpdo doorfdwlrq

lq doo fdvhv/ jlyhq wkdw zh duh rswlpdoo| fkrrvlqj kr|1

Sursrvlwlrq 5= Li wkh ghqvlwlhv ri vwudwd 4 wkurxjk P duh jlyhq dv j� wkurxjk

j� > wkh srsxodwlrq ghqvlw| i+|, lv hvwlpdwhg xvlqj d nhuqho ghqvlw| vdwlvi|lqj +D4,/

dqg wkh zlqgrz zlgwk lv fkrvhq dv +4;,/ wkhq wkh vdpsolqj doorfdwlrq zklfk plqlpl}hv

wkh lqwhjudwhg phdq vtxduhg huuru ri ai�+|, lv vdpsolqj sursruwlrqdo wr vwudwxp vl}h/

q� @ q��=

Surri= Wkh lqwhjudwhg phdq vtxduhg huuru ri ai�+|, lv

4

k

�]+N+#,,2 g#

� �[�'�

�2�

q�.

ke

7�22

]%

%�[�'�

��j��

&2g{

45

Page 15: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

Uhsodflqj k zlwk kr| |lhogv

LPVH+ ai�+|,, @8

7�

2

D

2

�]+N+#,,2 g#

�e

D

3C]

%

%�[�'�

��j��

&2g{

4D

D#

�[�'�

�2�

q�

$e

D

@ nW=

#�[�'�

�2�

q�

$e

D

Li zh plqlpl}h wklv txdqwlw| zlwk uhvshfw wr q�> ===> q� frqvwudlqhg e|S

q� @ q/ d

w|slfdo htxdwlrq zloo eh

CLPVH

Cq�@ �

7

8nW

#�[�'�

�2�

q�

$3�

D

�2�

q2�

. � @ 3

zkhuh � lv wkh Odjudqjh pxowlsolhu1 Li zh vroyh iru �/ pxowlso| erwk vlghv ri wkh

htxdwlrq e| q2�/ wdnh wkh vtxduh urrw ri erwk vlghv ri wkh htxdwlrq/ dqg vroyh iru

s�S

q� @s�q @

�e

DnW��2

�S�

�'�

w2

?�

�3

�f

= Uhsodflqj � lq wkh deryh htxdwlrq wkhq

surylghvw2

?2

@ �

?2dqg q� @ q�� 1

Wklv lv d vrphzkdw vxusulvlqj uhvxow jlyhq wkh lqwxlwlrq iurp wkh phdq hvwlpdwlrq

sureohp1 Krzhyhu/ lq wklv fdvh/ zh duh qrw hvwlpdwlqj dq| vlqjoh srlqw iurp hdfk

vwudwxp/ exw lqvwhdg wkh hqwluh glvwulexwlrq1 Hyhq iurp d vwudwxp zkrvh glvwulexwlrq

kdv d vpdoo yduldqfh zh zloo qhhg d vdpsoh vl}h vx!flhqwo| odujh wr hvwlpdwh wkh

frqwulexwlrq ri wkdw vwudwxp wr wkh ryhudoo srsxodwlrq ghqvlw|1

� �������� �� ����

Wklv sdshu lv dq dwwhpsw wr ehjlq xqli|lqj wkh olwhudwxuhv rq vxuyh| ghvljq dqg

qrqsdudphwulf ghqvlw| hvwlpdwlrq1 Dv vxfk zh ehjlq e| dqdo|}lqj wkh soxj0lq zlqgrz

zlgwk iru qrupdo gdwd/ wkh srlqw ri ghsduwxuh iru prvw wkhruhwlfdo frqvlghudwlrqv ri

qrqsdudphwulf ghqvlw| hvwlpdwlrq dv zhoo dv d xvhixo edqgzlgwk wr jhqhudwh d �uvw

jxhvv dw wkh glvwulexwlrq ru iru xvh dv d vwduwlqj srlqw iru rwkhu gdwd0gulyhq edqgzlgwk

vhohfwlrq whfkqltxhv vxfk dv qhduhvw qhljkeru dqg furvv0ydolgdwlrq1 Lw lv lqvwuxfwlyh

wr vhh krz wkh vwdqgdug uhvxowv fkdqjh zkhq vwudwl�fdwlrq lv lqwurgxfhg1

Wkh iudphzrun khuh lv jhqhudo dqg grhv qrw ghshqg xsrq dq| plqlpdo vwudwd

vdpsoh vl}hv1 Iru wkh vlpsoh h{dpsohv frqvlghuhg lq wkh vlpxodwlrqv/ lw pd| eh wkdw

xvlqj d gl�huhqw edqgzlgwk iru hdfk vwudwxp/ hvwlpdwlqj lqglylgxdo vwudwxp0vshfl�f

ghqvlwlhv dqg wkhq frpelqlqj wkhp xvlqj +45, zloo surylgh dq dghtxdwh dowhuqdwlyh1

46

Page 16: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

Krzhyhu iru fdvhv zkhuh wkhuh duh pdq| vwudwd/ vrph zlwk rqo| d kdqgixo ri hohphqwv/

vxfk d whfkqltxh lv qrw ihdvleoh1 Wkh whfkqltxh suhvhqwhg lq wklv sdshu/ wr fkrrvh

rqh edqgzlgwk iru doo wkh gdwd zklfk wdnhv lqwr dffrxqw wkh vwudwd gl�huhqfhv/ zloo

zrun lq wklv fdvh1 Ri frxuvh/ nqrzohgjh ri vwudwxp0vshfl�f phdqv dqg yduldqfhv

+ru dffhvv wr uhdvrqdeoh hvwlpdwhv wkhuhri, lv qhfhvvdu|1 Wklv lv d sureohp zklfk lv

iuhtxhqwo| idfhg e| vxuyh| vwdwlvwlfldqv lq ghvljqlqj dq rswlpdo doorfdwlrq1 Xvlqj

suhwhvwv/ suhylrxv vxuyh| vdpsohv/ ru vlpsoh djjuhjdwlrq uxohv wr frpelqh vlplodu

vwudwd duh doo zd|v durxqg wklv sureohp/ wkrxjk doo duh lpshuihfw1 Wkh whfkqltxh

grhv qrw uholhyh wkh uhvhdufkhu ri wkh qhhg wr pdnh lqwhooljhqw fkrlfhv dffruglqj wr

wkh sduwlfxodu dssolfdwlrq1

Pdq| sureohpv uhpdlq wr eh dgguhvvhg1 Rqh sureohp iru hfrqrplvwv lv wkh

ghduwk ri lqirupdwlrq rq wkh vxuyh| ghvljq ehklqg wkh gdwd1 Rffdvlrqdoo| zh nqrz

vrphwklqj derxw vxuyh| zhljkwv/ uduho| gr zh nqrz zklfk revhuydwlrqv frph iurp

zklfk sduwlfxodu vwudwd ru foxvwhuv1 Odfn ri vxfk lqirupdwlrq uhpdlqv dq lpshglphqw

wr lpsurylqj rxu dqdo|wlfdo whfkqltxhv dqg zh qhhg wr pdnh d pruh frqfhuwhg h�ruw

wr kdyh vxfk lqirupdwlrq lqfoxghg zlwk gdwd1

Dfnqrzohgjhphqwv

L dp judwhixo wr Dpdq Xoodk iru klv vxjhvwlrqv dqg frpphqwv rq wklv sdshu1 L

kdyh dovr ehqh�wwhg iurp frqyhuvdwlrqv zlwk Fkulv Vnhhov dqg wkh frpphqwv ri sdu0

wlflsdqwv lq vhplqduv dw Jhrujld Vwdwh Xqlyhuvlw|/ wkh X1V1 Exuhdx ri Oderu Vwdwlvwlfv/

Xqlyhuvlw| ri Fdoliruqld/ Ulyhuvlgh/ wkh Hfrqrphwulf Vrflhw| Dxvwudodvldq Phhwlqjv

4<<;/ Xqlyhuvlw| ri V|gqh| dqg wkh Xqlyhuvlw| ri Qhz Vrxwk Zdohv1

47

Page 17: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

����������

^4` Euhxqlj/ U1/ 5334/ Ghqvlw| Hvwlpdwlrq iru Foxvwhuhg Gdwd/ Hfrqrphwulf Uhylhzv/

iruwkfrplqj1

^5` Ghdwrq/ D1/4<<:/ Wkh Dqdo|vlv ri Krxvhkrog Vxuyh|v= D Plfurhfrqrphwulf Ds0

surdfk wr Ghyhorsphqw Srolf| +Mrkqv Krsnlqv Xqlyhuvlw| Suhvv/ Edowlpruh,1

^6` Kdugoh/ Z1/ 4<<3/ Dssolhg Qrqsdudphwulf Uhjuhvvlrq +Fdpeulgjh Xqlyhuvlw|

Suhvv/ Qhz \run,1

^7` Nlvk/ O1/ 4<98/ Vxuyh| Vdpsolqj +Mrkq0Zloh|/ Qhz \run,1

^8` Sdjdq/ D1 dqg D1 Xoodk/ 4<<</ Qrqsdudphwulf Hfrqrphwulfv/ +Fdpeulgjh Xql0

yhuvlw| Suhvv/ Qhz \run,1

^9` Sdu}hq/ H1/ 4<95/ Rq Hvwlpdwlrq ri d Suredelolw| Ghqvlw| Ixqfwlrq dqg Prgh/

Dqqdov ri Pdwkhpdwlfdo Vwdwlvwlfv 66/ 4398043:91

^:` Sxgqh|/ V1/ 4<;</ Prghoolqj Lqglylgxdo Fkrlfh= Wkh Hfrqrphwulfv ri Fruqhuv/

Nlqnv/ dqg Krohv +Edvlo Eodfnzhoo/ R{irug,1

^;` Urvhqeodww/ P1/ 4<89/ Uhpdunv rq Vrph Qrqsdudphwulf Hvwlpdwhv ri Ghqvlw|

Ixqfwlrq/ Dqqdov ri Pdwkhpdwlfdo Vwdwlvwlfv 5:/ ;65�;6:1

^<` Vloyhupdq/ E1/ 4<;9/ Ghqvlw| Hvwlpdwlrq iru Vwdwlvwlfv dqg Gdwd Dqdo|vlv +Fkds0

pdq dqg Kdoo/ Orqgrq,1

^43` Vkd}dp Xvhu*v Uhihuhqfh Pdqxdo/ Yhuvlrq ;13/ 4<<:/ PfJudz0Kloo

^44` Vxuyh| ri Lqfrph dqg Surjudp Sduwlflsdwlrq/ Xvhuv Jxlgh/ 4<<4/ +X1V1 Ghsduw0

phqw ri Frpphufh/ Hfrqrplfv dqg Vwdwlvwlfv Dgplqlvwudwlrq/ Exuhdx ri wkh

Fhqvxv/ Zdvklqjwrq/ GF,1

^45` Wkrpsvrq/ V1/ 4<<5/ Vdpsolqj +Mrkq Zloh| ) Vrqv/ Qhz \run,1

^46` Xoodk/ D1 dqg U1 Euhxqlj/ 4<<;/ Hfrqrphwulf Dqdo|vlv lq Frpsoh{ Vxuyh|v/ lq=

G1 Jlohv dqg D1 Xoodk/ hgv1/ Kdqgerrn ri Dssolhg Hfrqrplf Vwdwlvwlfv +Nohzhu/

Qhz \run,1

48

Page 18: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

Table 1Comparison of IMSE from weighted and unweighted estimation

Identical Standard deviations IMSE(hst)/

h*hst IMSE(h*) IMSE(hst) IMSE(h*)

Proportional 0.00 0.26626 0.26607 0.00133 0.00133 1.00000Sampling: 0.50 0.28290 0.27455 0.00129 0.00128 0.99815

1.00 0.33283 0.30181 0.00119 0.00117 0.979221.50 0.41603 0.34725 0.00110 0.00102 0.926102.00 0.53252 0.36956 0.00135 0.00095 0.705492.50 0.68229 0.34059 0.00375 0.00104 0.276223.00 0.86535 0.31439 0.01320 0.00112 0.084963.50 1.08168 0.30216 0.03859 0.00117 0.030244.00 1.33130 0.29899 0.09292 0.00118 0.012694.50 1.61420 0.30016 0.19669 0.00118 0.005975.00 1.93039 0.30242 0.38729 0.00117 0.00301

Dis-proportional 0.00 0.26626 0.27243 0.00146 0.00146 0.99897

Sampling: 0.50 0.28290 0.28112 0.00141 0.00141 0.99992n2/n1=2 1.00 0.33283 0.30903 0.00130 0.00128 0.98825

1.50 0.41603 0.35556 0.00118 0.00112 0.944662.00 0.53252 0.37840 0.00142 0.00105 0.739152.50 0.68229 0.34874 0.00380 0.00114 0.299483.00 0.86535 0.32191 0.01324 0.00123 0.093103.50 1.08168 0.30939 0.03862 0.00128 0.033214.00 1.33130 0.30615 0.09295 0.00130 0.013954.50 1.61420 0.30735 0.19671 0.00129 0.006575.00 1.93039 0.30966 0.38731 0.00128 0.00331

Identical means IMSE(hst)/

h*hst IMSE(h*) IMSE(hst) IMSE(h*)

Proportional 1.00 0.26626 0.26607 0.00133 0.00133 1.00000Sampling: 1.50 0.43267 0.31478 0.00145 0.00112 0.77163

2.00 0.66565 0.33664 0.00363 0.00105 0.288852.50 0.96519 0.34506 0.01280 0.00102 0.079823.00 1.33130 0.34834 0.04341 0.00101 0.023323.50 1.76397 0.34971 0.13070 0.00101 0.007724.00 2.26321 0.35034 0.35068 0.00101 0.002874.50 2.82901 0.35066 0.85215 0.00101 0.001185.00 3.46138 0.35082 1.90508 0.00101 0.000535.50 4.16031 0.35092 3.97038 0.00101 0.000256.00 4.92581 0.35097 7.79642 0.00101 0.00013

Dis-proportional 1.00 0.26626 0.27243 0.00146 0.00146 0.99897Sampling: 1.50 0.43267 0.32231 0.00153 0.00123 0.80293n2/n1=2 2.00 0.66565 0.34470 0.00368 0.00115 0.31292

2.50 0.96519 0.35333 0.01284 0.00112 0.087493.00 1.33130 0.35668 0.04343 0.00111 0.025623.50 1.76397 0.35809 0.13072 0.00111 0.008484.00 2.26321 0.35873 0.35070 0.00111 0.003164.50 2.82901 0.35905 0.85216 0.00111 0.001305.00 3.46138 0.35922 1.90509 0.00111 0.000585.50 4.16031 0.35932 3.97039 0.00111 0.000286.00 4.92581 0.35937 7.79643 0.00110 0.00014

12 µµ −

12 /σσ

Page 19: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh
Page 20: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh
Page 21: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh
Page 22: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

Table 2Results of simulation exercise

Weighted Non-Parametric Density Estimation for Stratified Samples

Average results of 1000 repetitions

Stratum 1 values fixed: n1=1000

n2 hst ha h* FigureProportional 0 1 1000 0.23159 0.19553 0.23138 4 No difference between stratasampling: 2 1 1000 0.32764 0.27818 0.32071 5 Strata differ only by mean

3 1 1000 0.41773 0.35468 0.27351 6 Strata differ only by mean0 3 1000 0.51800 0.31662 0.30318 7 Strata differ only by standard deviation3 3 1000 0.62402 0.49159 0.30584 8 Both means and standard deviations differ

Non-proportional 2 1 2000 0.29363 0.24931 0.30272 9 Strata differ only by meansampling: 3 1 2000 0.37006 0.31420 0.25824 10 Strata differ only by mean

0 3 2000 0.53760 0.36005 0.28605 11 Strata differ only by standard deviation3 3 2000 0.61692 0.52379 0.28921 12 Both means and standard deviations differ

11 =σ01 =µ

2µ 2σ

Page 23: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh
Page 24: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh
Page 25: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh
Page 26: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh
Page 27: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh
Page 28: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh
Page 29: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh
Page 30: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh
Page 31: Nonparametric Density Estimation for Stratified Samples · Nonparametric Density Estimation for Stratified Samples ... Wkh surshuwlhv ri hnuqho0edvhg qrq0sdudphwulf ghqv ... zkhuh

Recommended