+ All Categories
Home > Documents > Refining decisions after losing data: the unlucky broker problem

Refining decisions after losing data: the unlucky broker problem

Date post: 21-Jan-2023
Category:
Upload: ufrj
View: 0 times
Download: 0 times
Share this document with a friend
11
1980 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 4, APRIL 2010 Refining Decisions After Losing Data: The Unlucky Broker Problem Stefano Marano, Vincenzo Matta, and Fabio Mazzarella Abstract—Consider a standard statistical hypothesis test, leading to a binary decision made by exploiting a certain dataset. Suppose that, later, part of the data is lost, and we want to refine the test by exploiting both the surviving data and the previous decision. What is the best one can do? Such a question, here referred to as the unlucky broker problem, can be addressed by very standard tools from detection theory, but the solution gives intriguing insights and is by no means obvious. We provide the general form of the optimal detectors and discuss in depth their modus operandi, ranging from simple likelihood ratio tests to more complex behaviors. Limiting cases, where either the surviving data or the initial decision is almost useless, are also discussed. Index Terms—Decision refinement, detection, detection with side information. I. MOTIVATION A. The Unlucky Broker B ERNARD is an unlucky broker. He is faced with the problem of suggesting an appropriate portfolio assess- ment to his customers by using two data sets: one in the public domain and another made of certain confidential informa- tion he has received. To suggest the appropriate investments, Bernard must decide between, say, a positive or a negative market trend during this week. His decision criterion is to fix a certain probability of wrongly predicting a positive trend, and to maximize the corresponding probability of correctly making such prediction. Bernard makes his decision. Just before visiting his customers to communicate his de- cision and to propose them the appropriate course of action, Bernard is informed that his colleague Charlie lost his job as a consequence of having proposed many bad investments. Charlie, indeed, wrongly predicted a positive trend of the market the week before. At this point, Bernard wants to revise his own decision: he must ensure that the probability of wrongly predicting a positive trend be much lower than that initially thought. Clearly, he understands that this would unavoidably imply also a much lower probability of correctly predicting a positive trend and, consequently, that his potential income will be substantially decreased. However, saving his job is Bernard’s main priority. Unfortunately, the unlucky Bernard has lost the files con- taining the confidential information. His new decision about the Manuscript received May 06, 2009; accepted December 20, 2009. First pub- lished January 12, 2010; current version published March 10, 2010. The as- sociate editor coordinating the review of this manuscript and approving it for publication was Prof. Andreas Jakobsson. The authors are with the Department of Information and Electrical En- gineering, University of Salerno, I-84084, Fisciano (SA), Italy (e-mail: [email protected]; , [email protected]; [email protected]). Digital Object Identifier 10.1109/TSP.2010.2040412 weekly trend must be based only on the public domain data set. What should Bernard do? Should he simply retain the previous decision, or should he ignore that, and use only the currently available data set for a completely new decision? Or, what else? B. To Operate or Not to Operate Mark is a medical doctor and Paul is one of his patients. Based on the ensemble of certain clinical parameters, Mark should make a decision about two alternatives: to operate or not to op- erate on Paul. As in the previous case, the decision is made so as maximize the probability of operating when it is appropriate while limiting the probability of an unnecessary operation. Mark makes his decision about the surgery. The day after, Mark is informed that many patients operated upon for the same disease that is suspected to affect Paul have incurred heart troubles. At this point, Mark wants to revise his original decision in order to cope with a much smaller proba- bility of making an unnecessary surgery. Unfortunately, part of the clinical tests are no longer available. Can we suggest what Mark should do at this point? He remembers his original deci- sion, but has now available only a fraction of the clinical tests made by Paul. Is it better to maintain the original decision (after all, that was made using a more complete clinical picture), or is the original decision useless and a completely new decision based on the currently available data must be conceived? Per- haps, Mark’s intuition suggests that neither of these two ex- tremes is the best option. But it is not obvious how to proceed. C. Abstraction In statistical terminology the problem considered can be de- scribed as follows. One is initially faced with a standard binary statistical test between two simple hypotheses, such that a deci- sion obeying the Neyman–Pearson (NP) optimality criterion is made, on the basis of the available data set made of independent, identically distributed (i.i.d.) observations. Then, the vector is lost and one wants to make a new decision between the binary state of the nature using the pair , see Fig. 1. Note that a similar problem arises in decentralized detection with tandem (serial) architecture, in which each unit makes a de- cision on the basis of its own data and of the decision made by one neighbor; an abundant literature is available on this topic, see, e.g., [1]–[7]. In our case, however, the decision does de- pend upon , and this makes the problem essentially different from that considered in the literature. If the decision must be taken at the same false alarm level as , then data become irrelevant and the best one can do is to retain the original inference: . This, as we will show, is well known and understood. On the other hand, it is interesting to investigate what happens when a different false alarm level is 1053-587X/$26.00 © 2010 IEEE
Transcript

1980 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 4, APRIL 2010

Refining Decisions After Losing Data:The Unlucky Broker Problem

Stefano Marano, Vincenzo Matta, and Fabio Mazzarella

Abstract—Consider a standard statistical hypothesis test,leading to a binary decision made by exploiting a certain dataset.Suppose that, later, part of the data is lost, and we want to refinethe test by exploiting both the surviving data and the previousdecision. What is the best one can do? Such a question, herereferred to as the unlucky broker problem, can be addressed byvery standard tools from detection theory, but the solution givesintriguing insights and is by no means obvious. We provide thegeneral form of the optimal detectors and discuss in depth theirmodus operandi, ranging from simple likelihood ratio tests to morecomplex behaviors. Limiting cases, where either the survivingdata or the initial decision is almost useless, are also discussed.

Index Terms—Decision refinement, detection, detection with sideinformation.

I. MOTIVATION

A. The Unlucky Broker

B ERNARD is an unlucky broker. He is faced with theproblem of suggesting an appropriate portfolio assess-

ment to his customers by using two data sets: one in the publicdomain and another made of certain confidential informa-tion he has received. To suggest the appropriate investments,Bernard must decide between, say, a positive or a negativemarket trend during this week. His decision criterion is to fix acertain probability of wrongly predicting a positive trend, andto maximize the corresponding probability of correctly makingsuch prediction. Bernard makes his decision.

Just before visiting his customers to communicate his de-cision and to propose them the appropriate course of action,Bernard is informed that his colleague Charlie lost his jobas a consequence of having proposed many bad investments.Charlie, indeed, wrongly predicted a positive trend of themarket the week before. At this point, Bernard wants to revisehis own decision: he must ensure that the probability of wronglypredicting a positive trend be much lower than that initiallythought. Clearly, he understands that this would unavoidablyimply also a much lower probability of correctly predictinga positive trend and, consequently, that his potential incomewill be substantially decreased. However, saving his job isBernard’s main priority.

Unfortunately, the unlucky Bernard has lost the files con-taining the confidential information. His new decision about the

Manuscript received May 06, 2009; accepted December 20, 2009. First pub-lished January 12, 2010; current version published March 10, 2010. The as-sociate editor coordinating the review of this manuscript and approving it forpublication was Prof. Andreas Jakobsson.

The authors are with the Department of Information and Electrical En-gineering, University of Salerno, I-84084, Fisciano (SA), Italy (e-mail:[email protected]; , [email protected]; [email protected]).

Digital Object Identifier 10.1109/TSP.2010.2040412

weekly trend must be based only on the public domain data set.What should Bernard do? Should he simply retain the previousdecision, or should he ignore that, and use only the currentlyavailable data set for a completely new decision? Or, what else?

B. To Operate or Not to Operate

Mark is a medical doctor and Paul is one of his patients. Basedon the ensemble of certain clinical parameters, Mark shouldmake a decision about two alternatives: to operate or not to op-erate on Paul. As in the previous case, the decision is made soas maximize the probability of operating when it is appropriatewhile limiting the probability of an unnecessary operation. Markmakes his decision about the surgery.

The day after, Mark is informed that many patients operatedupon for the same disease that is suspected to affect Paul haveincurred heart troubles. At this point, Mark wants to revise hisoriginal decision in order to cope with a much smaller proba-bility of making an unnecessary surgery. Unfortunately, part ofthe clinical tests are no longer available. Can we suggest whatMark should do at this point? He remembers his original deci-sion, but has now available only a fraction of the clinical testsmade by Paul. Is it better to maintain the original decision (afterall, that was made using a more complete clinical picture), oris the original decision useless and a completely new decisionbased on the currently available data must be conceived? Per-haps, Mark’s intuition suggests that neither of these two ex-tremes is the best option. But it is not obvious how to proceed.

C. Abstraction

In statistical terminology the problem considered can be de-scribed as follows. One is initially faced with a standard binarystatistical test between two simple hypotheses, such that a deci-sion obeying the Neyman–Pearson (NP) optimality criterionis made, on the basis of the available data set made ofindependent, identically distributed (i.i.d.) observations. Then,the vector is lost and one wants to make a new decisionbetween the binary state of the nature using the pair , seeFig. 1.

Note that a similar problem arises in decentralized detectionwith tandem (serial) architecture, in which each unit makes a de-cision on the basis of its own data and of the decision made byone neighbor; an abundant literature is available on this topic,see, e.g., [1]–[7]. In our case, however, the decision does de-pend upon , and this makes the problem essentially differentfrom that considered in the literature.

If the decision must be taken at the same false alarm levelas , then data become irrelevant and the best one can do is toretain the original inference: . This, as we will show, iswell known and understood. On the other hand, it is interestingto investigate what happens when a different false alarm level is

1053-587X/$26.00 © 2010 IEEE

MARANO et al.: REFINING DECISIONS AFTER LOSING DATA 1981

Fig. 1. Conceptual scheme of the unlucky broker problem:��� and ��� representdata, while ������ ���� and � ������� ����� are optimal decisions made by the two en-tities � and �, at false alarm levels � and � , respectively. The focus hereis on how entity �, called unlucky broker, upon observing ��� and the previousdecision �, exploits such information in order to make the optimal decision � .Note that if entity� had no access to��� , then the problem would reduce to thatof a two-sensor tandem distributed detection, as mentioned in Section I-C.

desired. Since and are both binary variables, the issue canbe rephrased by saying that our interest is in understanding whenand why the original decision should be retained, or flipped to

.We will consider such problems in this paper, which is orga-

nized as follows. In Section II we formalize the problem. Thesolution exploits standard tools from detection theory and isprovided in Section III. In Section IV two specific problemsof practical interest are discussed: the Gaussian shift-in-meanand the exponential shift-in-scale hypothesis tests. The mainfindings for arbitrary distributions are discussed in Section V.Section VI deals with a Bayesian variation of the problem, whileSection VII concludes the paper. The appendix contains somemathematical derivations.

II. PROBLEM FORMALIZATION

The examples discussed so far pose a common inferenceproblem that we call that of the unlucky broker. Now, we pro-vide a suitable formalization of the described situations. Then,the unlucky broker problem is solved by exploiting standardtools from statistical decision theory, and its special featuresare emphasized.

We use upper case letters to denote random variables, withthe corresponding lower case letters representing the associatedrealizations; vectors are displayed in bold.

Let and betwo independent continuous-valued random vectors. The fol-lowing binary hypothesis test is to be solved:

(1)

where , and . In the above,is the marginal probability density function (pdf,

for short) of the variable , under hypothesis . Similarly,is the pdf under , while and

are the corresponding quantities pertaining to . We assumethroughout the paper that both and have mutually i.i.d.entries.

The optimal NP strategy (see, e.g., [8]–[10]) amounts to com-paring the log-likelihood ratio

(2)

to a threshold , yielding the decision rule

Borrowing standard terminology from detection theory, weintroduce the false alarm and detection probabilities

With reference to Fig. 1, the basic problem addressed in thispaper is as follows. A certain entity (say, ) observes the data

, and accordingly implements an optimal NP test at a pre-scribed false alarm level , ending up with a decisionat (the best) detection probability level . Another entity, say

, observes a portion of the data, namely, the vector , and alsoreceives from its decision . Note that has no accessto the data vector . The goal is to make the best NP decision,say (the symbol “ ” is mnemonic of “broker”),based upon the data accessible at site , i.e., the observationvector and the binary decision .

The road toward deriving the optimal detector based uponthe set of information available to is well traced andunderstood: it amounts to computing the likelihood-ratio ofthe pair and comparing it to a threshold. From a signalprocessing perspective, the interest is in characterizing the realmodus operandi of the detector and, in this connection, severalquestions arise. Should simply retain the original decision? One could naively guess that a decision based upon

should be better than that available to , where only the reduceddata set can be exploited. After a closer look, this choiceexhibits a severe limitation, in that it does not leave to entity

the chance of changing the false alarm probability originallyset by .

At the other extreme, should simply ignore andimplement an optimal NP decision based on , i.e., a likeli-hood ratio threshold test on these data? While it is true that thedata are irremediably lost, the (one-bit) quantized informa-tion packed into acts as a side information [11], [12] (or,more specifically, side decision) for the detector , integratingthe data still available, and, as such, it should be taken intoaccount. Accordingly, in general, the answer is expected to beat neither of these extremes.

It should be clear that, as long as works at the same falsealarm level chosen by , then is the best possible finaldecision. On the other hand, when the false alarm constraint isdifferent, the decision may have to be refined. Thus the decisionrefinement, as used in the title of this paper, refers to the pos-sibility allowed to of working at a different false alarm levelthan that used at site and the consequent potential change inthe optimum decision made by based on the available data

1982 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 4, APRIL 2010

and decision obtained at . It remains true that the ReceiverOperating Characteristic (ROC) achievableby must be upper bounded by the ROC avail-able at site . Therefore, here refinement does not refer to thefinal operating characteristic, but rather to the possibility of ar-bitrarily modifying the optimization constraint.1

III. THE UNLUCKY BROKER’S OPTIMAL DECISION

A. Structure of the Optimal Decision Maker

The data set available at site is the vector and the deci-sion , a binary variable, taken at site . The corre-sponding detection statistic can, thus, be written as the ratio [9]

(3)

where and can be explicitly written,for , as:

(4)

and also further expanded as

where conditioning has been removed thanks to the assumedindependence between and .

Let andbe, respectively, the false alarm and

detection probabilities of an optimal NP test based upon . Weobtain (5), shown at the bottom of the page.

By considering separately the two cases , it is readilyseen that the log-likelihood ratio can be written as

(6)

where, by convention, we set2 .Equation (6) emphasizes that the detection statistic is only a

function of the decision and of the log-likelihood ratio

1It should be also clear that the notion of decision refinement is not to beconfused with that of sample-space refinement used in information theory, see,e.g., [13].

2As can be seen, (6) is indeterminate either when � ��� � � ��� � �, orwhen � ��� � � ��� � �. However, note that, according to (4), � �� �� ������ � � �� � � ������ � � means ���� � ��� ������ � ���� ���� ������ � �, and, similarly, � �� � � ������ � � �� � � ������ �� means ���� � ��� ������ � ���� � ��� ������ � �. Roughlyspeaking, whenever � ��� � � ��� � �, � � �, and whenever � ��� �� ��� � �, � � �. In these cases, (5) tell that � ����� �� � � �����, such thatformula (6) can be safely used by employing the indicated convention.

of data , namely . This suggests defining, with a slightabuse of notation, the random variable . Now, byfurther defining

(7)

we get

(8)

Note that, in general, and may grow at differentrates with their argument , and the functions in (8) arenot necessarily monotone with .

The optimal decision rule for entity is well known to be

(9)

where is a threshold set to ensure a false alarm probabilityof .

All throughout the paper, we shall assume that the randomvariable has no point masses, a condition grounded onthe original working hypothesis of continuous-valued ’s and

’s. Note also that the functions in (7) are continuous.

B. How the Detector Works: A Closer Look

It is of interest to understand how the detector at siteworks, in comparison with the decision maker at site . Inpractice, we want to understand when and why changes thedecision made by . As a matter of fact, the decisionrule has some physical interpretation and implications whichcan be grasped by exploring the properties of the functions

and defined in (7). By the concavity of the ROC, it follows [8]

yielding

(10)

Now, let us choose a value for the detection thresholdlarger than the threshold used by . Suppose thata decision has been made by entity . In this case, fromthe second inequality in (10) we get . The im-mediate implication, in view of (9), is that the decisionis never to be changed. The opposite case that , impliescomparing the detection statistic with the threshold .Looking at the first of (10), we have . This, however,does not help, in general, for knowing whether ,since .

(5)

MARANO et al.: REFINING DECISIONS AFTER LOSING DATA 1983

Fig. 2. Functions � �� � and � �� � (upper plot) and final decisions in the“plane” �� � �� (lower plot) for the exponential shift-in-scale problem with� � � � �, � � �, � � � and � � ���. The grey regions in thelower plot mean that � � �, i.e., the unlucky broker there decides for� ; thedecision is � � � in correspondence of the white regions.

The rule employed by the detector can be summarized bystating that, in the region , a decision in favor ofmust be retained, while a decision in favor of needs a furthercheck: it is retained only if is larger than, or equal to, thenew threshold ; see also Fig. 2. Similar arguments apply to thedual case : a decision in favor of is always retained,while a decision in favor of is retained only if is lessthan the threshold . Finally, for , the original decisionis retained with probability one: the unlucky broker ignores ,and makes decisions with the same detection and false alarmprobabilities of entity .

The behavior of the detector , which optimally solves theunlucky broker problem, is summarized in the following defi-nition, where denotes the unit step function, and where thearguments of and are omitted for nota-tional simplicity.

Definition 1: is a “multiple-threshold” detector with sidedecision if

forfor

(11)

Suppose now that the functions and defined in(7) are invertible and strictly increasing. Then comparingwith is tantamount to comparing the log-likelihood to

, and the same holds when comparing with .Let

(12)

The detector then simplifies to the following.Definition 2: is a “single-threshold” detector with side

decision if

forfor

(13)

The terminology should be clear. In both cases, detectorexploits (whence the reference to the side decision) anddata , the latter entering in the computation only through theirlog-likelihood . In the single-threshold case, is com-pared to a single threshold, i.e., works, as regard to , just asa log-likelihood threshold test. The multiple-threshold detector,instead, may involve much more tricky log-likelihood compar-isons: it requires checking whether belongs to some arbi-trarily shaped subset of the real line, that typically amounts tocomparing to a set of different thresholds defining theseregions.

At first glance, the existence of multiple-threshold detectorswith side decision might appear counterintuitive. Recall how-ever that a single-threshold detector requires monotonicity ofthe terms that is not always guaranteed, as stated in thecomment just below (8). Later, we shall return to this point.

C. Performance Evaluation

The performance pertaining to entity can be evaluated byappealing to (11). Let us first consider the case . We have

(14)

where is the pdf of the random variable underthe alternative hypothesis. The integral in (14) represents theprobability that the original decision (although correct) ischanged by entity ; recall that is the detection probabilityof the detector operating on the full data set.

Similarly, for the false alarm probability one gets

In the opposite case of , the same arguments yield

IV. SPECIFIC DECISION PROBLEMS OF PRACTICAL INTEREST

We now discuss two specific hypothesis tests, commonly en-countered in practice: the Gaussian shift-in-mean, and the ex-ponential shift-in-scale, starting with the latter.

A. Exponential Shift-in-Scale

Let us consider the following decision problem:

(15)

where , , and is ourshortcut for an exponential pdf with expectation . Withoutloss of generality, we assume .

1984 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 4, APRIL 2010

Proposition 1: Under the exponential shift-in-scale model(15), the optimal solution to the unlucky broker problem is pro-vided by a single-threshold detector .

To prove this claim, let us start by considering the pertinentlog-likelihood ratios:

(16)

Let , ,, . Also, let

be a Gamma pdf with shape parameter and scaleparameter [14]. The random variables and are dis-tributed as follows3

(17)

where , denotes the hypothesis. Introducing now thefunction , for , and

(18)

otherwise [15], [16], we get

(19)

and

(20)

where we used the fact that and that, to avoid trivialities,. We note explicitly that, complying with the dis-

cussion in footnote 2 , for all , since, we have , and it makes no sense

to consider outside the range .Using these relationships, the proof of Proposition 1, which

amounts to showing that the (continuous) functions andfor the exponential shift-in-scale problem (15) are strictly

increasing, is detailed in the Appendix. There, it is also shownthat at the extremes of their domain, and verify

(21)

One implication is that in the region the thresholdis to be chosen as . Note the gap between

the original threshold and the minimum of at which thenew threshold begins to make a difference. Similarly, in the

3With expressions like � � � � ������ we mean that � is a randomvariable distributed as the sum of the constant � plus a random variable withdistribution � �����. Similarly, later we shall write ��������� � �� todenote the probability that a random variable distributed according to ������exceeds �.

region , the interesting region is . Forthese meaningful choices, we can state the following proposi-tion, whose proof is deferred to the Appendix. Let

(22)

Proposition 2: Under the exponential shift-in-scalemodel (15), the performance of the single-threshold de-tector is as follows. In the region ,

(23)

(24)

In the region ,

(25)

In the above, and are defined as in (12). In the upperplot of Fig. 2 the functions and are displayed forthe case study , , , and .Note how the horizontal dashed line corresponding to the valueof represents a marked separation between the two curves.We see that both functions are strictly increasing, and that theirlimiting values agree with the predictions of (21).

The lower plot in Fig. 2 illustrates schematically how the de-tector works. In the white regions the final decision is in favorof , while in the grey region a decision in favor of is taken.We see that if the original decision (given on the vertical axis)is , no matter what is, the final decision will be .On the other hand, a decision is retained only if is largeenough. Note also that if approaches , then the original de-cisions are always retained (with probability one).

In Fig. 3, we compare three different systems, namely: theoptimal NP detector having access to the full data ; theoptimal NP detector having access to the data subset ; the de-tector that is the subject of this paper, that having access to thepair . The ROCs of these systems are easily computed forthe first two detectors, and have been derived in Proposition 2for the unlucky broker case. As must be, the optimal detectorusing the pair outperforms that exploiting only , and itis outperformed by the detector that uses the dataset . Notealso how the unlucky broker’s ROC intersects the upper boundat the point whose abscissa is the false alarm level selected byentity . In this point the thresholds of the two detectors are thesame: .

It is also of interest to see what happens in an unbalancedsituation where and are different. In Fig. 4 we display theROCs for this scenario: in the upper plot and ,and the information contained in is expected to have a majorrole in determining the final decision , as compared to the roleof . Indeed, the behavior of detector approaches a limitingcase where the only information used for the final decision is

MARANO et al.: REFINING DECISIONS AFTER LOSING DATA 1985

Fig. 3. ROCs for the exponential shift-in-scale problem with � � � � �,� � �, � � �, and � � ���. Three optimal NP detectors are considered,whose decisions are based on the dataset specified in the legend.

Fig. 4. Comparison between ROCs evaluated in the exponential shift-in-scaleproblem with � � �, � � � (upper plot) and � � �, � � � (lowerplot), � � �, � � �, and � � ���.

the decision taken by , while becomes irrelevant. In such alimit, in order to vary the false alarm rate, the decision makerhas only the choice of randomizing the test among the only threeavailable pairs , , . The first and the third arealways part of any ROC, while the pair [approximately

in the figure] is that pertaining to the decision madeby .

In the lower plot of Fig. 4 the situation is reversed, withand . Here we see that the performances of the systems

are quite close to each other, and the unlucky broker’s ROCclosely approaches that of the detector operating on the data ,for false alarm levels just slightly different from that selectedby entity . In this situation, the initial decision is of minorrelevance and the best option (except at false alarm rate ) isto base the decision almost exclusively on .

B. Gaussian Shift-in-Mean

Let us consider the following hypothesis test

(26)

where , , and standsfor a Gaussian pdf with mean and variance .

Proposition 3: Under the Gaussian shift-in-mean model (26),the optimal solution to the unlucky broker problem is providedby a single-threshold detector .

As for Proposition 1, the proof is deferred to the Appendix,and amounts to showing that the functions and forthe Gaussian shift-in-mean problem (26) are strictly increasing.To characterize these functions, let us start by considering thelog-likelihood ratios, which are

By defining and , therandom variables and are distributed as

(27)

where the negative sign refers to and the positive sign to. The functions and can be evaluated explicitly

in terms of the standard Gaussian exceedance function :

(28)

where the positive sign applies to and the negative signrefers to .

The above expressions are used in the Appendix to proveProposition 3. There, it is also shown that

In Fig. 5, two case studies are considered. In both, , and . The difference is that in

the upper plot we set , while the lower plot refers to. Comments are similar to those made in connection

with Fig. 4: in the upper plot the surviving data are “too noisy”and the original decision is more informative, while inthe lower plot the opposite is true.

V. ARBITRARY DISTRIBUTIONS

Proving that the functions and defined in (7)are strictly monotone, for the observation models (15) and (26),is by no means trivial. Nonetheless, one might guess that thisdifficulty is only a matter of algebra, and that the monotone

1986 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 4, APRIL 2010

Fig. 5. Comparison between ROCs evaluated in the Gaussian shift-in-meanproblem with � � � � �, � � �� (upper plot) and � � �� (lowerplot) and � � � � �.

property may hold under fairly general observation scenarios.We have, however, the following proposition.

Proposition 4: Under arbitrary observation models, the op-timal solution to the unlucky broker problem is provided by amultiple-threshold detector .

The meaning of this claim is that the simplification to a single-threshold detector is a special case and does not hold in general.The proof simply consists in providing an example in which thefunctions and defined in (7) are not monotone.

To this end, we consider now a generalized Gaussian detec-tion problem [17]. Let us introduce the generalized Gaussiandistribution:

(29)

with . The correspondingshift-in-mean hypothesis test can be formalized as

Suppose that , , , and. In the upper plot of Fig. 6 the function ,

obtained numerically, is depicted and we see that it is no longermonotone; the same can be shown to hold for .

This non-monotone behavior has a strong impact on the de-cision rule, as it can be appreciated by looking at the lower plotin Fig. 6. The figure shows the final decisions taken by the de-tector, with a threshold . The decisions are plotted on the“plane” : in the white regions the final decisions are for

, while in the grey regions a final decision for is made. Aspredicted by the theory, since , all the decisionsare retained. However, the rule for retaining the decisionsis markedly different from that observed in the previous cases.Indeed, the grey region is no longer simply connected, meaning

Fig. 6. Function � �� � (upper plot) and final decisions in the “plane” �� � ��(lower plot), for the generalized Gaussian shift-in-mean problem with � �� � �, � � � � �, � � � � � and � � ��. The grey (resp. white)regions in the lower plot mean that � � � (resp. � � �).

that the detector structure does not simply involve the compar-ison of the log-likelihood ratio with a single threshold.

The generalized Gaussian distribution has smooth shape, andit is enough to show that the general approach to the unluckybroker problem requires a multiple-threshold detector. On theother hand, in specific applications involving less regular dis-tributions, the operating modality of the multiple-threshold de-tector can be shown to be even more complicated than that ob-served in Fig. 6.

As an example, let us consider the detection problem (1)in which the variables ’s and ’s are zero-mean Gaussianwith variance under the null hypothesis , while under thealternative hypothesis they come from a Gaussian population,with expectation uniformly selected among allowable values.Therefore, under the variables are identically distributed asa balanced homoscedastic mixture of Gaussian random vari-ables with means , and variance . Formally

and

To get insight about the detector structure, let us consider theabove problem for , , , ,

, and the means ’s selected as equally spaced pointsin the range . The upper plot in Fig. 7 shows the function

, while in the lower plot we display the corresponding de-cision regions at site . It can be noticed that, as in the gener-alized Gaussian problem, the region for retaining the decisionis not simply connected, giving rise to a multiple-threshold de-tector. In this case, however, it can be appreciated how the de-tection regions exhibit a more complex behavior, corroboratingthe physical relevance of Proposition 4.

VI. THE BAYESIAN UNLUCKY BROKER

In this section, we introduce a variation of the unlucky brokerproblem considered so far, making reference to the Bayesian

MARANO et al.: REFINING DECISIONS AFTER LOSING DATA 1987

Fig. 7. Function � �� � (upper plot) and final decisions in the plane �� � ��(lower plot), for the example involving the balanced mixture of Gaussians;values of the parameters are detailed in the main text. As for the previousfigures, the gray (respectively, white) regions in the lower plot mean that� � � (respectively, � � �).

paradigm where a priori probabilities of the hypotheses tobe tested are given. As well-known [9] in this frameworkhypothesis test (1) operating on the data set amountsto comparing the log-likelihood ratio in (2) to a threshold

, where andare the a priori probabilities of the hypotheses,

and is the cost incurred by choosing the hypothesiswhen hypothesis is true. Let us consider a uniform costassignment, that is if and if ,whence the threshold simplifies to .

Let be the optimal Bayesian detector and assume thata decision is made. Suppose that, later, the priors used in thetest are revealed to be inaccurate and, instead, a refined pair ismade available. Unfortunately, at that time, only the observa-tion vector is still available, along with the original decision

. The unlucky broker problem is that of refining this orig-inal decision, taking advantage of the new pair of a priori prob-abilities and exploiting the data set . Itshould be clear that the optimal decision made by the unluckybroker amounts to comparing the log-likelihood ratio in (3) toa new threshold . The correspondingerror probability can be defined as

(30)

where and are the associated detection and false alarmprobabilities.

With an approach that mirrors that developed in the previoussections, and exploiting similar tools, the described detectionproblem can be thus solved, and the above error probability canbe derived. Here, we only comment on the final findings, omit-ting the details of the derivation.

Fig. 8 shows the probability as a function of , forthree different systems: the optimal Bayesian detector havingaccess to the full data set , the optimal Bayesian detectorhaving access to only , and the detector used by the unlucky

Fig. 8. Bayesian error probabilities versus the a-priori probability � , for theGaussian shift-in-mean-problem with � � � � �, � � � � �, � �� � � and � � � � ��. Three detectors are considered, each exploitingthe data shown in the legend.

broker that exploits the pair . Fig. 8 refers to the Gaussianshift-in-mean test (26), with , and .We see that the curves of the detector using the full data set

and that of the one exploiting get in contact whenthe refined version of the priors agrees with the one used in theoriginal test (and therefore the priors are actually not refined atall). Otherwise, the unlucky broker detector outperforms the oneusing only , and is outperformed by the detector using the fulldata.

It is worth noting how, for close to , the error prob-ability of the unlucky broker stays almost constant. This canbe explained by observing that the broker is likely to maintainthe original decision, which was made using a-priori probabil-ities only slightly wrong, and therefore the detection and falsealarm probabilities remain close to those of the original deci-sion: and . Then, since forsymmetric priors and cost functions, we see from (30) that theerror probability must be almost constant. The availability of arefined prior, in this situation, seems of limited use.

Conversely, when moves away from (say,or in Fig. 8), the original decision is considered nolonger so reliable and the original decision is often changed.From Fig. 8 we see that the error probability decreases andrapidly approaches that of the detector exploiting only . In thatregime, say or , the original decision basedon very wrong priors is basically ignored and a Bayesian testbased upon is almost optimal.

A behavior similar to that in Fig. 8 is observed when the ini-tial a-priori is different from 0.5. One main difference, in thatcase, is that the error probability varies almost linearly (insteadof staying constant) with , in the region where is suffi-ciently close to .

VII. SUMMARY AND FINAL REMARKS

We have considered a novel topic in detection theory that wecall the unlucky broker problem, with a broad range of potentialapplications. Consider a statistical test between two hypotheses,exploiting the data set and leading to a decision obeying

1988 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 4, APRIL 2010

the NP optimality criterion. Then, suppose that the vector islost and one wants to make a new decision using the pair

. If the decision is made at the same false alarm levelas the original decision , then is irrelevant and settingis NP-optimal.

When the desired false alarm level for the final decision isdifferent, however, some decisions can be safely retained (i.e.,

), but other requires a deeper analysis. As one might ex-pect, we find that the sufficient statistic for the final decision isthe pair , where is the log-likelihood of vector

: both the original decision , and the data influence the finaldecision , with data playing their role only through the re-lated log-likelihood .

Intuition perhaps might also suggest that the final detectionstructure amounts to comparing with a suitable thresholdlevel (what is commonly called a threshold test). This, however,is not true. In general, we show that the optimal decision consistsof checking whether or not belongs to some subset of thereal axis having a complicated structure.

We have mainly addressed the NP framework, but theBayesian counterpart has also been considered. As to this latter,we note that the refinement is not necessarily due to improve-ment in the knowledge of the priors, as assumed in Section VI.Of practical interest is also the case where the priors are heldfixed and the refinement involves instead a different assignmentof the costs. Since this amounts to a different threshold setting,the two scenarios can be addressed exactly in the same way.

Other variations and extensions of the unlucky brokerproblem can be conceived and may be of interest for theapplications. In particular, two intriguing generalizations ofthe model are as follows. The first refers to some problemsencountered in decentralized detection [18]–[22], and in quan-tization for signal detection [23]–[27]. In fact, while decision

can be interpreted as a maximally compressed version of theoverall log-likelihood , we have assumed so far that

is unquantized. The interesting issue is how it should becompressed, in light of the final detection task to be carried outby entity .

In this respect, it is well understood that, in the presence ofi.i.d. observations, the optimal quantization rules involve onlycomparisons of the pertinent likelihood with a single threshold[20], [25]. Here we have consistently found that, for the optimal(unquantized) decision rule, it is sufficient to know , in-stead of the whole . On the other hand, we also show that, ingeneral, the log-likelihood of the data should be comparedto multiple thresholds, in order to make the optimal decision.Therefore the following questions may be posed. What is theimpact of our findings on the issue of optimally compressing

? How does the statistical dependence between andinfluence the quantization regions?

A second generalization consists in regarding as arandom channel connecting the two entities. In fact, one mightconsider as a strongly degraded version of the overall log-like-lihood , where the degradation can be caused by differentphysical reasons, such as noisy effects over a communicationmedium, privacy and security limitations (think of entity asan intruder which monitors the activity of entity ). In this re-spect, the interplay between the detection task and the commu-nication channel [28], as well as hypothesis testing with addi-

tional secrecy constraints (see, e.g., [29]), may certainly enrichthe problem.

APPENDIX

Proof of Proposition 1: The proof requires only some al-gebra. Let us prove the claim for , that for beingafforded similarly. It is expedient to work in terms of

which, in view of (18), (19) and (20), becomes

(31)

for , while , . Since, by continuity arguments, it will thus suf-

fice to show that the function is strictly increasing in the range. To this end, let us set , and focus

on the range . With a slight abuse of notation, we canwrite ,where we used , and the proportionality factor ispositive. In view of the definition of , we must show that theabove is strictly decreasing. The first derivative, , isproportional (with a positive factor) to

The term in brackets is equivalent to

which is negative since and . This completes theproof.

As to the properties in (21), triv-ially follows by observing that for . Asto the right limit of , we have

MARANO et al.: REFINING DECISIONS AFTER LOSING DATA 1989

Applying de l’Hôpital’s rule this becomes

and the last term reduces to , which isexactly by definitions just below (16). Therefore,

.The inequalities at the left extreme can be verified as

follows. The relationship is already established in(10). To show the strict inequality, we write the equation shownat the bottom of the page, where we have used .Consider now the argument of the logarithm, which is in theform , where are positive numbers.By exploiting the fact that provided that

, , and observing that with, we get , for all .

Consequently the argument of the logarithm is strictly greaterthan , namely the logarithm is strictly greater than

, and the desired result follows.Finally, for the left limit of we have

The argument of the logarithm is the ratio of the cumulative dis-tributions of under and , respectively, evaluatedin , which has been assumed strictly positive asstated just after (20). The limit remains therefore finite for any

, as assumed at the beginning of Section IV-A.Proof of Proposition 2: Suppose that . In view of

(17), the pdf of is

From Proposition 1, we know that is strictly increasing,such that the condition maps into , with

defined in (12). From (14) we, therefore, get. Now, assume first .

From (20) and using the above expression

With the change of variable , and using the binomialtheorem

where . The last integral can be computed inclosed form yielding the desired result ,where is defined in (22).

Let us now switch to the case . From (20), wehave , such that

where the first integral equals by definition. This completesthe proof of (23). The proof of (24) is straightforwardly obtainedonce that is replaced by . The proofs of (25) are obtainedby reasoning similarly, and are omitted for reasons of space.

Proof of Proposition 3: Let us introduce the function, where is the pdf of a standard

Gaussian random variable. The above function is clearly al-ways positive on the negative axis, while, from a well-knownproperty, , . Thus . Now,let us consider given in (28), and write for . Setting

, yields

and we now prove that is strictly decreasing. Taking thefirst derivative of , we get

Setting , the above can be rewritten,with a slight abuse of notation, as

. Proving that will be thus im-plied by proving that is strictly decreasing. To thisend, we compute the derivative: , where

, and we must show

1990 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 58, NO. 4, APRIL 2010

that this latter is negative. Since and, invoking continuity, it is enough to show

that is strictly increasing. We have ,where . Now if

is strictly decreasing, becauseand . Computing the derivative yields

which is indeed negative, since ispositive. This completes the proof of the strict monotonicity for

; that of is simply obtained by similar derivations.As to the limiting behavior of , it is obvious that

, while a simple application of del’Hôpital’s rule gives . By symmetry,the results for also follow.

ACKNOWLEDGMENT

The authors would like to acknowledge the anonymous re-viewers for their useful hints.

REFERENCES

[1] R. R. Tenney and N. R. Sandell, “Detection with distributed sensors,”IEEE Trans. Aerosp. Electron. Syst., vol. 17, pp. 501–510, Jul. 1981.

[2] L. K. Ekchian, “Optimal design of distributed detection networks,”Ph.D., Mass. Inst. Technol., Cambridge, 1982.

[3] L. K. Ekchian and R. R. Tenney, “Detection networks,” in Proc. 21thIEEE Conf. Decision and Control, Orlando, FL, 1982, pp. 686–691.

[4] J. D. Papastavrou, “Distributed detection with selective communica-tions,” Ph.D., Mass. Inst. Technol., Cambridge, 1986.

[5] R. Viswanathan, S. Thomopoulus, and R. Tumuluri, “Optimal serialdistributed decision fusion,” IEEE Trans. Aerosp. Electron. Syst., vol.24, pp. 366–376, Jul. 1988.

[6] J. Pothiawala, “Analysis of a two-sensor tandem distributed detectionnetwork,” Master, Mass. Inst. Technol., Cambridge, 1989.

[7] J. D. Papastavrou and M. Athans, “Distributed detection by a large teamof sensors in tandem,” IEEE Trans. Aerosp. Electron. Syst., vol. 28, pp.639–653, Jul. 1992.

[8] H. L. Van Trees, Detection, Estimation, and Modulation Theory. PartI. New York: Wiley , 2001.

[9] H. V. Poor, An Introduction to Signal Detection and Estimation. NewYork: Springer-Verlag, 1988.

[10] S. M. Kay, Fundamentals of Statistical Signal Processing, Volume II:Detection Theory. Englewood Cliffs, NJ: Prentice-Hall, 1998.

[11] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nded. Hoboken, NJ: Wiley-Intersci., 2006.

[12] S. Kullback, Information Theory and Statistics. New York: Dover,1968.

[13] R. Blahut, Principles and Practice of Information Theory. Reading,MA: Addison-Wesley, 1987.

[14] A. Papoulis, Probability, Random Variables, and Stochastic Processes,3rd ed. New York: McGraw-Hill, 1991.

[15] N. L. Johnson, S. Kotz, and N. Balakrishnan, Continuous UnivariateDistributions, Volume 1. New York: Wiley, 1994.

[16] M. Abramowitz and I. A. Stegun, Handbook of Mathematical Func-tions. New York: Dover, 1970.

[17] S. A. Kassam, Signal Detection in Non-Gaussian Noise. New York:Springer-Verlag, 1987.

[18] P. K. Varshney, Distributed Detection and Data Fusion. New York:Springer, 1997.

[19] L. Pescosolido, S. Barbarossa, and G. Scutari, “Decentralized detec-tion and localization through sensor networks designed as a popula-tion of self-synchronizing oscillators,” in Proc. IEEE Int. Conf. Acoust.,Speech Signal Process. (ICASSP), 2006, vol. 4.

[20] J. N. Tsitsiklis, “Decentralized detection,” in Advances in Signal Pro-cessing, H. V. Poor and J. B. Thomas, Eds. New York: JAI , 1993,pp. 297–344.

[21] J.-F. Chamberland and V. V. Veeravalli, “Decentralized detection insensor networks,” IEEE Trans. Signal Process., vol. 51, no. 2, pp.407–416, Feb. 2003.

[22] G. Mergen and L. Tong, “Type based estimation over multiaccess chan-nels,” IEEE Trans. Signal Process., vol. 54, no. 2, pp. 613–626, Feb.2006.

[23] A. Gersho and R. M. Gray, Vector Quantization and Signal Compres-sion. Norwell, MA: Kluwer Academic, 1992.

[24] S. A. Kassam, “Optimum quantization for signal detection,” IEEETrans. Commun., vol. 25, pp. 479–484, May 1977.

[25] D. Warren and P. Willett, “Optimum quantization for detection fusion:some proofs, examples, and pathology,” J. Franklin Inst., vol. 336, pp.323–359, 1999.

[26] P. Venkitasubramaniam, L. Tong, and A. Swami, “Score-functionquantization for distributed estimation,” in Proc. Conf. Inf. Sci. Syst.2006 (CISS’06), NJ, Mar. 2006.

[27] H. V. Poor, “Fine quantization in signal detection and estimation,”IEEE Trans. Inf. Theory, vol. IT-34, no. 5, pp. 960–972, Sep. 1988.

[28] B. Chen, L. Tong, and P. Varshney, “Channel-aware distributed detec-tion in wireless sensor networks,” IEEE Signal Process. Mag., vol. 23,no. 4, pp. 16–26, Jul. 2006.

[29] T. He and L. Tong, “Distributed detection of information flows withside information,” in Proc. Ann. Asilomar Conf. Signals, Syst., Comput.(ASILOMAR 2007),, Pacific Grove, CA, Nov. 2007.

Stefano Marano received the Laurea degree inelectronic engineering (cum laude) and the Ph.D.degree in electronic engineering and computerscience, both from the University of Naples, Italy, in1993 and 1997, respectively.

Currently, he is an Associate Professor with theUniversity of Salerno, Italy, where he formerlyserved as Assistant Professor. His areas of interestinclude statistical signal processing with emphasison distributed inference, sensor networks, andinformation theory. He published approximately 90

papers, including several invited, on leading international journals/transactionsand proceedings of leading international conferences; he has also given severalinvited talks in the area of statistical signal processing.

Prof. Marano was awarded the IEEE TRANSACTIONS ON ANTENNAS AND

PROPAGATION 1999 Best Paper Award (jointly with G. Franceschetti and F.Palmieri) for his work on stochastic modeling of electromagnetic propagationin urban areas. He was also a coauthor of the Best Student Paper Award (2ndplace) at the 12th Conference on Information Fusion in 2009. As a reviewer,he handled hundreds of papers, mainly for the IEEE TRANSACTIONS, and wasselected as Appreciated Reviewer by the IEEE TRANSACTIONS ON SIGNAL

PROCESSING, during 2007 and 2008. He has been on the Organizing Committeeof several international conferences in the field of signal processing and datafusion, and on the Technical Program Committee of the main internationalsymposia in those fields. He is currently serving as an Associate Editor forthe IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, and is aGuest Editor of an upcoming special issue of EURASIP Journal on Advancesin Signal Processing.

Vincenzo Matta received the Laurea degree inelectronic engineering and the Ph.D. degree ininformation engineering from University of Salerno,Fisciano, Italy, in 2001 and 2005, respectively.

He is currently an Assistant Professor with theUniversity of Salerno. His main research interestsinclude detection and estimation theory, signalprocessing, wireless communications, multiterminalinference, and sensor networks.

Fabio Mazzarella received the Laurea (B.Sc.) andthe Laurea Specialistica (M.Sc.) degrees in electronicengineering from the University of Salerno, Italy, in2004 and 2008, respectively.

He is currently a Ph.D. degree student at the De-partment of Information and Electrical Engineering,University of Salerno. His main research interestsfocus on statistical signal processing, detection andestimation, and information theory.


Recommended