+ All Categories
Home > Documents > Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95...

Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95...

Date post: 02-Oct-2020
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
43
Some Models of Information Aggregation and Consensus in Networks John N. Tsitsiklis MIT ACCESS, KTH January 2011
Transcript
Page 1: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Some Models of Information Aggregation

and Consensus in Networks

John N. TsitsiklisMIT

ACCESS, KTHJanuary 2011

Page 2: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Overview

• Information propagation/aggregation in networks

– Engineered versus social networks

– Bayesian versus naive updates

• Review a few key models and results

• Is this applied probability?

Theoretical AP Applied APrandom graphs ad. campaigns, etc.

Narrative AP Recreational AP“explain” social phenomena play with toy models

1

Page 3: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

5/30/09 4:14 PMAmazon.com: The Wisdom of Crowds: James Surowiecki: Books

Page 1 of 8http://www.amazon.com/Wisdom-Crowds-James-Surowiecki/dp/0385721706/ref=sr_1_1?ie=UTF8&s=books&qid=1243714434&sr=8-1

John's Amazon.com Today's Deals Gifts & Wish Lists Gift Cards

Sponsored Content

Quantity: 1

or

Sign in to turn on 1-Click ordering.

or

Amazon Prime Free Trial

required. Sign up when you

check out. Learn More

More Buying Choices

111 used & new from

$5.59

Have one to sell?

Share with Friends

Share your own customer images

Search inside another edition of this book

Start reading The Wisdomof Crowds on your Kindlein under a minute.

Don’t have a Kindle? Getyours here.

+

Hello, John Tsitsiklis. We have recommendations for you. (Not John?) Buy Any CD, Get $1 Off MP3s

Your Account | Help

Search Books

BooksAdvanced

SearchBrowseSubjects

Hot NewReleases

BestsellersThe New York

Times® Best SellersLibros EnEspañol

BargainBooks

Textbooks

The Wisdom of Crowds and over 285,000 other books are available for Amazon Kindle – Amazon’s new wireless reading device.

Learn more

The Wisdom of Crowds (Paperback)by James Surowiecki (Author) "If, years hence, people remember anything aboutthe TV game show Who Wants to Be a Millionaire?, they will probably remember

the contestants' panicked phone..." (more) Key Phrases: bowling bubble, bowling stocks, foam strike, United States,New York, Wall Street (more...)

(173 customer reviews)

List Price: $14.95

Price: $10.17 & eligible for FREE Super Saver Shipping

on orders over $25. Details

You Save: $4.78 (32%)

In Stock.Ships from and sold by Amazon.com. Gift-wrap available.

Want it delivered Tuesday, June 2? Order it in the next 50 hours and 30minutes, and choose One-Day Shipping at checkout. Details

59 new from $5.59 51 used from $5.92 1 collectible from $21.00

Best Value

Buy The Wisdom of Crowds and get Crowdsourcing: Why the Power of the Crowd Is Driving the Future of

Business at an additional 5% off Amazon.com's everyday low price.

Buy Together Today: $27.07

Leadership and Business Seminars in Boston (What's this?) | Change location

Business Degree Online eLearners.com/Business Boost Your Salary with a Business/ Administration B.S. Get Info Today!

Provider: Education Dynamics - elearners (Details) | No reviews yet. Be the first

Shop All Departments Cart Your Lists

Also Available in: ListPrice:

OurPrice:

Other Offers:

Kindle Edition (Kindle Book) $9.99

Hardcover (Bargain Price) $24.95 $16.475 used & new from$15.02

Hardcover $24.95 $16.4791 used & new from$3.39

Paperback29 used & new from$5.83

Audio Download(Audible.com)

$25.95 $13.63

Show more editions and formats

Get Free Two-Day ShippingGet Free Two-Day Shipping for three months with a special extended free trial of Amazon Prime. Add this eligible textbook to your cart to qualify. Sign up atcheckout. See details.

Customers Who Bought This Item Also Bought Page 1 of 20

Wikinomics: How Mass

Collaboration Changes

Everyt... by Don Tapscott

(103) $18.45

The Tipping Point: How

Little Things Can

Make... by Malcolm

Gladwell

(1,000) $8.54

Blink: The Power of

Thinking Without

Thinking by Malcolm

Gladwell

(1,035) $9.11

Made to Stick: Why

Some Ideas Survive

and Others Die by Chip

Heath

(274) $16.50

Back Next

The Wisdom of Crowds

(Paperback)

by James Surowiecki

(173) $10.17

111 used & new from $5.59

Book sections

Front Cover

Front Flap

Copyright

Table of Contents

First Pages

Back Flap

Back Cover

Surprise Me!

Sample searches in this book:

bowling bubble

bowling stocks

foam strike

Search Inside This Book

Feedback | Help | Expanded View | Close

Myopia and Herding

The Wisdom or Madness of Crowds

The Wisdom of Crowds

Wisdom versus Herding

of Rational but Selfish Agents

Consensus and Averaging

12

Page 4: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Social sciences

• Merging of “expert” opinions

• Evolution of public opinion

• Reputation systems

• Modeling of jurors

• Language evolution

• Charles Mackay (London, 1841):Extraordinary Popular Delusionsand the Madness of Crowds

– “Men, it has been well said,think in herds; it will be seenthat they go mad in herds,. . .”

5

Myopia and Herding

Wisdom or Madness of Crowds?

The Wisdom of Crowds

Wisdom versus Herding

of Rational but Selfish Agents

Consensus and Averaging

12

Page 5: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Sensor and other Engineered Networks

• Fusion of available information

– we get to design the nodes’ behavior

12

Page 6: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

The Basic Setup

• Each node i endowed with private information Xi

– Nodes form opinions, make decisions

– Nodes observe opinions/decisions or receive messages

– Nodes update opinions/decisions

– Everyone wants a “good”decision; common objective

• Questions

– Convergence? To what? How fast? Quality of limit?

– Does the underlying graph matter? (Tree, acyclic, random,. . .)

• Variations

– Hypothesis testing (binary decisions): P(H0 | Xi)

– Parameter estimation: E[Y | Xi]

– Optimization: minu E[c(u, Y ) | Xi]

13

Page 7: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

The Basic Setup

• Each node i endowed with private information Xi

– Nodes form opinions, make decisions

– Nodes observe opinions/decisions or receive messages

– Nodes update opinions/decisions

– Everyone wants a “good”decision; common objective

• Questions

– Convergence? To what? How fast? Quality of limit?

– Does the underlying graph matter? (Tree, acyclic, random,. . .)

• Variations

– Hypothesis testing (binary decisions): P(H0 | Xi)

– Parameter estimation: E[Y | Xi]

– Optimization: minu E[c(u, Y ) | Xi]

13

• Social science: postulate update mechanism

• Engineering: design/optimize update mechanism

14

Page 8: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

The Basic Setup

• Each node i endowed with private information Xi

– Nodes form opinions, make decisions

– Nodes observe opinions/decisions or receive messages

– Nodes update opinions/decisions

– Everyone wants a “good”decision; common objective

• Questions

– Convergence? To what? How fast? Quality of limit?

– Does the underlying graph matter? (Tree, acyclic, random,. . .)

• Variations

– Hypothesis testing (binary decisions): P(H0 | Xi)

– Parameter estimation: E[Y | Xi]

– Optimization: minu E[c(u, Y ) | Xi]

13

The Basic Setup

• Each node i endowed with private information Xi

– Nodes form opinions, make decisions

– Nodes observe opinions/decisions or receive messages

– Nodes update opinions/decisions

– Everyone wants a “good”decision; common objective

• Questions

– Convergence? To what? How fast? Quality of limit?

– Does the underlying graph matter? (Tree, acyclic, random,. . .)

• Variations

– Hypothesis testing (binary decisions): P(H0 | Xi)

– Parameter estimation: E[Y | Xi]

– Optimization: minu E[c(u, Y ) | Xi]

13

• Social science: postulate update mechanism

• Engineering: design/optimize update mechanism

14

Page 9: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Myopia and Herding

Wisdom or Madness of Crowds?

The Wisdom of Crowds

Wisdom versus Herding

of Rational but Selfish Agents

Consensus and Averaging

Bayesian Models15

Page 10: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Choosing the Best Tavern

• Private info points to the better one,with some probability of error

• Information is there, but is not used

13

Where do we eat tonight?

• Private info points to the better one,with some probability of error

• Information is there

• Information is there, but is not used

13

Where do we eat tonight?

(Bickchandani, Hisrchleifer, Welch, 92; Banerjee, 92)

• Private info points to the better one,with some probability of error

• Information is there

• Information is there, but is not used

13

Page 11: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Choosing the Best Tavern

• Private info points to the better one,with some probability of error

• Information is there, but is not used

13

Where do we eat tonight?

• Private info points to the better one,with some probability of error

• Information is there

• Information is there, but is not used

13

Where do we eat tonight?

(Bickchandani, Hisrchleifer, Welch, 92; Banerjee, 92)

• Private info points to the better one,with some probability of error

• Information is there

• Information is there, but is not used

13

A

A BA B

A AA

Where do we eat tonight?

(Bickchandani, Hisrchleifer, Welch, 92; Banerjee, 92)

• Private info points to the better one,with some probability of error

• Prior slightlyin favor of A

• Information is there

• Information is there, but is not used

copy

16

Page 12: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Choosing the Best Tavern

• Private info points to the better one,with some probability of error

• Information is there, but is not used

13

Where do we eat tonight?

• Private info points to the better one,with some probability of error

• Information is there

• Information is there, but is not used

13

Where do we eat tonight?

(Bickchandani, Hisrchleifer, Welch, 92; Banerjee, 92)

• Private info points to the better one,with some probability of error

• Information is there

• Information is there, but is not used

13

Where do we eat tonight?

• Private info points to the better one,with some probability of error

• Information is there

• Information is there, but is not used

13

A

A BA B

A AA

Where do we eat tonight?

(Bickchandani, Hisrchleifer, Welch, 92; Banerjee, 92)

• Private info points to the better one,with some probability of error

• Prior slightlyin favor of A

• Information is there

• Information is there, but is not used

copy

16

Page 13: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Choosing the Best Tavern

• Private info points to the better one,with some probability of error

• Information is there, but is not used

13

Where do we eat tonight?

• Private info points to the better one,with some probability of error

• Information is there

• Information is there, but is not used

13

Where do we eat tonight?

(Bickchandani, Hisrchleifer, Welch, 92; Banerjee, 92)

• Private info points to the better one,with some probability of error

• Information is there

• Information is there, but is not used

13

Where do we eat tonight?

• Private info points to the better one,with some probability of error

• Information is there

• Information is there, but is not used

13

A

A BA B

A AA

Where do we eat tonight?

(Bickchandani, Hisrchleifer, Welch, 92; Banerjee, 92)

• Private info points to the better one,with some probability of error

• Prior slightlyin favor of A

• Information is there

• Information is there, but is not used

copy

16

Where do we eat tonight?

(Bickchandani, Hisrchleifer, Welch, 92; Banerjee, 92)

• Private info points to the better one,with some probability of error

• Prior slightlyin favor of A

• Information is there

• Information is there, but is not used

copy

16

Page 14: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Tandem networks

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 1)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 1)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

Tandem networks

Let !j > 0 be the prior probability of hypothesis Hj and Pt(n) = !0P0(Yn,n = 1)+!1P1(Yn,n =

0) be the probability of error at sensor n. The goal of a system designer is to design a strategy

so that the probability of error Pt(n) is minimized. Let P !t (n) = inf Pt(n), where the infimum is

taken over all possible strategies.

1 2 n

X1 X2 Xn

Figure 1: A tandem network.

The problem of finding optimal strategies has been studied in [1–3], while the asymptotic

performance of a long tandem network (i.e., n ! ") is considered in [2, 4–8] (some of these

works do not restrict the message sent by each sensor to be binary). In the case of binary

communications, [4, 8] find necessary and su!cient conditions under which the error probability

goes to zero in the limit of large n. To be specific, the error probability stays bounded away

from zero i" there exists a B < " such that | log dP1

dP0| # B. In the case when the log-likelihood

ratio is unbounded, numerical examples have indicated that the error probability goes to zero

much slower than an exponential rate. In a parallel configuration, in which all sensors send their

messages directly to a single fusion center (i.e., "i is now a function of only Xi), the rate of decay

of the error probability is known to be exponential [9]. This suggests that a tandem configuration

performs worse than a parallel configuration, when n is large. It has been conjectured in [2,8,10,11]

that indeed, the rate of decay of the error probability is sub-exponential. However, no rigorous

proof is known for this result. The goal of this paper is to prove this conjecture.

We first note that there is a caveat to the sub-exponential decay conjecture: the probability

measures P0 and P1 are assumed to be equivalent measures, i.e., they are absolutely continuous

w.r.t. each other. Indeed, if there exists a measurable set A such that P0(A) > 0 and P1(A) = 0,

then an exponential decay rate can be achieved as follows: each sensor always declares 1 till the

2

• Estimation with limited memory(Cover, 1969; Hellman & Cover, 1970; Koplowitz, 1975)

• en = optimal P(Yn is incorrect)

• en ! 0? How fast?

10

Tandem networks

Let !j > 0 be the prior probability of hypothesis Hj and Pt(n) = !0P0(Yn,n = 1)+!1P1(Yn,n =

0) be the probability of error at sensor n. The goal of a system designer is to design a strategy

so that the probability of error Pt(n) is minimized. Let P !t (n) = inf Pt(n), where the infimum is

taken over all possible strategies.

1 2 n

X1 X2 Xn

Figure 1: A tandem network.

The problem of finding optimal strategies has been studied in [1–3], while the asymptotic

performance of a long tandem network (i.e., n ! ") is considered in [2, 4–8] (some of these

works do not restrict the message sent by each sensor to be binary). In the case of binary

communications, [4, 8] find necessary and su!cient conditions under which the error probability

goes to zero in the limit of large n. To be specific, the error probability stays bounded away

from zero i" there exists a B < " such that | log dP1

dP0| # B. In the case when the log-likelihood

ratio is unbounded, numerical examples have indicated that the error probability goes to zero

much slower than an exponential rate. In a parallel configuration, in which all sensors send their

messages directly to a single fusion center (i.e., "i is now a function of only Xi), the rate of decay

of the error probability is known to be exponential [9]. This suggests that a tandem configuration

performs worse than a parallel configuration, when n is large. It has been conjectured in [2,8,10,11]

that indeed, the rate of decay of the error probability is sub-exponential. However, no rigorous

proof is known for this result. The goal of this paper is to prove this conjecture.

We first note that there is a caveat to the sub-exponential decay conjecture: the probability

measures P0 and P1 are assumed to be equivalent measures, i.e., they are absolutely continuous

w.r.t. each other. Indeed, if there exists a measurable set A such that P0(A) > 0 and P1(A) = 0,

then an exponential decay rate can be achieved as follows: each sensor always declares 1 till the

2

• Estimation with limited memory(Cover, 1969; Hellman & Cover, 1970; Koplowitz, 1975)

• en = optimal P(Yn is incorrect)

• en ! 0? How fast?

10

Y1 Y2 Yn!1

• Xi are i.i.d.

• H0 : Xi " P0, H1 : Xi " P1

• ! : R #$ {0,1}

• minimize P(error) over {!i}

H0 H1

Yn = 0 Yn = 1

p0(0$ 1) p1(0$ 1)

p1(1$ 0) p0(1$ 0)

0 1

0 1

Y1 Y2 Yn!1

• Xi are i.i.d.

• H0 : Xi " P0, H1 : Xi " P1

• ! : R #$ {0,1}

• minimize P(error) over {!i}

H0 H1

Yn = 0 Yn = 1

p0(0$ 1) p1(0$ 1)

p1(1$ 0) p0(1$ 0)

0 1

0 1

Y1 Y2 Yn!1

• Xi are i.i.d.

• H0 : Xi " P0, H1 : Xi " P1

• ! : R #$ {0,1}

• minimize P(error) over {!i}

H0 H1

Yn = 0 Yn = 1

p0(0$ 1) p1(0$ 1)

p1(1$ 0) p0(1$ 0)

0 1

0 1

• en = min P(Yn is incorrect)

• Private info: likelihood ratio Li =P1(Xi)

P0(Xi)

• Two regimes:

• 0 < a ! Li < b <"Engineered en #$ 0 =% Myopic en #$ 0

• Li ranges over (0,")Myopic en $ 0 =% Engineered en #$ 0 en $ 0(but slowly)

Tandem networks

• Xi are i.i.d.; H0 : Xi ! P0, H1 : Xi ! P1

• binary message/decision functions !i: Yi = !i(Xi, Yi"1)

• “Social network” view: each !i is myopic–Bayes optimal

• “Engineered system view”: {!i} designed for best end-decision

– simple sensor network

– estimation with limited memory [Cover, 1969]

Tandem networks

• Xi are i.i.d.; H0 : Xi ! P0, H1 : Xi ! P1

• binary message/decision functions !i: Yi = !i(Xi, Yi"1)

• “Social network” view: each !i is myopic–Bayes optimal

• “Engineered system view”: {!i} designed for best end-decision

– simple sensor network

– estimation with limited memory [Cover, 1969]

Page 15: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Tandem networks

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 1)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 1)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

Tandem networks

Let !j > 0 be the prior probability of hypothesis Hj and Pt(n) = !0P0(Yn,n = 1)+!1P1(Yn,n =

0) be the probability of error at sensor n. The goal of a system designer is to design a strategy

so that the probability of error Pt(n) is minimized. Let P !t (n) = inf Pt(n), where the infimum is

taken over all possible strategies.

1 2 n

X1 X2 Xn

Figure 1: A tandem network.

The problem of finding optimal strategies has been studied in [1–3], while the asymptotic

performance of a long tandem network (i.e., n ! ") is considered in [2, 4–8] (some of these

works do not restrict the message sent by each sensor to be binary). In the case of binary

communications, [4, 8] find necessary and su!cient conditions under which the error probability

goes to zero in the limit of large n. To be specific, the error probability stays bounded away

from zero i" there exists a B < " such that | log dP1

dP0| # B. In the case when the log-likelihood

ratio is unbounded, numerical examples have indicated that the error probability goes to zero

much slower than an exponential rate. In a parallel configuration, in which all sensors send their

messages directly to a single fusion center (i.e., "i is now a function of only Xi), the rate of decay

of the error probability is known to be exponential [9]. This suggests that a tandem configuration

performs worse than a parallel configuration, when n is large. It has been conjectured in [2,8,10,11]

that indeed, the rate of decay of the error probability is sub-exponential. However, no rigorous

proof is known for this result. The goal of this paper is to prove this conjecture.

We first note that there is a caveat to the sub-exponential decay conjecture: the probability

measures P0 and P1 are assumed to be equivalent measures, i.e., they are absolutely continuous

w.r.t. each other. Indeed, if there exists a measurable set A such that P0(A) > 0 and P1(A) = 0,

then an exponential decay rate can be achieved as follows: each sensor always declares 1 till the

2

• Estimation with limited memory(Cover, 1969; Hellman & Cover, 1970; Koplowitz, 1975)

• en = optimal P(Yn is incorrect)

• en ! 0? How fast?

10

Tandem networks

Let !j > 0 be the prior probability of hypothesis Hj and Pt(n) = !0P0(Yn,n = 1)+!1P1(Yn,n =

0) be the probability of error at sensor n. The goal of a system designer is to design a strategy

so that the probability of error Pt(n) is minimized. Let P !t (n) = inf Pt(n), where the infimum is

taken over all possible strategies.

1 2 n

X1 X2 Xn

Figure 1: A tandem network.

The problem of finding optimal strategies has been studied in [1–3], while the asymptotic

performance of a long tandem network (i.e., n ! ") is considered in [2, 4–8] (some of these

works do not restrict the message sent by each sensor to be binary). In the case of binary

communications, [4, 8] find necessary and su!cient conditions under which the error probability

goes to zero in the limit of large n. To be specific, the error probability stays bounded away

from zero i" there exists a B < " such that | log dP1

dP0| # B. In the case when the log-likelihood

ratio is unbounded, numerical examples have indicated that the error probability goes to zero

much slower than an exponential rate. In a parallel configuration, in which all sensors send their

messages directly to a single fusion center (i.e., "i is now a function of only Xi), the rate of decay

of the error probability is known to be exponential [9]. This suggests that a tandem configuration

performs worse than a parallel configuration, when n is large. It has been conjectured in [2,8,10,11]

that indeed, the rate of decay of the error probability is sub-exponential. However, no rigorous

proof is known for this result. The goal of this paper is to prove this conjecture.

We first note that there is a caveat to the sub-exponential decay conjecture: the probability

measures P0 and P1 are assumed to be equivalent measures, i.e., they are absolutely continuous

w.r.t. each other. Indeed, if there exists a measurable set A such that P0(A) > 0 and P1(A) = 0,

then an exponential decay rate can be achieved as follows: each sensor always declares 1 till the

2

• Estimation with limited memory(Cover, 1969; Hellman & Cover, 1970; Koplowitz, 1975)

• en = optimal P(Yn is incorrect)

• en ! 0? How fast?

10

Y1 Y2 Yn!1

• Xi are i.i.d.

• H0 : Xi " P0, H1 : Xi " P1

• ! : R #$ {0,1}

• minimize P(error) over {!i}

H0 H1

Yn = 0 Yn = 1

p0(0$ 1) p1(0$ 1)

p1(1$ 0) p0(1$ 0)

0 1

0 1

Y1 Y2 Yn!1

• Xi are i.i.d.

• H0 : Xi " P0, H1 : Xi " P1

• ! : R #$ {0,1}

• minimize P(error) over {!i}

H0 H1

Yn = 0 Yn = 1

p0(0$ 1) p1(0$ 1)

p1(1$ 0) p0(1$ 0)

0 1

0 1

Y1 Y2 Yn!1

• Xi are i.i.d.

• H0 : Xi " P0, H1 : Xi " P1

• ! : R #$ {0,1}

• minimize P(error) over {!i}

H0 H1

Yn = 0 Yn = 1

p0(0$ 1) p1(0$ 1)

p1(1$ 0) p0(1$ 0)

0 1

0 1

• en = min P(Yn is incorrect)

• Private info: likelihood ratio Li =P1(Xi)

P0(Xi)

• Two regimes:

• 0 < a ! Li < b <"Engineered en #$ 0 =% Myopic en #$ 0

• Li ranges over (0,")Myopic en $ 0 =% Engineered en #$ 0 en $ 0(but slowly)

Tandem networks

• Xi are i.i.d.; H0 : Xi ! P0, H1 : Xi ! P1

• binary message/decision functions !i: Yi = !i(Xi, Yi"1)

• “Social network” view: each !i is myopic–Bayes optimal

• “Engineered system view”: {!i} designed for best end-decision

– simple sensor network

– estimation with limited memory [Cover, 1969]

Tandem networks

• Xi are i.i.d.; H0 : Xi ! P0, H1 : Xi ! P1

• binary message/decision functions !i: Yi = !i(Xi, Yi"1)

• “Social network” view: each !i is myopic–Bayes optimal

• “Engineered system view”: {!i} designed for best end-decision

– simple sensor network

– estimation with limited memory [Cover, 1969]

Tandem Network Results

• en = P(Yn is incorrect) ! 0 ?

19

Tandem networks

• Xi are i.i.d.; H0 : Xi ! P0, H1 : Xi ! P1

• binary message/decision functions !i: Yi = !i(Xi, Yi"1)

• “Social network” view: each !i is myopic–Bayes optimal

• “Engineered system view”: {!i} designed for best end-decision

– simple sensor network [Cover, 1969]

Page 16: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

L H1 H0 copy

0! 1

1! 0

P1(Xi) P0(Xi)x

Bounded likelihood ratios (B-LR): 0 < a < Li < b <"

Unbounded likelihood ratios (U-LR): 0 < Li <"

The Two Regimes

19

L H1 H0 copy 0! 1 1! 0 P1(Xi) P0(Xi)x

Bounded likelihood ratios (B-LR): 0 < a < Li < b <"

never get compelling evidence

compelling evidence has positive probability

Unbounded likelihood ratios (U-LR): 0 < Li <"

The Two Regimes

19

L H1 H0 copy 0! 1 1! 0 P1(Xi) P0(Xi)x

Bounded likelihood ratios (B-LR): 0 < a <d P1

d P0(Xi) < b <"

never get compelling evidence

compelling evidence has positive probability

Unbounded likelihood ratios (U-LR): 0 <d P1

d P0(Xi) <"

20

L H1 H0 copy 0! 1 1! 0 P1(Xi) P0(Xi)x

Bounded likelihood ratios (B-LR): 0 < a <d P1

d P0(Xi) < b <"

never get compelling evidence

compelling evidence has positive probability

Unbounded likelihood ratios (U-LR): 0 <d P1

d P0(Xi) <"

20

L H1 H0 copy 0! 1 1! 0 P1(Xi) P0(Xi)x

Bounded likelihood ratios (B-LR): 0 < a <d P1

d P0(Xi) < b <"

never get compelling evidence

arbitrarily compelling evidence is possible

Unbounded likelihood ratios (U-LR): 0 <d P1

d P0(Xi) <"

20

Page 17: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Tandem networks

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 1)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 1)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

H0 H1

Yn = 0 Yn = 1

p0(0! 1) p1(0! 1)

p1(1! 0) p0(1! 0)

0 1

0 1

Tandem networks

Let !j > 0 be the prior probability of hypothesis Hj and Pt(n) = !0P0(Yn,n = 1)+!1P1(Yn,n =

0) be the probability of error at sensor n. The goal of a system designer is to design a strategy

so that the probability of error Pt(n) is minimized. Let P !t (n) = inf Pt(n), where the infimum is

taken over all possible strategies.

1 2 n

X1 X2 Xn

Figure 1: A tandem network.

The problem of finding optimal strategies has been studied in [1–3], while the asymptotic

performance of a long tandem network (i.e., n ! ") is considered in [2, 4–8] (some of these

works do not restrict the message sent by each sensor to be binary). In the case of binary

communications, [4, 8] find necessary and su!cient conditions under which the error probability

goes to zero in the limit of large n. To be specific, the error probability stays bounded away

from zero i" there exists a B < " such that | log dP1

dP0| # B. In the case when the log-likelihood

ratio is unbounded, numerical examples have indicated that the error probability goes to zero

much slower than an exponential rate. In a parallel configuration, in which all sensors send their

messages directly to a single fusion center (i.e., "i is now a function of only Xi), the rate of decay

of the error probability is known to be exponential [9]. This suggests that a tandem configuration

performs worse than a parallel configuration, when n is large. It has been conjectured in [2,8,10,11]

that indeed, the rate of decay of the error probability is sub-exponential. However, no rigorous

proof is known for this result. The goal of this paper is to prove this conjecture.

We first note that there is a caveat to the sub-exponential decay conjecture: the probability

measures P0 and P1 are assumed to be equivalent measures, i.e., they are absolutely continuous

w.r.t. each other. Indeed, if there exists a measurable set A such that P0(A) > 0 and P1(A) = 0,

then an exponential decay rate can be achieved as follows: each sensor always declares 1 till the

2

• Estimation with limited memory(Cover, 1969; Hellman & Cover, 1970; Koplowitz, 1975)

• en = optimal P(Yn is incorrect)

• en ! 0? How fast?

10

Tandem networks

Let !j > 0 be the prior probability of hypothesis Hj and Pt(n) = !0P0(Yn,n = 1)+!1P1(Yn,n =

0) be the probability of error at sensor n. The goal of a system designer is to design a strategy

so that the probability of error Pt(n) is minimized. Let P !t (n) = inf Pt(n), where the infimum is

taken over all possible strategies.

1 2 n

X1 X2 Xn

Figure 1: A tandem network.

The problem of finding optimal strategies has been studied in [1–3], while the asymptotic

performance of a long tandem network (i.e., n ! ") is considered in [2, 4–8] (some of these

works do not restrict the message sent by each sensor to be binary). In the case of binary

communications, [4, 8] find necessary and su!cient conditions under which the error probability

goes to zero in the limit of large n. To be specific, the error probability stays bounded away

from zero i" there exists a B < " such that | log dP1

dP0| # B. In the case when the log-likelihood

ratio is unbounded, numerical examples have indicated that the error probability goes to zero

much slower than an exponential rate. In a parallel configuration, in which all sensors send their

messages directly to a single fusion center (i.e., "i is now a function of only Xi), the rate of decay

of the error probability is known to be exponential [9]. This suggests that a tandem configuration

performs worse than a parallel configuration, when n is large. It has been conjectured in [2,8,10,11]

that indeed, the rate of decay of the error probability is sub-exponential. However, no rigorous

proof is known for this result. The goal of this paper is to prove this conjecture.

We first note that there is a caveat to the sub-exponential decay conjecture: the probability

measures P0 and P1 are assumed to be equivalent measures, i.e., they are absolutely continuous

w.r.t. each other. Indeed, if there exists a measurable set A such that P0(A) > 0 and P1(A) = 0,

then an exponential decay rate can be achieved as follows: each sensor always declares 1 till the

2

• Estimation with limited memory(Cover, 1969; Hellman & Cover, 1970; Koplowitz, 1975)

• en = optimal P(Yn is incorrect)

• en ! 0? How fast?

10

Y1 Y2 Yn!1

• Xi are i.i.d.

• H0 : Xi " P0, H1 : Xi " P1

• ! : R #$ {0,1}

• minimize P(error) over {!i}

H0 H1

Yn = 0 Yn = 1

p0(0$ 1) p1(0$ 1)

p1(1$ 0) p0(1$ 0)

0 1

0 1

Y1 Y2 Yn!1

• Xi are i.i.d.

• H0 : Xi " P0, H1 : Xi " P1

• ! : R #$ {0,1}

• minimize P(error) over {!i}

H0 H1

Yn = 0 Yn = 1

p0(0$ 1) p1(0$ 1)

p1(1$ 0) p0(1$ 0)

0 1

0 1

Y1 Y2 Yn!1

• Xi are i.i.d.

• H0 : Xi " P0, H1 : Xi " P1

• ! : R #$ {0,1}

• minimize P(error) over {!i}

H0 H1

Yn = 0 Yn = 1

p0(0$ 1) p1(0$ 1)

p1(1$ 0) p0(1$ 0)

0 1

0 1

• en = min P(Yn is incorrect)

• Private info: likelihood ratio Li =P1(Xi)

P0(Xi)

• Two regimes:

• 0 < a ! Li < b <"Engineered en #$ 0 =% Myopic en #$ 0

• Li ranges over (0,")Myopic en $ 0 =% Engineered en #$ 0 en $ 0(but slowly)

Tandem Network Results

• en = P(Yn is incorrect)

• Private info: likelihood ratio Li =P1(Xi)

P0(Xi)

Two regimes:

• range of Li: [a, b] 0 < a < b <!Engineered en "# 0 =$ Myopic en "# 0(Cover, 1969)

• range of Li: (0,!)Myopic en # 0 =$ Engineered en # 0(Papastavrou & Athans, 1992) (Cover, 1969)

(but slowly) (Tay, JNT & Win, 2008)

The Two Regimes

Social Engineered

U-LR: en ! 0 =" en ! 0(Papastavrou (Cover, 69)

& Athans, 92) but slowly(Tay, Win & JNT, 08)

B-LR: en #! 0 $= en #! 0(Koplowitz, 75)

B-LR, en #! 0 en ! 0ternary (Dia & JNT, 09) (Koplowitz, 75)

messages

21

Page 18: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Engineered — Architectural Comparisons

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear unclear unclear

less informationbut “same performance”(Tay, Win & JNT, 08)

more informationbut “same performance”(Zoumpoulis & JNT, 09)

but same exponent(Neyman-Pearson version)

24

Page 19: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear

en ! exp{"!n}

! is known

(JNT, 88)

23

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear

en ! exp{"!n}

! is known

(JNT, 88)

23

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear

en ! exp{"!n}

! is known

(JNT, 88)

23

Engineered — Architectural Comparisons

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear unclear unclear

less informationbut “same performance”(Tay, Win & JNT, 08)

more informationbut “same performance”(Zoumpoulis & JNT, 09)

but same exponent(Neyman-Pearson version)

24

Page 20: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear

en ! exp{"!n}

! is known

(JNT, 88)

23

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear

en ! exp{"!n}

! is known

(JNT, 88)

23

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear

en ! exp{"!n}

! is known

(JNT, 88)

23

Another example

vmv2v1

f

2 2 2

• Leaves make Bernoulli observations Xi

• Easy to verify that !!tree,NP < !!star,NP.

• Maybe low branching factor near the leaves is the culprit

15

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear unclear unclear

information loss

but same exponent(Neyman-Pearson version)

worse error exponent

en ! exp{"!n}

! is known

(JNT, 88)

23

Engineered — Architectural Comparisons

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear unclear unclear

less informationbut “same performance”(Tay, Win & JNT, 08)

more informationbut “same performance”(Zoumpoulis & JNT, 09)

but same exponent(Neyman-Pearson version)

24

Page 21: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear

en ! exp{"!n}

! is known

(JNT, 88)

23

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear

en ! exp{"!n}

! is known

(JNT, 88)

23

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear

en ! exp{"!n}

! is known

(JNT, 88)

23

Another example

vmv2v1

f

2 2 2

• Leaves make Bernoulli observations Xi

• Easy to verify that !!tree,NP < !!star,NP.

• Maybe low branching factor near the leaves is the culprit

15

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear unclear unclear

information loss

but same exponent(Neyman-Pearson version)

worse error exponent

en ! exp{"!n}

! is known

(JNT, 88)

23

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear unclear unclear

information loss

but same exponent(Neyman-Pearson version)

worse error exponent

en ! exp{"!n}

! is known

(JNT, 88)

23

A Simple Tree

f

v1 v2

m m

• Each subtree: optimal NP testfor: P0(false alarm) = !/2

• P1(missed detection) ! e"m"#star,NP

• Fusion rule:declare H1 i! [Yv1 = 1 or Yv2 = 1]

13

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear unclear unclear

less informationbut “same performance”(Tay, Win & JNT, 08)

more informationbut “same performance”(Zoumpoulis & JNT, 09)

but same exponent(Neyman-Pearson version)

23

Engineered — Architectural Comparisons

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear unclear unclear

less informationbut “same performance”(Tay, Win & JNT, 08)

more informationbut “same performance”(Zoumpoulis & JNT, 09)

but same exponent(Neyman-Pearson version)

24

Page 22: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear

en ! exp{"!n}

! is known

(JNT, 88)

23

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear

en ! exp{"!n}

! is known

(JNT, 88)

23

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear

en ! exp{"!n}

! is known

(JNT, 88)

23

Another example

vmv2v1

f

2 2 2

• Leaves make Bernoulli observations Xi

• Easy to verify that !!tree,NP < !!star,NP.

• Maybe low branching factor near the leaves is the culprit

15

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear unclear unclear

information loss

but same exponent(Neyman-Pearson version)

worse error exponent

en ! exp{"!n}

! is known

(JNT, 88)

23

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear unclear unclear

information loss

but same exponent(Neyman-Pearson version)

worse error exponent

en ! exp{"!n}

! is known

(JNT, 88)

23

A Simple Tree

f

v1 v2

m m

• Each subtree: optimal NP testfor: P0(false alarm) = !/2

• P1(missed detection) ! e"m"#star,NP

• Fusion rule:declare H1 i! [Yv1 = 1 or Yv2 = 1]

13

General Acyclic Networks — Engineered

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear unclear unclear

less informationbut “same performance”(Tay, Win & JNT, 08)

more informationbut “same performance”(Zoumpoulis & JNT, 09)

but same exponent(Neyman-Pearson version)

23

Engineered — Architectural Comparisons

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear unclear unclear

less informationbut “same performance”(Tay, Win & JNT, 08)

more informationbut “same performance”(Zoumpoulis & JNT, 09)

but same exponent(Neyman-Pearson version)

24

A Simple Tree

f

v1 v2

m m

• Each subtree: optimal NP testfor: P0(false alarm) = !/2

• P1(missed detection) ! e"m"#star,NP

• Fusion rule:declare H1 i! [Yv1 = 1 or Yv2 = 1]

13

Engineered — Architectural Comparisons

(Tay, Win & JNT, 08)

(Zoumpoulis & JNT, 09)

unclear unclear unclear

less informationbut “same performance”(Tay, Win & JNT, 08)

more informationbut “same performance”(Kreidl, Zoumpoulis & JNT, 10)

but same exponent(Neyman-Pearson version)

24

Page 23: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

A Perspective on Bayesian Optimality

• Engineering

– Optimal rules are hard to find

– Can design “good” schemes (asymptotic, etc.)

• Social Networks

– Is Bayesian updating plausible?

• More realistic settings?

– Correlated private information

– Multiple hypotheses

– Complex topologies or interaction sequences

26

Page 24: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

“Naive” Information Aggregation(Consensus and Averaging)

“Naive” Information Aggregation

(Consensus and Averaging)

26

Page 25: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

The Setting

• n agents

– starting values xi(0)

• reach consensus on some x!, with either:

– mini xi(0) " x! " maxi xi(0) (consensus)

– x! =x1(0) + · · · + xn(0)

n(averaging)

– averaging when xi # {$1,+1} (voting)

• interested in:

– genuinely distributed algorithm

– no synchronization

– no “infrastructure” such as spanning trees

• simple updates, such as: xi :=xi + xj

229

Page 26: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

The Setting

• n agents

– starting values xi(0)

• reach consensus on some x!, with either:

– mini xi(0) " x! " maxi xi(0) (consensus)

– x! =x1(0) + · · · + xn(0)

n(averaging)

– averaging when xi # {$1,+1} (voting)

• interested in:

– genuinely distributed algorithm

– no synchronization

– no “infrastructure” such as spanning trees

• simple updates, such as: xi :=xi + xj

229

The Setting

• n agents

– starting values xi(0)

• reach consensus on some x!, with either:

– mini xi(0) " x! " maxi xi(0) (consensus)

– x! =x1(0) + · · · + xn(0)

n(averaging)

– averaging when xi # {$1,+1} (voting)

• interested in:

– genuinely distributed algorithm

– no synchronization

– no “infrastructure” such as spanning trees

• simple updates, such as: xi :=xi + xj

229

Page 27: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Social sciences

• Merging of “expert” opinions

• Evolution of public opinion

• Evolution of reputation

• Modeling of jurors

• Language evolution

• Preference for “simple” models

– behavior described by “rules of thumb”

– less complex than Bayesian updating

• interested in modeling, analysis (descriptive theory)

– . . . and narratives

Social sciences

• Merging of “expert” opinions

• Evolution of public opinion

• Evolution of reputation

• Modeling of jurors

• Language evolution

• Preference for “simple” models

– behavior described by “rules of thumb”

– less complex than Bayesian updating

• interested in modeling, analysis (descriptive theory)

– . . . and narratives

Page 28: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Engineering• Distributed computation and sensor networks

– Fusion of individual estimates– Distributed Kalman filtering– Distributed optimization– Distributed reinforcement learning

• Networking– Load balancing and resource allocation– Clock synchronization– Reputation management in ad hoc networks– Network monitoring

• Multiagent coordination and control– Coverage control– Monitoring– Creating virtual coordinates for geographic routing– Decentralized task assignment– Flocking

30

Page 29: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Distributed optimization

minx

n!

i=1fi(x)

• centralized gradient iteration: x := x! !n!

i=1"fi(x)

• distributed gradient algorithm: xi := xi ! !"fi(xi)

• reconciling distributed updates: xi :=xi + xj

2

converges (when stepsize ! is small enough)under minimal assumptions

– time-varying interconnection topology

minx

f(x)

• centralized x := x! !"f(x) + noise

• distributed xi := x! !"f(x) + noisei

• reconcile xi :=xi + 2xj

3

• consensus algorithm su!ces

averaging algorithm needed

Page 30: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Distributed optimization

minx

n!

i=1fi(x)

• centralized gradient iteration: x := x! !n!

i=1"fi(x)

• distributed gradient algorithm: xi := xi ! !"fi(xi)

• reconciling distributed updates: xi :=xi + xj

2

converges (when stepsize ! is small enough)under minimal assumptions

– time-varying interconnection topology

minx

f(x)

• centralized x := x! !"f(x) + noise

• distributed xi := x! !"f(x) + noisei

• reconcile xi :=xi + 2xj

3

• consensus algorithm su!ces

averaging algorithm needed

Distributed optimization

minx

n!

i=1fi(x)

• centralized gradient iteration: x := x! !n!

i=1"fi(x)

• distributed gradient algorithm: xi := xi ! !"fi(xi)

• reconciling distributed updates: xi :=xi + xj

2

converges (when stepsize ! is small enough)under minimal assumptions

– time-varying interconnection topology

Distributed optimization

minx

n!

i=1fi(x)

• centralized gradient iteration: x := x! !n!

i=1"fi(x)

• distributed gradient algorithm: xi := xi ! !"fi(xi)

• reconciling distributed updates: xi :=xi + xj

2

converges (when stepsize ! is small enough)under minimal assumptions

– time-varying interconnection topology

Distributed optimization

minx

n!

i=1fi(x)

• centralized gradient iteration: x := x! !n!

i=1"fi(x)

• distributed gradient algorithm: xi := xi ! !"fi(xi)

• reconciling distributed updates: xi :=xi + xj

2

converges (when stepsize ! is small enough)under minimal assumptions

– time-varying interconnection topology

Page 31: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Distributed optimization

minx

n!

i=1fi(x)

• centralized gradient iteration: x := x! !n!

i=1"fi(x)

• distributed gradient algorithm: xi := xi ! !"fi(xi)

• reconciling distributed updates: xi :=xi + xj

2

converges (when stepsize ! is small enough)under minimal assumptions

– time-varying interconnection topology

Distributed optimization

minimizen!

i=1fi(x)

• agent i: update xi in direction of improvement of fi

reconcile updates through a consensus or averaging algorithm

minimize f(x1, . . . , xn)

• agent i: update xi, based on (possibly outdated) values of xj

9

minx

f(x)

• centralized x := x! !"f(x) + noise

• distributed xi := x! !"f(x) + noisei

• reconcile xi :=xi + 2xj

3

• consensus algorithm su!ces

averaging algorithm needed

Distributed optimization

minx

n!

i=1fi(x)

• centralized gradient iteration: x := x! !n!

i=1"fi(x)

• distributed gradient algorithm: xi := xi ! !"fi(xi)

• reconciling distributed updates: xi :=xi + xj

2

converges (when stepsize ! is small enough)under minimal assumptions

– time-varying interconnection topology

Distributed optimization

minx

n!

i=1fi(x)

• centralized gradient iteration: x := x! !n!

i=1"fi(x)

• distributed gradient algorithm: xi := xi ! !"fi(xi)

• reconciling distributed updates: xi :=xi + xj

2

converges (when stepsize ! is small enough)under minimal assumptions

– time-varying interconnection topology

Distributed optimization

minx

n!

i=1fi(x)

• centralized gradient iteration: x := x! !n!

i=1"fi(x)

• distributed gradient algorithm: xi := xi ! !"fi(xi)

• reconciling distributed updates: xi :=xi + xj

2

converges (when stepsize ! is small enough)under minimal assumptions

– time-varying interconnection topology

Page 32: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Asynchronous computation model

Asynchronous computation model

(JNT, Bertsekas, Athans, 1986)

xi(t + 1) =!

j

aij(t)xj

"t! dij(t)

#

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

6

Asynchronous computation model

(JNT, Bertsekas, Athans, 1986)

xi(t + 1) =!

j

aij(t)xj

"t! dij(t)

#

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

6

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

24

The DeGroot opinion pooling model (1974)

xi(t + 1) =!

j

aij xj(t) aij ! 0,"

j aij = 1

x(t + 1) = Ax(t) A: stochastic matrix

• Markov chain theory + “mixing conditions”

"# convergence of At, to matrix with equal rows

"# convergence of xi to"

j !jxj

"# convergence rate estimates

• x(t + 1) = A(t)x(t):mixing conditions for nonstationary Markov chains(Chatterjee and Seneta, 1977)

!(n2) for line graphs

Page 33: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Asynchronous computation model

Asynchronous computation model

(JNT, Bertsekas, Athans, 1986)

xi(t + 1) =!

j

aij(t)xj

"t! dij(t)

#

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

6

Asynchronous computation model

(JNT, Bertsekas, Athans, 1986)

xi(t + 1) =!

j

aij(t)xj

"t! dij(t)

#

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

6

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

24

Averaging algorithms

• Averaging algorithms

– A doubly stochastic: 1! A x = 1! x, where 1! = [1 1 . . . 1]

– x1 + · · · + xn is conserved

– convergence to x" =x1(0) + · · · + xn(0)

n

34

The DeGroot opinion pooling model (1974)

xi(t + 1) =!

j

aij xj(t) aij ! 0,"

j aij = 1

x(t + 1) = Ax(t) A: stochastic matrix

• Markov chain theory + “mixing conditions”

"# convergence of At, to matrix with equal rows

"# convergence of xi to"

j !jxj

"# convergence rate estimates

• x(t + 1) = A(t)x(t):mixing conditions for nonstationary Markov chains(Chatterjee and Seneta, 1977)

!(n2) for line graphs

Page 34: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Asynchronous computation model

Asynchronous computation model

(JNT, Bertsekas, Athans, 1986)

xi(t + 1) =!

j

aij(t)xj

"t! dij(t)

#

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

6

Asynchronous computation model

(JNT, Bertsekas, Athans, 1986)

xi(t + 1) =!

j

aij(t)xj

"t! dij(t)

#

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

6

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

24

Convergence time (time to get close to “steady-state”)

Equal weight to all neighborsUndirected graph: exponential(n)

Directed graph: O(n3), tight

The DeGroot opinion pooling model (1974)

Convergence time of consensus algorithms

xi(t + 1) =!

j

aij xj(t) aij ! 0,"

j aij = 1

x(t + 1) = Ax(t) A: stochastic matrix

• Markov chain theory + “mixing conditions”

"# convergence of At, to matrix with equal rows

"# convergence of xi to"

j !jxj

"# convergence rate estimates

• x(t + 1) = A(t)x(t):mixing conditions for nonstationary Markov chains(Chatterjee and Seneta, 1977)

Page 35: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Asynchronous computation model

Asynchronous computation model

(JNT, Bertsekas, Athans, 1986)

xi(t + 1) =!

j

aij(t)xj

"t! dij(t)

#

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

6

Asynchronous computation model

(JNT, Bertsekas, Athans, 1986)

xi(t + 1) =!

j

aij(t)xj

"t! dij(t)

#

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

6

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

24

Convergence time (time to get close to “steady-state”)

Equal weight to all neighborsUndirected graph: exponential(n)

Directed graph: O(n3), tight

Convergence time (time to get close to “steady-state”)

Equal weight to all neighborsDirected graphs: exponential(n)

Undirected graphs: O(n3), tight(Landay and Odlyzko, 1981)

Better results for special graphs(Erdos-Renyi, geometric, small world)

Convergence time (time to get close to “steady-state”)

Equal weight to all neighborsDirected graphs: exponential(n)

Undirected graphs: O(n3), tight(Landau and Odlyzko, 1981)

Better results for special graphs(Erdos-Renyi, geometric, small world)

The DeGroot opinion pooling model (1974)

Convergence time of consensus algorithms

xi(t + 1) =!

j

aij xj(t) aij ! 0,"

j aij = 1

x(t + 1) = Ax(t) A: stochastic matrix

• Markov chain theory + “mixing conditions”

"# convergence of At, to matrix with equal rows

"# convergence of xi to"

j !jxj

"# convergence rate estimates

• x(t + 1) = A(t)x(t):mixing conditions for nonstationary Markov chains(Chatterjee and Seneta, 1977)

Page 36: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

A critique

• Social Networks

– “The process that it describes is intuitively appealing.”(DeGroot, 74)

– How plausible is this type of synchronism?

• Engineering

– If the graph is indeed fixed:elect a leader, form a spanning tree, accumulate on tree

– Want simplicity, and robustness w.r.t. changing topologies,failures, etc.

– Di!erent models and specs?

45

Page 37: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Asynchronous computation model

Asynchronous computation model

(JNT, Bertsekas, Athans, 1986)

xi(t + 1) =!

j

aij(t)xj

"t! dij(t)

#

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

6

Asynchronous computation model

(JNT, Bertsekas, Athans, 1986)

xi(t + 1) =!

j

aij(t)xj

"t! dij(t)

#

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

6

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

zero-delay case: x(t + 1) = A(t)x(t) (inhomogeneous chain)

27

Asynchronous computation model

Time-Varying/Chaotic Environments

Asynchronous computation model

(JNT, Bertsekas, Athans, 1986)

xi(t + 1) =!

j

aij(t)xj

"t! dij(t)

#

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

6

Asynchronous computation model

(JNT, Bertsekas, Athans, 1986)

xi(t + 1) =!

j

aij(t)xj

"t! dij(t)

#

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

6

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

zero-delay case: x(t + 1) = A(t)x(t) (inhomogeneous chain)

27

Asynchronous computation model

Asynchronous computation model

(JNT, Bertsekas, Athans, 1986)

xi(t + 1) =!

j

aij(t)xj

"t! dij(t)

#

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

6

Asynchronous computation model

(JNT, Bertsekas, Athans, 1986)

xi(t + 1) =!

j

aij(t)xj

"t! dij(t)

#

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

6

xi(t + 1) =!

j

aij(t)xj(t)

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

24

Asynchronous computation model

Asynchronous computation model

(JNT, Bertsekas, Athans, 1986)

xi(t + 1) =!

j

aij(t)xj

"t! dij(t)

#

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

6

Asynchronous computation model

(JNT, Bertsekas, Athans, 1986)

xi(t + 1) =!

j

aij(t)xj

"t! dij(t)

#

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

6

xi(t + 1) =!

j

aij(t)xj(t)

aij(t): nonzero whenever i receives message from j

dij(t): delay of that message

24

Variants

xi(t + 1) =!

j

aij xj(t) aij ! 0,"

j aij = 1

x(t + 1) = Ax(t) A: stochastic matrix

• Fixed matrix A, subject to given graph/zero patternsoptimize A via SDP (Boyd & Xiao, 2003)

• i.i.d. random graphs: same (in expectation) as fixed graphs;convergence rate "# “mixing times” (Boyd et al., 2005)

• Fairly arbitrary sequence of graphs/matrices A(t):worst-case analysis

• “equal-neighbor model”: xi := average of messages and own value

• bidirectional model: $ t: i# j i! i" j

• doubly stochastic A(t): sum & average preserving

Page 38: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Consensus convergence

xi(t + 1) =!

j

aij(t)xj

"t)

• aii(t) > 0; aij(t) > 0 =! aij(t) " ! > 0

• “strong connectivity in bounded time”:over B time steps “communication graph”is strongly connected

• Convergence to consensus:# i : xi(t)$ x% =convex combination of initial values(JNT, Bertsekas, Athans, 86; Jadbabaie et al., 03)

• “convergence time”: exponential in n and B

– even with:symmetric graph at each timeequal weight to each neighbor(Cao, Spielman, Morse, 05)

48

Page 39: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Averaging in Time-Varying Setting

• x(t + 1) = A(t)x(t)

– A(t) doubly stochastic, for all t,

– nonzero aij(t) ! ! > 0

– O(n2/!)

• Improved convergence rate

– exchange “load” with up to two neighbors at a time

– can use ! = O(1)

– convergence time: O(n2)

• Is there a !(n2) bound to be discovered?

52

! 1/n1.5

Averaging algorithms – analysis

• Suppose initially!

i xi(0) = 0,!

i x2i = 1

some node has |xi| ! 1/"

n some node has opposite sign

total gap at least 1/"

n some gap at least 1/n1.5

• Over B steps, communicate across gap: V improves by # !/n3

– Convergence time: O(n3/!) ! # 1/degree $% O(n4)– Account for simultaneous contributions of all gaps: O(n2/!)– Keep degree bounded $% 1/! bounded

• Averaging in time-varying bidirectional graphs:no harder than consensus on fixed graphs

• Various convergence proofs of optimization algs. remain valid– Improves the convergence time estimate for subgradient

methods [Nedic, Olshevsky, Ozdaglar, JNT, 09]

Page 40: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Averaging in Time-Varying Setting

• x(t + 1) = A(t)x(t)

– A(t) doubly stochastic, for all t,

– nonzero aij(t) ! ! > 0

– O(n2/!)

• Improved convergence rate

– exchange “load” with up to two neighbors at a time

– can use ! = O(1)

– convergence time: O(n2)

• Is there a !(n2) bound to be discovered?

52

Averaging in Time-Varying Setting

• x(t + 1) = A(t)x(t)

– A(t) doubly stochastic, for all t,

– nonzero aij(t) ! ! > 0

– O(n2/!)

• Improved convergence rate

– exchange “load” with up to two neighbors at a time

– can use ! = O(1)

– convergence time: O(n2)

• Is there a !(n2) bound to be discovered?

52

! 1/n1.5

Averaging algorithms – analysis

• Suppose initially!

i xi(0) = 0,!

i x2i = 1

some node has |xi| ! 1/"

n some node has opposite sign

total gap at least 1/"

n some gap at least 1/n1.5

• Over B steps, communicate across gap: V improves by # !/n3

– Convergence time: O(n3/!) ! # 1/degree $% O(n4)– Account for simultaneous contributions of all gaps: O(n2/!)– Keep degree bounded $% 1/! bounded

• Averaging in time-varying bidirectional graphs:no harder than consensus on fixed graphs

• Various convergence proofs of optimization algs. remain valid– Improves the convergence time estimate for subgradient

methods [Nedic, Olshevsky, Ozdaglar, JNT, 09]

Page 41: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Averaging in Time-Varying Setting

• x(t + 1) = A(t)x(t)

– A(t) doubly stochastic, for all t,

– nonzero aij(t) ! ! > 0

– O(n2/!)

• Improved convergence rate

– exchange “load” with up to two neighbors at a time

– can use ! = O(1)

– convergence time: O(n2)

• Is there a !(n2) bound to be discovered?

52

Averaging in Time-Varying Setting

• x(t + 1) = A(t)x(t)

– A(t) doubly stochastic, for all t,

– nonzero aij(t) ! ! > 0

– O(n2/!)

• Improved convergence rate

– exchange “load” with up to two neighbors at a time

– can use ! = O(1)

– convergence time: O(n2)

• Is there a !(n2) bound to be discovered?

52

! 1/n1.5

Averaging algorithms – analysis

• Suppose initially!

i xi(0) = 0,!

i x2i = 1

some node has |xi| ! 1/"

n some node has opposite sign

total gap at least 1/"

n some gap at least 1/n1.5

• Over B steps, communicate across gap: V improves by # !/n3

– Convergence time: O(n3/!) ! # 1/degree $% O(n4)– Account for simultaneous contributions of all gaps: O(n2/!)– Keep degree bounded $% 1/! bounded

• Averaging in time-varying bidirectional graphs:no harder than consensus on fixed graphs

• Various convergence proofs of optimization algs. remain valid– Improves the convergence time estimate for subgradient

methods [Nedic, Olshevsky, Ozdaglar, JNT, 09]

Averaging in Time-Varying Setting

• x(t + 1) = A(t)x(t)

– A(t) doubly stochastic, for all t,

– nonzero aij(t) ! ! > 0

– O(n2/!)

• Improved convergence rate

– exchange “load” with up to two neighbors at a time

– can use ! = O(1)

– convergence time: O(n2)

• Is there a !(n2) bound to be discovered?

52

Page 42: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

Closing Thoughts

Engineering

– Bayesian: near-optimal designs possible

– Naive: interesting, manageable design questions

Social Networks

– What are the plausible models?

Page 43: Some Models of Information Aggregation and Consensus in ... · Audio Download (Audible.com) $25.95 $13.63 Show more editions and formats Get Free Two-Day Shipping Get Free Two-Day

!ank y"!


Recommended