Online Mechanisms

Online MechanismsOnline Mechanisms

Seminar on game theory,David C Parkes.Presentation by Alon Baram.

Talk Layout

Review on mechanisms. Online mechanism definitions. Single value domains.

Review – Mechanism design

A Mechanism for n players is given by: Players types spaces Players action spaces Alternatives set A Valuations Outcome functions Payment functions Linear utility : valuation for player – price of player

Social choice function: Strategy Mechanism implements social choice function

1,..., nT T1,..., nX X

:i iv T A R

1: ... na X X A

1: ....i np X X R

1: ... nf T T A :i i is T X

Review – Mechanism design(2) In dominant strategies if for some dominant strategy

equilibrium for all types of players the function chooses the same outcome as the strategy, then it is implemented by the mechanism.

Mechanism is incentive compatible (truthful), if telling the truth about it’s type is dominant strategy.

Revelation principle – any mechanism with dominant strategies profile can be converted to be truthful.

Individual rationalism – if at equilibrium all agents have non-negative utility.

Ice Cream Stand

Moshe has an ice cream stand. He is very lazy and makes one cone per hour.

If no one buys the cone within that hour the ice cream melts, and he is very sad.

Buyers come and stand in the line. Upon arrive they declare how many hours they are willing to wait, and how much the cone is worth for them.

Ice Cream Stand – Cont.

Every buyer holds inner information The true time he is willing to wait for the cone The true value of the cone to him.

Thus a player type is a triplet defined by:(Available to stand in line time, has to leave time, cone

value) This triplet is declared (but not necessarily the real one)

to Moshe once the buyer arrives. Moshe decides who he wants to give this hour’s cone to.

Ice Cream Stand - example

There are 3 buyers, With types: (9:00,11:00,100),(9:00,11:00,80),(10:00,11:00,60).

Lets say Moshe uses the Vickery generalization for his decision. Every hour he sells to the highest bidder in the snd’ highest price.

If every bidder is truthful: Buyer 1 will get his cone for 80 in the first hour. Buyer 2 will get his cone for 60 in the second hour.

But what if buyer one chooses to lie?

Ice Cream Stand – Manipulation Buyer 1 may declare (9:00,11:00,61):

The cone in the first hour is won by buyer 2 for 61 The cone in the second hour is won by buyer 1 for 60 instead of

80. Buyer 1 may come to the line at hour 10:00 instead of

9:00. Buyer 2 wins the first cone for 0. Buyer 1 wins the second cone for 60.

The Vickery auction in online setting is untruthful because buyers may choose the auction to participate in.

Dynamic auction with expiring items Formal model for our example.

Discrete time periods : N agent types denoted

a is the arrival time. d is the departure time. w is the value for allocation of the single item.

The value is for allocating one unit during Payment p is collected from the agent. Utility is quasi-linear

1,2,...T

, ,a d w T T Ri i i i

,t a di i

i i iu w p

Online Mechanisms motivation Model the notion of dynamic environments:

Selling seats on airplanes for buyers arriving over time. Allocating computational resources for jobs arriving over time. Selling adverts on search engine to changing groups of buyers

with uncertain future supply. (Adwords) Allocating tasks to dynamically changing team of agents.

Online Mechanisms challenges Agents may arrive or depart at any time. Uncertainty about feasible actions in the future. The types of agents who did not arrive yet are unknown. Agent can lie about their arrival and departure time as

well as the valuation. We may restrict the type of lies an agent is capable of.

Example: An agent may not report an arrival time earlier than the one he actually arrives.

Online Mechanisms formal model Discrete time periods set Set of feasible outcomes

Where is the possible outcomes at time t. Sequence of decisions

Where is the decision at time t. Agent i type denoted Valuation function Quasi linear utility function Arrival period is the first time the agent may report its

type. Valuation component may depend on choices and time.

1,2,...T

1 2, ,...O O O

tO 1 2, ,...k k k O

tk

, , ia d w T T Wi i i i

,, , i ia di i i iv k v k

Direct Revelation Mechanisms An agent may send one message to the mechanism

regarding it’s type. The agent gets no information prior to this reporting. For the ice cream example:

Buyers reported their values and departure time upon arrival. Buyers didn’t know about the other buyers before arriving.

Direct Revelation Mechanisms - Formally Mechanism state Captures all information

relevant for time t decision. Allow stochastic events The state is State space finite, countable or continues. Feasible decisions at time t.

Direct Revelation Mechanisms – Formally 2 Mechanism Single claim about type. Decision policy , Payment policy , for every active agent. Decision policy may be stochastic. Payment policy may collect over several periods.

Ice cream revisited

The ice cream is a direct revelation mechanism. The state is current active agents list. The policy is allocate to highest active unallocated

bidder. The payment policy is 2nd active bidder.

Limited Misreports

Scenarios where agents have restrictions on the possible lies they can make.

Formally , for is the set of available misreports for agent with type

No early arrival misreports – agent cannot report arrival time before they actually arrive.

No late departures – agent cannot report departure time that is after the one they leave on.

In the ice cream, an agent couldn’t misreport early arrival, because he wasn’t there.

Truthful online mechanisms

An online mechanism is truthful if: For each agent i.

Given a known set of misreports for i. For every fixed choice of other players reported types, and every

stochastic event, It occurs that - The utility of I while reporting it’s true type is greater (equal) to his utility while reporting any available lie. While utility is :

The valuation of the agent while given his type and the policy decision on his reported type, and other agents types.

The appropriate payment for current state.

Truthful online mechanisms – cont Formally:

For a stochastic decision policy the expected utility of being truthful should be maximal, over all other reports and events.

For Bayes-Nash incentive compatibility All agents know distribution of agent types, events. The expected utility of truth-telling for all agent is larger then telling

available lie, if other agents are truthful. Weaker then previous definition.

Online revelation principle

The revelation principle for offline MD, states that arbitrary mechanism implementing dominant strategy may be emulated by a truthful one.

In general setting it might not be true for online MD. But it is true when we limit the misreports to no early

arrivals and no late departures.

Single value Domains

Introduction Dynamic auction with expiring items Adaptive limited supply auction

Interesting groups

“Moshe city travel” sells equipment for travelling the city. As we all know you can’t drive the city by car. They sell the following items:

Bicycle. Bicycle pumps. Segway.

There are 4 types of agents: (1,1,({Bicycle,Bicycle pumps},20)) (2,2,({Bicycle,Bicycle pumps},15)) (1,1,({Bicycle pumps},30)) (1,2,({Segway},40))

Interesting groups-cont

Assume Moshe is lazy works 2 hours a day… Moshe today has to offer one of each item. Say we have 4 buyers, one from each type. Possible allocation outcome, each spot represents the

buyers allocation:

In the eyes of agent 1 (bikes+pumps) we can define order:

1 2 3

{ , } { } { , , }

, ,{ }

{ } { }

bikes pump bikes bikes segway pump

L L Lpump

segway segway

2 1 3L L L

Single value domain

In a single value domain agents wants their preference to be at least some part of the decision, at any time they are present in the mechanism.

The value they get is either a constant when their request is satisfied, or 0 otherwise.

We can use the language of interesting sets, to define what is interesting for a player, and to find all choices that include this interest.

This is done by creating for every agent a set of L sets. Each set represent some subset of decisions. By defining a partial order on them we can check when the agent is satisfied.

Single value domain formally

Let be the set of interesting sets for agent i. A subset of decisions.

Define partial order Now is the value on interesting set. formal definition for single value domain:

Lets assume the mechanism knows each type IS.

Under this assumption define:

A partial order on types which sorts conflicting types by their value.

Single value combinatorial auction Multiple units of indivisible items. Uncertain supply, no storage between periods. Single value preferences, all allocations of interesting

items for agent i. Partial order Agent I with type

iS L

Critical value The store policy is to use vickery pricing, where each price

is now for the smallest contained sub-set. Ie for {Bicycle,Bicycle pumps} , {Bicycle pumps} , the

smallest contained subset is {Bicycle pumps} . After choosing some policy, we can calculate the Critical

value for an agent. Which is while fixing other agents, the minimum value he has to have in order to win his interest at some point.

In our example: Agent 1 needs to beat agent 3 in first round so his critical is

30+epsilon. Agent 2 cannot win so his critical value is inf. Agent 3 has to beat agent 1 so his critical value is 20+epsilon. Agent 4 wins anyway so his critical value is 0.

Revisit last example (1,1,({Bicycle,Bicycle

pumps},20)) (2,2,({Bicycle,Bicycle

pumps},15)) (1,1,({Bicycle pumps},30)) (1,2,({Segway},40))

Critical value - formally

Where means that the agent was allocated at some period.

Monotonic policy

A deterministic monotonic policy is one that Fix agent i

for every choice of types for all other agents, we can replace agent I type with a “bigger” one (ie value is larger, including the timeframe)

If I was allocated so will the new one be. Formally:

The strictness is to ensure the value is higher. The previous policy is monotonic.( I think)

Non monotonic example

Types 1=(1,1,(10,{bike})), 2=(1,2,(15,{bike})) 2 agents, one bike.

In this policy we have 2 bigger then 1 but in our example 2 isn’t allocated when faced with another 2.

1 1 2 21,2 1, 2,2 0( 1,2 0, 2,2 1)

Lemma

Meaning – critical value is determined by the other agents and the time interval.

Proof : fix other players and events. Assume For 2 types of I critical_value(theta’)< critical_value(theta) But theta’<=theta

Replace values to

critical_value(theta’)

But still theta’<=theta

Proof-cont

But critical_value(theta’)<critical_value(theta) so theta isn’t allocated and theta’ is, contradiction to monoticity.

Truthfulness in single value domains It is possible to implement truthfully any monotonic

deterministic policy, given no early arrivals or late departures misreports.

This is done by charging a departing agent on the time of departure his critical value if he was allocated.

Formally :

Truthfulness in single value domains Proof: fix other agent types, events, and fix agent i type

Case a, agent allocated. Any legal misreport will cause,

Either limiting the range more thus causing a type that needs larger critical value. (by previous lemma).

Just increasing r(i). In any case r(i) will have to be increased causing the agent to lose utility.

Case b, agent is not allocated. Critical value is larger than the agent value. Any type he report must be bigger (otherwise it will be smaller), thus

increasing the value, causing negative utility.

Necessary conditions for truthfulness A mechanism satisfies individual rationality when every

agent has non-negative utility in equilibrium. We now examine the necessary conditions for

truthfulness.

Reasonable misreporting – an agent can at least lie about later arrival time, earlier departure time and any value.

Necessary conditions for truthfulness Proof: fix other players and events. Assume

theta <= theta’ theta is allocated, theta’ isn’t. r(i)>vc(theta)

=> agent theta has strictly positive utility.

1.4 .

''i i i c ilemma I R

p p v

Necessary conditions for truthfulness Agent theta’ which is not allocated has weakly negative

utility. (he might be charged) Agent theta’ should lie and report type theta and will

have profit. Thus the mechanism is untruthful.

Dynamic auction with expiring items Examples:

Ice cream stand. Time on shared computer. Network resources.

Model Assumptions: No early arrivals. No late departures.

Can be justified by withholding the item/ result until departure.

Competitive analysis

Use competitive analysis adversary model. Competitive – how good is our algorithm verses the

optimal offline algorithm with full information. Optimality criterion – value of best possible offline allocation.

Adversary – chooses the worst input type he can find. Has a model indicating it’s power to select bad input.

Competitive analysis formally

y_i – bid I was allocated (0,1) X_it – when bid was allocated.

Define c-competitive Z – set of available inputs c>=1

Ice Cream Stand - Reloaded

There are 3 buyers, With types: (9:00,11:00,100),(9:00,11:00,80),(10:00,11:00,60).

Moshe complained that the customers lie all the time and asked agent Smith’s help to better choose his policy.

Smith suggested that he should use the critical value in order to decide payments for his customers.

Also ties between highest bidders should be broken randomly.

Sells for agent 1 for 60 in round one Sells for agent 2 for 60 in round two

The auction formally

Truthfulness and 2 competitive In our setting the auction is truthful and 2-competitive. Proof:

For truthful, it is enough to see that the policy is monotonic. If agent I won in some period, it will obviously win if he extend this

period , and/or increase his value. For competitiveness, look at each allocation of offline algorithm

If agent I was allocated with the offline but other in the online, charge the value onto the online agent.

If the same agent was allocated for both, charge him in the online. Every online agent was charged at most twice and for a value that is

at most his value. Therefore the total value of the offline algorithm is at most twice that

the online.

Lower bounds

The secretary problem

There are N job applicants. Each has a rank. While interviewing the rank of the current applicant is

learnt relative to the others who were interviewed. The interviewer must decide in place whether to hire. The adversary may choose the qualities but not the

order. The applicants are sampled uniformly. The optimal policy is to interview t-1 applicants and hire

the next one who is better.

The secretary problem - cont

T is

As N goes to infinity: The probability to hire the best goes to 1/e So does the ratio t/N Policy is sample N/e applicants and then accept the next one

who is better than the ones interviewed.

Adaptive limited supply auction N agents Single indivisible item. No early arrivals misreports. The differences between secretary and auction:

Bidders have entry and exit time. Bidders are strategic – can misreport.

Adversary creates set of arrival departure, set of values and types are defined by randomly sampling (uniform without replacement) from both sets.

Revenue optimality criterion – compare total payments against offline Vickery auction(2nd price).

Adaptive limited supply auction The competitive ratio is:

The optimal policy is divided to : Learning phase. Accepting phase.

Naïve solution: Observe [N/e] reports set p=max_value Sell with p to the first agent to report equal or greater value.

Adaptive limited supply auction Consider the following example:

Six types : 1=(1,6,6),2=(3,7,2),3=(4,8,4),4=(6,7,8) and two more arriving later.

Transition to accepting phase after 2 bids. Agent 4 wins in period 6 and pays 6. If 1 reports (5,7,6) he wins in 5 for payment 4.

Naïve solution doesn’t work. But it would have worked if all agents were impatient.

Adaptive limited supply auction Try the following Auction:

When the th bid is received, let p>=q the leading bids. If p is still present sell it to him at price q. (break ties randomly). Else, sell to next agent to bid at least p at price p.

Now revisiting the example: Six types : 1=(1,6,6),2=(3,7,2),3=(4,8,4),4=(6,7,8) and two more

arriving later. Transition to accepting phase occurs when agent 2 bids.

p=6,q=2 Agent 1 wins for 2.

Consider 1’ = (1,2,6) Transition occurs after agent 2 bids. The item is sold on period 6 to agent 4 for p=6. 1’ won’t want to report 1 because he will not be present

to receive the item.

Truthfulness and e+o(1) competitive

Date post:	12-Feb-2016
Category:	Documents
Upload:	jesse
View:	36 times
Download:	0 times

Online Mechanisms

Documents