+ All Categories
Home > Documents > Measurement-Based Models for Cognitive Medium Access in WLAN Bands

Measurement-Based Models for Cognitive Medium Access in WLAN Bands

Date post: 19-Aug-2015
Category:
Upload: soumya-das
View: 217 times
Download: 2 times
Share this document with a friend
Description:
journal paper
Popular Tags:
11
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, VOL. 26, NO. 1, JANUARY 2008 95 Cognitive Medium Access: Constraining Interference Based on Experimental Models Stefan Geirhofer, Student Member, IEEE, Lang Tong, Fellow, IEEE, and Brian M. Sadler, Fellow, IEEE Abstract— In this paper we design a cognitive radio that can coexist with multiple parallel WLAN channels while abiding by an interference constraint. The interaction between both systems is characterized by measurement and coexistence is enhanced by predicting the WLAN’s behavior based on a continuous-time Markov chain model. Cognitive Medium Access (CMA) is derived from this model by recasting the problem as one of constrained Markov decision processes. Solutions are obtained by linear programming. Furthermore, we show that optimal CMA admits structured solutions, simplifying practical implementations. Preliminary results for the partially observable case are presented. The performance of the proposed schemes is evaluated for a typical WLAN coexistence setup and shows a significant performance improvement. Index Terms— Cognitive Radio, Dynamic Spectrum Access, Standards Coexistence, IEEE 802.11b, Resource Management. I. I NTRODUCTION T HE GROWING demand in wireless technology has re- sulted in a dense allocation of relevant frequency bands. So far, regulators avoid mutual interference by assigning bands that do not overlap in frequency. This traditional static approach, however, leads to inefficient usage in both spatial and temporal domains. In fact, recent measurements [1] report that generally less than 5% of the spectrum are used at any given time and location. A static frequency allocation not only leads to inefficient spectrum usage, it moreover confines many services to unlicensed bands. A sizeable portion of the wireless consumer equipment falls into this category. Consider, for instance, WLAN, Bluetooth, cordless phones, and similar applications. As a consequence, we face today overly crowded unlicensed bands whose performance is typically limited by mutual interference, and inefficient spectrum usage in those bands statically assigned by regulators. Is inefficient usage the price we have to pay for avoiding interference? The emerging area of dynamic spectrum access (DSA) sparked by recent advances in software-defined and cognitive Manuscript received March 2007; revised August 2007. This work is supported in part by the U.S. Army Research Laboratory under the Collabo- rative Technology Alliance Program, Cooperative Agreement DAAD19-01-2- 0011. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation thereon. This work has been presented in part at the IEEE Global Communications Conference, Washington D.C., November 2007. Stefan Geirhofer and Lang Tong are with the Department of Electrical and Computer Engineering, Cornell University, Ithaca NY 14853 (e-mail: {sg355,lt35}@cornell.edu). Brian Sadler is with the U.S. Army Research Laboratory, Adelphi, MD 20783 (e-mail: [email protected]). Digital Object Identifier 10.1109/JSAC.2008.080109. radios presents a possible solution to this problem. This paper focuses on hierarchical schemes [2], in which the secondary system, i.e. the cognitive radio, is designed such that no or only insignificant interference is generated towards the primary user. Orthogonality between both systems can be achieved by exploiting different degrees-of-freedom in designing the system. While most of the research has focused on limiting mutual interference by spatial separation [3], we achieve orthogonality in the time domain by reusing idle periods that remain between the bursty packet transmissions of the primary system, represented by multiple, independently evolv- ing WLAN channels. The cognitive radio considered in this paper is based on a frequency hopping setup with a physical layer similar to Bluetooth. This allows us to draw some conceptual parallels to Bluetooth/WLAN coexistence setups. We emphasize that implementing Cognitive Medium Access (CMA) as an extension of Bluetooth requires additional sensing and processing that current hardware designs may not easily accommodate. Our results indicate the significant potential for incorporating CMA techniques in future systems. A. Main contribution The main contribution of this paper is the derivation of CMA, a protocol that enhances WLAN coexistence based on sensing and prediction. Our analysis is based on measurement- based models, both for predicting the WLAN’s behavior and for characterizing the cognitive radio’s impact. CMA is fundamentally based on a stochastic model for the WLAN’s packet transmissions. We have previously shown through theory and experiment that a semi-Markov model captures this stochastic behavior accurately [4]. We consider this model and derive practical access schemes based on a continuous-time Markov chain (CTMC) approximation. This model, together with a sense-before-transmit strategy, allows us to constrain the interference generated towards the primary user. The cognitive radio’s throughput is optimized by recast- ing the problem as a constrained Markov decision process (CMDP). The formulation of the CMDP depends on the sens- ing capabilities of the radio frontend. If all parallel channels can be observed simultaneously we say the system is fully observable. On the other hand, if only a limited number of channels can be sensed at a time, we have to address this partial observability, which significantly complicates the derivation of CMA. It is well known that CMDPs can be solved via linear programs, and we briefly state standard solution techniques. Additionally, we show that our problem setup admits struc- 0733-8716/08/$25.00 c 2008 IEEE
Transcript

IEEE JOURNAL ON SELECTED AREAS INCOMMUNICATIONS, VOL. 26,NO.1,JANUARY 2008 95Cognitive MediumAccess:ConstrainingInterferenceBasedon ExperimentalModelsStefan Geirhofer, Student Member, IEEE,Lang Tong, Fellow, IEEE, and BrianM.Sadler, Fellow, IEEEAbstractInthis paperwedesignacognitiveradiothatcancoexist withmultiple parallel WLANchannels while abidingby an interference constraint. The interaction between bothsystems is characterized by measurement and coexistence isenhanced by predicting the WLANs behavior based on acontinuous-time Markov chain model. Cognitive Medium Access(CMA) is derivedfromthis model by recasting the problemasoneofconstrainedMarkovdecisionprocesses. Solutionsareobtainedbylinear programming. Furthermore, we showthatoptimalCMA admitsstructured solutions, simplifying practicalimplementations. Preliminary results for the partially observablecasearepresented. Theperformanceof theproposedschemesis evaluated for a typical WLAN coexistencesetup and shows asignicant performanceimprovement.Index TermsCognitive Radio, Dynamic SpectrumAccess,Standards Coexistence, IEEE802.11b, Resource Management.I. INTRODUCTIONTHEGROWINGdemandinwirelesstechnologyhasre-sulted in a dense allocation of relevant frequency bands.So far, regulators avoid mutual interference by assigningbands that do not overlap in frequency. This traditional staticapproach, however, leadstoinefcient usageinbothspatialand temporal domains. In fact, recent measurements [1] reportthat generallylessthan5%ofthespectrumareusedat anygiventime andlocation. Astatic frequencyallocation notonly leads to inefcient spectrum usage, it moreover connesmany services tounlicensed bands. Asizeableportion of thewireless consumer equipment falls into this category. Consider,for instance, WLAN,Bluetooth, cordless phones, and similarapplications.As a consequence, we face today overly crowded unlicensedbands whose performance is typically limited by mutualinterference, andinefcient spectrumusage inthose bandsstatically assigned by regulators. Is inefcient usage the pricewehave to pay for avoiding interference?The emerging area of dynamic spectrumaccess (DSA)sparked by recent advances in software-dened and cognitiveManuscript received March 2007; revised August 2007. This work issupportedin partby theU.S. Army ResearchLaboratoryunderthe Collabo-rative Technology Alliance Program, Cooperative Agreement DAAD19-01-2-0011. The U.S. Government is authorized to reproduce and distribute reprintsfor Government purposes notwithstanding anycopyright notation thereon.Thisworkhasbeenpresentedinpart at theIEEEGlobal CommunicationsConference, WashingtonD.C., November2007.StefanGeirhofer andLangTongare withthe Department of ElectricalandComputer Engineering, Cornell University, Ithaca NY14853(e-mail:{sg355,lt35}@cornell.edu).BrianSadler is withtheU.S. ArmyResearchLaboratory, Adelphi, MD20783(e-mail:[email protected]).DigitalObjectIdentier10.1109/JSAC.2008.080109.radios presents a possible solution to this problem. This paperfocusesonhierarchicalschemes[2], inwhichthesecondarysystem, i.e. the cognitive radio, is designed such that no or onlyinsignicant interference is generated towards the primaryuser. Orthogonalitybetweenbothsystems canbe achievedbyexploitingdifferent degrees-of-freedomindesigningthesystem. Whilemost oftheresearchhasfocusedonlimitingmutual interference by spatial separation [3], we achieveorthogonality in the time domain by reusing idle periodsthat remainbetweentheburstypacket transmissions of theprimary system, represented by multiple, independently evolv-ingWLANchannels. Thecognitiveradioconsideredinthispaperisbasedonafrequency hopping setupwithaphysicallayer similar to Bluetooth. This allows us to drawsomeconceptualparallelstoBluetooth/WLANcoexistencesetups.Weemphasizethat implementingCognitiveMediumAccess(CMA) as an extension of Bluetooth requires additionalsensing andprocessingthat current hardware designs maynot easilyaccommodate. Our resultsindicatethesignicantpotential for incorporating CMA techniques in future systems.A. Main contributionThe maincontributionof this paper is the derivationofCMA,a protocol that enhances WLANcoexistence based onsensing and prediction. Our analysis is based on measurement-basedmodels,both forpredicting theWLANsbehavior andfor characterizing thecognitive radios impact.CMAis fundamentallybasedona stochastic model forthe WLANs packet transmissions. We have previously shownthroughtheory and experiment that a semi-Markovmodelcapturesthisstochasticbehavioraccurately[4]. Weconsiderthis model andderivepractical access schemes basedonacontinuous-timeMarkovchain(CTMC)approximation. Thismodel, togetherwithasense-before-transmit strategy,allowsus to constrain the interference generated towards the primaryuser.The cognitiveradios throughput is optimizedbyrecast-ingthe problemas a constrainedMarkovdecisionprocess(CMDP). The formulation of the CMDP depends on the sens-ingcapabilitiesoftheradiofrontend. Ifallparallelchannelscanbeobservedsimultaneouslywesaythesystemis fullyobservable. On the other hand, if only a limited numberof channels can be sensed at a time, we have to addressthispartialobservability,whichsignicantlycomplicatesthederivation of CMA.It is well knownthat CMDPs can be solved via linearprograms, andwebrieystatestandardsolutiontechniques.Additionally, weshowthat our problemsetupadmitsstruc-0733-8716/08/$25.00 c 2008IEEE96 IEEE JOURNAL ON SELECTED AREAS INCOMMUNICATIONS, VOL. 26,NO.1,JANUARY 2008turedsolutionsthat allowustogainfurtherinsight intotheproblemandcanbeusedtosolvetheoptimizationproblemwithreduced complexity.Thenumerical assessment of our algorithmsis basedontypical WLAN deployments. We examine the performance forthe CTMC model, and investigate its robustness by running thealgorithms on data generated using the accurate semi-Markovmodel. Acomparisonwithablindreferencescheme, whichdoes not perform any sensing, shows a signicant performanceimprovement.B. Related workDynamicallyaccessingspectruminthetimedomainhasreceivedincreasinginterest[3], [5]. Ataxonomyofexistingarchitectures is introduced in [2]. Among the rst to considertime domain spectrum sharing are [6], [7], where it is assumedthat primary and secondary system share the same slot struc-ture. Based on a Partially-Observed Markov Decision Process(POMDP) framework, access strategies for the secondarysystemarederived. Theassumptionofbothsystemshavingthesameslot structureis, however, toorestrictiveinmanycases.A semi-Markov model for predicting the WLANtransmis-sions has beenintroducedin[4], [8]. Inorder tosimplifythe derivation of access schemes, a CTMC approximation hasbeen considered in [9] and [10].Coexistenceinunlicensedbands has previouslyreceivedsignicant attentionoutside the cognitiveradiocommunity[11] withBluetooth/WLANcoexistence beinga prominentexample of practical concern. Apossible approach to in-terference mitigation in this context is adaptive frequencyhopping[12]. TheCMAprotocol isconceptuallysimilar tosuch schemes but differs in its physical layer sensing. Incurrent Bluetoothdevicessuchsensingis not required, andconsequentlyinterference informationneeds to be inferredfrom higher layers.The remainder of the paper is organizedas follows. InSec.II, thesystemsetupisintroduced. Sec.IIIpresentsthemeasurement-basedinterferencemodels. InSec.IV, optimalCMAis derived for fully and partially observed systems,constituting this papers main contribution. Numerical perfor-mance results are presented inSec.V.II. SYSTEM SETUPThephysicallayersetupconsideredinthispaperconsistsof Mparallel, independentlyevolvingWLANchannels, asshowninFig. 1. The cognitiveradioshares the same fre-quency band and dynamically hops through the WLAN bandsbased on sensing and statistical prediction. A value of M= 3bands is typical since practical WLAN setups in the ISM bandat 2.4 GHz support three non-overlapping channels [13].A. Physical layer setupAlthough there are no restrictions in designing the cognitiveradio, other than our ultimate goal of minimizing mutualinterference, weshall focusonthefrequencyhopping(FH)setupdepictedinFig. 1. Let eachof theMWLANbandsoverlap withNnarrowband hopping channels.f11WLANband 1f21fN1f1if2ifNif1Mf2MfNMPrimary user1 2 3 4 5 6 7 8 9 10time slotsWLANbandiWLANband MCognitive radioFig. 1. System setup. The cognitive radio is a time-slotted frequency hoppingsystem.Thechoiceof aFHsetupisconsideredfor tworeasons.First, WLANis anunslottedsystemthat performsmediumaccess based on Carrier Sense Multiple Access with CollisionAvoidance (CSMA/CA). A logical approach toward enforcinganinterferenceconstraint is thus tosense themediumpe-riodically, andtransmit inaslottedfashion. Second, mutualinterference is reduced by exploiting the fact that WLAN usesspread spectrum communications and thus has some inherentrobustness tonarrowband interference.Theaboveconsiderations aremotivatedbypracticalexpe-rience.TheFHsetupconsidered inthisworkhas conceptualsimilarities to Bluetooth. We use this similarity to nd realisticdeploymentparameters(slotsize, modulationcharacteristics,etc.) based on the standard [14]. CMA is therefore a potentialcognitiveextensiontoFHsystemssuchasBluetooth, basedon sensingand statisticalprediction of theWLAN.Apart fromthe FHsetup, we also considered a direct-sequence spread-spectrumphysical layer for the cognitiveradio,spanning thesameMfrequency bands astheWLAN.The spread-spectrumsetup is also slottedanddesignedtodynamicallyhopacross theWLANbands. For thespread-spectrum setup, wefound byexperiment [15] that as long asthe spreading code is different from the WLANs, interferenceproperties are similar tothe FHsetup. As a consequence,CMAextendstothissetupaswell, asit isonlyconcernedwith nding a hopping sequence across bands (but not withinbands for the FHsetup).B. Operational block diagramThe operation of the cognitive radio is illustrated in Fig.2.AnRFfrontendisusedforup-anddown-conversion ofthesignals, and sampling is performed using an acquisition board.Atthebeginning ofeveryslotaspectrumsensordetermineswhether the medium is busy, either based on energy detectionor by exploiting features of the WLAN standard. The sensingresult isprocessedwithintheCMAcontroller, whichdeter-mineswhether it issafe totransmit, andif yes, inwhichGEIRHOFER et al.: COGNITIVEMEDIUMACCESS: CONSTRAINING INTERFERENCE BASED ON EXPERIMENTAL MODELS 97SpectrumsensorCognitivetransmitterCognitivedataCMAControllerDown-converterUp-converterBaseband Processing RF frontendband selectionAccess PointPC1PC2PC3802.11b WLANFig.2. Blockdiagramof thecognitiveradiosoperation. ThecognitiveradioisshownontheleftandcoexistswiththeWLAN ontheright.band. Thesecondarytransmitter istunedaccordinglyandatransmission may be initiated.According to the above, on a slot level, the followingoperations areperformedinsequence. At thebeginningofevery slot the medium is sensed, the CMA controller decideson which band, if any, to transmit on, the transmitter is retunedaccordingly, and a transmission takes place for the remainderof theslot period.The sensing time, the run time of the controller, and the timeit takes to retune the transmitter contribute to the overhead ofthesystem.Wehavepreviously analyzed theperformance ofenergy detection on WLAN signals [8]. With a detection errorprobability of 105and a signal-to-noise ratio (SNR) of 5 dBwe require a sensing time of less than 5 s. Thus the sensingoverheadat thebeginningof everyslot isfairlysmall. TheCMAcontrollerimplementsarandomizedcontrolpolicy(tobedescribedindetail later), whichcanbeimplementedbybiased coin ips. The delay associated with it is small as well.Finally, current technologyyieldsafrequencyretuningtimeon the order of 100 s [16], and thus dominates the processingoverhead time. In our numerical evaluations we choose a slotsizeofTs= 625 s, whichisthesameasinBluetooth.Webelievethisisapracticalchoicegiventhesimilaritiesinthephysical layer setup.Westress that sinceeveryWLANchannel overlapswithNhopping channels, the hopping within each band decouplesfrom selecting one of theMWLAN bands. This paper solelydeals with the optimal selection of one of the Mbands.Hoppingwithineachbandcan, for instance, beperformedpseudo randomly.In this paper, we assume that the secondary systemissynchronized. In a practical system, maintaining identical hopsequences within the cognitive system may be challenging, assensingresultsobtainedat different nodescouldpotentiallydiffer. While it is beyond the scope of this paper to address thisissue in detail, we believe that collaborative sensing techniquescould be used to provide hop sequence coordination. Byexchangingsensingmetrics across subsequent slots, masterandslave couldperformsensingjointlyandthus arriveatidentical results. Another potential scheme for maintainingsynchronization is to employ acknowledgement feedback [10].III. MEASUREMENT-BASED INTERFERENCE MODELCMAisaprotocolthat dynamicallyhopsacrossmultipleparallel WLANbands inanoptimal fashion. Itisfundamen-tallybased on ameasurement-based prediction model, whichwillbe described indetail inSec.IV.Inaddition wepresentphysical-layercoexistencemodels, that describetheinterac-tion between both systems, should a collision occur. Thisexperimental coexistence model is described in the following.Inparticular, we rst evaluate whether the cognitivera-dioaffectstheWLANs carrier sensing. Second, weobtainempirical resultsfortheprobabilitythat acollisionbetweenbothsystems leadstoaWLANpacket error. Duetospacelimitations, we focus on a qualitative assessment; detailedquantitative results can be found in a separate technical report,available online [15].A. Impact on WLAN carrier sensingThedesignofthecognitiveradioneedstoensurethat itstransmissions remain transparent totheWLAN.ThisimpliesthattheWLANscarriersensingmustnotbealtered.Other-wisenot onlyour paradigmof hierarchical DSAwouldbeundermined, butalsothecognitive radios dynamiceffect onthe WLAN would render our prediction model useless, unlessitincorporated theWLANs retransmission behavior.We evaluatedthe cognitiveradios impact bymeasuringtheprobabilitythat theWLANdetectsthecognitiveradiostransmission. Theexperimental setupisshowninFig. 3(a).It consists of an802.11brouter andanRFsignal source,generatingtheWLANandtheFHsignal,respectively. Moreprecisely,weconsiderastatic(non-hopping) FHsignalwithBluetooths modulation parameters [14]. As the signal remainsstatic inoneof thechannels it is possibletoexaminethemutual interference resulting from a specicchannel.The WLAN adapter and the signal source are connected viacirculators, which couple generator and router while providingisolation in the reverse direction. A WLAN adapter is used tocapturethereceivedsignal, andavector signal analyzer isusedto verify thecorrect operation of thesetup.The impact of the FHsignal is assessed by measuringits effect on the WLANpacket rate. The WLANroutercontinuously transmits packets,and aWLANadapter cardisusedtomeasureitsaveragerate(bycapturingpacketsoverlong periods of time). In the presence of the cognitive system,if the WLAN detected the interference, its rate would decreaseasback-off periods would need to beaccommodated.Theimpact ontheWLANlargelydependsontheinter-ferers channel index. Our results suggest that the WLANadapter performs energydetectioninnarrowbands spacedabout 10 MHzapart. Noimpact wasobservedoutsidethese98 IEEE JOURNAL ON SELECTED AREAS INCOMMUNICATIONS, VOL. 26,NO.1,JANUARY 2008Signal generatorAgilent E4438CWLAN routerCirculatorVSAAgilent 89640A VSAWLAN card PC1 321 32PowerDivider(a) ImpactonWLANs carriersensingSignal generatorWLAN cardWLAN cardPowerDividerPCAgilent E4438C(b) ImpactonWLANs packeterrorrateFig. 3. Experimental setupfor evaluatingthecognitiveradiosimpact ontheWLANs carriersensingandpacketerrorrate.channels, evenfor fairlyhighinterferencepowers. Due tospacelimitationsquantitativeresultsarenotincludedinthispaper, but given in atechnical report, available online [15].Basedonour quantitativeresults [15] andgiventypicalsetups[17]andpathlossmodels[18], weconcludethatthecognitiveradiodoesnot alter theWLANsmediumaccess.This solidies our hierarchical approach and renders thestochastic prediction model applicable.Furthermore, we have analyzed whether measuring ratechangesisanappropriatemethodtodeterminethecognitiveradiosimpact. Wevalidatedour measurement approachbyperforming the same analysis for a different type of interferer,namelyaWLAN-typesignalusingthesamespreadingcodeas standardized for 802.11b [19]. We observed that the adaptercard is signicantly more sensitive to this type of signal. Thepower level above which an impact occurs was determined tobe 77 dBm (see [15] for details). This is in accordance withthe 802.11b standard [19, p.58] which mandates the sensitivitytobe 76 dBm or better.B. Effect on packet error rateThesecondcomponentofourinterferencemodel focuseson the cognitive radios impact on WLANs packet errorrate.Specically,wemeasuretheprobability thatacollisionbetween both systems leads to a WLAN packet error. The mea-surement setup is shown in Fig. 3(b).It consists of aWLANadapter card and a signal source generating the WLAN signaland the FHinterferer, respectively. The signals are combinedand captured via another WLAN card and commercial packetcapturing software. A vector signal analyzer is used to verifytheoperation of thesetup.Thepacket errorprobabilityismeasuredinthefollowingway. A continuous stream of packets is generated and capturedat the receiver to determine the rate of packets with the inter-ferer turned off. Subsequently, in the presence of interferenceSIFS Data ACKCWFreetrans-mitidle

Fi(t) Ft(t)TRANSMITIDLECTMCApproximationFig.4. Semi-Markovmodel(SMM)anditscontinuous-timeMarkovchain(CTMC)approximation. TheSMMisshownontheleft. Bylumpingstatesandapproximatingtheholdingtimesbyexponential distributions wearriveattheCTMC approximationontheright.the rate will decrease since some packets will be too distortedto be captured by the adapter. Other packets will be capturedbut will show an invalid redundancy check. By comparing thenumber of successfully received packets with the interference-free case, we can determine the probability of a packet error.Theimpact of theFHinterferer dependsonthechannelindex. Close to the center frequency, a signicant impactis observed for signal to interference ratios (SIR) of lessthan0 dB. Forinstance, foranoffset of3 MHzfromcenterfrequency and an SIR of -3 dB, we observe a packet error rateof 85%. If the SIR drops below 5 dB virtually every packetis lost. The impact of the FH interferer decreases as we moveaway from the center frequency. This is not surprising and haspreviously been reported [20]. It is due to the downconversionand ltering performed within the WLANreceiver. Morequantitativeresultsfor thismeasurement scenarioaregivenin[15].IV. COGNITIVE MEDIUM ACCESSThederivation ofCMAisbasedontheempirical interfer-encemodel. Inshort, wehaveseenthat whiletheWLANscarriersensingremains unaltered,apacketcollisionislikelyto cause a packet error. For ease of analysis, and because suchan assumption has frequently been made in other papers [11],weassumethateverycollisioninevitablyresultsinapacketerror. This is a worst case assumption given our measurementresults.A. Empirical WLAN Model and CTMC ApproximationSincecollisionscausepacket errorsweneedtoconstrainthe rate of collisions between both systems. The derivation ofCMAisbased on apreviously established WLANpredictionmodel [4], [8]. The key components of this model are brieyreviewed, specically the semi-Markov model and its CTMCapproximation. Westressthat bothmodelshavebeenbasedonandvalidatedthroughexperimental data gatheredvia asensingtestbed[8]. The model is basedonempirical datafortheidleandbusydurationsofthemedium.Inshort, thestochasticmodelenablesustopredictwhitespaceanddirectourcognitiveradiosuchthatitpreferablyhopstobandsnotcurrently used by theWLAN.GEIRHOFER et al.: COGNITIVEMEDIUMACCESS: CONSTRAINING INTERFERENCE BASED ON EXPERIMENTAL MODELS 991) Semi-Markov model: Analyzing the idle and busy dura-tionsobtainedbymeasurement wefoundthatalmostalwaysthe transmission of a data packet is followed by a shortinterframespace(SIFS) andanacknowledgement fromthereceiver. Thisdoesnot comeas asurprise, giventhat suchacknowledgements aremandated bythestandardandmerelycorrespond toa successful packet exchange [13].Wethus arrive at the transition diagram depicted in Fig. 4,whichconsistsofalternatingpackettransmissions(includingtheir mandatory acknowledgements) and idle periods, respec-tively. The transitions between DATA, SIFS, and ACK state aredeterministic.ForCMAitiscrucialtopredicthowlongthesystemremainsineitherstate. Instatistical terminologythiscorrespondstondingdistributions Ft(t) andFi(t) for theholding (orsojourn) times intheTRANSMITandIDLEstate,respectively.Clearly, the holdingtime in the TRANSMIT state corre-sponds to the length of the packets, which is largely inuencedby the type of trafc and the specic properties of thescheduler. Our measurements suggest that over short timeperiods, Ft(t) consists of several (less than ve) discretevalues. Inthetrafcscenariotobeanalyzedinthis paperthepacket lengths are,in fact,almost deterministic [21].Predicting the idle durations is more involved. As amatterof fact, anidle channel caneither be due toinactivityofthemediumorbearesidueoftheWLANsmediumaccesswhichhas toguaranteeeachstationanequal transmissionopportunity. The latter is implemented via a backoff procedurerequiring stations to defer mediumaccess for uniformly-distributedrandomtimeperiods.Inessence, thestationwiththesmallest backoff gets toaccess themediumrst. As aconsequenceof theaboveweobserveuniformlydistributedidledurations resulting from the above operation.If themediumisinactive, i.e. if noneofthestationshasanydatatotransmit, wehaveshownbyexperimentthattheidledurations arewellapproximated byageneralized Paretodistribution [8].The existence of both effects leads to a mixture distributiontomodel theholding time intheIDLEstate.Wearrive atFi(t) = pcwFu(t) + (1 pcw)Fgp(t), (1)where pcwdenotes the probabilitythat anidle durationisdue to the contention window, Fu(t) is a uniform distributionwithin [0, 0.7 ms] [8], andFgp(t; k, ) = 1 _1 +k t_1/k(2)is theCDFof ageneralizedParetodistributionwithshapeparameter kandscaleparameter . Byttingthemixturedistribution (1) to the empirical data we can estimate the aboveparameters. The accuracyhas been validated by statisticalmeasures of t in[8].2) CTMCapproximation: The semi-Markovmodel pre-sented above provides for an excellent t with empiricaldata. However, inderivingCMAthesemi-Markovmodel isdifculttoanalyzesinceitdoesnotpossessthecontinuous-timeMarkovproperty. Inordertosimplifyanalysiswecon-sider aCTMCapproximation, whichcorrespondstottingexponential distributions tothe idleandbusyperiods. Theband 1timeband 2band 3Idle sensing Busy sensing(a) FullyobservableCMAband 1band 2band 3timeIdle sensing Busy sensing(b) PartiallyobservableCMAFig. 5. Illustration of fully and partially observable CMA. Circles and squaresdenote idle andbusysensingresults, respectively. Inthe fullyobservablescenario,all bands are sensed simultaneouslyand dependingon this result,atransmission may be initiated in one of the channels. In the partially observedcase, onlyonebandcanbeobservedat atime. Consequently, actions arelimitedtoeitherstayinginthecurrentband,orhoppingtoanother.exponentialt providesagoodapproximationalthoughit isnot strongly validated by statistical measures of t. Moredetailscanbefoundin[15]. Theparametersof theCTMCmodel are thus and , leading to Ft(t) = 1exp(t) andFi(t) = 1 exp(t),respectively.B. Fully observable CMA (FO-CMA)Thesensingbasedaccessschemespresentedinthispapercan be categorized according to the sensing capabilities of thecognitive radio. If the spectrum sensor supports a high enoughbandwidth, the state of all Mchannels can be observed simul-taneously at the beginning of every slot. This is illustrated inFig.5(a). Based on the sensing result the cognitive controllerdecides inwhich, if any, channel to transmit.Ifthespectrumsensorhaslimitedbandwidth, weassumethat only one of the Mbands can be sensed at a time;the state of the other bands remains hidden. Furthermore,we assume that for practical reasons, a transmission canonlybe initiatedinthechannel that has just beensensed.Thisassumptionmakesthesystempartiallyobservableandsignicantlycomplicates the analysis. This is illustratedinFig.5(b), where only one channel is observed at the beginningofany slot.Westart withthefullyobservedcaseandrst recast theproblem mathematically as a constrained Markov decision pro-cess. The standard solution technique [22] is briey reviewed.We thenshowthat our problemsetupadmits a structuredsolution. ThepartiallyobservedcaseisaddressedseparatelyinSec.IV-E.In order to nd optimal access strategies we need to formu-lateaconstrainedoptimizationproblem[9]. Let eachoftheWLAN bands i = 1, . . . , Mevolve as a CTMC {Xi(t), t 0}with states 0 (IDLE) and 1 (TRANSMIT). The holding100 IEEE JOURNAL ON SELECTED AREAS INCOMMUNICATIONS, VOL. 26,NO.1,JANUARY 2008times are exponentially distributed with parameters iand i,respectively, as shown inFig. 4.The generator matrix Gifor channeliishence given byGi =_ iiii_, (3)which leads to thestationary distribution(i)0=ii +i, (i)1=ii +i(4)and transition matrix [23, p.391]P(i)=1i +i_i +ie(i+i)ti ie(i+i)ti ie(i+i)ti +ie(i+i)t_.(5)The secondary system senses the state of the entire systemat thebeginning ofevery slot,inducing adiscrete-time chain{Yi[k], k0}of sensingresults for eachchannel i. Fornotational convenience let us dene the vector valued randomprocess {Y[k], k 0}thatcontainsthelatest sensingresultfor all channels,Y[k] =_Y1[k], . . . , YM[k]T. (6)It is straightforwardto verifythat Y[k] is a discrete-timeMarkovchainwithstatespace X = {0, 1}M. Thetransitionmatrix becomes, due to the independence of the WLAN bands,Pxy =M

i=1P(i)xiyi, x, y X (7)andwearriveat thefollowingexpressionforthestationarydistributionx =M

i=1(i)xi . (8)Given thesensingresultsineachslot,theCMAcontrollerdecides whether to transmit and if yes, in which channel. Theactionset isthus A= {0, 1, . . . , M}, wherea=0denotesthat notransmissiontakes place, anda 1means that atransmission is scheduled inchannela.Transmittingacrosschannel aaccruesaunit rewardpro-vided no collision occurs. The expected immediate reward ofchoosing actionain state ythus becomesr(y, a) =_1[ya=0]eaTs, a 10, a = 0, (9)where 1[] represents the indicator function and Ts denotes theslot duration.The interferenceconstraint canbe formulatedinseveralways. Weshall call slot kinbandi busy, anddenotethisas Ci[k] = 1,ift [kTs, (k + 1)Ts)s.t. Xi(t) = 0. (10)The interference constraint can then beformulated asDCIC = limN

Nk=1 1[Ak=i,Ci[k]=1]N, (11)whereAkreferstotheactiontakeninslot k. It iscapital-izedtostressthat Akisrandom; it dependsonthecurrentsensing result and the action (randomly) chosen by the CMAcontroller. Theaboveequationcorrespondstothelongrunfraction of slot collisions per unit time.Wewill refer to(11)asthe cumulative interference constraint (CIC).While the CIC seems an intuitive measure, it quanties theinterference from the secondary users perspective. The densityof the WLAN trafc is not taken into account. The formulationcan be better tailored to the primary user by imposing a packeterror rate constraint (PERC)for all bandsD(i)PERC = limN

Nk=1 1[Ak=i,Ci[k]=1]Ni(NTs), 1 i M, (12)where Ni(t) counts the number of transmitted WLAN packetsinbandiup totimet,Ni(t) =

n=01[S(i)nt]. (13)In the above equation, S(i)ndenotes the arrival times of WLANpackets in band i. Therefore (12) is the long-runfractionof collisions per transmittedWLANpackets. Inshort, thisrepresents thefractionof WLANpackets that get droppeddue tothe cognitive radios interference.Based on the above denitions, we dene the expectedimmediate costs for the CIC,dCIC(y, a) =___1 eaTsifya = 0,a 11 ifya = 1,a 10 ifa = 0, (14)and for thePERC,dPERC(y, a) =___(a+a)(1eaTs)aaTsifya = 0,a 11 ifya = 1,a 10 ifa = 0.(15)Havingintroduced rewardsandcosts,theCMDPcannowbedened. CMAmaximizesJ(, ) = limN1NN

t=1Er(Yt, At) (16)withrespect topolicy,subject to aCICDCIC(, ) = limN1NN

t=1EdCIC(Yt, At) , (17)orsubject to PERCs,(seeEq. 18).Intheaboveformulasdenotestheinitialdistributionofthesystem, andthepolicywemaximizefor. Theexpec-tationoperator isthustakenwithrespect totheprobabilitydistribution induced bygiven initial distribution.C. Linear programming solutionIt iswell knownthat aCMDPsoptimal policyisinthespaceof Markovianrandomizedpolicies [24]. Theoptimalpolicy is hence a functionthat maps state-actionpairs(y, a) to the probability of choosing action a in state y,: X A [0, 1]. Since, inCMDPsboththerewardandthe constraints can be expressed using the frequency of state-actionpairs (y, a)theoptimalpolicy canbefound bylinearprogramming [22].GEIRHOFER et al.: COGNITIVEMEDIUMACCESS: CONSTRAINING INTERFERENCE BASED ON EXPERIMENTAL MODELS 101D(i)PERC(, ) = limN1NN

t=1E1[At=i]dPERC(Yt, At) i, 1 i M. (18)Theorem 1 [22, p.38]: The linear programmax(y,a)

yX

aA(y)(y, a)r(y, a) (19)subject to

yX

aA(y)(y, a)di(y, a) i, 1 i M, (20)where(y, a) Q()andQ() =___(y, a), y X, a A(y) :

yX

aA(y)(y, a)(y(x) Pxay) = 0

yX

aA(y)(y, a) = 1, (y, a) 0___(21)is equivalent tothe CMDPformulations (16)-(18). TheCMDPsoptimal policyiscompletelydeterminedbythestate-actionfrequencies. After obtaining(y, a) viathelinear program the probabilitywy(a) of choosing actionainstate ysimply becomes,wy(a) =(y, a)

aA(y)(y, a), y X,a A(y), (22)provided the denominator isnon-zero (arbitrary otherwise).D. Optimal FO-CMA structureInthis sectionweshowthat theabovelinear programs,under some conditions, admit structured solutions that help togain insight intotheproblem. Moreover, usingthestructuredresults simplies the implementation of CMA.1) Cumulativeinterferenceconstraint: First, considerthecaseof thecumulativeinterferenceconstraint (20). Withoutloss of generalityassumethat 1 M. Weshowthat it is optimal touseonlychannels withsmall . Howmany channels should be used can be derived from a thresholdmodel with respect to the constraint . The solution structureis depicted in Fig.6.Algorithm 1 (Threshold solution)i. Denethe maximum interference level for channeli asi =

yX1[yi=0]1[yj=1,j (24)iii. Adopt the following randomized policy. With probabilitywi, transmitintheidlechannelwiththelowest i, i.e.,instate ytransmit in channeli ifand only ifyi = 0 andyj = 1 j< i, (25)andchoosenottotransmitotherwise.Theprobabilitieswiaregiven withkas dened instep(ii),wj = 1, 1 j< k, wk = k1k k1, wj = 0,j> k(26)1 2 3 M ch. #123

do not transmittransmitinterference levelMFig.6. Solutionstructureforthecumulativeinterferencecase.Theorem2: The policyinducedbyAlgorithm1is asolutiontotheLP(19)-(20) andhenceequivalent toCMA.

Proof: seeappendix.2) Packet error rateconstraints: Inthecaseof PERCs,separateconstraintsfor eachchannel needtobeconsideredsimultaneously. Intuitively, the maximumrewardwould beachieved if all constraints could be made tight (otherwisetransmissionopportunitieswouldbewasted). It maynot befeasible, however, to tighten all constraints, given that atransmission can be initiated in, at most, one channel per slot.Whetherthisispossible, infact, dependsonhowloosethePERCsarechosen.IftheMconstraintscanbemadetight, weshowthattheproblemdecouples, andtheoptimal policycanbefoundbyconsidering eachchannel individually. Thisisthecaseifthefollowing condition is met for all channels,a =

xX1[xa=0]x

Ml=1 1[xl=0]ada,a A, (27)whereda =(a +a)(1 eaTs)aaTs. (28)isthe expected average cost associated withacollision.The intuition behind (27) is to nd a condition under whichtheinterferenceconstraintcanbemadetightbyconsideringchannels separately. In fact, a tradeoff on which channelto transmit in need only be struck if x contains multiplezeros. The denominator accordingly normalizes by the numberof transmission opportunities in state x and thus ensuresthat, althoughthe Mbands are consideredseparately, theprobability of transmission for a given state will never exceedone.Wecan thus adopt thefollowing algorithm.Algorithm 2:i. The maximum activity level i in channel i is given by (27).Disregardingothertransmissionopportunities, wetransmit102 IEEE JOURNAL ON SELECTED AREAS INCOMMUNICATIONS, VOL. 26,NO.1,JANUARY 2008inbandi withprobabilitywi =idii,i A (29)ii. Since(27)ismet,theconstraints canbemadetight,andweobtainwy(a) =wa1[ya=0]

Ml=1 1[yl=0]. (30)Theorem3: The policyinducedbyAlgorithm2 is asolutiontotheLP(19)-(21) andhenceequivalent toCMA.

Proof: seeappendix.E. Partially Observable CMA (PO-CMA)In the last section we assumedthat the state of all Mchannelscanbeobservedsimultaneously. Inthissectionwealleviate this constraint and assume that only a single channelcan be observed at a time, as shown in Fig. 5(b). Furthermore,atransmissioncanonlybeinitiatedinthechannel that hasjust beensensed. Theactionset thusreducesto A = {0, 1}denoting whether or not atransmission isinitiated.The partial observability severely complicates the problem.It is now necessary to trade off the exploration of the system(byfrequentlysensingdifferentbands)withtheexploitationoftransmissionopportunities. Inordertoillustratehowsucha tradeoff can be struck, we present some preliminary resultsfor the special caseofM= 2channels.Afundamental consideration in designing PO-CMAiswhich of the past observations and actions are useful formakingoptimal decisions. Giventhat all bandsaremodeledas CTMCs, the continuous-time Markov property leads to theconclusionthat thelatest sensorreadingofeverychannelissufcient for predicting its behavior.ThisisillustratedinFig. 7forthespecialcaseofM= 2bands. The states are labeled according to whether the currentsensing result is busy or idle. A busy channel is simply denotedb, whereasforanidlechannelwealsokeeptrackofhowmany consecutive slots the channel has been sensed idle.Hence thestates labeled 0,. . . ,N all correspond toan idlesensing result.Thesensinghistoryisincorporated inbrackets,wheretherst number reects the currently active channel and the sec-ond index denotes the latest sensing result in the other channel,respectively. Note that the time since the other channel has lastbeensensedisgiven, inthisspecial caseofM= 2, bythenumber of slots spent in thecurrent channel.Accordingtothe above, the setupis fullydescribedbythetriple(y, i, x)wherey Y = {b, 0, . . . , N}denotesthecurrent channels sensing history, i the currently active channelindex, and x {0, 1} the last sensing result in the other band.Thetransitionbehaviorcanbeunderstood asfollows.Underaction1, i.e. keeptransmitting, thefollowingtransitionsarepossible(y, i, x) (y + 1, i, x), 0 y< N (31)(y, i, x) (0,i, 1) (32)(y, i, x) (b,i, 1), (33)0(1,0)1(1,0)b(2,0)0(2,0)b(2,1)0(2,1)Band 1Band 2N(1,0)1(2,1)N(2,1)N(2,0)1(2,0)b(1,1)0(1,1)1(1,1)N(1,1)b(1,0)a=1a=1 a=0Fig. 7. Markovchainmodel forthepartiallyobservablecase. Onlysomeexample transitions are shown. Theindices denote the sensingresult, theactivechannelnumber,andthelastsensingresult,respectively.where the rst line denotes the channel staying idle. The othertwopossibletransitionscorrespondtothechannelbecomingbusy and the cognitive radio thus relocating to the other band.Thisotherbandcaninturnbeeitheridleor busyandthustwodifferent transitionscanoccur.Thenotation irepresentstheother band, that is i = (i mod 2) + 1.Under action0, i.e. relocatetoother band, thefollowingtransitions may occur(y, i, x) (0,i, 0) (34)(y, i, x) (b,i, 0), (35)denotingarelocationtotheotherbandandndingit eitheridleor busy, respectively. Thetransitionbehavior is showninFig.7. Inorder tokeepa nite state space, we cannotstaywithanychannel longerthanfor Nslots. Theoptimalpolicy for the partially observable case can be found by linearprogramming, as shown inSec. IV-C.V. NUMERICAL RESULTSInthissectionwepresent numericalresultsfortheCMAschemes andevaluate their performance gaincomparedtoa blind reference scheme that does not performsensing.Theschemes areevaluatedinterms of throughput andin-terferencefor varyingWLANtrafcload. Suchanalysis isstandardincoexistenceliterature[25]. Theresultsarebasedon simulations using the CTMC approximation. Furthermore,weexaminetherobustness ofthisapproximation byrunningalgorithms derived from theCTMC model on datageneratedvia the semi-Markov model. We will see that the results matchclosely, justifying theapproximation.A. Simulation parametersThe numerical results reect the throughput and interferencebehaviorforvarying WLANtrafcload.Whileperformanceis assessed by simulation, the model parameters (for bothSMMand CTMCapproximation) are extracted fromrealtrace data. The experimental setupconsistedof a wirelessrouter and three workstations with adapter cards [8] to gatherpacket traces. Using the Distributed Internet Trafc Generator[26] on each workstation we generated constant-payload UDPtrafc. TheinterdeparturetimebetweenpacketswaschosenGEIRHOFER et al.: COGNITIVEMEDIUMACCESS: CONSTRAINING INTERFERENCE BASED ON EXPERIMENTAL MODELS 103TABLE IMeasurementparametersforsemi-Markovmodelanditscontinuous-timeMarkovchainapproximation. Thistablewasobtainedbyestimatingmodelparametersfromempiricaldataobtainedfromrealpackettraces.WLAN trafcload = /maxParameter 0.05 0.1 0.2 0.3 0.4 0.5 1.0CTMCapproximation1[ms] 15.9 9.10 4.48 2.90 1.98 1.39 0.211[ms] 1.11 1.08 1.05 1.03 1.02 1.03 1.03semi-Markovmodel[ms] 18.7 11.3 5.46 3.95 3.09 2.35 0.04k/102-2.11 -2.50 2.47 1.51 2.61 1.69 50.1pc[%] 13.2 18.3 21.2 30.1 40.0 47.7 98.8T[ms] 1.11 1.08 1.05 1.03 1.02 1.03 1.03to be exponentially distributed with varying parameter .Byinitiallychoosingverylargewe rst determinedthemaximumtrafcloadsupportedbythesetup. Eachsettingof thetrafcloadwasthennormalizedwithrespect tothismaximum value = /max.ForeachsettingoftheWLANtrafcload , wecapturedtherawcomplexbasebanddataofthetransmissionsusingavector signal analyzer and processed the data to nd the busyandidledurationsoftheWLAN. Basedontheseresultsweobtainedtheparametersof thesemi-Markovmodel anditsCTMCapproximation,showninTab. I; see[15]fordetails.Remainingparameterswerechosensuchthat theyreect atypical WLAN/Bluetooth coexistence setup. The slot durationwas chosen asTs = 625 s.B. Performance of FO-CMA and PO-CMAFirst, we evaluate FO-CMAs throughput for a CICof=5%, andPERCs of i=10%for eachchannel. Thethroughput for both scenarios is shown in Fig.8 for M = 1, 2,and3parallel bands. We see that for the CIC, we obtainthesamethroughputregardlessofhowmanyparallel bandsexist. Indeed, this is tobeexpected, sincewedeal withacumulative constraint. Having multiple parallel bands availabledoes not loosen this limitation. In contrast, the throughput foraPERCineach channel isshown intheright plot ofFig. 8.Inthis case, thethroughput doesincreasewiththenumberofchannels, sinceeveryconstraintisonlyapplicablewithinone channel and adding channels provides for additionaltransmission opportunities.Second, we show the performance of PO-CMA forM= 2anda CICas well as a PERC, respectively. We notethattheperformanceof PO-CMAis closetothecorrespondingFO-CMAscheme. Whiletheperformancegapmaybemorepronounced for larger M, this result suggests that performancedegrades gracefullywith the number of bands. This is anencouraging result, given that for PO-CMA it sufces to havea simple transceiver instead of a more sophisticated radiofrontend.From Fig.8, we can also infer that, for small WLAN trafcload , the CIC is less restrictive than the PERC. On the otherhand,forlarge , thissituationisreversed.Thisbehaviorisinherentlylinkedtothedenitionsof bothconstraints. ThePERC implicitly conditions on the WLAN being active and isthus more restrictive ifthe number of packets issmall.0 0.25 0.5 0.75 100.10.20.30.40.50.60.70.80.91WLAN traffic load normalized throughput 0 0.25 0.5 0.75 100.050.10.150.20.250.30.35WLAN traffic load normalized throughput FOCICPOCICFOPERCPOPERCM=1M=2M=3M=2, PO-CMAFig. 8. Performance with CIC (left side) and PERC (right side) for increasingnumber of bands M. The normalized throughput represents the time-averagedfractionofsuccessfultransmissionsoutof thetotalnumberofslots.C. Comparison with a blind hopping schemeIn Fig. 9 we compare the throughput and interference of ourCMAschemeswithablindreferenceschemethat performsoblivious hoppingthroughout the entire band. We evaluatetheperformanceforM= 3sincetheISMbandat 2.4 GHzsupports exactly three parallel WLAN bands [13]. The trafcloadwas chosensuchthat theblindreferencetransmits inevery, everyother, or everythirdslot. Thehoppingpatternwaspseudo-random across the entire band.We can see that while the blind hoppers throughput can behigher than CMAs, its interference may become prohibitivelylarge. Even if the blind reference is transmitting only in everythirdslot, thepacketerrorratecanbe20%orhigher. Forafullyloadedsystemthepacket errorratecanbelargerthan60%. Westressthat forinterferencethishigh, theWLANsthroughput will inevitablydecrease violatingour paradigmof ahierarchical scheme. WithCMA, ontheother hand, asmall interferencelevel isguaranteedandasaconsequenceweassumethat thecognitiveradiosimpact ontheWLANwillbe small.D. Robustness to CTMC approximationLastlyweevaluatetherobustness totheCTMCapprox-imationbyheuristicallyrunningtheCMAschemesderivedfrom the CTMC model on data generated via the semi-Markovmodel. The simulation parameters for both models are showninTab. I. Theperformanceandinterferencefor aPERCisshowninFig.10.Wecanseethatthedeviationbetweenthesemi-Markovmodel andits CTMCapproximationissmall.While, the throughput curves almost coincide, the interferenceconstraint is slightlyviolatedfor highWLANtrafc load.However, theaberrationis still small enoughtojustifytheuseof theCTMC model.VI. CONCLUSIONSIn conclusion we proposed Cognitive MediumAccess(CMA) schemesfor sharingspectrumwithaset of parallel104 IEEE JOURNAL ON SELECTED AREAS INCOMMUNICATIONS, VOL. 26,NO.1,JANUARY 20080 0.25 0.5 0.75 100.10.20.30.40.50.60.70.80.9WLAN traffic load normalized throughput 0 0.25 0.5 0.75 100.10.20.30.40.50.60.70.80.91WLAN traffic load normalized packet error rateFOPERCBlind ref.rate 1rate 1/2rate 1/3PERCrate 1rate 1/2rate 1/3normalizedFig. 9. Comparison with a blind reference scheme. The blind hopper operateswithconstantrateandis completelyobliviousof theWLAN.WLANbands. Itsderivationwasbasedonameasurement-basedinterference study aswell asastochasticmodel whichcaptures the WLANs mediumaccess. The CMAschemesproposedinthis paper canbeclassiedaccordingtofullyobservableandpartiallyobservablesystems. Inbothcasesoptimal policies can be obtained via linear programming,but inthecaseofFO-CMAwefurthermorespeciedstruc-tured solutions. The paper evaluated the performance ofthese schemes for various scenarios. We found that PO-CMAachieves almost the same performance as FO-CMA. Bothschemes signicantly outperform blind hopping schemes.APPENDIXProof of Theorem 2We needtoshowthat Algorithm1is a solutiontotheLP(19)-(20). First, rewritetheLPintermsofitsstationarydistribution. This is possible since the transition behavior doesnot depend on theactionsa.Weobtainmaxwy(a)

(y,a)XAywy(a)r(y, a) (36)subject to

(y,a)XAywy(a)dCIC(y, a). (37)Note that for the CIC,dCIC(y, a) = 1 r(y, a).Furthermore,dene X1 = {x X : xi = 0} andXi = {x X : xi = 0, xj = 1, j< i}, 1 i M. (38)Then, statesfor whichxwx(i) = ywy(i), x, y Xi, (39)clearlyofferthesamerewardat thesamecost. Duetothisandour assumptionthat 1 M, channel i offershigher (or equal) reward at lower (or equal) cost than channeljfori


Recommended