Download - An Empirical Analysis of the Auction Bids for Keyword Types in … · Keyword Types in Sponsored Search By Gabe Rich Faculty Advisor: Igal Hendel ... Introduction to Google’s AdWords

An Empirical Analysis of the Auction Bids for

Keyword Types in Sponsored Search

By Gabe Rich

Faculty Advisor: Igal Hendel

Senior Honors Thesis Mathematical Methods in the Social Sciences

Northwestern University Spring, 2008

Rich 2

Acknowledgements

I would first like to acknowledge Randy Heeb whom working with the past

summer inspired my research question. Inviting me to work with him and our team got

me interested in search advertising and sparked the questions I attempt to answer in this

thesis.

I would also like to thank my advisor, Professor Igal Hendel, whose advice and

experience was invaluable as I worked through the analysis of the thesis. His patience

and helpfulness through the entire thesis writing process helped me keep everything in

perspective.

Lastly, I would like to thank my parents for always supporting and encouraging

me through the writing of this paper and my entire life. Their need to always stay on my

case was crucial to my ability to have completed these tasks.

Rich 3

Abstract

This paper evaluates a way advertisers use internet advertising. It focuses on how

much advertisers are willing to pay for a given click from a search engine user.

Advertisers can pick specific searches on which they would like their advertisement to

appear by choosing different keywords. When a keyword is searched, an auction occurs,

the advertisers’ per-click bids are compared, and the advertisements are placed along a

sponsored search results area of the web page. Only when an advertisement is clicked by

the search user will the advertiser pay the search engine.

This paper uses historical bid data given by Google to see what prices and

advertisement positions an advertisement would have when advertising on a list of

keywords. Using this information, the paper analyses whether advertisers pay more for

keywords that are more complex and keywords that include brand names. These factors

are ways the advertiser can focus his consumer base. This paper will show that

advertisers pay more for specific keywords and less for keywords that include a brand

name.

This paper also shows that advertisers pay more for keywords that are searched

more often for advertisements that are in the highest position than for advertisements in

the lower positions. This result might be because only the top advertisement is seen and

has a benefit even when not being clicked. So the amount advertisers pay to be viewed in

addition to being clicked from that position is shown even greater on keywords with

higher search volumes.

Rich 4

Introduction

When one is researching a potential future purchase on the internet, he often starts

by simply typing what he is looking for into a search engine and pressing return. What

he doesn’t know is how much goes into deciding what comes up on the next page. This

statement is not referring to the search results he sees going down the middle of the page.

The process of how these are decided and placed is often through a very complicated and

secret algorithm. The topic of this paper refers to the little section often placed on the

right side of the page titled Sponsored Links or Sponsor Results.

The sponsored search industry is an extremely important source of revenue for

search engines, and has become one of the largest sources of income on the internet. The

system works by providing additional sponsored search results to every search engine

user generally along the right side of the search results page. These results are placed

differently than the natural results that go down the middle of the page which are chosen

to best respond to the user’s search inquiry. The search engines make a large portion of

their revenue, in most cases, by auctioning off the spots in the sponsored links section to

companies to advertise and hyperlink their websites. Because advertisers pay the search

engine every time a search user clicks their ad, they, of course, want users who are

interested in the products that they have to offer. They control this by choosing a list of

keywords on which they would like to advertise. When a search user searches on one of

their keywords, the advertiser is put into an instantaneous auction which decides if and

where his advertisement will be placed in the sponsored search results.

What this paper evaluates is how much advertisers are willing to pay for different

keywords. Advertisers, of course, choose keywords that relate to the product or service

Rich 5

they offer on the internet because they want users who are searching for that product.

This is a simple way that the advertisers segment the population of search engine users

into two groups: those that are looking for what the advertiser is trying to sell and those

that are not. They can then target their advertising to the former. This paper researches

whether advertisers pay more for specific and complex keywords because those

keywords can segment and target the population even further. Keywords that are more

specific and complex are of more value to any given advertiser because of their ability to

further target potential consumers. They will still advertise only on the specific keywords

that relate to their products. Therefore they will be even more likely to match the search

user’s desired search results than when using a less specific keyword relating to their

products. Advertisers that realize that advertising on more specific keywords is more

valuable to them will bid higher amounts of money for the clicks of people searching on

the specific and complex keywords.

In addition to this, the paper will also ask whether advertisers pay more for their

advertisements that land in the best position (the one at the top of the sponsored search

results links) for keywords that are searched more often. One hypothesis might be that

the advertisers who are in the first position are most benefited by the search user reading

the advertisement, even if it is not clicked. This may have added value to keywords with

high search volumes because those advertisements will be seen most often.

Introduction to Google’s AdWords

Google’s AdWords is the program that Google uses to sell advertising spots on

the internet. Advertisers subscribe to the service to advertise on Google and other

Rich 6

websites within Google’s advertising network, the Search Network and the Content

Network1. For the purpose of this study, I will only refer to the part of system that

involves advertising on Google’s Search Network, specifically on search results pages.

This means that advertisements are placed on specific search results pages chosen by the

advertiser.

To advertise in the Search Network the advertiser chooses which search results

pages it would like to advertise on by choosing a list of keywords. When these keywords

are searched, the advertisement is eligible to be shown on the sponsored results of that

user’s search. Because each search results page has spots for only up to eleven

advertisements2, Google uses a mechanism to determine which advertisements get

positions, and which positions they get. The way Google, in addition to most of the other

search engines, choose which eligible advertisements are shown is through an auction.

However, its auction uses a combination of factors to determine which advertisements get

which position on the search results page. (The top position, being the best, will be seen

and, probably, clicked most often.) Although different search engines use different

ranking criteria for the placement of ads within the auction, the one that Google

AdWords uses has been shown by previous researchers (Feng, Bhargava, and Pennock,

and Lahaie) to be the most stable and best for the revenue of the search engine.

The way the advertisements are ordered on the page is by an “Ad Rank”, which is

the product of two factors. The first is an advertiser’s maximum cost-per-click (CPC)

bid, which is the most an advertiser is willing to pay for each time a user clicks his

1 This Content Network involves advertising on partner sites with Google by matching an advertisement with the contents of websites. Though this also uses bidding, it does not involve bidding on keywords to land on search results pages. 2 More spots are allocated on subsequent search results pages, but each spots is viewed less as it is placed further from the front of the search.

Rich 7

advertisement. (Maximum CPC bids can be placed for an advertiser’s entire group of

keywords or be set individually for each keyword.) The other factor is a quality score

that uses a combination of the advertiser and, specifically, the to-be-displayed

advertisement’s AdWords history and how well the advertisement matches the search

query of the search engine user. Because the Ad Rank for each advertisement is created

in real time, a new order may be made for each search that happens on Google.

An important distinction is that the bid is on a cost-per-click and not a cost-per-

impression basis. This means that the advertiser’s bid will only be paid when the search

user clicks the advertisement and is redirected to the website to either learn more about

the product or service or make a purchase. In this manner, advertisers are only charged

when a potential customer accesses their websites. This also means that advertisements

that are never clicked will not have to pay Google. For this reason, Google also uses a

quality score as part of the advertiser’s bid.

The quality score that is used is determined by a few factors. Though Google

does not explain exactly how the score is made it does explain the main contributors to it.

The most important factor is the advertisement’s historical click-through-rate (CTR),

which is the proportion of how many times the advertisement is clicked to how many

times it is shown. The other major factor is the relevance of the search query to the

advertisement itself and the keyword the advertiser chose. Because most advertisers

choose their keywords to broad match the search queries, the advertisement will show on

any search using a combination of the words in the keyword the advertiser chose to

advertise on. For this reason, search queries may match better with some advertisers’

keywords than others. For example, an advertiser choosing to advertise on the keyword

Rich 8

“car parts” will be eligible for the search query “What parts of the world does my

driver’s license allow me to drive a car?”, but will have a worse keyword match

component of the quality score than an advertiser with the keyword “world driver’s

license.”

Google uses each of the main factors in the quality score for a reason related to

the probability a user will click the advertisements. The reason for using the CTR is so

the ad rank score is closer to ranking the advertisement by expected revenue (Feng,

Bhargava, and Pennock.) This is because using the product of the CTR and the CPC bid

would make an estimate of how often it is clicked times how much the search engine

makes per click. In fact, this method is sometimes referred to as a rank-by-revenue

method. Using relevancy factors ensures the search results will be attractive to the

specific searchers. Google needs to make sure that their searchers continue to find the

sponsored search results relevant to what they are looking for, or users will no longer

click or look to the sponsored results. The relevancy factor is another way that Google

tries to increase the chance that the user will click the sponsored results and generate

revenue.

Another concept that Google uses in its auctions is what it calls the AdWords

Discounter. This service that Google offers to advertisers works in effect to turn the

auction into a generalized second price auction. In a normal generalized second price

auction, each bidder pays a marginally higher amount than the bid of the bidder below

him. (The highest bidder would get the first prize and pay the amount of the second

highest bidder. The second highest bidder would get the second prize and pay the

amount of the third highest bidder. Etc. [Edelman, Ostrovsky, and Schwarz.]) The

Rich 9

Discounter works to have this effect in the auction using the advertisement’s quality

scores. The way it works is by lowering the amount an advertiser pays enough so that its

ad rank would still be higher than the ad rank of the advertiser in the next position. This

way an advertiser never pays more than it needs to maintain its same position. This

works by lowering the product of the CPC payment and quality score to the amount of

the bidder below it and then dividing the quality score from it to determine the actual cost

the advertiser would pay for a click on that search. This function allows the advertisers

to bid their actual value of a click without worrying about paying too much money

because their payment will only be what is necessary to reach the highest possible

position.

Literature Review

The majority of the literature that studies the sponsored search industry has been

developing and evaluating theoretical models that correspond to the different auction

methods that search engines use to determine the positioning and prices for the

advertisers. This research is helpful in showing exactly what advertisers’ bid prices

represent. Previous research has shown that the auction method Google uses weighted by

CTR is the most stable in showing bids that are closest to the true values of clicks to

advertisers. The research done by Feng et al. explains that the method Google uses with

Click-Through-Rates to determine ordering is close to inducing truthful bids from

advertisers. Their findings are that when using past CTRs for an advertiser, they should

control for the position in which an ad is placed. By doing this the search engine can

fairly attribute CTRs to ads despite wherever they are initially placed along searches. By

Rich 10

modeling this mechanism, Feng et al. show that the best way to evaluate how much

revenue ads will make for a search engine will take an advertisers’ CTR into account

when ranking ads. This assurance demonstrates the importance of rank-by-revenue

system that Google uses and reinforces the use of Google estimates in this study of

advertisers’ willingness to pay for a given keyword. Lahaie also discusses the closer

relationship with the Google method to having bidder’s true values, while explaining that

with some circumstances there are no equilibriums for bidder’s acting within the auctions

rules.

In the two papers Edelman and Ostrovsky have published they explain that the

second price auction that is used does not have an equilibrium of bidders truth-telling

their values in the bids. They explain how the outcome of the generalized second-price

auction is different than the Vickrey-Clarke Groves mechanism because the pricing

methods differ. They go on to say that the VCG method would induce truthful bids in

Internet Advertising and the Generalized Second-Price Auction and continue to discuss

how bidders dynamically change their bids depending on their competitors for each

keyword in Strategic Bidder Behavior in Sponsored Search Auctions. This second

publication developed a theory of bidders moving along a spectrum of bid values

depending on where their competitor was and observed this behavior among different

keywords in both major search engines. Thus, bid amounts fluctuate over cycles of time

as the competitors respond to each others’ actions and move continually. Though this

knowledge would skew the results if this analysis were using real, instantaneous bid data,

I assume that the bid estimates Google gives to prospective advertisers averages out these

effects. Because the website claims that some of the estimates they give for CPCs will

Rich 11

reach the top position 85% of the time, I will assume that these estimates are based over a

long enough time period to be confident that the numbers accurately represent an average

of what advertisers are paying for the given keyword. I also assume that these effects

would not be different over different keywords or would correlate directly with the level

of competition within each keyword. Because Google gives a metric of competitive level

for each keyword, whatever effect that the bid fluctuation will have on the data will be

accounted for by using that rating as a variable in the regression on each keyword’s

value.

Another study done by Zhou and Lukose that might skew the results of this study

show that advertisers might act vindictively and choose bids that would not necessarily

directly benefit themselves, but do more harm to their competitors. This works by raising

ones own bid to increase the CPC of the advertiser in a higher position. Because the

auction is a generalized second price auction, the vindictive bidder will not see higher

CPCs because the advertiser below has not changed its bid, but the advertiser above will

see a raised CPC effect3. Therefore a competitor can raise its price to incur a higher cost

to the advertiser who has the spot above him. Not only does this make the competitor

lose money, but because many advertisers put spending limits on their accounts, the

victim advertiser may reach its limit and no longer display ads. The vindictive bidder

then benefits by having the better advertising spot without having to pay a larger amount

for it, because the original advertiser is no longer bidding for a position. Knowing that

advertisers bid this way would change Google’s estimates for bids at lower positions, and

3 This is different than click fraud, which is clicking many times on someone else’s ad to run up their advertising bill and reach their account cap. Though this can be done, search engines explain that there are systems in place to monitor and stop click fraud from occurring and advertisers’ accounts from being charged for this.

Rich 12

possibly from the top position if many advertisers reach their limit often. Though

vindictive bidding would have an affect on a study using historical bid estimates, I make

two assumptions about why the historical estimates used in this study can be

representative of advertisers’ true value of a click from different keywords. The first

assumption is that vindictive bidding is very hard to accomplish on keywords with many

bidders. It is difficult to target where your opponents’ bids are4 and to consistently be in

an advertising position to have the desired effect. I also make the assumption that if

vindictive bidding does occur and would have an affect on historical bid data compared

to advertisers’ true values, then this will be consistent with how much advertiser

competition there is for a given keyword. Again, by including the factor for the level

competition there is for each keyword as another independent variable, this effect on CPC

prices can hopefully be taken out of the regression.

These studies which create and use theoretical models demonstrate that the

generalized second price auction works in search engines advertisement sales and that the

bids advertisers offer is the assumed value of a click to the advertisers. This paper

attempts to determine some of the reasons behind why advertisers bid certain amounts for

different keywords, and what makes clicks from some keywords more valuable than

others.

Data Being Used

The data used for this paper was gathered from tools in Google AdWords which

give information on the performance of keywords within the AdWords product. The data

4Google explains that advertisers cannot see their competitors’ bids.

Rich 13

comes from two different outputs on the site that are used to help potential advertisers

who might be interested in advertising on Google AdWords. They both provide

estimates about what advertisers can expect to pay to advertise on the Google Search

Network. Because these estimates are created from historical AdWords data, I assume

that the data is an accurate representation of what advertisers paid and experienced while

advertising on these keywords. The words chosen for this analysis were also provided by

the keyword tools. In addition to the data about the keywords, the AdWords tools also

suggest keywords for the advertiser to use. The paper uses these keywords to use the best

sample of keywords that are advertised on.

The keywords that were used in the analysis come from the AdWords External

Keyword Tool located at https://adwords.google.com/select/KeywordToolExternal. This

program will, when given a keyword or list of keywords, produce additional keywords

(up to a total of 200) related to and including the initial listed words. This page also

reports, for each keyword, normalized ratings (from zero to one5) of advertiser

competition, search volume of the past month, average search volume over the past year,

a search volume for each month of the past year, and which month of the last year had the

highest search volume for that keyword. When given a maximum Cost-per-Click bid, the

External Keyword Tool will also provide an average CPC price the advertiser will pay,

and a range of ad positions (first to third, fourth to sixth, or seventh to tenth) where the

advertisement will most likely land. This data can be used to determine what advertisers

have been bidding and paying per click to have their advertisements shown in certain

5 On the website display the normalized ratings for competitive level and volume are shown in the form of a bar chart, however, when the tables are downloaded, they are each converted to numbers with two decimal places.

Rich 14

position groups. These tables of data can be downloaded directly into .csv format from

the Google website.

A concern of using the normalized data about search volume and level of

competition is that the values may only be normalized by the list of keywords being

retrieved during any singular use of the External Keyword Tool. In order to make sure

the values would at least be consistent over the entire system of keywords, and not just

those whose data was being collected in the specific retrieval, I checked keywords on the

keyword tool to make sure normalized values for competition and advertiser competition

were normalized against the whole system, and not the keywords just in the set. I did this

by using some of the same keywords in two different sets of keywords, and then

compared the values of the advertiser competition and search volume from each set.

Because the numbers were the same for each keyword, this means that the tool does not

base these values on a scale within the given keyword set, but rather always gives a

consistent value for the estimations.

The other source of keyword data provided by Google is provided by the Sandbox

Traffic Estimator, which can be found at this site

https://adwords.google.com/select/TrafficEstimatorSandbox. Within this tool potential

advertisers can suggest a list of keywords and a maximum CPC bid for the list. Though

the site does not provide the user with additional keyword suggestions, it does provide

additional information about each keyword and CPC bid combination including a visual

metric of keyword traffic, minimum and maximum values for estimated CPC, a range of

ad placements similar to the first tool, a range of how many clicks the ad will get at that

position, and a range of the cost per day of running the ad at that CPC on that keyword.

Rich 15

This tool differs slightly from the first in that the potential advertiser does not need to

provide a maximum CPC bid. When that input is left blank, the Estimator will give a

maximum CPC estimate that will reach the ad position 85% of the time. This is an

important function because it gives a consistent estimate for the highest CPCs advertisers

have been paying to reach the top position on the search results page. This estimate will

be used as a representative of the CPC price to reach the top position.

The dependent variables used for the main regressions in the analysis are all the

suggested CPC payments the advertiser would pay. They come from how much the

potential Google advertiser would pay to reach the provided position for the given

keyword. Though these are explicitly stated as how much current advertisers pay to

advertise, they result from historical usage that Google has of their advertisers’ history.

The effect the AdWords Discounter has is that the advertiser would not have to pay more

than advertiser below him would pay for the spot, so I assume that these values are on

average what advertisers are bidding to advertise on each keyword. Using these as how

much advertisers bid, we know how much the advertisers value advertising in each spot

for the given keywords.

The data actually used for this analysis is taken from three different keyword

groups. Each of the three lists is provided from the External Keyword Tool and is

generated from one keyword. The initial keywords are stereo, insurance, and therapist;

the Keyword Tool provided a list of 200 words for each of these three. I chose these

three to be a small sample of different products that people might research online. A

stereo being an actual tangible product someone could purchase online. Insurance is a

financial service that people can research and purchase online. Thirdly, a therapist is a

Rich 16

service most people probably would not purchase online, but could be researched for

purchase of the service in the near future and could therefore be profitable to advertise

online. These are to represent a small cross section of commercial searches people make

on the internet. For each of these 600 keywords, the analysis will use estimates from the

traffic estimator and keyword tool for a list of 23 maximum CPC bids ranging from $50

to $.10 to cover the spectrum of where advertisers bid. The increments between the

highest and lowest bids are not even but are formulated to best retrieve precise data about

CPC payments and advertisement positioning by becoming smaller and smaller as the

bids approaches $.10. See appendix for full list of keywords and bids used in collecting

the data.

It is important to note that the variables provided by the tools do overlap, and the

choice of variables from which tool to use can be important. The variables provided by

the External Keyword Tool are: average CPC, estimated bin of advertisement position

(The bins given suggest the advertisement will be from the first third position, fourth to

sixth, or seventh to tenth.), normalized search volumes for the past month, each month in

the past year, and the entire past year, and a normalized metric of advertiser

competitiveness. The variables given by the traffic estimator are a measure of the

estimation on a one to five scale of the search volume of the keyword and minimums and

maximums for the estimated CPC, advertisement position location (The bin ranges from

the traffic estimator uses the same bins as the keyword tool, but does not stop at three

bins. This data goes up to seven including bins for the eleventh to fifteenth spot,

sixteenth to twentieth, twenty-first to thirtieth, and thirty-first to fortieth.), expected clicks

per day on that keyword, and expected cost per day on the keyword. The appendix

Rich 17

includes a list of the position locations that the advertisements would land, and how each

is estimated in the regression. Each analysis done in this paper uses variables from both

of the sources about each keyword and bid observation, but uses each for a reason

described for the specific analysis.

Because AdWords ranks and places advertisements in positions based on more

than just the CPC bid, the Traffic Estimator explains that the estimates are all based on

average historical CTRs. This means that the number of clicks per day the estimator

provides is based on the assumption that search users will click the advertisement in the

given position in question at the same rate as they had on average in the past. It also

means when evaluating what position bin the advertisement will land for each keyword

and bid observation, it uses a quality score based on the keyword history. Because

historical averages are used to generate the data, I assume that the prices advertisers

actually bid and pay are similar to the CPC estimates given by the External Keyword

Tool and the Traffic Estimator.

In addition to the variables provided by Google, a few are created to be used in

the regressions to account for other factors. The first three dummy variables are made to

distinguish from which keyword group each keyword came: stereo, insurance, or

therapist. By using these variables in the regressions, the inherent differences in keyword

value are taken into account. For instance, insurance advertisers might value the click of

someone searching for insurance more than stereo advertisers value stereo searchers’

clicks, because the profit of selling insurance policy is much more than that of selling a

stereo. This effect is shown in the regression, and does not influence the regression’s

measurement of the value of keyword complexity. Another variable added is a word

Rich 18

count of how many words are in each keyword. This is a simple measure of the

complexity and specificity of the keywords advertisers choose. The last variable added is

a “brand” variable. For each of the six hundred keywords, the keyword was flagged on

whether or not it is a brand name of a certain company. This dummy variable determines

if the keyword the advertisers are bidding on is a keyword that is an existing company.

The variable for how many words the keyword has and the brand dummy variable

are the measures of keyword complexity and specification. The number of words does

this because with searches in general, each additional word used in the search query

narrows the search results to those including the additional word. Therefore someone

who enters a search query with more words is looking for something more specific in

their search results. The brand dummy variable has this same effect of narrowing the

possible results to those produced or sold by the brand name used in the search query.

Analysis

Max CPC to Reach Top Position

The first regression in this paper is on the prices suggested by AdWords to reach

the top position. The CPC variable used for this regression is the upper bound of the

estimates from the Traffic Estimator when the keywords were input without a maximum

CPC bid. The reason for using this variable for the CPC payment and not the estimate

from the External Keyword Tool or the lower bound from the Traffic Estimator is

because the site describes that the estimates given when no maximum CPC bid is entered

will put the advertisement in the top position for searches on that keyword 85% of the

Rich 19

time. Because this is explicitly stated on the Google page, the regression uses this as the

most consistent price estimator for CPC bids of the top position.

In this regression a group of independent variables are used to explain the CPC of

the top position for each keyword. The first is the average of the upper and lower bound

of the estimation of clicks the advertisement might receive. As explained by Edelman,

Ostrovsky, and Schwarz, the actual advertisement location may matter less to the

advertiser than the number of clicks it receives, so by using the estimated number of

clicks an advertisement would receive in part of the regression, that is an estimate of one

of the reasons advertisers might pay a certain amount, even though what they pay is on a

per-click basis. The second independent variable is the External Keyword Tool’s volume

estimator. It does not give an actual number of how often the keywords are searched but

is a metric used to represent this factor. This is an important aspect of how much

advertisers are willing to pay because the volume, of course, will affect the number of

clicks an advertisement will receive. It could also be important to an advertiser because a

search user just seeing the advertisement might be of a value, albeit a lesser value than

the user clicking it.

The next two variables used are measures of the competitive levels on keywords.

The first is the variable given by External Keyword Tool to describe the competitive

situation in the bidding of the keyword observation. The other is a different variable

generated from the AdWords data. Because the data set has observations for each

keyword of bids ranging from $50 to $.10, there is a variety of advertisement position

location estimates for lower bids. As the bids get lower, different keywords get moved to

worse and worse position bins. The second variable that estimates the level of

Rich 20

competition in a given keyword is an estimate of how many advertisers bid on that

keyword, based on the lowest position bin the advertisement could be in. The number

used for the variable is the estimate of the average number of advertisements that are

bidding on a keyword, given by the worst bin an advertisement could be placed in for that

word. These keywords are both included to best take out the effects of the number of

competitors bidding to advertise on certain keywords.

The next two variables are the word count and the brand dummy variable. The

word count is a measure of the complexity and specificity of the keyword observation.

The effect of this variable will be a gauge on how valuable complex and specific

keywords are to advertisers. The brand variable is another way the keywords are

specified even more, this measure is of specification and uniqueness. Because,

theoretically, only one website (the brand’s) will actually match the search query, it is

interesting to see how much all advertisers are willing to pay to advertise at the top of

that keyword search.

Because these variables specify and narrow the searches, they are likely to have a

much smaller search volume. This is shown in Regression A of the Appendix. For this

reason, the prices of keywords that correlate with them could be lower, but by using the

search volume variables, the regression counters this affect to actually see how much

more valuable more specific keywords are to advertisers independent of how often they

are searched.

The last variables used in the regression are the dummy variables for what

keyword group the observation comes from, insurance, stereo, or therapist. Because the

actual value of a customer clicking through to an internet provider of one of these things

Rich 21

differs greatly, these variables were included to remove that effect from the analysis of

the keyword construction. By using these dummy variables the regression overcomes the

inherent differences in values of clicks from people searching for insurance, a stereo, or a

sort of therapy. Only the variables for stereo and therapist were included because using

all three would cause co-linearity. The constant used in the regression represents the

value of advertising on an insurance keyword. The regression is as so:

regress cpcmax dayclicksavg avgvol comp num_advts words brand stereo therapist

cpcmax Coef. Std. Err. P>t dayclicksavg -0.0000461 0.0000196 0.019avgvol 5.065737 2.563023 0.049comp 1.520214 1.081611 0.16num_advts 0.0599272 0.047263 0.205words 1.207616 0.3115602 0.000brand -1.86452 0.6432874 0.004stereo -9.516739 0.4540914 0.000therapist -8.613147 0.4920606 0.000_cons 4.678337 1.701982 0.006

The R-squared of this regression is .531. The variables that are significant (at the 95%

confidence level) in this regression are the number of clicks, the search volume, the

number of words in the keyword, the brand dummy variable, and, of course, the

identifying variables from what keyword list each came. The two insignificant variables

are the two measures of the level of advertiser competitiveness in the keyword.

The positive coefficient for search volume shows that advertisers who are paying

for the first advertisement position value how often the keyword is searched. This

contrasts with the negative coefficient for the number of clicks an advertisement receives.

It shows that advertisers want the ads shown more often, but not necessarily clicked the

most when they are in the top position. This could make sense because as the advertiser

tries to reach the top position, his CPC bid and therefore CPC payments will rise. So

Rich 22

both the CPC payment and number of clicks or times he would pay the CPC both

increase, making the total payment to the search engine much larger. Because of this

larger overall increase in payment, some advertisers may not find paying for the top

position of a keyword whose advertisements get clicked very often worth its price.

An interesting observation of the explanatory variables used to measure word

specificity is the positive coefficient for the number of words but the negative coefficient

for the brand variable. This shows that advertisers are paying more for keywords with

more words, but less for the brand names. Because advertisers want clicks form people

who are genuinely interested in the products offered, the more specific a search, the more

likely a purchase will be made, and the more valuable the click is. This is why

advertisers are willing to pay more for keywords that results in specific search queries.

The reason the specificity of a consumer’s search for a specific brand may not be

valuable to advertisers is probably for reasons related to what the consumer is actually

looking for. One reason is that the consumer is looking for that brand, so he might not be

interested in a competitor’s advertisement. In addition to that, his search may result in

the brand’s website being given by the search engine’s natural results. Specifying a

certain type of product through using specific keywords will get customers with more

specific and concrete purchasing intentions. If a customer uses a brand name, however,

the natural search results are likely to bring up exactly what the search user is looking for.

Because of this sponsored results become obsolete compared to the natural search results.

This next regression of the number of clicks on a brand name and the volume of a search

shows that a brand name in the keyword significantly reduces the number of clicks

advertisements receive in the highest position.

Rich 23

regress dayclicksavg brand avgvol

dayclicksavg Coef. Std. Err. P>t brand -645.242 276.2339 0.020avgvol 53292.92 753.9589 0.000_cons -23031.3 369.6808 0.000

These results show how the number of clicks can drop by an estimation of 645 clicks a

day if the keyword being searched has a brand name. This regression also takes into

account how often the advertisements are shown by including the average search volume

of the keywords, so it focuses of the click-through-rate of advertisements on brand

keywords.

CPCs Over All Advertisement Positions

Another way to see how the complexity of keywords can affect how much

advertisers pay for them is to examine how much advertisers paid across the board to

advertise on keywords, not just how much they paid to reach the top position. The main

difference between this regression and the one done before is that instead of having one

observation for each keyword to reach the top bid, this regression includes up to 23 for

each keyword to explore advertisers CPC payments over all of the position spectrum and

uses their estimated advertisement position as an additional independent variable.

The dependent variable for this regression is estimated CPC payment. For

consistency’s sake, the same variable is used from the data as the first regression, the

maximum estimated CPC value provided by the Traffic estimator. This allows direct

comparisons of the results from this regression to those from the regression using only

the values to reach the top advertisement position.

Rich 24

In order to run this regression two more new variables needed to be created. This

first is an independent variable, ad-position. This variable is calculated very similarly to

the number of advertisements from the previous regression; however, it uses the

estimated position given by AdWords for the advertisement with that specific bid

observation. It uses the lower and upper bound for estimated advertisement position to

create an average position at which the advertisement would land, given that

observation’s bid. This variable is added to control for the wide variance in CPC values

across the bids ranging from $50 to $.10. Because of the inclusion of this variable, I did

not include the bottom advertisement number as an additional measure of competition in

the regression. At lower bids, those two variables would begin to converge, because the

lower bids would cause the advertisement position to drop to its lowest position. For this

reason, the only measure of advertiser competition is the one provided by External

Keyword Tool.

The second variable used for this regression is used as a filter. Because the bids

used to make the data observations begin at $50 and move down, a portion of them are

extremely high bids, even for the most expensive keywords. The average CPC of the

keywords over this bid range is $1.80, and the maximum of any keyword is under $37.

For these reasons, many of the bids on keywords are extremely high, and including them

in the regression might skew the results. In order to compensate, a variable filter is used

to prevent the “over-bids” from being included in the regression. This was done by

identifying what the highest maximum CPC payment estimate for each keyword is, and

then tagging each of the bids higher than it. For example, say the CPC estimates for the

keyword “acceptance insurance” provided by AdWords for the bids of $50, $35, $25, and

Rich 25

$20 are all $5.84, but the CPC estimate for the $15 bid is $5.79. This converted to the

observations whose bids are $50, $35, and $25 are all tagged as over-bids. This way the

$20 at $5.84 bid is still included in the regression.

This rest of the variables in the regression are the same as the previous regression

of the CPCs to reach the top position.

regress cpcmax dayclicksavg estadpos avgvol comp words brand stereo therapist

if overbid == 0

cpcmax Coef. Std. Err. P>t dayclickavg 7.62E-06 3.56E-06 0.032estadpos -0.5162933 0.013869 0.000avgvol -0.1604301 0.3618728 0.658comp 1.284278 0.1712169 0.000words 0.4790947 0.0452695 0.000brand -0.7889787 0.0959657 0.000stereo -3.182362 0.0657782 0.000therapist -2.541234 0.0711544 0.000_cons 3.388384 0.2523916 0.000

The R-squared value for this regression is .2624. This means that even including the

variable for the expected advertisement position, these variables explain much less about

the CPCs across the spectrum of advertiser bidding than the regression on just the

payments to get to the top position.

The regression shows similarly that the value of a click over all the bids are

affected by the word count and brand factor in the same direction as they affect the top

bid. However their coefficients have, on average, a little less than half the effect on all

CPCs as they do on the ones to reach the top positions. This most likely has to do with

the smaller average of CPCs of the bids over than the ones to reach the top. This effect

can also be seen in the decrease in the size of the coefficients for the stereo and therapist

identifiers. Though they both still decrease the value of the word (from a keyword tagged

Rich 26

in the insurance category) the coefficients have a smaller effect on the entire range of

CPCs.

A major difference in this regression is that search volume is no longer significant

and that the number of clicks is now significant and positive. For positions other than the

first, a larger number of clicks becomes more important, whereas at just the first position

a large number of clicks makes the keyword less valuable. It seems that beyond being in

the top position of the sponsored search results, the amount people are willing to pay to

advertise on keywords does not depend on how often the keyword is searched as much as

how often they are clicked.

These differences, in the value of number of clicks and search volume, between

the advertisers paying for the top position and the advertisers paying for the other

positions could result from the added benefit of a search user viewing advertisements in

the top position. It is possible that the only advertisement that gets any benefit to being

on a search results page that is viewed but receives no clicks on the sponsored

advertisements is the advertisement in the top position. This makes sense because a

search user might look at the first advertisement on the list without deciding to click any

of them. The search user would still have seen the first advertisement, and that is a value

to advertisers. The top position gains an additional value to high volume search words

because those are more times the advertisement is viewed. In this way, the amount the

advertiser pays for each click could be more than his actual value of click, because the

value of being in the position is from both the clicks he pays for and the unrecorded

impressions that he receives.

Rich 27

This effect is clear from the data in this and the first regression. Advertisers value

the added benefit of more impressions when they have the highest position. In this

manner, the advertisements in the top positions are advertising similarly to that of more

traditional advertising mediums like radio, print, and television. The goal of those types

is to get exposure of the product and the name of the company, compared to the added

new benefit of advertising by linking to the website. Though the benefit of these

impressions is much less traceable, it can still be of great value to advertisers.

Conclusions

The results of this analysis show three main findings. Advertisers do in fact value

more specific keywords for all different advertisement positions, when controlled for how

often the keywords are searched and clicked. Specificity brings customers who are more

likely to be interested in the products the company offers and are, therefore, more

valuable to reach than those searching less specific keywords. Search users who search a

specific brand, however, are less valuable. This is most likely because the sponsored ads

are clicked less often because of the likelihood of the natural search results containing the

desired websites.

These effects are shown in both the CPCs to reach the highest advertisement

position and in the CPCs to reach any position. The effects of the keyword complexity

and brand name are smaller on the CPCs to reach any position. This can be explained by

the much larger CPC values to reach the highest position than to reach any position. The

number of words in a keyword and whether it is a brand name both have smaller effects

on the smaller CPC values.

Rich 28

The third finding is that advertisers are willing to pay more, specifically for the

top position, when there is a large search volume. An explanation for this is because the

top position gets the added benefit of being seen much more often than the other

advertisements. This fact makes having the best position much more valuable for

keywords that are searched often, because this means the keywords are seen more often.

This effect makes the top-position advertisements (when they are not clicked) function

much more like traditional advertising through television or print. The user only sees the

name of the brand and it registers in their memory.

What this means for potential advertisers is a couple of things. Though this

research does not discuss the success of the companies who are advertising on more

specific keywords, it shows that previous advertisers do know that specific keywords are

more valuable. Though new advertisers would have to pay more for these, the

knowledge that other advertisers have done this shows that specific keywords might be

worth the extra cost. The impact to advertisers is that it might be more important to

advertise to specificity to find the right customers, rather than to generality for a larger

amount of clicks.

Recommendations for Future Research

Some of the ways this topic could be researched further would involve better data

measuring some of the variables. Because the data was retrieved using only the tools

Google provides to potential advertisers, more accurate measures were not possible. One

of these would be more developed measurements of the keywords’ specificity and

complexity. By using a linguistics algorithm or an expert opinion to determine how

Rich 29

specific the keywords are, one would have a much more accurate variable describing how

specific keywords are. Instead of just using how many words are in a keyword, the actual

words could be measured on how specific they are to the keyword’s group, and that

would be a better way to analyze the effect of specific keywords.

Another variable that could be more precise is the variable for search volume.

The one used in this analysis was on a one to one hundred scale. If the actual search

volume of the keywords were used, many more variables could be more accurate. One

could create a theoretical click-through-rate for the words, and this could be used as a

measure of the quality score. Using this value, a researcher might be able to delve further

into the details of Google’s ad rank auction system.

A last variable of data which could be very informative but would be hard to

retrieve is the conversion rate for advertisers. This is the percentage of times the ad is

clicked that the search user actually moves through with the process and makes a

purchase from the advertising company. With this data, one might be able to tell more

about how the way the user searches relates to how he follows through with the

advertising company. By analyzing these success rates, better recommendations could be

made to advertisers to actually see how keyword complexity affects their return on

investment and profitability of advertising.

Rich 30

Appendix

Word List

Insurance Stereo Therapist acceptance insurance american hi fi acne treatment aetna health insurance american hi fi lyrics addiction counseling aetna insurance amp adolescent counseling affordable health insurance amplifier alternative therapy affordable insurance amplifiers american physical therapy association all state insurance amps anger management american family insurance apples in stereo animal assisted therapy american insurance appliances anxiety annuities audio anxiety therapist annuity audio adrenaline apartment therapy auto audio amplifier aquatic therapy auto insurance audio bible aroma therapy auto insurance companies audio book art therapist auto insurance company audio books art therapy auto insurance quote audio cable asian massage auto insurance quotes audio cables behavior therapy auto owners insurance audio clips behavioral therapy automobile audio codec cancer treatment automobile insurance audio codecs chelation therapy boat insurance audio com chemo therapy business insurance audio control child counseling buy insurance audio converter child counselor california department of insurance audio device child therapist california insurance audio driver child therapy car audio drivers cognitive behavior therapy car ins audio editing cognitive behavioral therapy car insurance audio editor cognitive therapist car insurance quote audio engineering cognitive therapy car insurance quotes audio equipment colon therapy car insurance rates audio express color therapy car quote audio files consumer credit counseling cars audio hijack counseling center cash advance audio interface counseling psychologist cheap audio mixer counseling services cheap auto insurance audio recorder counseling therapy cheap car insurance audio recording counsellors cheap health insurance audio research counselor cheap insurance audio review counselors chubb insurance audio software couple counseling citizens insurance audio speakers couples counselor claims audio systems couples therapist cna insurance audio technica couples therapy cobra insurance audio visual cranial sacral therapy

Rich 31

combined insurance audiophile craniosacral therapy commerce insurance auto stereo credit counseling commercial insurance best home theater dance therapy condo insurance big screen decompression therapy country insurance big screen tv depression dental blaqk audio depression counseling dental insurance bluetooth stereo headphones depression counselor department of insurance bluetooth stereo headset depression therapist direct insurance boat stereo depression therapy disability insurance bookshelf stereo dialectical behavior therapy encompass insurance boombox dialectical behavioral therapy erie insurance bose home theater divorce esure buy stereo divorce counseling farm bureau insurance buy stereo system drug therapy finance car electroconvulsive therapy financial advisor car audio enzymatic therapy financial planning car audio systems equine therapy fire insurance car cd players family counselor flood insurance car stereo family therapist florida department of insurance car stereo installation family therapy florida insurance car stereo removal find a psychiatrist foremost insurance car stereo system find a therapist free home insurance quote car stereo systems find therapist gap insurance car stereos gene therapy general insurance cars gestalt therapy general liability insurance cassette deck group therapy grange insurance cassette decks hand therapy group health insurance cassette stereo hormone replacement therapy hanover insurance cassette stereo system hormone therapy hartford insurance cb radios hyperbaric oxygen therapy health cd player individual counseling health care insurance cd players individual therapy health insurance cd stereo infusion therapy health insurance companies cd stereo system inversion therapy health insurance plan compact stereo iv therapy health insurance plans death by stereo laser therapy health insurance quote diamond audio licensed professional counselor health insurance quotes dvd licensed therapist hmo dvd audio life coach home dvd player light therapy home insurance dvd players magnet therapy home loans electrical magnetic therapy home owner insurance electronics manual therapy home owners insurance free audio books marital counseling homeowner insurance headphones marriage and family therapist homeowners hi fi marriage and family therapy homeowners insurance hi fi buys marriage counseling homeowner's insurance home audio marriage counselor house insurance home stereo marriage counselors

Rich 32

individual health insurance home stereo system marriage therapist ins home stereos marriage therapy insurace home theater massage insuranc home theater furniture massage chair insurance home theater installation massage envy insurance adjuster home theater magazine massage school insurance agencies home theater master massage table insurance agency home theater pc massage therapist insurance agent home theater projector massage therapist salary insurance agent companies home theater projectors massage therapists insurance agent company home theater receiver massage therapy insurance agents home theater review massage therapy school insurance broker home theater seating massage therapy schools insurance brokers home theater speakers medical psychotherapists insurance claims home theater system mental health insurance co home theater systems mental health counseling insurance com home theaters mental health counselor insurance commissioner home theatre mental health therapist insurance companies jet audio message therapist insurance companies quotes jl audio message therapy insurance company kenwood stereo music therapist insurance coverage laptop music therapy insurance estimate legacy audio narrative therapy insurance fraud m audio new york therapist insurance institute ma audio occupational therapist insurance jobs magnolia home theater occupational therapists insurance leads marine stereo occupational therapy insurance license media centers occupational therapy assistant insurance plans memphis audio occupational therapy association insurance policies mini stereo occupational therapy jobs insurance policy monitor audio online therapy insurance provider mp3 player oxygen therapy insurance providers mp3 players ozone therapy insurance quote multimedia audio controller pediatric physical therapy insurance quotes music pet therapy insurance rate music system phone therapy insurance rates no audio device photodynamic therapy insurance ratings online car stereo physical therapist insurances open air stereo physical therapist assistant insurane outdoor stereo physical therapist salary insure pioneer audio physical therapists insure my car pioneer car audio physical therapy insureance pioneer car stereo physical therapy aide insurence pioneer stereo physical therapy assistant insurers plasma physical therapy association investment plasma tv physical therapy equipment investments polk audio physical therapy exercises liability insurance preamp physical therapy jobs life pro audio physical therapy program

Rich 33

life insurance projector physical therapy programs life insurance companies radio physical therapy salary life insurance policy radio stereo physical therapy school life insurance quote radio stereo system physical therapy schools life insurance quotes real audio play therapy loan realtek audio premarital counseling loans realtek hd audio professional therapist long term care receiver proton therapy long term care insurance receivers psychiatrist low cost health insurance rims psychiatrist directory low cost insurance shelf stereo psycho therapist malpractice insurance sigmatel audio psychologist medical insurance soda stereo psychologists meloche sony car stereo psychotherapist md money sound systems psychotherapists mortgage speaker psychotherapy mortgage insurance speakers radiation therapist mortgages stereo radiation therapy motorcycle insurance stereo advantage reality therapy mutual funds stereo amplifier recreation therapy national insurance stereo bluetooth recreational therapy nationwide insurance stereo cabinet relationship counseling new york life insurance stereo cable relationship therapy online insurance stereo dealer relationships oxford health insurance stereo equipment release therapy pet insurance stereo headphones respiratory therapist private mortgage insurance stereo installation respiratory therapists professional liability insurance stereo lyrics respiratory therapy property stereo receiver san francisco therapist property insurance stereo receivers seattle therapist real estate stereo repair seattle therapists rental insurance stereo retailer shock therapy renters insurance stereo review sound therapy renter's insurance stereo shop speech therapist retirement stereo speakers speech therapists retirement planning stereo store speech therapy rv insurance stereo system sports physical therapy short term health insurance stereo system store sports therapy standard insurance stereo systems stem cell therapy student health insurance stereo total stress student insurance stereogram teen counseling term insurance stereos testosterone therapy term life insurance stereoscopic thai massage texas department of insurance streaming audio therapist texas insurance subwoofer therapist directory title insurance subwoofers therapist jobs travel surround sound therapists travel insurance surround sound systems therapy travelers insurance television therapy dog

Rich 34

truck insurance the apples in stereo therapy dogs unemployment insurance turntable therapy nyc vehicle insurance turntables trigger point therapy vision insurance universal audio urine therapy whole life insurance usb audio vision therapy work at home wholesale water therapy work from home wireless audio water treatment workers compensation insurance wireless home theater wilderness therapy workmans wireless stereo window treatment zurich insurance yamaha stereo yoga therapy

Position Bins

Bin # Min Position Max Position Avg Position

1 1 3 2

2 4 6 5

3 7 10 8.5

4 11 15 13

5 16 20 18

6 21 30 25.5

7 31 40 35.5

Bid List

$0.10 $0.15

$$0.20 $0.30 $0.50 $0.75

$1 $1.25 $1.50

$2 $2.50

$3

Rich 35

$4 $5 $6 $8

$10 $12.50

$15 $20 $25 $35 $50

Reg. A

regress avgvol words brand

avgvol Coef. Std. Err. P>t words -0.0766555 0.0012082 0.000 brand -0.0212383 0.0028039 0.000 _cons 0.6371395 0.0026828 0.000

Rich 36

Bibliography Aggarwal, G., Goel, A., Motwani, R. Truthful Auctions for Pricing Search Keywords. theory.stanford.edu. <http://theory.stanford.edu/~gagan/papers/keyword_auctions_EC06.pdf> Edelman, B., Ostrovsky, M. Strategic Bidder Behavior in Sponsored Search Auctions. Decision Support Systems, 2007. Elsevier. <http://www.benedelman.org/publications/cycling-060703.pdf> Edelman, B., Ostrovsky, M., Schwarz, M. Internet Advertising and the Generalized Second-Price Auction, 2005. atypon-link.com. <http://faculty-gsb.stanford.edu/ostrovsky/papers/gsp.pdf> Fain, D. C., Pederson, J. O. Sponsored Search: A Brief History. <http://www.business.ualberta.ca/kasdemir/ssa2/fain_pedersen.PDF> Feng, J., Bhargava, H. K., Pennock, D. M. Implementing Sponsored Search in Web Search Engines: Computational Evaluation of Alternative Mechanisms. INFORMS Journal on Computing, 2006. bear.cba.ufl.edu. <http://bear.cba.ufl.edu/feng/JOC.pdf> Lahaie, S. An Analysis of Alternative Slot Auction Designs for Sponsored Search. eecs.harvard.edu <http://www.eecs.harvard.edu/~slahaie/pubs/fp185-lahaie.pdf> Zhou, Y., Lukose, R. Vindictive Bidding in Keyword Auctions. cse.wustl.edu <http://www.cse.wustl.edu/~yzhou/yunhongzhou/documents/06-ssa- vindictive.pdf> AdWords Help Center. Can I see what my competitors are bidding?. google.com <https://adwords.google.com/support/bin/answer.py?answer=12395&topic=1026 4>. AdWords Help Center. How are ads ranked?. google.com <https://adwords.google.com/support/bin/answer.py?answer=6111&hl=en_US&c tx=SetPricing>. AdWords Help Center. How does Google detect invalid clicks?. google.com <http://adwords.google.com/support/bin/answer.py?hl=en&answer=6114>. Google AdWords. Google AdWords: Keyword Tool. <https://adwords.google.com/select/KeywordToolExternal> Google Adwords. Google AdWords:Traffic Estimator. <https://adwords.google.com/select/TrafficEstimatorSandbox>