An Empirical Analysis of the Auction Bids for
Keyword Types in Sponsored Search
By Gabe Rich
Faculty Advisor: Igal Hendel
Senior Honors Thesis Mathematical Methods in the Social Sciences
Northwestern University Spring, 2008
Rich 2
Acknowledgements
I would first like to acknowledge Randy Heeb whom working with the past
summer inspired my research question. Inviting me to work with him and our team got
me interested in search advertising and sparked the questions I attempt to answer in this
thesis.
I would also like to thank my advisor, Professor Igal Hendel, whose advice and
experience was invaluable as I worked through the analysis of the thesis. His patience
and helpfulness through the entire thesis writing process helped me keep everything in
perspective.
Lastly, I would like to thank my parents for always supporting and encouraging
me through the writing of this paper and my entire life. Their need to always stay on my
case was crucial to my ability to have completed these tasks.
Rich 3
Abstract
This paper evaluates a way advertisers use internet advertising. It focuses on how
much advertisers are willing to pay for a given click from a search engine user.
Advertisers can pick specific searches on which they would like their advertisement to
appear by choosing different keywords. When a keyword is searched, an auction occurs,
the advertisers’ per-click bids are compared, and the advertisements are placed along a
sponsored search results area of the web page. Only when an advertisement is clicked by
the search user will the advertiser pay the search engine.
This paper uses historical bid data given by Google to see what prices and
advertisement positions an advertisement would have when advertising on a list of
keywords. Using this information, the paper analyses whether advertisers pay more for
keywords that are more complex and keywords that include brand names. These factors
are ways the advertiser can focus his consumer base. This paper will show that
advertisers pay more for specific keywords and less for keywords that include a brand
name.
This paper also shows that advertisers pay more for keywords that are searched
more often for advertisements that are in the highest position than for advertisements in
the lower positions. This result might be because only the top advertisement is seen and
has a benefit even when not being clicked. So the amount advertisers pay to be viewed in
addition to being clicked from that position is shown even greater on keywords with
higher search volumes.
Rich 4
Introduction
When one is researching a potential future purchase on the internet, he often starts
by simply typing what he is looking for into a search engine and pressing return. What
he doesn’t know is how much goes into deciding what comes up on the next page. This
statement is not referring to the search results he sees going down the middle of the page.
The process of how these are decided and placed is often through a very complicated and
secret algorithm. The topic of this paper refers to the little section often placed on the
right side of the page titled Sponsored Links or Sponsor Results.
The sponsored search industry is an extremely important source of revenue for
search engines, and has become one of the largest sources of income on the internet. The
system works by providing additional sponsored search results to every search engine
user generally along the right side of the search results page. These results are placed
differently than the natural results that go down the middle of the page which are chosen
to best respond to the user’s search inquiry. The search engines make a large portion of
their revenue, in most cases, by auctioning off the spots in the sponsored links section to
companies to advertise and hyperlink their websites. Because advertisers pay the search
engine every time a search user clicks their ad, they, of course, want users who are
interested in the products that they have to offer. They control this by choosing a list of
keywords on which they would like to advertise. When a search user searches on one of
their keywords, the advertiser is put into an instantaneous auction which decides if and
where his advertisement will be placed in the sponsored search results.
What this paper evaluates is how much advertisers are willing to pay for different
keywords. Advertisers, of course, choose keywords that relate to the product or service
Rich 5
they offer on the internet because they want users who are searching for that product.
This is a simple way that the advertisers segment the population of search engine users
into two groups: those that are looking for what the advertiser is trying to sell and those
that are not. They can then target their advertising to the former. This paper researches
whether advertisers pay more for specific and complex keywords because those
keywords can segment and target the population even further. Keywords that are more
specific and complex are of more value to any given advertiser because of their ability to
further target potential consumers. They will still advertise only on the specific keywords
that relate to their products. Therefore they will be even more likely to match the search
user’s desired search results than when using a less specific keyword relating to their
products. Advertisers that realize that advertising on more specific keywords is more
valuable to them will bid higher amounts of money for the clicks of people searching on
the specific and complex keywords.
In addition to this, the paper will also ask whether advertisers pay more for their
advertisements that land in the best position (the one at the top of the sponsored search
results links) for keywords that are searched more often. One hypothesis might be that
the advertisers who are in the first position are most benefited by the search user reading
the advertisement, even if it is not clicked. This may have added value to keywords with
high search volumes because those advertisements will be seen most often.
Introduction to Google’s AdWords
Google’s AdWords is the program that Google uses to sell advertising spots on
the internet. Advertisers subscribe to the service to advertise on Google and other
Rich 6
websites within Google’s advertising network, the Search Network and the Content
Network1. For the purpose of this study, I will only refer to the part of system that
involves advertising on Google’s Search Network, specifically on search results pages.
This means that advertisements are placed on specific search results pages chosen by the
advertiser.
To advertise in the Search Network the advertiser chooses which search results
pages it would like to advertise on by choosing a list of keywords. When these keywords
are searched, the advertisement is eligible to be shown on the sponsored results of that
user’s search. Because each search results page has spots for only up to eleven
advertisements2, Google uses a mechanism to determine which advertisements get
positions, and which positions they get. The way Google, in addition to most of the other
search engines, choose which eligible advertisements are shown is through an auction.
However, its auction uses a combination of factors to determine which advertisements get
which position on the search results page. (The top position, being the best, will be seen
and, probably, clicked most often.) Although different search engines use different
ranking criteria for the placement of ads within the auction, the one that Google
AdWords uses has been shown by previous researchers (Feng, Bhargava, and Pennock,
and Lahaie) to be the most stable and best for the revenue of the search engine.
The way the advertisements are ordered on the page is by an “Ad Rank”, which is
the product of two factors. The first is an advertiser’s maximum cost-per-click (CPC)
bid, which is the most an advertiser is willing to pay for each time a user clicks his
1 This Content Network involves advertising on partner sites with Google by matching an advertisement with the contents of websites. Though this also uses bidding, it does not involve bidding on keywords to land on search results pages. 2 More spots are allocated on subsequent search results pages, but each spots is viewed less as it is placed further from the front of the search.
Rich 7
advertisement. (Maximum CPC bids can be placed for an advertiser’s entire group of
keywords or be set individually for each keyword.) The other factor is a quality score
that uses a combination of the advertiser and, specifically, the to-be-displayed
advertisement’s AdWords history and how well the advertisement matches the search
query of the search engine user. Because the Ad Rank for each advertisement is created
in real time, a new order may be made for each search that happens on Google.
An important distinction is that the bid is on a cost-per-click and not a cost-per-
impression basis. This means that the advertiser’s bid will only be paid when the search
user clicks the advertisement and is redirected to the website to either learn more about
the product or service or make a purchase. In this manner, advertisers are only charged
when a potential customer accesses their websites. This also means that advertisements
that are never clicked will not have to pay Google. For this reason, Google also uses a
quality score as part of the advertiser’s bid.
The quality score that is used is determined by a few factors. Though Google
does not explain exactly how the score is made it does explain the main contributors to it.
The most important factor is the advertisement’s historical click-through-rate (CTR),
which is the proportion of how many times the advertisement is clicked to how many
times it is shown. The other major factor is the relevance of the search query to the
advertisement itself and the keyword the advertiser chose. Because most advertisers
choose their keywords to broad match the search queries, the advertisement will show on
any search using a combination of the words in the keyword the advertiser chose to
advertise on. For this reason, search queries may match better with some advertisers’
keywords than others. For example, an advertiser choosing to advertise on the keyword
Rich 8
“car parts” will be eligible for the search query “What parts of the world does my
driver’s license allow me to drive a car?”, but will have a worse keyword match
component of the quality score than an advertiser with the keyword “world driver’s
license.”
Google uses each of the main factors in the quality score for a reason related to
the probability a user will click the advertisements. The reason for using the CTR is so
the ad rank score is closer to ranking the advertisement by expected revenue (Feng,
Bhargava, and Pennock.) This is because using the product of the CTR and the CPC bid
would make an estimate of how often it is clicked times how much the search engine
makes per click. In fact, this method is sometimes referred to as a rank-by-revenue
method. Using relevancy factors ensures the search results will be attractive to the
specific searchers. Google needs to make sure that their searchers continue to find the
sponsored search results relevant to what they are looking for, or users will no longer
click or look to the sponsored results. The relevancy factor is another way that Google
tries to increase the chance that the user will click the sponsored results and generate
revenue.
Another concept that Google uses in its auctions is what it calls the AdWords
Discounter. This service that Google offers to advertisers works in effect to turn the
auction into a generalized second price auction. In a normal generalized second price
auction, each bidder pays a marginally higher amount than the bid of the bidder below
him. (The highest bidder would get the first prize and pay the amount of the second
highest bidder. The second highest bidder would get the second prize and pay the
amount of the third highest bidder. Etc. [Edelman, Ostrovsky, and Schwarz.]) The
Rich 9
Discounter works to have this effect in the auction using the advertisement’s quality
scores. The way it works is by lowering the amount an advertiser pays enough so that its
ad rank would still be higher than the ad rank of the advertiser in the next position. This
way an advertiser never pays more than it needs to maintain its same position. This
works by lowering the product of the CPC payment and quality score to the amount of
the bidder below it and then dividing the quality score from it to determine the actual cost
the advertiser would pay for a click on that search. This function allows the advertisers
to bid their actual value of a click without worrying about paying too much money
because their payment will only be what is necessary to reach the highest possible
position.
Literature Review
The majority of the literature that studies the sponsored search industry has been
developing and evaluating theoretical models that correspond to the different auction
methods that search engines use to determine the positioning and prices for the
advertisers. This research is helpful in showing exactly what advertisers’ bid prices
represent. Previous research has shown that the auction method Google uses weighted by
CTR is the most stable in showing bids that are closest to the true values of clicks to
advertisers. The research done by Feng et al. explains that the method Google uses with
Click-Through-Rates to determine ordering is close to inducing truthful bids from
advertisers. Their findings are that when using past CTRs for an advertiser, they should
control for the position in which an ad is placed. By doing this the search engine can
fairly attribute CTRs to ads despite wherever they are initially placed along searches. By
Rich 10
modeling this mechanism, Feng et al. show that the best way to evaluate how much
revenue ads will make for a search engine will take an advertisers’ CTR into account
when ranking ads. This assurance demonstrates the importance of rank-by-revenue
system that Google uses and reinforces the use of Google estimates in this study of
advertisers’ willingness to pay for a given keyword. Lahaie also discusses the closer
relationship with the Google method to having bidder’s true values, while explaining that
with some circumstances there are no equilibriums for bidder’s acting within the auctions
rules.
In the two papers Edelman and Ostrovsky have published they explain that the
second price auction that is used does not have an equilibrium of bidders truth-telling
their values in the bids. They explain how the outcome of the generalized second-price
auction is different than the Vickrey-Clarke Groves mechanism because the pricing
methods differ. They go on to say that the VCG method would induce truthful bids in
Internet Advertising and the Generalized Second-Price Auction and continue to discuss
how bidders dynamically change their bids depending on their competitors for each
keyword in Strategic Bidder Behavior in Sponsored Search Auctions. This second
publication developed a theory of bidders moving along a spectrum of bid values
depending on where their competitor was and observed this behavior among different
keywords in both major search engines. Thus, bid amounts fluctuate over cycles of time
as the competitors respond to each others’ actions and move continually. Though this
knowledge would skew the results if this analysis were using real, instantaneous bid data,
I assume that the bid estimates Google gives to prospective advertisers averages out these
effects. Because the website claims that some of the estimates they give for CPCs will
Rich 11
reach the top position 85% of the time, I will assume that these estimates are based over a
long enough time period to be confident that the numbers accurately represent an average
of what advertisers are paying for the given keyword. I also assume that these effects
would not be different over different keywords or would correlate directly with the level
of competition within each keyword. Because Google gives a metric of competitive level
for each keyword, whatever effect that the bid fluctuation will have on the data will be
accounted for by using that rating as a variable in the regression on each keyword’s
value.
Another study done by Zhou and Lukose that might skew the results of this study
show that advertisers might act vindictively and choose bids that would not necessarily
directly benefit themselves, but do more harm to their competitors. This works by raising
ones own bid to increase the CPC of the advertiser in a higher position. Because the
auction is a generalized second price auction, the vindictive bidder will not see higher
CPCs because the advertiser below has not changed its bid, but the advertiser above will
see a raised CPC effect3. Therefore a competitor can raise its price to incur a higher cost
to the advertiser who has the spot above him. Not only does this make the competitor
lose money, but because many advertisers put spending limits on their accounts, the
victim advertiser may reach its limit and no longer display ads. The vindictive bidder
then benefits by having the better advertising spot without having to pay a larger amount
for it, because the original advertiser is no longer bidding for a position. Knowing that
advertisers bid this way would change Google’s estimates for bids at lower positions, and
3 This is different than click fraud, which is clicking many times on someone else’s ad to run up their advertising bill and reach their account cap. Though this can be done, search engines explain that there are systems in place to monitor and stop click fraud from occurring and advertisers’ accounts from being charged for this.
Rich 12
possibly from the top position if many advertisers reach their limit often. Though
vindictive bidding would have an affect on a study using historical bid estimates, I make
two assumptions about why the historical estimates used in this study can be
representative of advertisers’ true value of a click from different keywords. The first
assumption is that vindictive bidding is very hard to accomplish on keywords with many
bidders. It is difficult to target where your opponents’ bids are4 and to consistently be in
an advertising position to have the desired effect. I also make the assumption that if
vindictive bidding does occur and would have an affect on historical bid data compared
to advertisers’ true values, then this will be consistent with how much advertiser
competition there is for a given keyword. Again, by including the factor for the level
competition there is for each keyword as another independent variable, this effect on CPC
prices can hopefully be taken out of the regression.
These studies which create and use theoretical models demonstrate that the
generalized second price auction works in search engines advertisement sales and that the
bids advertisers offer is the assumed value of a click to the advertisers. This paper
attempts to determine some of the reasons behind why advertisers bid certain amounts for
different keywords, and what makes clicks from some keywords more valuable than
others.
Data Being Used
The data used for this paper was gathered from tools in Google AdWords which
give information on the performance of keywords within the AdWords product. The data
4Google explains that advertisers cannot see their competitors’ bids.
Rich 13
comes from two different outputs on the site that are used to help potential advertisers
who might be interested in advertising on Google AdWords. They both provide
estimates about what advertisers can expect to pay to advertise on the Google Search
Network. Because these estimates are created from historical AdWords data, I assume
that the data is an accurate representation of what advertisers paid and experienced while
advertising on these keywords. The words chosen for this analysis were also provided by
the keyword tools. In addition to the data about the keywords, the AdWords tools also
suggest keywords for the advertiser to use. The paper uses these keywords to use the best
sample of keywords that are advertised on.
The keywords that were used in the analysis come from the AdWords External
Keyword Tool located at https://adwords.google.com/select/KeywordToolExternal. This
program will, when given a keyword or list of keywords, produce additional keywords
(up to a total of 200) related to and including the initial listed words. This page also
reports, for each keyword, normalized ratings (from zero to one5) of advertiser
competition, search volume of the past month, average search volume over the past year,
a search volume for each month of the past year, and which month of the last year had the
highest search volume for that keyword. When given a maximum Cost-per-Click bid, the
External Keyword Tool will also provide an average CPC price the advertiser will pay,
and a range of ad positions (first to third, fourth to sixth, or seventh to tenth) where the
advertisement will most likely land. This data can be used to determine what advertisers
have been bidding and paying per click to have their advertisements shown in certain
5 On the website display the normalized ratings for competitive level and volume are shown in the form of a bar chart, however, when the tables are downloaded, they are each converted to numbers with two decimal places.
Rich 14
position groups. These tables of data can be downloaded directly into .csv format from
the Google website.
A concern of using the normalized data about search volume and level of
competition is that the values may only be normalized by the list of keywords being
retrieved during any singular use of the External Keyword Tool. In order to make sure
the values would at least be consistent over the entire system of keywords, and not just
those whose data was being collected in the specific retrieval, I checked keywords on the
keyword tool to make sure normalized values for competition and advertiser competition
were normalized against the whole system, and not the keywords just in the set. I did this
by using some of the same keywords in two different sets of keywords, and then
compared the values of the advertiser competition and search volume from each set.
Because the numbers were the same for each keyword, this means that the tool does not
base these values on a scale within the given keyword set, but rather always gives a
consistent value for the estimations.
The other source of keyword data provided by Google is provided by the Sandbox
Traffic Estimator, which can be found at this site
https://adwords.google.com/select/TrafficEstimatorSandbox. Within this tool potential
advertisers can suggest a list of keywords and a maximum CPC bid for the list. Though
the site does not provide the user with additional keyword suggestions, it does provide
additional information about each keyword and CPC bid combination including a visual
metric of keyword traffic, minimum and maximum values for estimated CPC, a range of
ad placements similar to the first tool, a range of how many clicks the ad will get at that
position, and a range of the cost per day of running the ad at that CPC on that keyword.
Rich 15
This tool differs slightly from the first in that the potential advertiser does not need to
provide a maximum CPC bid. When that input is left blank, the Estimator will give a
maximum CPC estimate that will reach the ad position 85% of the time. This is an
important function because it gives a consistent estimate for the highest CPCs advertisers
have been paying to reach the top position on the search results page. This estimate will
be used as a representative of the CPC price to reach the top position.
The dependent variables used for the main regressions in the analysis are all the
suggested CPC payments the advertiser would pay. They come from how much the
potential Google advertiser would pay to reach the provided position for the given
keyword. Though these are explicitly stated as how much current advertisers pay to
advertise, they result from historical usage that Google has of their advertisers’ history.
The effect the AdWords Discounter has is that the advertiser would not have to pay more
than advertiser below him would pay for the spot, so I assume that these values are on
average what advertisers are bidding to advertise on each keyword. Using these as how
much advertisers bid, we know how much the advertisers value advertising in each spot
for the given keywords.
The data actually used for this analysis is taken from three different keyword
groups. Each of the three lists is provided from the External Keyword Tool and is
generated from one keyword. The initial keywords are stereo, insurance, and therapist;
the Keyword Tool provided a list of 200 words for each of these three. I chose these
three to be a small sample of different products that people might research online. A
stereo being an actual tangible product someone could purchase online. Insurance is a
financial service that people can research and purchase online. Thirdly, a therapist is a
Rich 16
service most people probably would not purchase online, but could be researched for
purchase of the service in the near future and could therefore be profitable to advertise
online. These are to represent a small cross section of commercial searches people make
on the internet. For each of these 600 keywords, the analysis will use estimates from the
traffic estimator and keyword tool for a list of 23 maximum CPC bids ranging from $50
to $.10 to cover the spectrum of where advertisers bid. The increments between the
highest and lowest bids are not even but are formulated to best retrieve precise data about
CPC payments and advertisement positioning by becoming smaller and smaller as the
bids approaches $.10. See appendix for full list of keywords and bids used in collecting
the data.
It is important to note that the variables provided by the tools do overlap, and the
choice of variables from which tool to use can be important. The variables provided by
the External Keyword Tool are: average CPC, estimated bin of advertisement position
(The bins given suggest the advertisement will be from the first third position, fourth to
sixth, or seventh to tenth.), normalized search volumes for the past month, each month in
the past year, and the entire past year, and a normalized metric of advertiser
competitiveness. The variables given by the traffic estimator are a measure of the
estimation on a one to five scale of the search volume of the keyword and minimums and
maximums for the estimated CPC, advertisement position location (The bin ranges from
the traffic estimator uses the same bins as the keyword tool, but does not stop at three
bins. This data goes up to seven including bins for the eleventh to fifteenth spot,
sixteenth to twentieth, twenty-first to thirtieth, and thirty-first to fortieth.), expected clicks
per day on that keyword, and expected cost per day on the keyword. The appendix
Rich 17
includes a list of the position locations that the advertisements would land, and how each
is estimated in the regression. Each analysis done in this paper uses variables from both
of the sources about each keyword and bid observation, but uses each for a reason
described for the specific analysis.
Because AdWords ranks and places advertisements in positions based on more
than just the CPC bid, the Traffic Estimator explains that the estimates are all based on
average historical CTRs. This means that the number of clicks per day the estimator
provides is based on the assumption that search users will click the advertisement in the
given position in question at the same rate as they had on average in the past. It also
means when evaluating what position bin the advertisement will land for each keyword
and bid observation, it uses a quality score based on the keyword history. Because
historical averages are used to generate the data, I assume that the prices advertisers
actually bid and pay are similar to the CPC estimates given by the External Keyword
Tool and the Traffic Estimator.
In addition to the variables provided by Google, a few are created to be used in
the regressions to account for other factors. The first three dummy variables are made to
distinguish from which keyword group each keyword came: stereo, insurance, or
therapist. By using these variables in the regressions, the inherent differences in keyword
value are taken into account. For instance, insurance advertisers might value the click of
someone searching for insurance more than stereo advertisers value stereo searchers’
clicks, because the profit of selling insurance policy is much more than that of selling a
stereo. This effect is shown in the regression, and does not influence the regression’s
measurement of the value of keyword complexity. Another variable added is a word
Rich 18
count of how many words are in each keyword. This is a simple measure of the
complexity and specificity of the keywords advertisers choose. The last variable added is
a “brand” variable. For each of the six hundred keywords, the keyword was flagged on
whether or not it is a brand name of a certain company. This dummy variable determines
if the keyword the advertisers are bidding on is a keyword that is an existing company.
The variable for how many words the keyword has and the brand dummy variable
are the measures of keyword complexity and specification. The number of words does
this because with searches in general, each additional word used in the search query
narrows the search results to those including the additional word. Therefore someone
who enters a search query with more words is looking for something more specific in
their search results. The brand dummy variable has this same effect of narrowing the
possible results to those produced or sold by the brand name used in the search query.
Analysis
Max CPC to Reach Top Position
The first regression in this paper is on the prices suggested by AdWords to reach
the top position. The CPC variable used for this regression is the upper bound of the
estimates from the Traffic Estimator when the keywords were input without a maximum
CPC bid. The reason for using this variable for the CPC payment and not the estimate
from the External Keyword Tool or the lower bound from the Traffic Estimator is
because the site describes that the estimates given when no maximum CPC bid is entered
will put the advertisement in the top position for searches on that keyword 85% of the
Rich 19
time. Because this is explicitly stated on the Google page, the regression uses this as the
most consistent price estimator for CPC bids of the top position.
In this regression a group of independent variables are used to explain the CPC of
the top position for each keyword. The first is the average of the upper and lower bound
of the estimation of clicks the advertisement might receive. As explained by Edelman,
Ostrovsky, and Schwarz, the actual advertisement location may matter less to the
advertiser than the number of clicks it receives, so by using the estimated number of
clicks an advertisement would receive in part of the regression, that is an estimate of one
of the reasons advertisers might pay a certain amount, even though what they pay is on a
per-click basis. The second independent variable is the External Keyword Tool’s volume
estimator. It does not give an actual number of how often the keywords are searched but
is a metric used to represent this factor. This is an important aspect of how much
advertisers are willing to pay because the volume, of course, will affect the number of
clicks an advertisement will receive. It could also be important to an advertiser because a
search user just seeing the advertisement might be of a value, albeit a lesser value than
the user clicking it.
The next two variables used are measures of the competitive levels on keywords.
The first is the variable given by External Keyword Tool to describe the competitive
situation in the bidding of the keyword observation. The other is a different variable
generated from the AdWords data. Because the data set has observations for each
keyword of bids ranging from $50 to $.10, there is a variety of advertisement position
location estimates for lower bids. As the bids get lower, different keywords get moved to
worse and worse position bins. The second variable that estimates the level of
Rich 20
competition in a given keyword is an estimate of how many advertisers bid on that
keyword, based on the lowest position bin the advertisement could be in. The number
used for the variable is the estimate of the average number of advertisements that are
bidding on a keyword, given by the worst bin an advertisement could be placed in for that
word. These keywords are both included to best take out the effects of the number of
competitors bidding to advertise on certain keywords.
The next two variables are the word count and the brand dummy variable. The
word count is a measure of the complexity and specificity of the keyword observation.
The effect of this variable will be a gauge on how valuable complex and specific
keywords are to advertisers. The brand variable is another way the keywords are
specified even more, this measure is of specification and uniqueness. Because,
theoretically, only one website (the brand’s) will actually match the search query, it is
interesting to see how much all advertisers are willing to pay to advertise at the top of
that keyword search.
Because these variables specify and narrow the searches, they are likely to have a
much smaller search volume. This is shown in Regression A of the Appendix. For this
reason, the prices of keywords that correlate with them could be lower, but by using the
search volume variables, the regression counters this affect to actually see how much
more valuable more specific keywords are to advertisers independent of how often they
are searched.
The last variables used in the regression are the dummy variables for what
keyword group the observation comes from, insurance, stereo, or therapist. Because the
actual value of a customer clicking through to an internet provider of one of these things
Rich 21
differs greatly, these variables were included to remove that effect from the analysis of
the keyword construction. By using these dummy variables the regression overcomes the
inherent differences in values of clicks from people searching for insurance, a stereo, or a
sort of therapy. Only the variables for stereo and therapist were included because using
all three would cause co-linearity. The constant used in the regression represents the
value of advertising on an insurance keyword. The regression is as so:
regress cpcmax dayclicksavg avgvol comp num_advts words brand stereo therapist
cpcmax Coef. Std. Err. P>t dayclicksavg -0.0000461 0.0000196 0.019avgvol 5.065737 2.563023 0.049comp 1.520214 1.081611 0.16num_advts 0.0599272 0.047263 0.205words 1.207616 0.3115602 0.000brand -1.86452 0.6432874 0.004stereo -9.516739 0.4540914 0.000therapist -8.613147 0.4920606 0.000_cons 4.678337 1.701982 0.006
The R-squared of this regression is .531. The variables that are significant (at the 95%
confidence level) in this regression are the number of clicks, the search volume, the
number of words in the keyword, the brand dummy variable, and, of course, the
identifying variables from what keyword list each came. The two insignificant variables
are the two measures of the level of advertiser competitiveness in the keyword.
The positive coefficient for search volume shows that advertisers who are paying
for the first advertisement position value how often the keyword is searched. This
contrasts with the negative coefficient for the number of clicks an advertisement receives.
It shows that advertisers want the ads shown more often, but not necessarily clicked the
most when they are in the top position. This could make sense because as the advertiser
tries to reach the top position, his CPC bid and therefore CPC payments will rise. So
Rich 22
both the CPC payment and number of clicks or times he would pay the CPC both
increase, making the total payment to the search engine much larger. Because of this
larger overall increase in payment, some advertisers may not find paying for the top
position of a keyword whose advertisements get clicked very often worth its price.
An interesting observation of the explanatory variables used to measure word
specificity is the positive coefficient for the number of words but the negative coefficient
for the brand variable. This shows that advertisers are paying more for keywords with
more words, but less for the brand names. Because advertisers want clicks form people
who are genuinely interested in the products offered, the more specific a search, the more
likely a purchase will be made, and the more valuable the click is. This is why
advertisers are willing to pay more for keywords that results in specific search queries.
The reason the specificity of a consumer’s search for a specific brand may not be
valuable to advertisers is probably for reasons related to what the consumer is actually
looking for. One reason is that the consumer is looking for that brand, so he might not be
interested in a competitor’s advertisement. In addition to that, his search may result in
the brand’s website being given by the search engine’s natural results. Specifying a
certain type of product through using specific keywords will get customers with more
specific and concrete purchasing intentions. If a customer uses a brand name, however,
the natural search results are likely to bring up exactly what the search user is looking for.
Because of this sponsored results become obsolete compared to the natural search results.
This next regression of the number of clicks on a brand name and the volume of a search
shows that a brand name in the keyword significantly reduces the number of clicks
advertisements receive in the highest position.
Rich 23
regress dayclicksavg brand avgvol
dayclicksavg Coef. Std. Err. P>t brand -645.242 276.2339 0.020avgvol 53292.92 753.9589 0.000_cons -23031.3 369.6808 0.000
These results show how the number of clicks can drop by an estimation of 645 clicks a
day if the keyword being searched has a brand name. This regression also takes into
account how often the advertisements are shown by including the average search volume
of the keywords, so it focuses of the click-through-rate of advertisements on brand
keywords.
CPCs Over All Advertisement Positions
Another way to see how the complexity of keywords can affect how much
advertisers pay for them is to examine how much advertisers paid across the board to
advertise on keywords, not just how much they paid to reach the top position. The main
difference between this regression and the one done before is that instead of having one
observation for each keyword to reach the top bid, this regression includes up to 23 for
each keyword to explore advertisers CPC payments over all of the position spectrum and
uses their estimated advertisement position as an additional independent variable.
The dependent variable for this regression is estimated CPC payment. For
consistency’s sake, the same variable is used from the data as the first regression, the
maximum estimated CPC value provided by the Traffic estimator. This allows direct
comparisons of the results from this regression to those from the regression using only
the values to reach the top advertisement position.
Rich 24
In order to run this regression two more new variables needed to be created. This
first is an independent variable, ad-position. This variable is calculated very similarly to
the number of advertisements from the previous regression; however, it uses the
estimated position given by AdWords for the advertisement with that specific bid
observation. It uses the lower and upper bound for estimated advertisement position to
create an average position at which the advertisement would land, given that
observation’s bid. This variable is added to control for the wide variance in CPC values
across the bids ranging from $50 to $.10. Because of the inclusion of this variable, I did
not include the bottom advertisement number as an additional measure of competition in
the regression. At lower bids, those two variables would begin to converge, because the
lower bids would cause the advertisement position to drop to its lowest position. For this
reason, the only measure of advertiser competition is the one provided by External
Keyword Tool.
The second variable used for this regression is used as a filter. Because the bids
used to make the data observations begin at $50 and move down, a portion of them are
extremely high bids, even for the most expensive keywords. The average CPC of the
keywords over this bid range is $1.80, and the maximum of any keyword is under $37.
For these reasons, many of the bids on keywords are extremely high, and including them
in the regression might skew the results. In order to compensate, a variable filter is used
to prevent the “over-bids” from being included in the regression. This was done by
identifying what the highest maximum CPC payment estimate for each keyword is, and
then tagging each of the bids higher than it. For example, say the CPC estimates for the
keyword “acceptance insurance” provided by AdWords for the bids of $50, $35, $25, and
Rich 25
$20 are all $5.84, but the CPC estimate for the $15 bid is $5.79. This converted to the
observations whose bids are $50, $35, and $25 are all tagged as over-bids. This way the
$20 at $5.84 bid is still included in the regression.
This rest of the variables in the regression are the same as the previous regression
of the CPCs to reach the top position.
regress cpcmax dayclicksavg estadpos avgvol comp words brand stereo therapist
if overbid == 0
cpcmax Coef. Std. Err. P>t dayclickavg 7.62E-06 3.56E-06 0.032estadpos -0.5162933 0.013869 0.000avgvol -0.1604301 0.3618728 0.658comp 1.284278 0.1712169 0.000words 0.4790947 0.0452695 0.000brand -0.7889787 0.0959657 0.000stereo -3.182362 0.0657782 0.000therapist -2.541234 0.0711544 0.000_cons 3.388384 0.2523916 0.000
The R-squared value for this regression is .2624. This means that even including the
variable for the expected advertisement position, these variables explain much less about
the CPCs across the spectrum of advertiser bidding than the regression on just the
payments to get to the top position.
The regression shows similarly that the value of a click over all the bids are
affected by the word count and brand factor in the same direction as they affect the top
bid. However their coefficients have, on average, a little less than half the effect on all
CPCs as they do on the ones to reach the top positions. This most likely has to do with
the smaller average of CPCs of the bids over than the ones to reach the top. This effect
can also be seen in the decrease in the size of the coefficients for the stereo and therapist
identifiers. Though they both still decrease the value of the word (from a keyword tagged
Rich 26
in the insurance category) the coefficients have a smaller effect on the entire range of
CPCs.
A major difference in this regression is that search volume is no longer significant
and that the number of clicks is now significant and positive. For positions other than the
first, a larger number of clicks becomes more important, whereas at just the first position
a large number of clicks makes the keyword less valuable. It seems that beyond being in
the top position of the sponsored search results, the amount people are willing to pay to
advertise on keywords does not depend on how often the keyword is searched as much as
how often they are clicked.
These differences, in the value of number of clicks and search volume, between
the advertisers paying for the top position and the advertisers paying for the other
positions could result from the added benefit of a search user viewing advertisements in
the top position. It is possible that the only advertisement that gets any benefit to being
on a search results page that is viewed but receives no clicks on the sponsored
advertisements is the advertisement in the top position. This makes sense because a
search user might look at the first advertisement on the list without deciding to click any
of them. The search user would still have seen the first advertisement, and that is a value
to advertisers. The top position gains an additional value to high volume search words
because those are more times the advertisement is viewed. In this way, the amount the
advertiser pays for each click could be more than his actual value of click, because the
value of being in the position is from both the clicks he pays for and the unrecorded
impressions that he receives.
Rich 27
This effect is clear from the data in this and the first regression. Advertisers value
the added benefit of more impressions when they have the highest position. In this
manner, the advertisements in the top positions are advertising similarly to that of more
traditional advertising mediums like radio, print, and television. The goal of those types
is to get exposure of the product and the name of the company, compared to the added
new benefit of advertising by linking to the website. Though the benefit of these
impressions is much less traceable, it can still be of great value to advertisers.
Conclusions
The results of this analysis show three main findings. Advertisers do in fact value
more specific keywords for all different advertisement positions, when controlled for how
often the keywords are searched and clicked. Specificity brings customers who are more
likely to be interested in the products the company offers and are, therefore, more
valuable to reach than those searching less specific keywords. Search users who search a
specific brand, however, are less valuable. This is most likely because the sponsored ads
are clicked less often because of the likelihood of the natural search results containing the
desired websites.
These effects are shown in both the CPCs to reach the highest advertisement
position and in the CPCs to reach any position. The effects of the keyword complexity
and brand name are smaller on the CPCs to reach any position. This can be explained by
the much larger CPC values to reach the highest position than to reach any position. The
number of words in a keyword and whether it is a brand name both have smaller effects
on the smaller CPC values.
Rich 28
The third finding is that advertisers are willing to pay more, specifically for the
top position, when there is a large search volume. An explanation for this is because the
top position gets the added benefit of being seen much more often than the other
advertisements. This fact makes having the best position much more valuable for
keywords that are searched often, because this means the keywords are seen more often.
This effect makes the top-position advertisements (when they are not clicked) function
much more like traditional advertising through television or print. The user only sees the
name of the brand and it registers in their memory.
What this means for potential advertisers is a couple of things. Though this
research does not discuss the success of the companies who are advertising on more
specific keywords, it shows that previous advertisers do know that specific keywords are
more valuable. Though new advertisers would have to pay more for these, the
knowledge that other advertisers have done this shows that specific keywords might be
worth the extra cost. The impact to advertisers is that it might be more important to
advertise to specificity to find the right customers, rather than to generality for a larger
amount of clicks.
Recommendations for Future Research
Some of the ways this topic could be researched further would involve better data
measuring some of the variables. Because the data was retrieved using only the tools
Google provides to potential advertisers, more accurate measures were not possible. One
of these would be more developed measurements of the keywords’ specificity and
complexity. By using a linguistics algorithm or an expert opinion to determine how
Rich 29
specific the keywords are, one would have a much more accurate variable describing how
specific keywords are. Instead of just using how many words are in a keyword, the actual
words could be measured on how specific they are to the keyword’s group, and that
would be a better way to analyze the effect of specific keywords.
Another variable that could be more precise is the variable for search volume.
The one used in this analysis was on a one to one hundred scale. If the actual search
volume of the keywords were used, many more variables could be more accurate. One
could create a theoretical click-through-rate for the words, and this could be used as a
measure of the quality score. Using this value, a researcher might be able to delve further
into the details of Google’s ad rank auction system.
A last variable of data which could be very informative but would be hard to
retrieve is the conversion rate for advertisers. This is the percentage of times the ad is
clicked that the search user actually moves through with the process and makes a
purchase from the advertising company. With this data, one might be able to tell more
about how the way the user searches relates to how he follows through with the
advertising company. By analyzing these success rates, better recommendations could be
made to advertisers to actually see how keyword complexity affects their return on
investment and profitability of advertising.
Rich 30
Appendix
Word List
Insurance Stereo Therapist acceptance insurance american hi fi acne treatment aetna health insurance american hi fi lyrics addiction counseling aetna insurance amp adolescent counseling affordable health insurance amplifier alternative therapy affordable insurance amplifiers american physical therapy association all state insurance amps anger management american family insurance apples in stereo animal assisted therapy american insurance appliances anxiety annuities audio anxiety therapist annuity audio adrenaline apartment therapy auto audio amplifier aquatic therapy auto insurance audio bible aroma therapy auto insurance companies audio book art therapist auto insurance company audio books art therapy auto insurance quote audio cable asian massage auto insurance quotes audio cables behavior therapy auto owners insurance audio clips behavioral therapy automobile audio codec cancer treatment automobile insurance audio codecs chelation therapy boat insurance audio com chemo therapy business insurance audio control child counseling buy insurance audio converter child counselor california department of insurance audio device child therapist california insurance audio driver child therapy car audio drivers cognitive behavior therapy car ins audio editing cognitive behavioral therapy car insurance audio editor cognitive therapist car insurance quote audio engineering cognitive therapy car insurance quotes audio equipment colon therapy car insurance rates audio express color therapy car quote audio files consumer credit counseling cars audio hijack counseling center cash advance audio interface counseling psychologist cheap audio mixer counseling services cheap auto insurance audio recorder counseling therapy cheap car insurance audio recording counsellors cheap health insurance audio research counselor cheap insurance audio review counselors chubb insurance audio software couple counseling citizens insurance audio speakers couples counselor claims audio systems couples therapist cna insurance audio technica couples therapy cobra insurance audio visual cranial sacral therapy
Rich 31
combined insurance audiophile craniosacral therapy commerce insurance auto stereo credit counseling commercial insurance best home theater dance therapy condo insurance big screen decompression therapy country insurance big screen tv depression dental blaqk audio depression counseling dental insurance bluetooth stereo headphones depression counselor department of insurance bluetooth stereo headset depression therapist direct insurance boat stereo depression therapy disability insurance bookshelf stereo dialectical behavior therapy encompass insurance boombox dialectical behavioral therapy erie insurance bose home theater divorce esure buy stereo divorce counseling farm bureau insurance buy stereo system drug therapy finance car electroconvulsive therapy financial advisor car audio enzymatic therapy financial planning car audio systems equine therapy fire insurance car cd players family counselor flood insurance car stereo family therapist florida department of insurance car stereo installation family therapy florida insurance car stereo removal find a psychiatrist foremost insurance car stereo system find a therapist free home insurance quote car stereo systems find therapist gap insurance car stereos gene therapy general insurance cars gestalt therapy general liability insurance cassette deck group therapy grange insurance cassette decks hand therapy group health insurance cassette stereo hormone replacement therapy hanover insurance cassette stereo system hormone therapy hartford insurance cb radios hyperbaric oxygen therapy health cd player individual counseling health care insurance cd players individual therapy health insurance cd stereo infusion therapy health insurance companies cd stereo system inversion therapy health insurance plan compact stereo iv therapy health insurance plans death by stereo laser therapy health insurance quote diamond audio licensed professional counselor health insurance quotes dvd licensed therapist hmo dvd audio life coach home dvd player light therapy home insurance dvd players magnet therapy home loans electrical magnetic therapy home owner insurance electronics manual therapy home owners insurance free audio books marital counseling homeowner insurance headphones marriage and family therapist homeowners hi fi marriage and family therapy homeowners insurance hi fi buys marriage counseling homeowner's insurance home audio marriage counselor house insurance home stereo marriage counselors
Rich 32
individual health insurance home stereo system marriage therapist ins home stereos marriage therapy insurace home theater massage insuranc home theater furniture massage chair insurance home theater installation massage envy insurance adjuster home theater magazine massage school insurance agencies home theater master massage table insurance agency home theater pc massage therapist insurance agent home theater projector massage therapist salary insurance agent companies home theater projectors massage therapists insurance agent company home theater receiver massage therapy insurance agents home theater review massage therapy school insurance broker home theater seating massage therapy schools insurance brokers home theater speakers medical psychotherapists insurance claims home theater system mental health insurance co home theater systems mental health counseling insurance com home theaters mental health counselor insurance commissioner home theatre mental health therapist insurance companies jet audio message therapist insurance companies quotes jl audio message therapy insurance company kenwood stereo music therapist insurance coverage laptop music therapy insurance estimate legacy audio narrative therapy insurance fraud m audio new york therapist insurance institute ma audio occupational therapist insurance jobs magnolia home theater occupational therapists insurance leads marine stereo occupational therapy insurance license media centers occupational therapy assistant insurance plans memphis audio occupational therapy association insurance policies mini stereo occupational therapy jobs insurance policy monitor audio online therapy insurance provider mp3 player oxygen therapy insurance providers mp3 players ozone therapy insurance quote multimedia audio controller pediatric physical therapy insurance quotes music pet therapy insurance rate music system phone therapy insurance rates no audio device photodynamic therapy insurance ratings online car stereo physical therapist insurances open air stereo physical therapist assistant insurane outdoor stereo physical therapist salary insure pioneer audio physical therapists insure my car pioneer car audio physical therapy insureance pioneer car stereo physical therapy aide insurence pioneer stereo physical therapy assistant insurers plasma physical therapy association investment plasma tv physical therapy equipment investments polk audio physical therapy exercises liability insurance preamp physical therapy jobs life pro audio physical therapy program
Rich 33
life insurance projector physical therapy programs life insurance companies radio physical therapy salary life insurance policy radio stereo physical therapy school life insurance quote radio stereo system physical therapy schools life insurance quotes real audio play therapy loan realtek audio premarital counseling loans realtek hd audio professional therapist long term care receiver proton therapy long term care insurance receivers psychiatrist low cost health insurance rims psychiatrist directory low cost insurance shelf stereo psycho therapist malpractice insurance sigmatel audio psychologist medical insurance soda stereo psychologists meloche sony car stereo psychotherapist md money sound systems psychotherapists mortgage speaker psychotherapy mortgage insurance speakers radiation therapist mortgages stereo radiation therapy motorcycle insurance stereo advantage reality therapy mutual funds stereo amplifier recreation therapy national insurance stereo bluetooth recreational therapy nationwide insurance stereo cabinet relationship counseling new york life insurance stereo cable relationship therapy online insurance stereo dealer relationships oxford health insurance stereo equipment release therapy pet insurance stereo headphones respiratory therapist private mortgage insurance stereo installation respiratory therapists professional liability insurance stereo lyrics respiratory therapy property stereo receiver san francisco therapist property insurance stereo receivers seattle therapist real estate stereo repair seattle therapists rental insurance stereo retailer shock therapy renters insurance stereo review sound therapy renter's insurance stereo shop speech therapist retirement stereo speakers speech therapists retirement planning stereo store speech therapy rv insurance stereo system sports physical therapy short term health insurance stereo system store sports therapy standard insurance stereo systems stem cell therapy student health insurance stereo total stress student insurance stereogram teen counseling term insurance stereos testosterone therapy term life insurance stereoscopic thai massage texas department of insurance streaming audio therapist texas insurance subwoofer therapist directory title insurance subwoofers therapist jobs travel surround sound therapists travel insurance surround sound systems therapy travelers insurance television therapy dog
Rich 34
truck insurance the apples in stereo therapy dogs unemployment insurance turntable therapy nyc vehicle insurance turntables trigger point therapy vision insurance universal audio urine therapy whole life insurance usb audio vision therapy work at home wholesale water therapy work from home wireless audio water treatment workers compensation insurance wireless home theater wilderness therapy workmans wireless stereo window treatment zurich insurance yamaha stereo yoga therapy
Position Bins
Bin # Min Position Max Position Avg Position
1 1 3 2
2 4 6 5
3 7 10 8.5
4 11 15 13
5 16 20 18
6 21 30 25.5
7 31 40 35.5
Bid List
$0.10 $0.15
$$0.20 $0.30 $0.50 $0.75
$1 $1.25 $1.50
$2 $2.50
$3
Rich 35
$4 $5 $6 $8
$10 $12.50
$15 $20 $25 $35 $50
Reg. A
regress avgvol words brand
avgvol Coef. Std. Err. P>t words -0.0766555 0.0012082 0.000 brand -0.0212383 0.0028039 0.000 _cons 0.6371395 0.0026828 0.000
Rich 36
Bibliography Aggarwal, G., Goel, A., Motwani, R. Truthful Auctions for Pricing Search Keywords. theory.stanford.edu. <http://theory.stanford.edu/~gagan/papers/keyword_auctions_EC06.pdf> Edelman, B., Ostrovsky, M. Strategic Bidder Behavior in Sponsored Search Auctions. Decision Support Systems, 2007. Elsevier. <http://www.benedelman.org/publications/cycling-060703.pdf> Edelman, B., Ostrovsky, M., Schwarz, M. Internet Advertising and the Generalized Second-Price Auction, 2005. atypon-link.com. <http://faculty-gsb.stanford.edu/ostrovsky/papers/gsp.pdf> Fain, D. C., Pederson, J. O. Sponsored Search: A Brief History. <http://www.business.ualberta.ca/kasdemir/ssa2/fain_pedersen.PDF> Feng, J., Bhargava, H. K., Pennock, D. M. Implementing Sponsored Search in Web Search Engines: Computational Evaluation of Alternative Mechanisms. INFORMS Journal on Computing, 2006. bear.cba.ufl.edu. <http://bear.cba.ufl.edu/feng/JOC.pdf> Lahaie, S. An Analysis of Alternative Slot Auction Designs for Sponsored Search. eecs.harvard.edu <http://www.eecs.harvard.edu/~slahaie/pubs/fp185-lahaie.pdf> Zhou, Y., Lukose, R. Vindictive Bidding in Keyword Auctions. cse.wustl.edu <http://www.cse.wustl.edu/~yzhou/yunhongzhou/documents/06-ssa- vindictive.pdf> AdWords Help Center. Can I see what my competitors are bidding?. google.com <https://adwords.google.com/support/bin/answer.py?answer=12395&topic=1026 4>. AdWords Help Center. How are ads ranked?. google.com <https://adwords.google.com/support/bin/answer.py?answer=6111&hl=en_US&c tx=SetPricing>. AdWords Help Center. How does Google detect invalid clicks?. google.com <http://adwords.google.com/support/bin/answer.py?hl=en&answer=6114>. Google AdWords. Google AdWords: Keyword Tool. <https://adwords.google.com/select/KeywordToolExternal> Google Adwords. Google AdWords:Traffic Estimator. <https://adwords.google.com/select/TrafficEstimatorSandbox>